Project Name	Stars	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
Spring Boot Quick	2,282	7 months ago			13		Java
:herb: 基于springboot的快速学习示例,整合自己遇到的开源框架,如：rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、k3s、k3d、k8s、mybatis加解密插件、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等:pushpin:
Sparkler	401	a year ago			55	apache-2.0	Java
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Cc Pyspark	280	a year ago			4	mit	Python
Process Common Crawl data with Python and Spark
Docs	102	5 years ago			3
《数据采集从入门到放弃》源码。内容简介：爬虫介绍、就业情况、爬虫工程师面试题；HTTP协议介绍； Requests使用；解析器Xpath介绍； MongoDB与MySQL；多线程爬虫； Scrapy介绍；Scrapy-redis介绍；使用docker部署；使用nomad管理docker集群；使用EFK查询docker日志
Cc Index Table	78	7 months ago			8	apache-2.0	Java
Index Common Crawl archives in tabular format
Keywordanalysis	33	6 years ago
Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends
Engineeringteam	32	5 years ago			2
와이빅타 엔지니어링팀의 자료를 정리해두는 곳입니다.
Search_ads_web_service	27	7 years ago					Java
Online search advertisement platform & Realtime Campaign Monitoring [Maybe Deprecated]
Steam_recommendation_system	25	7 years ago					Jupyter Notebook
Recommendation System, Collaborative Filtering, Spark, Hive, Flask, Web Crawler, AWS EC2, AWS RDS
Sparkwarc	13	2 years ago	4	January 11, 2022		apache-2.0	WebAssembly
Load WARC files into Apache Spark with sparklyr

Alternatives To Common_crawl_insight

Select To Compare

Spring Boot Quick ⭐ 2,282

:herb: 基于springboot的快速学习示例,整合自己遇到的开源框架,如：rabbitmq(延迟队列)、K

most recent commit 7 months ago

Sparkler ⭐ 401

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

most recent commit a year ago

Cc Pyspark ⭐ 280

Process Common Crawl data with Python and Spark

most recent commit a year ago

Docs ⭐ 102

《数据采集从入门到放弃》源码。内容简介：爬虫介绍、就业情况、爬虫工程师面试题；HTTP协议介绍； Requests使用；解析器Xpath介绍； MongoDB与MySQL；多线程爬虫； Scrapy介绍；Scrapy-redis介绍；使用docker部署；使用nomad管理docker集群；使用EFK查询docker日志

most recent commit 5 years ago

Cc Index Table ⭐ 78

Index Common Crawl archives in tabular format

most recent commit 7 months ago

Keywordanalysis ⭐ 33

Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends

most recent commit 6 years ago

Engineeringteam ⭐ 32

와이빅타 엔지니어링팀의 자료를 정리해두는 곳입니다.

most recent commit 5 years ago

Search_ads_web_service ⭐ 27

Online search advertisement platform & Realtime Campaign Monitoring [Maybe Deprecated]

most recent commit 7 years ago

Steam_recommendation_system ⭐ 25

Recommendation System, Collaborative Filtering, Spark, Hive, Flask, Web Crawler, AWS EC2, AWS RDS

most recent commit 7 years ago

Sparkwarc ⭐ 13

Load WARC files into Apache Spark with sparklyr

total releases 4most recent commit 2 years ago

Suggest An Alternative To Common_crawl_insight

Alternative Project Comparisons

Common_crawl_insight vs Spring Boot Quick

Common_crawl_insight vs Sparkler

Common_crawl_insight vs Cc Pyspark

Common_crawl_insight vs Docs

Common_crawl_insight vs Cc Index Table

Common_crawl_insight vs Keywordanalysis

Common_crawl_insight vs Engineeringteam

Common_crawl_insight vs Search_ads_web_service

Common_crawl_insight vs Steam_recommendation_system

Common_crawl_insight vs Sparkwarc

Popular Spark Projects

Spark ⭐ 37,661

Apache Spark - A unified analytics engine for large-scale data processing

dependent packages 939total releases 46latest release May 09, 2021most recent commit 3 months ago

Data Science Ipython Notebooks ⭐ 25,668

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

most recent commit 7 months ago

Redash ⭐ 24,479

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

dependent packages 3total releases 2latest release May 05, 2020most recent commit 3 months ago

Docker_practice ⭐ 23,279

Learn and understand Docker&Container technologies, with real DevOps practice!

total releases 9latest release December 01, 2021most recent commit 4 months ago

Data Engineering Zoomcamp ⭐ 19,461

Free Data Engineering course!

most recent commit 3 months ago

Popular Crawler Projects

Scrapy ⭐ 49,918

Scrapy, a fast high-level web crawling & scraping framework for Python.

dependent packages 445total releases 96latest release September 18, 2023most recent commit 3 months ago

Lux ⭐ 24,752

👾 Fast and simple video download library and CLI tool written in Go

dependent packages 8total releases 40latest release November 06, 2023most recent commit a month ago

Colly ⭐ 21,902

Elegant Scraper and Crawler Framework for Golang

dependent packages 328total releases 22latest release March 08, 2022most recent commit 2 months ago

Easyspider ⭐ 20,149

A visual no-code/code-free web crawler/spider易采集：一个可视化浏览器自动化测试/数据采集/爬虫软件，可以无代码图形化

most recent commit a month ago

Proxy_pool ⭐ 19,442

Python ProxyPool for web spider

most recent commit 4 months ago

Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories

No Spam. Unsubscribe easily at any time.

Python

Spark

Crawler

Privacy | About | Terms | Follow Us On Twitter

Downloads, Dependent Repos, Dependent Packages, Total Releases, Latest Releases data powered by Libraries.io.