Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Nutch | 2,742 | 82 | 1 | 3 months ago | 26 | August 22, 2022 | 14 | apache-2.0 | Java | |
Apache Nutch is an extensible and scalable web crawler | ||||||||||
Commoncrawl | 466 | 6 years ago | 8 | C++ | ||||||
Common Crawl support library to access 2008-2012 crawl archives (ARC files) | ||||||||||
Commoncrawl Crawler | 208 | a year ago | gpl-3.0 | Java | ||||||
The Common Crawl Crawler Engine and Related MapReduce code (2008-2012) | ||||||||||
Cc Warc Examples | 46 | 10 years ago | 3 | mit | Java | |||||
CommonCrawl WARC/WET/WAT examples and processing code for Java + Hadoop | ||||||||||
Slinky | 39 | 14 years ago | Python | |||||||
Slinky, a high-performance web crawler / text analytics in Python, Redis, Hadoop, R, Gephi | ||||||||||
Wikireverse | 39 | 6 years ago | 2 | mit | Java | |||||
Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles. | ||||||||||
Engineeringteam | 32 | 5 years ago | 2 | |||||||
와이빅타 엔지니어링팀의 자료를 정리해두는 곳입니다. | ||||||||||
Ccooo | 27 | 9 years ago | Clojure | |||||||
Common Crawl One-Oh-One (aka "A Common Crawl Experiment") | ||||||||||
Real_time_social_media_mining | 24 | 5 months ago | 21 | mit | HTML | |||||
DevOps pipeline for Real Time Social/Web Mining | ||||||||||
Nutch Aws | 23 | 9 years ago | 1 | Makefile | ||||||