Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Nutch | 2,742 | 82 | 1 | 2 months ago | 26 | August 22, 2022 | 14 | apache-2.0 | Java | |
Apache Nutch is an extensible and scalable web crawler | ||||||||||
Storm Crawler | 834 | 7 | 10 | 2 months ago | 36 | October 25, 2023 | 34 | apache-2.0 | HTML | |
A scalable, mature and versatile web crawler based on Apache Storm | ||||||||||
Sparkler | 401 | a year ago | 55 | apache-2.0 | Java | |||||
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark. | ||||||||||
Awesome Crawler Cn | 243 | a year ago | mit | |||||||
互联网爬虫,蜘蛛,数据采集器,网页解析器的汇总,因新技术不断发展,新框架层出不穷,此文会不断更新... | ||||||||||
Nutch Htmlunit | 122 | 9 years ago | 1 | apache-2.0 | Java | |||||
基于Apache Nutch和Htmlunit的扩展实现AJAX页面爬虫抓取解析插件 | ||||||||||
Memex Explorer | 106 | 8 years ago | 67 | bsd-2-clause | Python | |||||
Viewers for statistics and dashboarding of Domain Search Engine data | ||||||||||
Crawlerpack | 99 | 51 | 7 years ago | 9 | December 10, 2016 | apache-2.0 | Java | |||
Java 網路資料爬蟲包 | ||||||||||
Clj Web Crawler | 38 | 13 years ago | mit | Clojure | ||||||
A wrapper around Apache commons-client for the Clojure programming language. | ||||||||||
Mongo Elasticsearch Nutch | 15 | 8 years ago | 2 | Shell | ||||||
Docker image for creating a single Apache Nutch server, with mongodb as crawl storage and Elasticsearch for indexing | ||||||||||
Nutch In Java | 14 | a year ago | 1 | mit | Java | |||||
How to use Apache Nutch without command line |