Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Sparkler | 401 | a year ago | 55 | apache-2.0 | Java | |||||
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark. | ||||||||||
See | 27 | 2 years ago | gpl-3.0 | Erlang | ||||||
Search Engine in Erlang | ||||||||||
Information Retrieval | 15 | a year ago | Python | |||||||
Elasticsearch, MongoDB, Tornado Server, RESTful API, Python, Information Retrieval, Machine Learning, Web Crawler | ||||||||||
Serritor | 13 | 2 | 4 years ago | 12 | June 11, 2020 | apache-2.0 | Java | |||
Serritor is an open source web crawler framework built upon Selenium and written in Java. It can be used to crawl dynamic web pages that require JavaScript to render data. | ||||||||||
Covid19_stats | 9 | 3 years ago | gpl-3.0 | Python | ||||||
코로나-19 에 대한 확진/완치/사망 에 대한 국내, 해외 정보를 수집합니다. Data scrapes Covid-19 Confirmed/Cured/Deceases Cases. | ||||||||||
Inforetrieval | 8 | 5 years ago | mit | HTML | ||||||
Inverted Indexer, web crawler, sort, search and poster steamer written using Python for information retrieval. | ||||||||||
Web Search Engine Uic | 6 | 5 years ago | Python | |||||||
CS 582 Information Retrieval at University of Illinois at Chicago. Multithreaded crawling of UIC domain, inverted index, page rank, SEO with Context Pseudo-Relevance Feedback | ||||||||||
Sce | 5 | 6 years ago | 35 | apache-2.0 | Shell | |||||
Sparkler Crawl Environment - a packaged, dockerized version of http://github.com/USCDataScience/sparkler.git | ||||||||||
Machine_learning_focused_crawler | 5 | 5 years ago | Python | |||||||
A focused web crawler that uses Machine Learning to fetch better relevant results. |