Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Trafilatura | 2,447 | 66 | 3 months ago | 39 | November 29, 2023 | 66 | gpl-3.0 | Python | ||
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments | ||||||||||
News Please | 1,821 | 6 | 4 | 4 months ago | 121 | August 30, 2023 | 17 | apache-2.0 | Python | |
news-please - an integrated web crawler and information extractor for news that just works | ||||||||||
Holiday Cn | 1,018 | 3 months ago | 6 | mit | Python | |||||
📅🇨🇳中国法定节假日数据 自动每日抓取国务院公告 | ||||||||||
Bookcorpus | 698 | 9 months ago | 5 | mit | Python | |||||
Crawl BookCorpus | ||||||||||
Personrelationknowledgegraph | 480 | 5 years ago | 7 | Python | ||||||
ChinesePersonRelationGraph, person relationship extraction based on nlp methods.中文人物关系知识图谱项目,内容包括中文人物关系图谱构建,基于知识库的数据回标,基于远程监督与bootstrapping方法的人物关系抽取,基于知识图谱的知识问答等应用。 | ||||||||||
Clipper.js | 311 | 3 months ago | 4 | apache-2.0 | TypeScript | |||||
HTML to Markdown converter and crawler. | ||||||||||
Weibo_terminator_workflow | 259 | 7 years ago | 3 | Python | ||||||
Update Version of weibo_terminator, This is Workflow Version aim at Get Job Done! | ||||||||||
Lagoujob | 250 | 5 years ago | apache-2.0 | Python | ||||||
Job data mining repo for lagou.com | ||||||||||
Fxdesktopsearch | 168 | 3 months ago | 19 | apache-2.0 | Java | |||||
A JavaFX based desktop search application. | ||||||||||
Ungoliant | 132 | 6 months ago | 5 | February 24, 2023 | 29 | apache-2.0 | Rust | |||
:spider: The pipeline for the OSCAR corpus |