Annotated Wikiextractor Alternatives

Name: jodaiber/Annotated-WikiExtractor
Brand: jodaiber/Annotated-WikiExtractor
SKU: project/jodaiber/Annotated-WikiExtractor
Rating: 4.45 (88 reviews)

Simple Wikipedia plain text extractor with article link annotations and Hadoop support.

Categories > Data Processing > Xml

Suggest Alternative

Stars

Alternatives

License

gpl-3.0

Open Issues

Most Recent Commit

over 15 years ago

Programming Language

Python

Dependent Repos

Dependent Packages

Total Releases

Categories

Programming Languages > Python

Data Formats > Xml

Learning Resources > Article

Data Processing > Hadoop

Companies > Wikipedia

Data Processing > Mapreduce

Repo

Alternatives To jodaiber/Annotated-WikiExtractor

Project Name	Stars	Repos Using This	Packages Using This	Most Recent Commit	Total Releases	Open Issues	License	Language
datawrangling/trendingtopics	351	0	0	almost 15 years ago	0	10		Ruby
Rails app for tracking trends in server logs - powered by the Cloudera Hadoop Distribution on EC2
jodaiber/Annotated-WikiExtractor	88	0	0	over 15 years ago	0	0	gpl-3.0	Python
Simple Wikipedia plain text extractor with article link annotations and Hadoop support.
rossf7/wikireverse	39	0	0	almost 8 years ago	0	2	mit	Java
Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.
matpalm/wikipediaPhilosophy	18	0	0	over 13 years ago	0	0		Python
do all first links on wikipedia _really_ lead to philosophy?
argszero/translate	7	0	0	almost 4 years ago	0	0
translate articles to chinese
jackiewang20/elasticsearch-springboot-starter-master	5	0	0	over 6 years ago	0	0		Java

Alternatives To jodaiber/Annotated-WikiExtractor

Select To Compare

datawrangling/trendingtopics ⭐ 351

Rails app for tracking trends in server logs - powered by the Cloudera Hadoop Distribution on EC2

dependent packages 0 total releases 0 most recent commit almost 15 years ago

jodaiber/Annotated-WikiExtractor ⭐ 88

Simple Wikipedia plain text extractor with article link annotations and Hadoop support.

dependent packages 0 total releases 0 most recent commit over 15 years ago

rossf7/wikireverse ⭐ 39

Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.

dependent packages 0 total releases 0 most recent commit almost 8 years ago

matpalm/wikipediaPhilosophy ⭐ 18

do all first links on wikipedia _really_ lead to philosophy?

dependent packages 0 total releases 0 most recent commit over 13 years ago

argszero/translate ⭐ 7

translate articles to chinese

dependent packages 0 total releases 0 most recent commit almost 4 years ago

jackiewang20/elasticsearch-springboot-starter-master ⭐ 5

dependent packages 0 total releases 0 most recent commit over 6 years ago

Suggest An Alternative To Annotated-WikiExtractor

Alternative Project Comparisons

jodaiber/Annotated-WikiExtractor vs Trendingtopics

jodaiber/Annotated-WikiExtractor vs Annotated Wikiextractor

jodaiber/Annotated-WikiExtractor vs Wikireverse

jodaiber/Annotated-WikiExtractor vs Wikipediaphilosophy

jodaiber/Annotated-WikiExtractor vs Translate

jodaiber/Annotated-WikiExtractor vs Elasticsearch Springboot Starter Master

Popular Hadoop Projects

apache/spark⭐ 37,661

Apache Spark - A unified analytics engine for large-scale data processing

donnemartin/data-science-ipython-notebooks⭐ 25,668

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

dmlc/xgboost⭐ 25,253

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

spotify/luigi⭐ 17,046

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Tencent/APIJSON⭐ 16,277

🏆 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码，前端(客户端) 定制返回 JSON 的数据和结构。 🏆 A JSON Transmission Protocol and an ORM Library 🚀 provides APIs and Docs without writing any code.