Extract

A cross-platform command line tool for parallelised content extraction and analysis.
Alternatives To Extract
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Extract21615 months ago48January 02, 202310mitJava
A cross-platform command line tool for parallelised content extraction and analysis.
Transformalize14119446 months ago49April 18, 20193otherC#
Configurable Extract, Transform, and Load
Wasp24
3 months ago4otherScala
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Chompsky7
10 years agoJava
An NLP pipeline for Wikia data
Dcat Ap Viewer5
a month ago25mitJavaScript
Viewer of DCAT-AP 2.0.1 compatible dataset metadata
Solrmongodbdataimporter5
2 years agoJava
Solr MongoDB Data Import
Resume1
3 years ago
个人简历
Transformalize.provider.solr1
9 months ago4apache-2.0C#
a solr provider for transformalize
Alternatives To Extract
Select To Compare


Alternative Project Comparisons
Readme

Extract

Circle CI

A cross-platform command line tool for parallelized, distributed content-extraction. Built on top of Apache Tika and an essential part of the engineering behind the Panama Papers, Swiss Leaks and Luxembourg Leaks investigations.

It supports Redis-backed queueing for distributed, parallel extraction and will write to Solr, plain text files or standard output.

For guidance and instructions, please see the wiki.

Credits and Collaboration

Initialy developed by Matthew Caruana Galizia at ICIJ.

We welcome contributions! Please submit pull requests or contact us directly.

License

Copyright (c) 2018 International Consortium of Investigative Journalists. See LICENSE.

Popular Solr Projects
Popular Etl Projects
Popular Data Processing Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Java
Solr
Etl
Leak
Tika