Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Extract | 216 | 1 | 5 months ago | 48 | January 02, 2023 | 10 | mit | Java | ||
A cross-platform command line tool for parallelised content extraction and analysis. | ||||||||||
Transformalize | 141 | 19 | 44 | 6 months ago | 49 | April 18, 2019 | 3 | other | C# | |
Configurable Extract, Transform, and Load | ||||||||||
Wasp | 24 | 3 months ago | 4 | other | Scala | |||||
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you. | ||||||||||
Chompsky | 7 | 10 years ago | Java | |||||||
An NLP pipeline for Wikia data | ||||||||||
Dcat Ap Viewer | 5 | a month ago | 25 | mit | JavaScript | |||||
Viewer of DCAT-AP 2.0.1 compatible dataset metadata | ||||||||||
Solrmongodbdataimporter | 5 | 2 years ago | Java | |||||||
Solr MongoDB Data Import | ||||||||||
Resume | 1 | 3 years ago | ||||||||
个人简历 | ||||||||||
Transformalize.provider.solr | 1 | 9 months ago | 4 | apache-2.0 | C# | |||||
a solr provider for transformalize |
A cross-platform command line tool for parallelized, distributed content-extraction. Built on top of Apache Tika and an essential part of the engineering behind the Panama Papers, Swiss Leaks and Luxembourg Leaks investigations.
It supports Redis-backed queueing for distributed, parallel extraction and will write to Solr, plain text files or standard output.
For guidance and instructions, please see the wiki.
Initialy developed by Matthew Caruana Galizia at ICIJ.
We welcome contributions! Please submit pull requests or contact us directly.
Copyright (c) 2018 International Consortium of Investigative Journalists. See LICENSE
.