Project Name	Stars	Repos Using This	Packages Using This	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
Tika	2,007	1,687	570	3 months ago	66	October 17, 2023	49	apache-2.0	Java
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
Tika Python	1,316	83	54	8 months ago	35	January 02, 2023	4	apache-2.0	Python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Datashare	519			3 months ago	135	November 21, 2023	17	agpl-3.0	Java
A self-hosted search engine for documents.
Sparkler	401			a year ago			55	apache-2.0	Java
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Go Tika	171		4	a year ago	6	April 18, 2022	9	apache-2.0	Go
Go package for using Apache Tika
Docker Tikaserver	160			2 years ago			9	apache-2.0	Dockerfile
Apache Tika Server as a Docker Image
Pdf2html	117	2	6	3 months ago	26	January 22, 2023	7	apache-2.0	JavaScript
pdf2html is a module which helps to convert PDF file to HTML pages using Apache Tika. This module also helps to generate thumbnail image for PDF file using Apache PDFBox.
Memex Explorer	106			8 years ago			67	bsd-2-clause	Python
Viewers for statistics and dashboarding of Domain Search Engine data
Php Apache Tika	104	3	3	8 months ago	38	April 14, 2023		mit	PHP
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
Imagecat	84			6 years ago					Java
ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (images,but could be extended to other files) in place, and to extract metadata and OCR information from those files/images using Tika and Tesseract OCR.

Alternatives To Nifi Extracttext Processor

Select To Compare

Tika ⭐ 2,007

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

dependent packages 570total releases 66most recent commit 3 months ago

Tika Python ⭐ 1,316

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

dependent packages 54total releases 35most recent commit 8 months ago

Datashare ⭐ 519

A self-hosted search engine for documents.

total releases 135most recent commit 3 months ago

Sparkler ⭐ 401

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

most recent commit a year ago

Go Tika ⭐ 171

Go package for using Apache Tika

dependent packages 4total releases 6most recent commit a year ago

Docker Tikaserver ⭐ 160

Apache Tika Server as a Docker Image

most recent commit 2 years ago

Pdf2html ⭐ 117

pdf2html is a module which helps to convert PDF file to HTML pages using Apache Tika. This module also helps to generate thumbnail image for PDF file using Apache PDFBox.

dependent packages 6total releases 26most recent commit 3 months ago

Memex Explorer ⭐ 106

Viewers for statistics and dashboarding of Domain Search Engine data

most recent commit 8 years ago

Php Apache Tika ⭐ 104

Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats

dependent packages 3total releases 38most recent commit 8 months ago

packagist vaites/php-apache-tika} Downloads

Imagecat ⭐ 84

ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (images,but could be extended to other files) in place, and to extract metadata and OCR information from those files/images using Tika and Tesseract OCR.

most recent commit 6 years ago

Suggest An Alternative To nifi-extracttext-processor

Alternative Project Comparisons

Nifi Extracttext Processor vs Tika

Nifi Extracttext Processor vs Tika Python

Nifi Extracttext Processor vs Datashare

Nifi Extracttext Processor vs Sparkler

Nifi Extracttext Processor vs Go Tika

Nifi Extracttext Processor vs Docker Tikaserver

Nifi Extracttext Processor vs Pdf2html

Nifi Extracttext Processor vs Memex Explorer

Nifi Extracttext Processor vs Php Apache Tika

Nifi Extracttext Processor vs Imagecat

Popular Tika Projects

S3_website ⭐ 2,259

Manage an S3 website: sync, deliver via CloudFront, benefit from advanced S3 website features.

total releases 109latest release October 11, 2017most recent commit a year ago

Fscrawler ⭐ 1,279

Elasticsearch File System Crawler (FS Crawler)

dependent packages 1total releases 5latest release January 10, 2022most recent commit 3 months ago

Lingua ⭐ 622

The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

dependent packages 3total releases 17latest release August 02, 2022most recent commit 5 months ago

Gransk ⭐ 237

Document processing for investigations

most recent commit 7 years ago

Extract ⭐ 229

A cross-platform command line tool for parallelised content extraction and analysis.

dependent packages 1total releases 58latest release November 13, 2023most recent commit 3 months ago

Popular Apache Projects

Echarts ⭐ 58,775

Apache ECharts is a powerful, interactive charting and data visualization library for browser

dependent packages 6,345total releases 119latest release July 18, 2023most recent commit 15 days ago

Superset ⭐ 58,051

Apache Superset is a Data Visualization and Data Exploration Platform

dependent packages 21total releases 6latest release April 18, 2023most recent commit 20 days ago

Awesome Cpp ⭐ 53,034

A curated list of awesome C++ (or C) frameworks, libraries, resources, and shiny things. Inspired by awesome-... stuff.

most recent commit 3 months ago

Awesome Android Ui ⭐ 47,955

A curated list of awesome Android UI/UX libraries

most recent commit 5 months ago

Spark ⭐ 37,661

Apache Spark - A unified analytics engine for large-scale data processing

dependent packages 939total releases 46latest release May 09, 2021most recent commit 3 months ago

Popular Data Processing Categories