Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for tika
tika
x
132 search results found
S3_website
⭐
2,259
Manage an S3 website: sync, deliver via CloudFront, benefit from advanced S3 website features.
Tika
⭐
2,007
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
Tika Python
⭐
1,316
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Fscrawler
⭐
1,279
Elasticsearch File System Crawler (FS Crawler)
Lingua
⭐
622
The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike
Datashare
⭐
519
A self-hosted search engine for documents.
Sparkler
⭐
401
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Gransk
⭐
237
Document processing for investigations
Extract
⭐
229
A cross-platform command line tool for parallelised content extraction and analysis.
Go Tika
⭐
171
Go package for using Apache Tika
Pantomime
⭐
171
A tiny Clojure library that deals with MIME types (Internet media types)
Itsy
⭐
168
A threaded web-spider written in Clojure
Docker Tikaserver
⭐
160
Apache Tika Server as a Docker Image
Tikaondotnet
⭐
148
Use the Java Tika text extraction library on the .NET platform
Node Tika
⭐
128
Apache Tika bridge for Node.js. Text and metadata extraction, language detection and more.
Pdf2html
⭐
117
pdf2html is a module which helps to convert PDF file to HTML pages using Apache Tika. This module also helps to generate thumbnail image for PDF file using Apache PDFBox.
Vorbisjava
⭐
109
A library for working with Ogg Vorbis files
Memex Explorer
⭐
106
Viewers for statistics and dashboarding of Domain Search Engine data
Php Apache Tika
⭐
104
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
Freeeed
⭐
101
Open source eDiscovery
Mlwithtensorflow2ed
⭐
101
Code for Machine Learning with TensorFlow: 2nd Edition Published by Manning Publications
Tika Similarity
⭐
100
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
Verticlesearchengine
⭐
98
Academic Search Engine using Scrapy, MongoDB, Lucene/Solr, Tika, Struts2, Jquery, Bootstrap, D3, CAS
Image_space
⭐
91
Interactive Image similarity and Visual Search and Retrieval application
Imagecat
⭐
84
ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (images,but could be extended to other files) in place, and to extract metadata and OCR information from those files/images using Tika and Tesseract OCR.
Tika Docker
⭐
81
Convenience Docker images for Apache Tika Server
Pdf Corpora
⭐
60
An index of PDF-centric corpora
Harvester
⭐
59
Web crawling and document processing through a usable interface.
Es Amazon S3 River
⭐
59
Amazon S3 river for Elasticsearch
Rtika
⭐
52
R Interface to Apache Tika
Doc_processing_toolkit
⭐
52
Python library to extract text from PDF, and default to OCR when text extraction fails.
Phptikawrapper
⭐
52
Simple PHP Wrapper for Apache Tika
Geoparser
⭐
50
Extract and Visualize location from any file
Pwned Antifas
⭐
48
Fuji
⭐
43
FAIRsFAIR Research Data Object Assessment Service
Rika
⭐
43
A JRuby wrapper for Apache Tika to extract text and metadata from files of various formats.
Xponents
⭐
41
Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.
Newman
⭐
39
Quickly analyze and explore email with advanced analytics and visualization.
Cogstack Pipeline
⭐
39
Distributed, fault tolerant batch processing for Natural Language Applications and Search, using remote partitioning
Dflydev Canal
⭐
35
Analyze content to determine the appropriate Internet media type
Vim Office
⭐
34
read common binary files, such as PDFs and those of Microsoft Office or LibreOffice, in Vim
En4j
⭐
31
Java Desktop Client to Evernote
Querido Diario Toolbox
⭐
30
Este projeto empodera quem deseja processar dados no contexto do Querido Diário e realizar suas próprias análises.
Sentimentanalysisparser
⭐
29
Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.
Aspen
⭐
29
🔎 📖 ✨ Custom, private search engine for text documents built with NextJS/React/ES6/ES7
Nifi Extracttext Processor
⭐
28
Apache NiFi Custom Processor Extracting Text From Files with Apache Tika
Xltsearch
⭐
28
High-performance, portable and configurable desktop search application / information retrieval system
Ipfs Tika
⭐
27
Java web application taking IPFS hashes, extracting (textual) content and metadata through Apache's Tika.
Clj Tika
⭐
25
Clojure bindings to Apache Tika project
Pandora
⭐
24
Small box of pandora to prototype your app with ready for use backend. This is just my compilation of different solutions occasionally applied in hackathons and challenges
Pdf Discovery Demo
⭐
24
Demonstration of searching PDF document with Solr, Tika, and Tesseract
Solr_exploit
⭐
23
Apache Solr远程代码执行漏洞(CVE-2019-0193) Exploit
Ruby_tika_app
⭐
23
A ruby wrapper for the Tika jar (tika-app.jar) that extracts text in a lot of formats from PDF, xls, doc, etc files
Document_search_engine_architecture
⭐
22
📄🚀 Unleash a powerful Document Search Engine with Apache NiFi for lightning-fast, comprehensive text indexing and search.
Simple Tika Server
⭐
19
Apache Tika as a http service, PUT files and get the metadata as JSON
Utilityscripts
⭐
19
Scripts for managing scrapers
Jhighlight
⭐
18
JHighlight is an embeddable pure Java syntax highlighting library.
Tika Dockers
⭐
18
A suite of Machine Learning / Deep Learning Dockerfiles to allow Apache Tika to extract objects and to produce textual captions for images and video
Tika Server
⭐
18
Apache Tika Server with Tesseract 4 Docker Setup
Imixs Docker
⭐
18
Docker Images for the Imixs-Workflow project
Tika Helm
⭐
17
A Helm chart to deploy Apache Tika on Kubernetes.
Oregon Law Parser
⭐
17
Distill information about amendments to the Oregon Revised Statutes.
Rtika
⭐
16
A JRuby wrapper for Apache Tika
Etllib
⭐
16
This is the ETL lib package. It provides an API to munge and prepare JSON, TSV and other data using Apache Tika and JSON parsing/loading for ETL via Apache OODT (or other libs) into Apache Solr.
Alfresco Transform Core
⭐
15
Tika Text Extract
⭐
15
Extract text from a document by Apache Tika
Polar.usc.edu
⭐
15
Polar USC activities related to NSF Polar CyberInfrastructure program at the University of Southern California
T3ext Extractor
⭐
14
TYPO3 Extension extractor
Page Pipe
⭐
14
pass pages through a pluggable pipeline to extract information from them.
Nanite
⭐
14
Nanite - a friendly swarm of format-identifying robots.
Tika Server
⭐
14
Apache Tika Server as a Background Service in Node.js
Tokyo
⭐
13
tokyo, a REST API, when given any type of document 📄, Identifies mime-type 🧐. Suggests extension 🦔. Alas Extracts text 💪.
Tika App Python
⭐
13
Python bindings for Apache Tika
Tika Dl4j Spark Imgrec
⭐
13
Image recognition on Spark cluster powered by Deeplearning4j and Apache Tika
Moodle Search_elastic
⭐
13
An Elasticsearch engine plugin for Moodle's Global Search
Miner
⭐
13
Miner is a PHP library that extracting metadata and interesting text content (like author, summary, and etc.) from HTML pages. It acts like a simplified HTML metadata parser in Apache Tika.
Iscc Cli
⭐
13
ISCC: Command Line Tool
Apache Tika Lambda Layer
⭐
12
AWS Lambda layer containing latest version of Apache Tika
Tika Service
⭐
12
Apache Tika running as a web service
Datafusion
⭐
11
matching between unstructured and structured data sets
Shangridocs
⭐
11
Document exploration tool
Tika Ner Corenlp
⭐
11
Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser
Loophole
⭐
11
记录搭建漏洞环境及漏洞复现
Dropwizard Tika Server
⭐
10
A DropWizard wrapper around Apache Tika.
Tika Hadoop Mapreduce
⭐
10
Apache Tika integration with Java MapReduce for Hadoop
Snorkel Extraction
⭐
10
A previous version of Snorkel focused on information extraction
Apachetikabundle
⭐
10
📁 Symfony Bundle for https://github.com/vaites/php-apache-tika
Farcrysolrpro
⭐
10
FarCry Solr Pro plugin Supports: Solr 3.5, FarCry 7.0+, 6.2+, 6.1.4+, 6.0.19+
Textextractor Ue4
⭐
10
Unreal Engine 4 integraton of Apache Tika to detect and extract metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
Quarkus Tika
⭐
10
Quarkus Tika extension
Silverstripe Textextraction
⭐
9
Text Extraction API for Silverstripe CMS (mostly used with 'fulltextsearch' module)
Typo3 Solr File Indexer
⭐
9
TYPO3 Extension: solr_file_indexer
Contextactions
⭐
9
Collection of Caja utility scripts for MATE desktop
Gravity
⭐
8
An efficient Java substring search library
Cve 2018 11761
⭐
8
Apache Tika Denial of Service Vulnerability (CVE-2018-11761)
Tika
⭐
8
Docker container to provide Apache Tika RESTful API
Scraper Place
⭐
8
Scraper https://www.marches-publics.gouv.fr/
Kafka Connect Document Source
⭐
8
Kafka connector with content extraction to push extracte document contents.
Ckanext Fulltext
⭐
8
Video Recognition
⭐
8
1-100 of 132 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.