Leechcrawler

Incremental crawling capabilities for Apache Tika. Crawl content out of e.g. file systems, http(s) sources (webcrawling) imap(s) servers or your own arbitrary data sources. LeechCrawler offers additional Tika parsers providing these crawling capabilities.

Categories > Data Processing > Crawler

Suggest Alternative

Stars

License

bsd-3-clause

Open Issues

Most Recent Commit

2 years ago

Programming Language

Java

Categories

Programming Languages > Java

Data Processing > Crawler

Data Processing > Tika

Site

Repo

Alternatives To Leechcrawler

Project Name	Stars	Packages Using This	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
Fscrawler	1,279	1	3 months ago	5	January 10, 2022	145	apache-2.0	Java
Elasticsearch File System Crawler (FS Crawler)
Sparkler	401		a year ago			55	apache-2.0	Java
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Memex Explorer	106		8 years ago			67	bsd-2-clause	Python
Viewers for statistics and dashboarding of Domain Search Engine data
Harvester	59		7 years ago			3	gpl-3.0	JavaScript
Web crawling and document processing through a usable interface.
Leechcrawler	8		2 years ago			2	bsd-3-clause	Java
Incremental crawling capabilities for Apache Tika. Crawl content out of e.g. file systems, http(s) sources (webcrawling) imap(s) servers or your own arbitrary data sources. LeechCrawler offers additional Tika parsers providing these crawling capabilities.

Alternatives To Leechcrawler

Select To Compare

Fscrawler ⭐ 1,279

Elasticsearch File System Crawler (FS Crawler)

dependent packages 1total releases 5most recent commit 3 months ago

Sparkler ⭐ 401

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

most recent commit a year ago

Memex Explorer ⭐ 106

Viewers for statistics and dashboarding of Domain Search Engine data

most recent commit 8 years ago

Harvester ⭐ 59

Web crawling and document processing through a usable interface.

most recent commit 7 years ago

Leechcrawler ⭐ 8

most recent commit 2 years ago

Suggest An Alternative To leechcrawler

Alternative Project Comparisons

Leechcrawler vs Fscrawler

Leechcrawler vs Sparkler

Leechcrawler vs Memex Explorer

Leechcrawler vs Harvester

Popular Crawler Projects

Scrapy ⭐ 49,918

Scrapy, a fast high-level web crawling & scraping framework for Python.

dependent packages 445total releases 96latest release September 18, 2023most recent commit 3 months ago

Lux ⭐ 24,752

👾 Fast and simple video download library and CLI tool written in Go

dependent packages 8total releases 40latest release November 06, 2023most recent commit 25 days ago

Colly ⭐ 21,902

Elegant Scraper and Crawler Framework for Golang

dependent packages 328total releases 22latest release March 08, 2022most recent commit a month ago

Easyspider ⭐ 20,149

A visual no-code/code-free web crawler/spider易采集：一个可视化浏览器自动化测试/数据采集/爬虫软件，可以无代码图形化

most recent commit 23 days ago

Proxy_pool ⭐ 19,442

Python ProxyPool for web spider

most recent commit 4 months ago

Popular Tika Projects

S3_website ⭐ 2,259

Manage an S3 website: sync, deliver via CloudFront, benefit from advanced S3 website features.

total releases 109latest release October 11, 2017most recent commit a year ago

Tika ⭐ 2,007

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

dependent packages 570total releases 66latest release October 17, 2023most recent commit 3 months ago

Tika Python ⭐ 1,316

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

dependent packages 54total releases 35latest release January 02, 2023most recent commit 9 months ago

Lingua ⭐ 622

The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

dependent packages 3total releases 17latest release August 02, 2022most recent commit 5 months ago

Datashare ⭐ 519

A self-hosted search engine for documents.

total releases 135latest release November 21, 2023most recent commit 3 months ago

Popular Data Processing Categories