Trafilatura Alternatives

Name: adbar/trafilatura
Brand: adbar/trafilatura
SKU: project/adbar/trafilatura
Rating: 4.94 (2447 reviews)

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Categories > Data Processing > Natural Language Processing

Suggest Alternative

Stars

2,447

Alternatives

License

gpl-3.0

Open Issues

Most Recent Commit

over 2 years ago

Programming Language

Python

Monthly Downloads

Dependent Repos

Dependent Packages

Total Releases

Latest Release

November 29, 2023

Categories

Programming Languages > Python

Machine Learning > Natural Language Processing

Data Processing > Scraper

Data Processing > Corpus

Data Processing > Discovery

Data Processing > Web Crawler

Text Processing > Readability

Machine Learning > Text Mining

Text Processing > Text Extraction

Applications > News Aggregator

Site

Repo

Alternatives To adbar/trafilatura

Project Name	Stars	Repos Using This	Packages Using This	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
adbar/trafilatura	2,447	0	66	over 2 years ago	39	November 29, 2023	66	gpl-3.0	Python
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
gsh199449/spider	907	0	0	almost 8 years ago	0		3	gpl-3.0	Java
A configurable web spider with a easy-to-use web console
currentslab/extractnet	118	0	0	over 2 years ago	9	November 06, 2022	3	mit	HTML
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
lisc-tools/lisc	81	0	0	over 2 years ago	5	October 15, 2023	1	apache-2.0	Python
Literature Scanner: Automated collection & analyses of the scientific literature.
ardauzunoglu/TRScraper	47	0	0	over 5 years ago	0		1	mit	Python
TRScraper, doğal dil işleme uygulamalarında kullanılmak amacıyla geliştirilmiş, Türkçe içerik girilen büyük platformlarda metin madenciliği yapma imkanı sunan bir uygulamadır.
pesoto/Text-Analysis	32	0	0	almost 9 years ago	0		0		Jupyter Notebook
Explaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.
johnbumgarner/newshound	25	0	0	over 3 years ago	1	October 06, 2021	1	mit
This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around the world in over 50 languages.
0x0be/scrapeadvisor	22	0	0	over 3 years ago	0				Python
A user-friendly python-based GUI which provides sentiment analysis of users' reviews toward a specific TripAdvisor facility
0x01h/hepsiburada-review-scraper	20	0	0	almost 7 years ago	0		0	gpl-3.0	Python
Hepsiburada review/comment and rating scraper. Turkish text dataset creator for data science and NLP projects. 📜
akshitvjain/restaurant-finder-featureReviews	19	0	0	about 6 years ago	0		0	mit	Python
Build a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).

Alternatives To adbar/trafilatura

Select To Compare

adbar/trafilatura ⭐ 2,447

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

dependent packages 66 total releases 39 most recent commit over 2 years ago downloads badge

gsh199449/spider ⭐ 907

A configurable web spider with a easy-to-use web console

dependent packages 0 total releases 0 most recent commit almost 8 years ago

currentslab/extractnet ⭐ 118

A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package

dependent packages 0 total releases 9 most recent commit over 2 years ago downloads badge

lisc-tools/lisc ⭐ 81

Literature Scanner: Automated collection & analyses of the scientific literature.

dependent packages 0 total releases 5 most recent commit over 2 years ago downloads badge

ardauzunoglu/TRScraper ⭐ 47

TRScraper, doğal dil işleme uygulamalarında kullanılmak amacıyla geliştirilmiş, Türkçe içerik girilen büyük platformlarda metin madenciliği yapma imkanı sunan bir uygulamadır.

dependent packages 0 total releases 0 most recent commit over 5 years ago

pesoto/Text-Analysis ⭐ 32

Explaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.

dependent packages 0 total releases 0 most recent commit almost 9 years ago

johnbumgarner/newshound ⭐ 25

This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around the world in over 50 languages.

dependent packages 0 total releases 1 most recent commit over 3 years ago downloads badge

0x0be/scrapeadvisor ⭐ 22

A user-friendly python-based GUI which provides sentiment analysis of users' reviews toward a specific TripAdvisor facility

dependent packages 0 total releases 0 most recent commit over 3 years ago

0x01h/hepsiburada-review-scraper ⭐ 20

Hepsiburada review/comment and rating scraper. Turkish text dataset creator for data science and NLP projects. 📜

dependent packages 0 total releases 0 most recent commit almost 7 years ago

akshitvjain/restaurant-finder-featureReviews ⭐ 19

Build a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).

dependent packages 0 total releases 0 most recent commit about 6 years ago

Suggest An Alternative To trafilatura

Alternative Project Comparisons

adbar/trafilatura vs Trafilatura

adbar/trafilatura vs Spider

adbar/trafilatura vs Extractnet

adbar/trafilatura vs Lisc

adbar/trafilatura vs Trscraper

adbar/trafilatura vs Text Analysis

adbar/trafilatura vs Newshound

adbar/trafilatura vs Scrapeadvisor

adbar/trafilatura vs Hepsiburada Review Scraper

adbar/trafilatura vs Restaurant Finder Featurereviews

Popular Web Crawler Projects

scrapy/scrapy⭐ 49,918

Scrapy, a fast high-level web crawling & scraping framework for Python.

huginn/huginn⭐ 40,328

Create agents that monitor and act on your behalf. Your agents are standing by!

assafelovic/gpt-researcher⭐ 26,857

An autonomous agent that conducts deep research on any data using any LLM providers

dgtlmoon/changedetection.io⭐ 13,943

The best and simplest free open source website change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification

apify/crawlee⭐ 11,229

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

Popular Text Mining Projects

keon/awesome-nlp⭐ 15,531

:book: A curated list of resources dedicated to Natural Language Processing (NLP)

deanmalmgren/textract⭐ 3,699

extract text from any document. no muss. no fuss.

jbesomi/texthero⭐ 2,773

Text preprocessing, representation and visualization from zero to hero.

JasonKessler/scattertext⭐ 2,131

Beautiful visualizations of how language differs among document types.

chiphuyen/lazynlp⭐ 1,867

Library to scrape and clean web pages to create massive datasets.

Popular Data Processing Categories

Jupyter Notebook

Dataset

Sql

Validation

Pipeline

Translation

Data Science

Classification

Transaction

Scraper