Project Name	Stars	Most Recent Commit	Open Issues	License	Language
Strumentalia Seealsology	76	6 months ago	7	other	JavaScript
see also section scraping on custom levels of depth
Wikipedia Crawler	57	9 years ago	1	mit	Python
This is a program to crawl entire 'Wikipedia' and extract & store information from the pages as required.
Tech Seo Crawler	54	2 years ago	16	mit	Python
Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.
Wikireverse	39	6 years ago	2	mit	Java
Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.
Wikiracer	32	7 years ago		mit	Go
Finds the shortest path between two Wikipedia articles, using only Wikipedia links.
Word2vec Flask Api	26	7 years ago		mit	Python
Flask API for Word2vec
Wikipedia Crawler	25	3 years ago		gpl-3.0	Python
Extracts plain-text from Wikipedia articles, ideal to perform linguistic analysis
Crawling For Nomore404	19	a year ago	8		Python

Similarweb	10	6 years ago	2		Python
similarweb crawler
Wikipedia Title Dataset	10	7 years ago			Python
Dataset used for Learning Character-level Compositionality with Visual Features (ACL2017)

Alternatives To Similarweb

Select To Compare

Strumentalia Seealsology ⭐ 76

most recent commit 6 months ago

Wikipedia Crawler ⭐ 57

This is a program to crawl entire 'Wikipedia' and extract & store information from the pages as required.

most recent commit 9 years ago

Tech Seo Crawler ⭐ 54

Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.

most recent commit 2 years ago

Wikireverse ⭐ 39

Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.

most recent commit 6 years ago

Wikiracer ⭐ 32

Finds the shortest path between two Wikipedia articles, using only Wikipedia links.

most recent commit 7 years ago

Word2vec Flask Api ⭐ 26

Flask API for Word2vec

most recent commit 7 years ago

Wikipedia Crawler ⭐ 25

Extracts plain-text from Wikipedia articles, ideal to perform linguistic analysis

most recent commit 3 years ago

Crawling For Nomore404 ⭐ 19

most recent commit a year ago

Similarweb ⭐ 10

similarweb crawler

most recent commit 6 years ago

Wikipedia Title Dataset ⭐ 10

Dataset used for Learning Character-level Compositionality with Visual Features (ACL2017)

most recent commit 7 years ago

Suggest An Alternative To similarweb

Alternative Project Comparisons

Similarweb vs Strumentalia Seealsology

Similarweb vs Wikipedia Crawler

Similarweb vs Tech Seo Crawler

Similarweb vs Wikireverse

Similarweb vs Wikiracer

Similarweb vs Word2vec Flask Api

Similarweb vs Wikipedia Crawler

Similarweb vs Crawling For Nomore404

Similarweb vs Wikipedia Title Dataset

Popular Crawler Projects

Scrapy ⭐ 49,918

Scrapy, a fast high-level web crawling & scraping framework for Python.

dependent packages 445total releases 96latest release September 18, 2023most recent commit 4 months ago

Lux ⭐ 24,752

👾 Fast and simple video download library and CLI tool written in Go

dependent packages 8total releases 40latest release November 06, 2023most recent commit a month ago

Colly ⭐ 21,902

Elegant Scraper and Crawler Framework for Golang

dependent packages 328total releases 22latest release March 08, 2022most recent commit 2 months ago

Easyspider ⭐ 20,149

A visual no-code/code-free web crawler/spider易采集：一个可视化浏览器自动化测试/数据采集/爬虫软件，可以无代码图形化

most recent commit a month ago

Proxy_pool ⭐ 19,442

Python ProxyPool for web spider

most recent commit 4 months ago

Popular Wikipedia Projects

Design Patterns For Humans ⭐ 42,678

An ultra-simplified explanation to design patterns

most recent commit 4 months ago

Hacker Laws ⭐ 24,993

💻📖 Laws, Theories, Principles and Patterns that developers will find useful. #hackerlaws

most recent commit 9 months ago

Javascript Design Patterns For Humans ⭐ 4,191

An ultra-simplified explanation of design patterns implemented in javascript

most recent commit 4 months ago

Mediawiki ⭐ 3,827

🌻 The collaborative editing software that runs Wikipedia. Mirror from https://gerrit.wikimedia.org/g/mediawi See https://mediawiki.org/wiki/Developer_access for contributing.

dependent packages 4total releases 167latest release September 29, 2023most recent commit 4 months ago

Wikiextractor ⭐ 3,440

A tool for extracting plain text from Wikipedia dumps

dependent packages 3total releases 4latest release October 14, 2021most recent commit 8 months ago

Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories

No Spam. Unsubscribe easily at any time.

Python

Crawler

Wikipedia

Geography

Privacy | About | Terms | Follow Us On Twitter

Downloads, Dependent Repos, Dependent Packages, Total Releases, Latest Releases data powered by Libraries.io.