Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for crawler
crawler
x
3,834 search results found
Scrapy
⭐
49,918
Scrapy, a fast high-level web crawling & scraping framework for Python.
Lux
⭐
24,752
👾 Fast and simple video download library and CLI tool written in Go
Colly
⭐
21,902
Elegant Scraper and Crawler Framework for Golang
Easyspider
⭐
20,149
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化
Proxy_pool
⭐
19,442
Python ProxyPool for web spider
Pyspider
⭐
15,943
A Powerful Spider(Web Crawler) System in Python.
Newspaper
⭐
13,147
News, full-text, and article metadata extraction in Python 3. Advanced docs:
Examples Of Web Crawlers
⭐
13,142
一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等 interesting examples of python crawlers that are friendly to beginners. )
Tushare
⭐
12,165
TuShare is a utility for crawling historical data of China stocks
Crawlee
⭐
12,106
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
Webmagic
⭐
11,080
A scalable web crawler framework for Java.
Crawlab
⭐
10,521
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Photon
⭐
10,244
Incredibly fast crawler designed for OSINT.
Python
⭐
9,097
Python脚本。模拟登录知乎, 爬虫,操作excel,微信公众号,远程开机
Avbook
⭐
8,777
AV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
Spider Flow
⭐
8,075
新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。
Katana
⭐
7,995
A next-generation crawling and spidering framework.
Infospider
⭐
6,856
INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透
Node Crawler
⭐
6,610
Web Crawler/Spider for NodeJS + server-side jQuery ;-)
Awesome Web Scraping
⭐
6,060
List of libraries, tools and APIs for web scraping and data processing.
Awesome Crawler
⭐
5,859
A collection of awesome web crawler,spider in different languages
Wechatsogou
⭐
5,777
基于搜狗微信搜索的微信公众号爬虫接口
Ferret
⭐
5,540
Declarative web scraping
Scrapy Redis
⭐
5,438
Redis-based components for Scrapy.
Haipproxy
⭐
5,384
💖 High available distributed ip proxy pool, powerd by Scrapy and Redis
Autoscraper
⭐
5,159
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Proxypool
⭐
5,154
An Efficient ProxyPool with Getter, Tester and Server
Headless Chrome Crawler
⭐
5,051
Distributed crawler powered by Headless Chrome
Douyin_tiktok_download_api
⭐
4,844
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、T
Rod
⭐
4,505
A Devtools driver for web automation and scraping
Mygptreader
⭐
4,267
A community-driven way to read and chat with AI bots - powered by chatGPT.
Crawler4j
⭐
4,192
Open Source Web Crawler for Java
Hakrawler
⭐
4,120
Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application
Scylla
⭐
3,819
Intelligent proxy pool for Humans™ to extract content from the internet and build your own Large Language Models in this new AI era
Interesting Python
⭐
3,748
有趣的Python爬虫和Python数据分析小项目(Some interesting Python crawlers and data analysis projects)
Dotnetspider
⭐
3,747
DotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework
Ecommercecrawlers
⭐
3,724
实战🐍多种网站、电商数据爬虫🕷。包含🕸:淘宝商品、微信公众号、大众点评、企查查、招聘网站、闲鱼
Wooyun_public
⭐
3,701
This repo is archived. Thanks for wooyun! 乌云公开漏洞、知识库爬虫和搜索 crawl and search for wooyun.org public bug(vulnerability) and drops
Arachni
⭐
3,632
Web Application Security Scanner Framework
Proxybroker
⭐
3,561
Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS 🎭
Toapi
⭐
3,417
Every web site provides APIs.
Novel Plus
⭐
3,358
novel-plus 是一个多端(PC、WAP)阅读 、功能完善的小说 CMS 系统。包括小说推荐、小说检索、小说排行、小说阅读、小说书架、小说评论、小说爬虫、会员中心、作家专区、
Browser Fingerprinting
⭐
3,353
Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️♂️ when scraping the web?
Gau
⭐
3,273
Fetch known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl.
Distribute_crawler
⭐
3,176
使用scrapy,redis, mongodb,graphite实现的一个分布式网络爬虫,底层存储mongodb集群,分布式使用re
Proxypool
⭐
3,173
自动抓取tg频道、订阅地址、公开互联网上的ss、ssr、vmess、trojan节点信息,聚合去重测
Puppeteer Sharp
⭐
3,163
Headless Chrome .NET API
Toapi
⭐
3,153
Every web site provides APIs.
Gerapy
⭐
3,144
Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js
Webcollector
⭐
2,974
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
Weibo Crawler
⭐
2,820
新浪微博爬虫,用python爬取新浪微博数据,并下载微博图片和微博视频
Nutch
⭐
2,742
Apache Nutch is an extensible and scalable web crawler
Red_hawk
⭐
2,689
All in one tool for Information Gathering, Vulnerability Scanning and Crawling. A must have tool for all penetration testers
Work_crawler
⭐
2,671
Download comics novels 小说漫画下载工具 小説漫画のダウンローダ 小說漫畫下載:腾讯漫画 大角虫漫画 有妖气 咪咕 SF漫画 哦漫画 看漫画 漫画柜 汗汗酷漫 動漫伊甸園 快看漫画 微博动漫 733动漫网 大古漫画网 漫画DB 無限動漫 動漫狂 卡推漫画 动漫之家 动漫屋 古风漫画网 36漫画网 亲亲漫画网 乙女漫画 webtoons 咚漫 ニコニコ静画 ComicWalker ヤングエースUP モアイ pixivコミック サイコミ;アルファポリス カクヨム ハーメルン 小説家になろう 起点中文网 八一中文网 顶点小说 落霞小说网 努努书坊 笔趣阁→epub.
Crawlergo
⭐
2,642
A powerful browser crawler for web vulnerability scanners
Hawk
⭐
2,638
visualized crawler & ETL IDE written with C#/WPF
Querylist
⭐
2,598
🕷️ The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
Python3 Spider
⭐
2,582
Python爬虫实战 - 模拟登陆各大网站 包含但不限于:滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝,如果喜欢请start ❤️
Heritrix3
⭐
2,579
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
Googlescraper
⭐
2,540
A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
Crawler_illegal_cases_in_china
⭐
2,467
Collection of China illegal cases about web crawler 本项目用来整理所有中国大陆爬虫开发者涉诉与违规相关的新闻、资料与法律法规。致力于帮助在中国大陆工作的 [AD]中文知识图谱门户
Lianjia Beike Spider
⭐
2,464
链家网和贝壳网房价爬虫,采集北京上海广州深圳等21个中国主要城市的房价数据(小区,二手房,出租房,新 MongoDB,Excel, json存储,支持Python2和3,图表展示数据,注释丰富 ,点星支持,仅供学习参考,请勿用于商业用途,后果自负。
Trafilatura
⭐
2,447
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Dht
⭐
2,424
BitTorrent DHT Protocol && DHT Spider.
Crawler
⭐
2,417
An easy to use, powerful crawler implemented in PHP. Can execute Javascript.
Gecco
⭐
2,403
Easy to use lightweight web crawler(易用的轻量化网络爬虫)
Decryptlogin
⭐
2,375
DecryptLogin: APIs for loginning some websites by using requests.
Owllook
⭐
2,340
owllook-小说搜索引擎
Torbot
⭐
2,338
Dark Web OSINT Tool
Feapder
⭐
2,333
🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder是一款上手简单,功能强大的Python爬虫框架。内置AirSpider、Spider、
Grab
⭐
2,292
Web Scraping Framework
Spring Boot Quick
⭐
2,282
🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如:rabbitmq(延迟队列)、K
Weibo_terminater
⭐
2,265
Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Awesome Puppeteer
⭐
2,245
A curated list of awesome puppeteer resources.
Gospider
⭐
2,190
Gospider - Fast web spider written in Go
Crawl
⭐
2,148
Dungeon Crawl: Stone Soup official repository
Laravel Sitemap
⭐
2,122
Create and generate sitemaps with ease
Google Play Scraper
⭐
2,108
Node.js scraper to get data from Google Play
Gryffin
⭐
2,061
Gryffin is a large scale web security scanning platform.
Gain
⭐
2,029
Web crawling framework based on asyncio.
Dxy Covid 19 Crawler
⭐
2,000
2019新型冠状病毒疫情实时爬虫及API | COVID-19/2019-nCoV Realtime Infection Crawler and API
Abot
⭐
1,991
Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
Gain
⭐
1,972
Web crawling framework based on asyncio.
Rendora
⭐
1,950
dynamic server-side rendering using headless Chrome to effortlessly solve the SEO problem for modern javascript websites
Finalrecon
⭐
1,949
All In One Web Recon
Gocrawl
⭐
1,929
Polite, slim and concurrent web crawler.
Hauberk
⭐
1,920
A web-based roguelike written in Dart.
Seimicrawler
⭐
1,895
一个简单、敏捷、分布式的支持SpringBoot的Java爬虫框架;An agile, distributed crawler framework.
Geziyor
⭐
1,892
Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.
Amemv Crawler
⭐
1,871
🙌Easily download all the videos from TikTok(amemv).下载指定的 抖音(Douyin) 号的视频,抖音爬虫
Xalpha
⭐
1,851
基金投资管理回测引擎
Crawler Detect
⭐
1,842
🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent
News Please
⭐
1,821
news-please - an integrated web crawler and information extractor for news that just works
Ambar
⭐
1,797
🔍 Ambar: Document Search Engine
Vulnx
⭐
1,763
vulnx 🕷️ an intelligent Bot, Shell can achieve automatic injection, and help researchers detect security vulnerabilities CMS system. It can perform a quick CMS security detection, information collection (including sub-domain name, ip address, country information, organizational information and time zone, etc.) and vulnerability scanning.
Domain_analyzer
⭐
1,747
Analyze the security of any domain by finding all the information possible. Made in python.
Ruia
⭐
1,731
Async Python 3.6+ web scraping micro-framework based on asyncio
Dirhunt
⭐
1,693
Find web directories without bruteforce
Pspider
⭐
1,675
简单易用的Python爬虫框架,QQ交流群:597510560
Awesome Web Archiving
⭐
1,669
An Awesome List for getting started with web archiving
Related Searches
Python Crawler (4,545)
Javascript Crawler (1,142)
Crawler Scrapy (988)
Scraper Crawler (896)
Java Crawler (807)
Crawler Spider (709)
1-100 of 3,834 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.