Awesome Open Source

Programming Languages

Search results for crawler spider

424 search results found

Crawlertutorial ⭐ 310

爬蟲極簡教學（fetch, parse, search, multiprocessing, API）- PTT 為例

Node Readability ⭐ 302

Scrape/Crawl article from any site automatically. Make any web page readable, no matter Chinese or English.

Webpalm ⭐ 295

WebPalm is a powerful command-line tool for website mapping and web scraping. With its recursive approach, it can generate a complete tree of all webpages and their links on a website. It can also extract data from the body of each page using regular expressions, making it an ideal tool for web scraping and data extraction.

Crawler ⭐ 288

K 哥爬虫代码分享，JS 逆向，爬虫进阶。关注公众号：K哥爬虫

Magic_google ⭐ 287

Google search results crawler, get google search results that you need

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

Ppspider ⭐ 278

web spider built by puppeteer, support task-queue and task-scheduling by decorators，support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架，提供灵活的任务队列管理调度方案，提供便捷的数据保存方案（ne

A web crawler for Go

Zhihu Api ⭐ 267

Unofficial API for zhihu.

一个灵活、友好的爬虫框架

Laravel Crawler Detect ⭐ 262

A Laravel wrapper for CrawlerDetect - the web crawler detection library

Hotel Review Analysis ⭐ 254

Sentiment analysis and aspect classification for hotel reviews using machine learning models with MonkeyLearn.

Lagoujob ⭐ 250

Job data mining repo for lagou.com

Awesome Crawler Cn ⭐ 243

互联网爬虫，蜘蛛，数据采集器，网页解析器的汇总，因新技术不断发展，新框架层出不穷，此文会不断更新..

JavaScript + BeautifulSoup = JSSoup

Scrapy Jsonrpc ⭐ 238

Scrapy extension to control spiders using JSON-RPC

Go Movies ⭐ 232

golang spider Crawler 爬虫电影

Scrapy Deltafetch ⭐ 232

Scrapy spider middleware to ignore requests to pages containing items seen in previous crawls

Nudecrawler ⭐ 231

Crawl telegra.ph searching for nudes!

Infinitycrawler ⭐ 221

A simple but powerful web crawler library for .NET

Wayback Machine Scraper ⭐ 219

A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

Zhihuspider ⭐ 215

多线程知乎用户爬虫，基于python3

Fast Lianjia Crawler ⭐ 214

直接通过链家 API 抓取数据的极速爬虫，宇宙最快~~ 🚀

Finance_news_analysis ⭐ 206

金融新闻数据挖掘分析

Webvideobot ⭐ 200

golang light-weight image crawler

Portia Dashboard ⭐ 190

portia-dashboard is a visual web crawler based on scrapinghub/portia

A fast tool to fetch URLs from HTML attributes by crawl-in.

Crawlab Lite ⭐ 184

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

Chromium_for_spider ⭐ 182

dynamic crawler for web vulnerability scanner

Digger is a powerful and flexible web crawler implemented by pure golang

Zhihu Crawler People ⭐ 179

A simple distributed crawler for zhihu && data analysis

A loose framework for crawling and scraping web sites.

Spider_reverse ⭐ 178

爬虫逆向案例，已完成：震坤行 | 网易易盾 | 微信小程序反编译逆向(百达星系) | 同花顺 | rpc解密 | 加速乐 | 极验滑块验证码 | 巨量算数 | Boss直聘 | 企查查 | 中国五矿 | qq音乐 | 产业政策大数据平台 | 企知道 | 雪球网(acw_sc__v2) | 1688 | 七麦数据 | whggzy | 企名科技 | mohurd | 艺恩数据 | 欧科云链

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

🥄 A package for building specific Proxy Pool for different Sites.

Voight Kampff ⭐ 171

Voight-Kampff is a Ruby gem that detects bots, spiders, crawlers and replicants

Fun_crawler ⭐ 170

Crawl some picture for fun

Douban_crawler ⭐ 169

备份豆瓣计划

Qqmusicspider ⭐ 168

基于Scrapy的QQ音乐爬虫(QQ Music Spider)，爬取歌曲信息、歌词、精彩评论等，并且分享了QQ音乐中排名前6400名的内地和港台歌手

PHP Link Checker

Leetcode Spider ⭐ 166

用 node.js 爬你自己的 leetcode 解题源码

Goribot ⭐ 162

[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。

Ncov2019_data_crawler ⭐ 158

疫情数据爬虫，2019新型冠状病毒数据仓库，轨迹数据，同乘数据，报道

Allnewsspider ⭐ 153

澎湃新闻，新浪新闻，腾讯新闻，搜狐新闻，新闻联播，泰晤士报，纽约时报，BBCNews，旨在爬取所有新

Aliexpress Product Scraper ⭐ 152

Get Aliexpress product details as a json response including feedbacks, variants, shipping info, description, images, etc.,

Jlitespider ⭐ 151

A lite distributed Java spider framework :-)

Scrapy_demo ⭐ 150

all kinds of scrapy demo

Weibosearch ⭐ 144

A distributed Sina Weibo Search spider base on Scrapy and Redis.

Js Reverse ⭐ 144

Scrapy Training ⭐ 141

Scrapy Training companion code

Domain names collector - Crawl websites and collect domain names along with their availability status.

Javbus Api ⭐ 136

一个自我托管的 JavBus API 服务

Islandbeauty ⭐ 131

A spider/crawler edit by Node.js to download torrents of Adult videos.

MM131网站图片爬取 🚨

Not Your Average Web Crawler ⭐ 130

A web crawler (for bug hunting) that gathers more than you can imagine.

Deep Deep ⭐ 130

Adaptive crawler which uses Reinforcement Learning methods

Zhihu Spider ⭐ 128

一个获取知乎用户主页信息的多线程Python爬虫程序。

Yispider ⭐ 127

一款分布式爬虫平台，帮助你更好的管理和开发爬虫。内置一套爬虫定义规则（模版），可使用模版快速定义爬虫，也可当作框架手动开发爬虫。(兴趣使然的项目，用

Ok_ip_proxy_pool ⭐ 123

🍿爬虫代理IP池(proxy pool) python🍟一个还ok的IP代理池

Dyer is designed for reliable, flexible and fast web crawling, providing some high-level, comprehensive features without compromising speed.

🌟:octocat: powered by python3( simple learning of spider) 百度文库；网易云歌曲；豆瓣电影； GitHub；京东； QQ空间；天气； vip解析助手； TED文本内容； wifi破解脚本；必应图片设置为桌面等爬取

Pkulaw_spider ⭐ 109

爬取北大法宝网http://www.pkulaw.cn/Case/

Phpcreeper ⭐ 108

A new generation of multi-process asynchronous event-driven spider engine based on Workerman. http://www.phpcreeper.com

Cross Platform C# Web crawler framework, headless browser, parallel crawler. Please star this project! +1.

Crawler_detect ⭐ 106

Ruby gem to detect bots and crawlers via the user agent

Instagram Scraper ⭐ 105

Some scrapy spiders useful to crawl instagram posts using public APIS (No TOKEN)

Jkcrawler ⭐ 100

使用 Scrapy 写成的 JK 爬虫，图片源自哔哩哔哩、Tumblr、Instagram，以及微博、Twitter

Bilibili_member_crawler ⭐ 98

B站用户爬虫好耶~是爬虫

Gopa Abandoned ⭐ 97

GOPA, a spider written in Go.（NOTE: this project moved to https://github.com/infinitbyte/gopa ）

Blinkist M4a Downloader ⭐ 97

Grabs all of the audio files from all of the Blinkist books

Ant_nest ⭐ 93

Simple, clear and fast Web Crawler framework build on python3.6+, powered by asyncio.

Douban Movie ⭐ 91

Golang爬虫爬取豆瓣电影Top250

Es6 Crawler Detect ⭐ 88

🕷️ This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.

Scrapy_ipproxypool ⭐ 86

免费 IP 代理池。Scrapy 爬虫框架插件

Crawler is a bare-bones spider designed to quickly and effectively build an index of all files and pages on a given Web site as well as the link relationship (both incoming and outgoing) between each page.

Aliexscrape ⭐ 84

Get Aliexpress product details in JSON

Arachnid ⭐ 80

Powerful web scraping framework for Crystal

使用asyncio和aiohttp开发的轻量级异步协程web爬虫框架

Fetchurls ⭐ 79

A bash script to spider a site, follow links, and fetch urls (with built-in filtering) into a generated text file.

Zhihu_spider ⭐ 79

large-scale user information crawler of zhihu

Weibospider ⭐ 79

微博爬虫，一个基于Scrapy框架的轻量微博爬虫，Sina Weibo Spider

Awesome Python Primer ⭐ 78

自学入门 Python 优质中文资源索引，包含书籍 / 文档 / 视频，适用于爬虫 / Web / 数据分析 / 机器学习方向

Couch Crawler ⭐ 77

A search engine built on top of couchdb-lucene

Memex Program Index ⭐ 76

A list of memex-related tools and their repository URLs

Dictionary_crawler ⭐ 76

This is a python code based on Scrapy package to crawl famous online dictionaries like Oxford, Longman, Cambridge, Webster, and Collins to make a dataset

Taiwan News Crawlers ⭐ 75

Scrapy-based Crawlers for news of Taiwan

Get itbooks from ebooks's website for free,such as allitebooks,digilibraries,etc

Inventus ⭐ 74

Inventus is a spider designed to find subdomains of a specific domain by crawling it and any subdomains it discovers.

Lrabbit_scrapy ⭐ 73

a quick start python mutil thread crawl

Scrapy_helper ⭐ 73

Dynamic configurable crawl (动态可配置化爬虫)

Scrapingspider ⭐ 73

业余时间开发的，支持多线程，支持关键字过滤，支持正文内容智能识别的爬虫。

Ctrip_spider ⭐ 73

Scrape Learning (ctrip)

Simpyder ⭐ 73

超高速异步协程Python爬虫

Wget Lua ⭐ 72

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

Puppeteer Walker ⭐ 72

a puppeteer walker 🕷 🕸

Dcard Spider ⭐ 71

A spider on Dcard. Strong and speedy.

Gospider ⭐ 70

⚡ Light weight Golang spider framework | 轻量的 Golang 爬虫框架

爬虫管理系统，支持集群，弹性伸缩。支持运行feapder、scrapy、selenium、playw

Python Testing Crawler ⭐ 69

A crawler for automated functional testing of a web application

Related Searches

Python Crawler (4,528)

Python Spider (2,155)

Javascript Crawler (1,142)

Spider Scrapy (982)

Scraper Crawler (896)

Java Crawler (593)

Crawler Scrapy (578)

Golang Crawler (509)

101-200 of 424 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.