Awesome Open Source

Programming Languages

Search results for crawler

3,834 search results found

Scrapely ⭐ 1,668

A pure-python HTML screen-scraping library

Go_spider ⭐ 1,629

[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.

React Snapshot ⭐ 1,619

A zero-configuration static pre-renderer for React apps

Anemone ⭐ 1,615

Anemone web-spider framework

Python Crawler ⭐ 1,576

从头开始系统化的学习如何写Python爬虫。 Python版本 3.6

Static Site Generator Webpack Plugin ⭐ 1,538

Minimal, unopinionated static site generator powered by webpack

Open Source Search Engine ⭐ 1,504

Nov 20 2017 -- A distributed open source search engine and spider/crawler written in C/C++ for Linux on Intel/AMD. From gigablast dot com, which has binaries for download. See the README.md file at the very bottom of this page for instructions.

Autocrawler ⭐ 1,454

Google, Naver multiprocess image web crawler (Selenium)

Node Rate Limiter ⭐ 1,444

A generic rate limiter for node.js. Useful for API clients, web crawling, or other tasks that need to be throttled

Bilix ⭐ 1,433

⚡️Lightning-fast async download tool for bilibili and more | 快如闪电的异步下载工具，支持bilibili及更多

Xsscrapy ⭐ 1,398

XSS spider - 66/66 wavsep XSS detected

Diskover Community ⭐ 1,391

Diskover Community Edition - Open source file indexer, file search engine and data management and analytics powered by Elasticsearch

Catvodtvspider ⭐ 1,365

Lightcrawler ⭐ 1,354

Crawl a website and run it through Google lighthouse

Weixin Game Helper ⭐ 1,352

微信小游戏辅助合集（加减大师、包你懂我、大家来找茬腾讯版、头脑王者、好友画我、悦动音符、我最在行、星

Swiftlinkpreview ⭐ 1,347

It makes a preview from an URL, grabbing all the information such as title, relevant texts and images.

Php Spider ⭐ 1,316

A configurable and extensible PHP web spider

Openwpm ⭐ 1,307

A web privacy measurement framework

Ast Hook For Js Re ⭐ 1,303

浏览器内存漫游解决方案（探索中...）

Wombat ⭐ 1,297

Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.

Article Extractor ⭐ 1,297

To extract main article from given URL with Node.js

Jd Autobuy ⭐ 1,292

Python爬虫，京东自动登录，在线抢购商品

The complete web scraping toolkit for PHP.

Sotawhat ⭐ 1,280

Returns latest research results by crawling arxiv papers and summarizing abstracts. Helps you stay afloat with so many new papers everyday.

Fscrawler ⭐ 1,279

Elasticsearch File System Crawler (FS Crawler)

Catvodtvspider ⭐ 1,270

Lxspider ⭐ 1,267

爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷

Grab Site ⭐ 1,254

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

Frontera ⭐ 1,244

A scalable frontier for web crawlers

Cariddi ⭐ 1,228

Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more

Beanbun ⭐ 1,195

Beanbun 是用 PHP 编写的多进程网络爬虫框架，具有良好的开放性、高可扩展性，基于 Workerman。

⚡ The fastest directory crawler & globbing library for NodeJS. Crawls 1m files in < 1s

Lightnovel Crawler ⭐ 1,185

Generate and download e-books from online sources.

Scrapy Cluster ⭐ 1,137

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.

Sqliv ⭐ 1,111

massive SQL injection vulnerability scanner

Tumblr Crawler ⭐ 1,105

Easily download all the photos/videos from tumblr blogs. 下载指定的 Tumblr 博客中的图片，视频

Angular Seo ⭐ 1,082

SEO for AngularJS apps made easy.

Newpipeextractor ⭐ 1,070

NewPipe's core library for extracting data from streaming sites

Parliament Scraper ⭐ 1,049

Public Data Scraper for Parliament Data for the EU and other Parliaments

Crawler User Agents ⭐ 1,045

Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome ⭐

Appcrawler ⭐ 1,023

基于appium的app自动遍历工具

Holiday Cn ⭐ 1,018

📅🇨🇳中国法定节假日数据自动每日抓取国务院公告

Instagram Profilecrawl ⭐ 1,001

📝 quickly crawl the information (e.g. followers, tags etc...) of an instagram profile.

🍻 bilibili video (including bangumi) and danmaku downloader | B站视频（含番剧）、弹幕下载器

磁力網站U3C3介紹以及域名更新

A tool for pixiv.net. 人人可用的P站爬虫

Dungeonfs ⭐ 966

A FUSE filesystem and dungeon crawling adventure game engine

Python Seo Analyzer ⭐ 956

An SEO tool that analyzes the structure of a site, crawls the site, count words in the body of the site and warns of any technical SEO issues.

Linkinator ⭐ 955

🐿 Scurry around your site and find all those broken links.

Querido Diario ⭐ 944

📰 Diários oficiais brasileiros acessíveis a todos | 📰 Brazilian government gazettes, accessible to everyone.

Fess is very powerful and easily deployable Enterprise Search Server.

A tool for pixiv.net. 人人可用的P站爬虫

Mlscraper ⭐ 935

🤖 Scrape data from HTML websites automatically by just providing examples

Tumblthree ⭐ 922

A Tumblr Blog Backup Application

Instagram Crawler ⭐ 922

Get Instagram posts/profile/hashtag data without using Instagram API

Python website crawler.

Prerender Node ⭐ 916

Express middleware for prerendering javascript-rendered pages on the fly for SEO

Crawlergo_x_xray ⭐ 915

360/0Kee-Team/crawlergo动态爬虫结合长亭XRAY扫描器的被动扫描功能

Goclone ⭐ 907

Website Cloner - Utilizes powerful Go routines to clone websites to your computer within seconds.

Bhban_rpa ⭐ 903

<6개월 치 업무를 하루 만에 끝내는 업무 자동화(생능출판사, 2020)>의 예제 코드입니다. 파이썬을 한 번도 배워본 적 없는 분들을 위한 예제이며, 엑셀부터 디자인, 매크로, 크롤링까지 업무 자동화와 관련된 다양한 분야 예제가 제공됩니다.

Xsrfprobe ⭐ 897

The Prime Cross Site Request Forgery (CSRF) Audit and Exploitation Toolkit.

Crawler ⭐ 897

A high performance web crawler / scraper in Elixir.

Awesome Datahoarding ⭐ 892

List of data-hoarding related tools

Scrawler ⭐ 882

🏳️‍🌈 Media downloader from any sites, including Twitter, Reddit, Instagram, Threads, Facebook, OnlyFans, YouTube, Pinterest, PornHub, XHamster, XVIDEOS, ThisVid etc.

Kimuraframework ⭐ 874

Kimurai is a modern web scraping framework written in Ruby which works out of box with Headless Chromium/Firefox, PhantomJS, or simple HTTP requests and allows to scrape and interact with JavaScript rendered websites

Baiduspider ⭐ 872

BaiduSpider，一个爬取百度搜索结果的爬虫，目前支持百度网页搜索，百度图片搜索，百度知道搜索

Angrysearch ⭐ 866

Linux file search, instant results as you type

👧 美女写真套图爬虫（二）

Zhihu Crawler ⭐ 843

zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬

Scrapy Selenium ⭐ 842

Scrapy middleware to handle javascript pages using selenium

Storm Crawler ⭐ 834

A scalable, mature and versatile web crawler based on Apache Storm

Scrapyrt ⭐ 793

HTTP API for Scrapy spiders

Icrawler ⭐ 792

A multi-thread crawler framework with many builtin image crawlers provided.

Crawly, a high-level web crawling & scraping framework for Elixir.

Ipfs Search ⭐ 779

Search engine for the Interplanetary Filesystem.

Seccrawler ⭐ 777

一个方便安全研究人员获取每日安全日报的爬虫和推送程序，目前爬取范围包括先知社区、安全客、Seebug Paper、跳跳糖、奇安信攻防社区、棱角社区以及绿盟、腾讯玄武、天融信、360等实验室博客，持续更新

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Baiduimagespider ⭐ 774

一个超级轻量的百度图片爬虫

DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.

Computerstudent ⭐ 764

计算机专业系统性学习资料（python,c,c++,计算机组成，计算机网络，编译原理，电路，谷歌插件

Creeper ⭐ 762

🐾 Creeper - The Next Generation Crawler Framework (Go)

Nginx Badbot Blocker ⭐ 759

Block bad, possibly even malicious web crawlers (automated bots) using Nginx

Fetchbot ⭐ 758

A simple and flexible web crawler that follows the robots.txt policies and crawl delays.

Spider_collection ⭐ 754

python爬虫，目前库存：网易云音乐歌曲爬取，B站视频爬取，知乎问答爬取，壁纸爬取，xvideos

[Unmaintained] A simple and clean video/music/image downloader 👾

X Crawl ⭐ 718

x-crawl is a flexible Node.js multifunctional crawler library. Flexible usage and numerous functions can help you quickly, safely, and stably crawl pages, interfaces, and files. ---------------- x-crawl 是一个灵活的 Node.js 多功能爬虫库。灵活的使用方式和众多的功能可以帮助您快速、安全、稳定地爬取页面、接口以及文件。

Skrape.it ⭐ 714

A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.

Device_detector ⭐ 711

DeviceDetector is a precise and fast user agent parser and device detector written in Ruby

Packtpub Crawler ⭐ 701

Download your daily free Packt Publishing eBook https://www.packtpub.com/packt/offers/free-learnin

Bookcorpus ⭐ 698

Crawl BookCorpus

Tweetscraper ⭐ 698

TweetScraper is a simple crawler/spider for Twitter Search without using API

Listed Company News Crawl And Text Analysis ⭐ 689

从新浪财经、每经网、金融界、中国证券网、证券时报网上，爬取上市公司（个股）的历史新闻文本数据进行文本

直接連線登入的 PTT library，支援 PTT, PTT2

Catgate ⭐ 681

CatGate is a small crawler framework based on Chrome extension . CatGate是一个基于浏览器插件的数据抓取工具。做成浏览器插件无需模拟登入，能最真实的模仿用户行为

Go Dork ⭐ 677

The fastest dork scanner written in Go.

Staticgen ⭐ 668

Static website generator that lets you use HTTP servers and frameworks you already know

Domain_hunter ⭐ 658

A Burp Suite Extension that try to find all sub-domain, similar-domain and related-domain of an organization automatically! 基于流量自动收集整个企业或组织的子域名、相似域名、相关域名的burp插件

One Python ⭐ 655

We don't need a lot of libraries. We just need the best ones. | Unofficial recommended first choice.

Word2vec Graph ⭐ 650

Exploring word2vec embeddings as a graph of nearest neighbors

Xxl Crawler ⭐ 650

A distributed web crawler framework.（分布式爬虫框架XXL-CRAWLER）

Related Searches

Python Crawler (4,545)

Javascript Crawler (1,142)

Crawler Scrapy (988)

Scraper Crawler (896)

Java Crawler (807)

Crawler Spider (709)

101-200 of 3,834 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.