Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for java crawler
crawler
x
java
x
186 search results found
Tinkertime
⭐
26
NO LONGER SUPPORTED - The Ultimate KSP Mod Mechanic
Charles
⭐
25
Java web crawling library
Real_time_social_media_mining
⭐
24
DevOps pipeline for Real Time Social/Web Mining
Myhttpclient
⭐
23
爬虫抓取框架,封装HttpClient,Htmlunit,Selenium等工具
Gecco Redis
⭐
23
Gecko crawler supports distributed by redis
Httpproxy
⭐
23
JAVA实现的IP代理池,支持HTTP与HTTPS两种方式
Widow
⭐
23
Distributed, asynchronous web crawler
Teneo
⭐
22
Crawling Framework
⭐
21
Easily crawl news portals or blog sites using Storm Crawler.
Preston
⭐
21
a biodiversity dataset tracker
Cc Dbp
⭐
20
A dataset for knowledge base population research using Common Crawl and DBpedia.
Httrack2warc
⭐
20
Converts HTTrack crawls to WARC files
Eksi
⭐
20
Eksi sözlük crawl,stat , api calismalari
Jspider
⭐
19
A simple, scalable, and highly efficient web crawler framework for Java.
Sitecrawler
⭐
18
This is a Java library which can be used to crawl the content of some of web properties (www.salesforce.com, blogs.salesforce.com for example). It supports dynamic scaling (depending on available machine power (CPU, RAM) and network capacity) out of the box. It also has a Plugin structure, which allows others to write code (plugins) that act on the crawled pages.
Common Crawl Quick Hacks
⭐
18
common crawl quick hack examples
Webtoon Crawler
⭐
17
Let's download webtoons while they are free!
Kairos
⭐
17
Kairos, combines a focused crawler and an information extraction engine, to convert a list of conference websites into a index filled with fields of metadata that correspond to individual papers. Using event date metadata extracted from the conference website, Kairos proactively harvests metadata about the individual papers soon after they are made public. We use a Maximum Entropy classifier to classify uniform resource locators (URLs) as scientific conference websites and use Conditional Random
Crawler
⭐
17
A simple and flexible web crawler framework for java.
Douyin Crawler
⭐
16
抖音爬虫. 通过手机代理爬取用户的作品和用户的喜欢
Cc Bill Tracker
⭐
16
These map reduce functions use Common Crawl data to look at the spread of congressional legislation on the internet
Webhunger
⭐
15
WebHunger is an extensible, full-scale crawler framework that supports distributed crawling, aiming at getting users focused on web page parsing without concerning for the crawling process.
Googleplay Web Crawler
⭐
15
Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive
Paperwebcrawler
⭐
15
IEEE XPLORE等文献网站的爬虫工具/Crawler for Paper Website like IEEE XPLORE
Leetcodecrawler
⭐
15
A tool for crawling the description and accepted submitted code of problems on the LeetCode and LeetCode-Cn website.
Tentacle
⭐
15
a opensource spider with java
Groundhog
⭐
14
A framework for crawling GitHub projects and raw data and to extract metrics from them
Twitter Crawler
⭐
14
REST and STREAMING crawlers of Twitter (java)
Nutch In Java
⭐
14
How to use Apache Nutch without command line
Ido
⭐
14
Brings 1.13 and 1.14 movement like swimming and crawling into 1.12.2! Based off https://github.com/pentantan/BetterSwiming
Befree
⭐
13
大概就是爬取YouTube之类一些墙外的一些热门内容到一些大陆能访问的网站
Serritor
⭐
13
Serritor is an open source web crawler framework built upon Selenium and written in Java. It can be used to crawl dynamic web pages that require JavaScript to render data.
Springbootdc
⭐
13
SpringBoot Developer Components
Toolkit
⭐
13
Paper plugin 1.20 - /crash, /crawl, /lunar, /vanish, /sit - client detection
Spotifydiscoverybot
⭐
13
A Java-based bot that automatically crawls for new releases by your followed artists on Spotify. Never miss a release again!
Nio Crawler
⭐
13
Simple Java Crawler using NIO
Robots.txt
⭐
13
🤖 robots.txt as a service. Crawls robots.txt files, downloads and parses them to check rules through an API
Lightshot
⭐
13
Lightshot image grabber
Pubsenti Finder
⭐
12
微博评论情感分析,爬虫,文本分类,Web。
Venom Tutorial
⭐
12
A tutorial based on your preferred open source focused crawler for the deep web.
Torfiles
⭐
12
An open-source torrent searching serice
Knowledge Distillation
⭐
12
site crawler for knowledge graph
In One File Manager
⭐
12
Desktop File Manager for Windows
Crawler Framework
⭐
12
分布式爬虫框架,基于webdrvier模拟用户请求,kafka消息传递,分布式网页存储使用hbase ip服务和号码验证服务等, proxy page使用H5和we版进行接入
Pttcrawler
⭐
12
PTTCrawler is a powerful ptt crawler written by Java
Fastcrawler
⭐
12
一个快速,简单,基于多线程的网络爬虫框架
Chatper15_net_io_img_crawler
⭐
11
第15章 Kotlin 文件IO操作与多线程
Springbootcrawlerdb
⭐
11
A Spring Boot web crawler setup/example with crawler4j, Jsoup, Spring Data JPA (Hibernate), PostgresDB.
Ghost Login
⭐
11
Specifically designed to solve the web crawler when collecting Internet web data who need to login the web-site by useing some Simulated ways.
Warc Mapreduce
⭐
11
warc and wet support for Hadoop's mapreduce api
Supermonkey
⭐
11
A crawler for automated Android UI testing.
Gspider
⭐
11
a groory spider .
Confluence Static Cache
⭐
11
Generates static file cache for Confluence
Spring Boot Integration Crawler Sample
⭐
10
a spring boot + spring integration crawler sample.
Apk Crawler
⭐
10
APK-Crawler is a tool for collecting apk files.
Dwtc Extractor
⭐
10
Extraction code used to create the Dresden Web Table Corpus
Instagram Crawler
⭐
10
Kaboomzhihu
⭐
9
知乎批量关注,批量取消关注
Hubs
⭐
9
Hubs is a content crawler application on Android. It provides apis to crawl web content and display data.
Nutch Crawler
⭐
9
Apache Nutch fork tunned for web services and data discovery.
Bilibili Plugin
⭐
9
哔哩哔哩插件姬
Codechef Crawler
⭐
9
[deprecated] A web crawler that can download all successful submissions of a user in Codechef. Just provide the username.
Webmuncher
⭐
9
A web scrapper/crawler in Java
Dhtcrawler
⭐
9
Crawl torrent in Java
Quora Loader
⭐
9
A realtime read-only locator and extraction library for Quora questions and answers.
Weather Mrs
⭐
9
天气爬虫采集,kafka实时分发,flume 收集数据导入到 Hbase, 再由 Hive 与 Hbase 建立映射,Superset 分析和展示数据。
Nutch Indexer Discovery
⭐
9
Watson Discovery Service indexing plugin for Apache Nutch
Recodoc2
⭐
9
Analysis Platform for Developer Learning Resources
Atom Nuke
⭐
8
Agrotagger
⭐
8
This application allows to index web documents, creating RDF triples that link a web URL to some URIs of a SKOS thesaurus
Leethub
⭐
8
An Android Client for LeetCode
Zhihu Crawler
⭐
8
A simple ZhiHu Crawler using WebMagic
Dwtc Tools
⭐
8
Dresden Web Table Corpus Java library
Chronicrawl
⭐
8
Experimental continouous web crawler for web archiving
Adaptive Crawler
⭐
8
Twitter Adaptive Crawler
Jirlbot
⭐
8
Java implementation of the Internet Research Lab Web Crawler (IRLbot) as presented by Hsin-Tsang Lee, Derek Leonard, Xiaoming Wang, and Dmitri Loguinov in their paper "IRLbot: Scaling to 6 Billion Pages and Beyond"
Docker Codesearch
⭐
8
Code Search on Fess
Geeco
⭐
8
Gecco是一款用java语言开发的轻量化的易用的网络爬虫。Gecco整合了jsoup、httpcl request。如果你喜欢这款爬虫框架请star 或者 fork!
Bamboo
⭐
8
Web archive collection manager
Ukwa Heritrix
⭐
8
The UKWA Heritrix3 custom modules and Docker builder.
Renren Analysis
⭐
8
a project which is used for crawler and data visualization on renren.com
Leechcrawler
⭐
8
Incremental crawling capabilities for Apache Tika. Crawl content out of e.g. file systems, http(s) sources (webcrawling) imap(s) servers or your own arbitrary data sources. LeechCrawler offers additional Tika parsers providing these crawling capabilities.
Baidu Search Result Crawler
⭐
8
一个百度搜索结果内容获取爬虫。
Questionrecommendation
⭐
8
Programming Questions Recommendation System (牛客网试题推荐系统)
Rxcrawler
⭐
8
a java crawler base on rx-java
Eksiseyler
⭐
8
Sample MVP project uses jsoup-web-crawl like API
Vertx Crawler
⭐
7
Web Crawler based on Vert.x
Ryanaid
⭐
7
ryanair crawler based on webkit
Httrack2arc
⭐
7
HTTrack2Arc is a tool that converts crawls made by HTTrack to Internet Archive ARC files.
Cis555 Project
⭐
7
The final project for CIS555 at the University of Pennsylvania.
Fns_front
⭐
7
🕶👨🎤👩🎤Fashion Network Service.
Spider
⭐
7
A sample spider application.
Flash
⭐
7
A simple Crawler-based search engine that demonstrates the main features of a search engine (web crawling, indexing and ranking) and the interaction between them using Java and a Web Interface.
Cloud Computing Search Engine
⭐
7
A cloud-based web search engine computing Hadoop MapReduce on Amazon EC2 consisting of crawler, indexer, PageRank.
Venom Examples
⭐
7
A basic web crawler example
Java Learn
⭐
7
java-learn
Fess Ds Atlassian
⭐
7
DataStore Crawler for JIRA/Confluence
Movie Showtimes
⭐
7
Web Service & Android Application to look up Vietnam movie showtimes
Crawlerzwei
⭐
7
CZwei Crawler for www.2ch.net
Gospy
⭐
7
🕷 a flexible web crawler framework
Related Searches
Java Spring (21,350)
Java Plugin (12,452)
Java Spring Boot (11,982)
Java Video Game (8,093)
Java Gradle (8,072)
Java Docker (6,180)
Java Database (6,015)
Java Mysql (5,954)
Java Sdk (5,864)
Javascript Java (5,468)
101-186 of 186 search results
< Previous
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.