Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for java crawler
crawler
x
java
x
186 search results found
Tinkertime
⭐
26
NO LONGER SUPPORTED - The Ultimate KSP Mod Mechanic
Charles
⭐
25
Java web crawling library
Real_time_social_media_mining
⭐
24
DevOps pipeline for Real Time Social/Web Mining
Myhttpclient
⭐
23
爬虫抓取框架,封装HttpClient,Htmlunit,Selenium等工具
Httpproxy
⭐
23
JAVA实现的IP代理池,支持HTTP与HTTPS两种方式
Widow
⭐
23
Distributed, asynchronous web crawler
Gecco Redis
⭐
23
Gecko crawler supports distributed by redis
Teneo
⭐
22
Crawling Framework
⭐
21
Easily crawl news portals or blog sites using Storm Crawler.
Preston
⭐
21
a biodiversity dataset tracker
Cc Dbp
⭐
20
A dataset for knowledge base population research using Common Crawl and DBpedia.
Eksi
⭐
20
Eksi sözlük crawl,stat , api calismalari
Httrack2warc
⭐
20
Converts HTTrack crawls to WARC files
Jspider
⭐
19
A simple, scalable, and highly efficient web crawler framework for Java.
Common Crawl Quick Hacks
⭐
18
common crawl quick hack examples
Sitecrawler
⭐
18
This is a Java library which can be used to crawl the content of some of web properties (www.salesforce.com, blogs.salesforce.com for example). It supports dynamic scaling (depending on available machine power (CPU, RAM) and network capacity) out of the box. It also has a Plugin structure, which allows others to write code (plugins) that act on the crawled pages.
Kairos
⭐
17
Kairos, combines a focused crawler and an information extraction engine, to convert a list of conference websites into a index filled with fields of metadata that correspond to individual papers. Using event date metadata extracted from the conference website, Kairos proactively harvests metadata about the individual papers soon after they are made public. We use a Maximum Entropy classifier to classify uniform resource locators (URLs) as scientific conference websites and use Conditional Random
Crawler
⭐
17
A simple and flexible web crawler framework for java.
Webtoon Crawler
⭐
17
Let's download webtoons while they are free!
Douyin Crawler
⭐
16
抖音爬虫. 通过手机代理爬取用户的作品和用户的喜欢
Cc Bill Tracker
⭐
16
These map reduce functions use Common Crawl data to look at the spread of congressional legislation on the internet
Leetcodecrawler
⭐
15
A tool for crawling the description and accepted submitted code of problems on the LeetCode and LeetCode-Cn website.
Tentacle
⭐
15
a opensource spider with java
Paperwebcrawler
⭐
15
IEEE XPLORE等文献网站的爬虫工具/Crawler for Paper Website like IEEE XPLORE
Webhunger
⭐
15
WebHunger is an extensible, full-scale crawler framework that supports distributed crawling, aiming at getting users focused on web page parsing without concerning for the crawling process.
Googleplay Web Crawler
⭐
15
Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive
Groundhog
⭐
14
A framework for crawling GitHub projects and raw data and to extract metrics from them
Twitter Crawler
⭐
14
REST and STREAMING crawlers of Twitter (java)
Ido
⭐
14
Brings 1.13 and 1.14 movement like swimming and crawling into 1.12.2! Based off https://github.com/pentantan/BetterSwiming
Nutch In Java
⭐
14
How to use Apache Nutch without command line
Toolkit
⭐
13
Paper plugin 1.20 - /crash, /crawl, /lunar, /vanish, /sit - client detection
Springbootdc
⭐
13
SpringBoot Developer Components
Lightshot
⭐
13
Lightshot image grabber
Serritor
⭐
13
Serritor is an open source web crawler framework built upon Selenium and written in Java. It can be used to crawl dynamic web pages that require JavaScript to render data.
Befree
⭐
13
大概就是爬取YouTube之类一些墙外的一些热门内容到一些大陆能访问的网站
Nio Crawler
⭐
13
Simple Java Crawler using NIO
Robots.txt
⭐
13
🤖 robots.txt as a service. Crawls robots.txt files, downloads and parses them to check rules through an API
Spotifydiscoverybot
⭐
13
A Java-based bot that automatically crawls for new releases by your followed artists on Spotify. Never miss a release again!
Venom Tutorial
⭐
12
A tutorial based on your preferred open source focused crawler for the deep web.
Torfiles
⭐
12
An open-source torrent searching serice
Fastcrawler
⭐
12
一个快速,简单,基于多线程的网络爬虫框架
Pttcrawler
⭐
12
PTTCrawler is a powerful ptt crawler written by Java
Pubsenti Finder
⭐
12
微博评论情感分析,爬虫,文本分类,Web。
Crawler Framework
⭐
12
分布式爬虫框架,基于webdrvier模拟用户请求,kafka消息传递,分布式网页存储使用hbase ip服务和号码验证服务等, proxy page使用H5和we版进行接入
In One File Manager
⭐
12
Desktop File Manager for Windows
Knowledge Distillation
⭐
12
site crawler for knowledge graph
Warc Mapreduce
⭐
11
warc and wet support for Hadoop's mapreduce api
Confluence Static Cache
⭐
11
Generates static file cache for Confluence
Supermonkey
⭐
11
A crawler for automated Android UI testing.
Gspider
⭐
11
a groory spider .
Springbootcrawlerdb
⭐
11
A Spring Boot web crawler setup/example with crawler4j, Jsoup, Spring Data JPA (Hibernate), PostgresDB.
Chatper15_net_io_img_crawler
⭐
11
第15章 Kotlin 文件IO操作与多线程
Ghost Login
⭐
11
Specifically designed to solve the web crawler when collecting Internet web data who need to login the web-site by useing some Simulated ways.
Apk Crawler
⭐
10
APK-Crawler is a tool for collecting apk files.
Instagram Crawler
⭐
10
Spring Boot Integration Crawler Sample
⭐
10
a spring boot + spring integration crawler sample.
Dwtc Extractor
⭐
10
Extraction code used to create the Dresden Web Table Corpus
Nutch Crawler
⭐
9
Apache Nutch fork tunned for web services and data discovery.
Weather Mrs
⭐
9
天气爬虫采集,kafka实时分发,flume 收集数据导入到 Hbase, 再由 Hive 与 Hbase 建立映射,Superset 分析和展示数据。
Codechef Crawler
⭐
9
[deprecated] A web crawler that can download all successful submissions of a user in Codechef. Just provide the username.
Dhtcrawler
⭐
9
Crawl torrent in Java
Webmuncher
⭐
9
A web scrapper/crawler in Java
Bilibili Plugin
⭐
9
哔哩哔哩插件姬
Quora Loader
⭐
9
A realtime read-only locator and extraction library for Quora questions and answers.
Nutch Indexer Discovery
⭐
9
Watson Discovery Service indexing plugin for Apache Nutch
Hubs
⭐
9
Hubs is a content crawler application on Android. It provides apis to crawl web content and display data.
Kaboomzhihu
⭐
9
知乎批量关注,批量取消关注
Recodoc2
⭐
9
Analysis Platform for Developer Learning Resources
Questionrecommendation
⭐
8
Programming Questions Recommendation System (牛客网试题推荐系统)
Adaptive Crawler
⭐
8
Twitter Adaptive Crawler
Dwtc Tools
⭐
8
Dresden Web Table Corpus Java library
Agrotagger
⭐
8
This application allows to index web documents, creating RDF triples that link a web URL to some URIs of a SKOS thesaurus
Leethub
⭐
8
An Android Client for LeetCode
Zhihu Crawler
⭐
8
A simple ZhiHu Crawler using WebMagic
Chronicrawl
⭐
8
Experimental continouous web crawler for web archiving
Jirlbot
⭐
8
Java implementation of the Internet Research Lab Web Crawler (IRLbot) as presented by Hsin-Tsang Lee, Derek Leonard, Xiaoming Wang, and Dmitri Loguinov in their paper "IRLbot: Scaling to 6 Billion Pages and Beyond"
Docker Codesearch
⭐
8
Code Search on Fess
Geeco
⭐
8
Gecco是一款用java语言开发的轻量化的易用的网络爬虫。Gecco整合了jsoup、httpcl request。如果你喜欢这款爬虫框架请star 或者 fork!
Bamboo
⭐
8
Web archive collection manager
Ukwa Heritrix
⭐
8
The UKWA Heritrix3 custom modules and Docker builder.
Leechcrawler
⭐
8
Incremental crawling capabilities for Apache Tika. Crawl content out of e.g. file systems, http(s) sources (webcrawling) imap(s) servers or your own arbitrary data sources. LeechCrawler offers additional Tika parsers providing these crawling capabilities.
Atom Nuke
⭐
8
Renren Analysis
⭐
8
a project which is used for crawler and data visualization on renren.com
Eksiseyler
⭐
8
Sample MVP project uses jsoup-web-crawl like API
Rxcrawler
⭐
8
a java crawler base on rx-java
Baidu Search Result Crawler
⭐
8
一个百度搜索结果内容获取爬虫。
Cis555 Project
⭐
7
The final project for CIS555 at the University of Pennsylvania.
Movie Showtimes
⭐
7
Web Service & Android Application to look up Vietnam movie showtimes
Crawlerzwei
⭐
7
CZwei Crawler for www.2ch.net
Fns_front
⭐
7
🕶👨🎤👩🎤Fashion Network Service.
Crawlrss
⭐
7
Crawl RSS - Heritrix 3 add-on
Vertx Crawler
⭐
7
Web Crawler based on Vert.x
Flash
⭐
7
A simple Crawler-based search engine that demonstrates the main features of a search engine (web crawling, indexing and ranking) and the interaction between them using Java and a Web Interface.
Born2crawl
⭐
7
A highly performant and versatile crawling engine, designed with scalability and extensibility in mind.
Fess Ds Atlassian
⭐
7
DataStore Crawler for JIRA/Confluence
Java Learn
⭐
7
java-learn
Gospy
⭐
7
🕷 a flexible web crawler framework
Ryanaid
⭐
7
ryanair crawler based on webkit
Cloud Computing Search Engine
⭐
7
A cloud-based web search engine computing Hadoop MapReduce on Amazon EC2 consisting of crawler, indexer, PageRank.
Httrack2arc
⭐
7
HTTrack2Arc is a tool that converts crawls made by HTTrack to Internet Archive ARC files.
Related Searches
Java Spring (21,350)
Java Plugin (12,452)
Java Spring Boot (11,982)
Java Video Game (8,093)
Java Gradle (8,072)
Java Docker (6,180)
Java Database (6,015)
Java Mysql (5,954)
Java Sdk (5,864)
Javascript Java (5,468)
101-186 of 186 search results
< Previous
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.