Awesome Open Source

Programming Languages

Search results for java crawler

186 search results found

Tinkertime ⭐ 26

NO LONGER SUPPORTED - The Ultimate KSP Mod Mechanic

Java web crawling library

Real_time_social_media_mining ⭐ 24

DevOps pipeline for Real Time Social/Web Mining

Myhttpclient ⭐ 23

爬虫抓取框架,封装HttpClient,Htmlunit,Selenium等工具

Gecco Redis ⭐ 23

Gecko crawler supports distributed by redis

Httpproxy ⭐ 23

JAVA实现的IP代理池，支持HTTP与HTTPS两种方式

Distributed, asynchronous web crawler

Crawling Framework ⭐ 21

Easily crawl news portals or blog sites using Storm Crawler.

a biodiversity dataset tracker

A dataset for knowledge base population research using Common Crawl and DBpedia.

Httrack2warc ⭐ 20

Converts HTTrack crawls to WARC files

Eksi sözlük crawl,stat , api calismalari

A simple, scalable, and highly efficient web crawler framework for Java.

Sitecrawler ⭐ 18

This is a Java library which can be used to crawl the content of some of web properties (www.salesforce.com, blogs.salesforce.com for example). It supports dynamic scaling (depending on available machine power (CPU, RAM) and network capacity) out of the box. It also has a Plugin structure, which allows others to write code (plugins) that act on the crawled pages.

Common Crawl Quick Hacks ⭐ 18

common crawl quick hack examples

Webtoon Crawler ⭐ 17

Let's download webtoons while they are free!

Kairos, combines a focused crawler and an information extraction engine, to convert a list of conference websites into a index filled with fields of metadata that correspond to individual papers. Using event date metadata extracted from the conference website, Kairos proactively harvests metadata about the individual papers soon after they are made public. We use a Maximum Entropy classifier to classify uniform resource locators (URLs) as scientific conference websites and use Conditional Random

A simple and flexible web crawler framework for java.

Douyin Crawler ⭐ 16

抖音爬虫. 通过手机代理爬取用户的作品和用户的喜欢

Cc Bill Tracker ⭐ 16

These map reduce functions use Common Crawl data to look at the spread of congressional legislation on the internet

Webhunger ⭐ 15

WebHunger is an extensible, full-scale crawler framework that supports distributed crawling, aiming at getting users focused on web page parsing without concerning for the crawling process.

Googleplay Web Crawler ⭐ 15

Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive

Paperwebcrawler ⭐ 15

IEEE XPLORE等文献网站的爬虫工具/Crawler for Paper Website like IEEE XPLORE

Leetcodecrawler ⭐ 15

A tool for crawling the description and accepted submitted code of problems on the LeetCode and LeetCode-Cn website.

Tentacle ⭐ 15

a opensource spider with java

Groundhog ⭐ 14

A framework for crawling GitHub projects and raw data and to extract metrics from them

Twitter Crawler ⭐ 14

REST and STREAMING crawlers of Twitter (java)

Nutch In Java ⭐ 14

How to use Apache Nutch without command line

Brings 1.13 and 1.14 movement like swimming and crawling into 1.12.2! Based off https://github.com/pentantan/BetterSwiming

大概就是爬取YouTube之类一些墙外的一些热门内容到一些大陆能访问的网站

Serritor ⭐ 13

Serritor is an open source web crawler framework built upon Selenium and written in Java. It can be used to crawl dynamic web pages that require JavaScript to render data.

Springbootdc ⭐ 13

SpringBoot Developer Components

Paper plugin 1.20 - /crash, /crawl, /lunar, /vanish, /sit - client detection

Spotifydiscoverybot ⭐ 13

A Java-based bot that automatically crawls for new releases by your followed artists on Spotify. Never miss a release again!

Nio Crawler ⭐ 13

Simple Java Crawler using NIO

Robots.txt ⭐ 13

🤖 robots.txt as a service. Crawls robots.txt files, downloads and parses them to check rules through an API

Lightshot ⭐ 13

Lightshot image grabber

Pubsenti Finder ⭐ 12

微博评论情感分析，爬虫，文本分类，Web。

Venom Tutorial ⭐ 12

A tutorial based on your preferred open source focused crawler for the deep web.

Torfiles ⭐ 12

An open-source torrent searching serice

Knowledge Distillation ⭐ 12

site crawler for knowledge graph

In One File Manager ⭐ 12

Desktop File Manager for Windows

Crawler Framework ⭐ 12

分布式爬虫框架,基于webdrvier模拟用户请求,kafka消息传递,分布式网页存储使用hbase ip服务和号码验证服务等, proxy page使用H5和we版进行接入

Pttcrawler ⭐ 12

PTTCrawler is a powerful ptt crawler written by Java

Fastcrawler ⭐ 12

一个快速，简单，基于多线程的网络爬虫框架

Chatper15_net_io_img_crawler ⭐ 11

第15章 Kotlin 文件IO操作与多线程

Springbootcrawlerdb ⭐ 11

A Spring Boot web crawler setup/example with crawler4j, Jsoup, Spring Data JPA (Hibernate), PostgresDB.

Ghost Login ⭐ 11

Specifically designed to solve the web crawler when collecting Internet web data who need to login the web-site by useing some Simulated ways.

Warc Mapreduce ⭐ 11

warc and wet support for Hadoop's mapreduce api

Supermonkey ⭐ 11

A crawler for automated Android UI testing.

a groory spider .

Confluence Static Cache ⭐ 11

Generates static file cache for Confluence

Spring Boot Integration Crawler Sample ⭐ 10

a spring boot + spring integration crawler sample.

Apk Crawler ⭐ 10

APK-Crawler is a tool for collecting apk files.

Dwtc Extractor ⭐ 10

Extraction code used to create the Dresden Web Table Corpus

Instagram Crawler ⭐ 10

Kaboomzhihu ⭐ 9

知乎批量关注，批量取消关注

Hubs is a content crawler application on Android. It provides apis to crawl web content and display data.

Nutch Crawler ⭐ 9

Apache Nutch fork tunned for web services and data discovery.

Bilibili Plugin ⭐ 9

哔哩哔哩插件姬

Codechef Crawler ⭐ 9

[deprecated] A web crawler that can download all successful submissions of a user in Codechef. Just provide the username.

Webmuncher ⭐ 9

A web scrapper/crawler in Java

Dhtcrawler ⭐ 9

Crawl torrent in Java

Quora Loader ⭐ 9

A realtime read-only locator and extraction library for Quora questions and answers.

Weather Mrs ⭐ 9

天气爬虫采集，kafka实时分发，flume 收集数据导入到 Hbase, 再由 Hive 与 Hbase 建立映射，Superset 分析和展示数据。

Nutch Indexer Discovery ⭐ 9

Watson Discovery Service indexing plugin for Apache Nutch

Analysis Platform for Developer Learning Resources

Atom Nuke ⭐ 8

Agrotagger ⭐ 8

This application allows to index web documents, creating RDF triples that link a web URL to some URIs of a SKOS thesaurus

An Android Client for LeetCode

Zhihu Crawler ⭐ 8

A simple ZhiHu Crawler using WebMagic

Dwtc Tools ⭐ 8

Dresden Web Table Corpus Java library

Chronicrawl ⭐ 8

Experimental continouous web crawler for web archiving

Adaptive Crawler ⭐ 8

Twitter Adaptive Crawler

Java implementation of the Internet Research Lab Web Crawler (IRLbot) as presented by Hsin-Tsang Lee, Derek Leonard, Xiaoming Wang, and Dmitri Loguinov in their paper "IRLbot: Scaling to 6 Billion Pages and Beyond"

Docker Codesearch ⭐ 8

Code Search on Fess

Gecco是一款用java语言开发的轻量化的易用的网络爬虫。Gecco整合了jsoup、httpcl request。如果你喜欢这款爬虫框架请star 或者 fork!

Web archive collection manager

Ukwa Heritrix ⭐ 8

The UKWA Heritrix3 custom modules and Docker builder.

Renren Analysis ⭐ 8

a project which is used for crawler and data visualization on renren.com

Leechcrawler ⭐ 8

Incremental crawling capabilities for Apache Tika. Crawl content out of e.g. file systems, http(s) sources (webcrawling) imap(s) servers or your own arbitrary data sources. LeechCrawler offers additional Tika parsers providing these crawling capabilities.

Baidu Search Result Crawler ⭐ 8

一个百度搜索结果内容获取爬虫。

Questionrecommendation ⭐ 8

Programming Questions Recommendation System (牛客网试题推荐系统)

Rxcrawler ⭐ 8

a java crawler base on rx-java

Eksiseyler ⭐ 8

Sample MVP project uses jsoup-web-crawl like API

Vertx Crawler ⭐ 7

Web Crawler based on Vert.x

ryanair crawler based on webkit

Httrack2arc ⭐ 7

HTTrack2Arc is a tool that converts crawls made by HTTrack to Internet Archive ARC files.

Cis555 Project ⭐ 7

The final project for CIS555 at the University of Pennsylvania.

Fns_front ⭐ 7

🕶👨‍🎤👩‍🎤Fashion Network Service.

A sample spider application.

A simple Crawler-based search engine that demonstrates the main features of a search engine (web crawling, indexing and ranking) and the interaction between them using Java and a Web Interface.

Cloud Computing Search Engine ⭐ 7

A cloud-based web search engine computing Hadoop MapReduce on Amazon EC2 consisting of crawler, indexer, PageRank.

Venom Examples ⭐ 7

A basic web crawler example

Java Learn ⭐ 7

Fess Ds Atlassian ⭐ 7

DataStore Crawler for JIRA/Confluence

Movie Showtimes ⭐ 7

Web Service & Android Application to look up Vietnam movie showtimes

Crawlerzwei ⭐ 7

CZwei Crawler for www.2ch.net

🕷 a flexible web crawler framework

Related Searches

Java Spring (21,350)

Java Plugin (12,452)

Java Spring Boot (11,982)

Java Video Game (8,093)

Java Gradle (8,072)

Java Docker (6,180)

Java Database (6,015)

Java Mysql (5,954)

Java Sdk (5,864)

Javascript Java (5,468)

101-186 of 186 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.