Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for elasticsearch crawler
crawler
x
elasticsearch
x
62 search results found
Wooyun_public
⭐
3,701
This repo is archived. Thanks for wooyun! 乌云公开漏洞、知识库爬虫和搜索 crawl and search for wooyun.org public bug(vulnerability) and drops
Spring Boot Quick
⭐
2,282
🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如:rabbitmq(延迟队列)、K
News Please
⭐
1,821
news-please - an integrated web crawler and information extractor for news that just works
Diskover Community
⭐
1,391
Diskover Community Edition - Open source file indexer, file search engine and data management and analytics powered by Elasticsearch
Fscrawler
⭐
1,279
Elasticsearch File System Crawler (FS Crawler)
Fess
⭐
943
Fess is very powerful and easily deployable Enterprise Search Server.
Ipfs Search
⭐
779
Search engine for the Interplanetary Filesystem.
Monocle
⭐
326
Monocle helps teams and individual to better organize daily duties and to detect anomalies in the way changes are produced and reviewed.
Freshonions Torscraper
⭐
313
Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion
Gopa
⭐
281
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Elasticsearch River Web
⭐
232
Web Crawler for Elasticsearch
News Crawl
⭐
229
News crawling with StormCrawler - stores content as WARC
Weixin_crawler
⭐
209
高效微信公众号历史文章和阅读数据爬虫powered by scrapy
Od Database
⭐
113
Distributed crawler, database and web frontend for public directories indexing
Bathyscaphe
⭐
83
Fast, highly configurable, cloud native dark web crawler.
Movie Elasticsearch
⭐
76
使用 SpringBoot2.0+ElasticSearch 实现的开源电影搜索引擎
Dodder
⭐
71
A distributed DHT crawler that sniffs torrents from BitTorrent network
Docker Diskover
⭐
66
A Docker container for the Diskover space mapping application
Harvester
⭐
59
Web crawling and document processing through a usable interface.
Fishfishjump
⭐
57
Fish Fish Jump is a solution in the python that simply and basic for search engines. 🐟 🐟 🐟
Gdht
⭐
48
A distributed self-host DHT torrent search suite
Anilist Crawler
⭐
43
Crawl data from anilist API and store in MariaDB.
Go Crawler Distributed
⭐
39
分布式爬虫项目,本项目支持个性化定制页面解析器二次开发,项目整体采用微服务架构,通过消息队列实现消息 gorm, goquery, easyjson, viper, amqp, zap, go-micro,并通过Docker实现容器化部署,中间爬虫节点支持水平拓展。
Auctus
⭐
34
Dataset search engine, discovering data from a variety of sources, profiling it, and allowing advanced queries on the index
Openartbrowser
⭐
33
Exploring the world of arts using open data
Crawlerflow
⭐
30
Web Crawlers orchestration framework that lets you create datasets from multiple web sources using yaml configurations.
Playdrone Kitchen
⭐
28
Kitchen for the Google Play Crawler cluster
Trollhunter
⭐
27
Twitter Troll & Fake News Hunter - Crawls news websites and twitter to identify fake news
Goodsearcher
⭐
26
A pyLucene-based search module for searching books from goodreads.com
Crawler Project
⭐
25
Google资深工程师深度讲解Go语言 爬虫项目。
Trend Monitoring
⭐
23
실시간 트렌드 데이터 분석/모니터링 시스템 tremo
Deadpool
⭐
22
该项目是一个使用celery作为主体框架的爬虫应用,能够灵活的添加爬虫任务,并且同时运行多站点的爬虫
Bthello App
⭐
21
Python3 DHT 磁力种子爬虫 种子解析 种子搜索 演示地址
Gumo
⭐
21
A crawler that extracts data from a dynamic webpage. Written in node js.
Crawling Framework
⭐
21
Easily crawl news portals or blog sites using Storm Crawler.
Island
⭐
17
一个分布式的爬虫项目
Crawlerx
⭐
16
CrawlerX - Develop Extensible, Distributed, Scalable Crawler System which is a web platform that can be used to crawl URLs in different kind of protocols in a distributed way.
V2ex Crawler
⭐
15
A simple single-threaded crawler for V2EX
Information Retrieval
⭐
15
Elasticsearch, MongoDB, Tornado Server, RESTful API, Python, Information Retrieval, Machine Learning, Web Crawler
Mongo Elasticsearch Nutch
⭐
15
Docker image for creating a single Apache Nutch server, with mongodb as crawl storage and Elasticsearch for indexing
Bthello
⭐
14
Python3 DHT 磁力种子爬虫 种子解析 种子搜索 演示地址
Ants
⭐
14
open source, distributed, restful crawler engine
Newscrawler
⭐
13
News crawler
Torture
⭐
13
FTP search based on Go! and ElasticSearch for the 31. Chaos Communication Congress
My_qa_robot
⭐
13
An AutoQA chatbot based on historical QA pairs and realized through local KB & online crawler
Horizonspider
⭐
13
The spider for ZeroNet search engine Horizon
Memex
⭐
12
Bitinsight
⭐
12
🌍 Bittorrent Network Overview through Infohash Indexing, Metadata and IP visualisations of the DHT network
Microcrawler Js
⭐
11
Scrapping made easy...
Emotion_analysis_elastic_pytorch
⭐
11
Deep Emotion Analysis with Elastic and PyTorch
Filecrawler
⭐
11
File Crawler index files and search hard-coded credentials
Wsu Accessibility Collector
⭐
10
Scans and collects accessibility data for a given set of URLs
News Crawler
⭐
10
Crawler that collects and extracts content of daily published news articles
Bot Marvin
⭐
9
Highly scalable crawler with best features.
Estest
⭐
9
ElasticSearch test
Pandemic Knowledge
⭐
8
A fully-featured multi-source data pipeline for continuously extracting knowledge from COVID-19 data.
Weixin_crawler 1
⭐
8
高效微信公众号历史文章和阅读数据爬虫powered by scrapy 微信公众号爬虫 微信采集 公众号采集
Castroom
⭐
8
Podcast Search Engine
Govalert
⭐
8
hash
Kabutops
⭐
8
Dead simple yet powerful Ruby crawler for easy parallel crawling with support for an anonymity.
Docker Nutch Elasticsearch Mongodb
⭐
8
Docker Image for Apache Nutch, Elasticsearch and MongoDB
Elastic Webcrawler
⭐
8
Golang Webcrawler for Elasticsearch
Oeh Search Etl
⭐
7
The Backend includes all data for the ETL process (Scrapy, Postgres, Elasticsearch)
Rubygems Crawler
⭐
7
A little utility to download rubygems.org information - used for an ElasticSearch demo at RubyConf 2013
I_love_indexes
⭐
7
Python Lcv Search Engine
⭐
7
Updated version of Python distributed crawler- A search engine. It serves as the Google Chrome web browser as its principal user interface.
Jekyll Search Server
⭐
6
A standalone search crawler and API for Jekyll sites.
Skeleton X
⭐
6
🎉基于Springboot的SSM脚手架,目前已整合spring-scurity,websocke
Visee
⭐
6
Just a typical search engine in this universe 🔥🔥🔥
Nutchelasticsearch
⭐
6
Systemanalysisdesign
⭐
6
Term Project repository for System Analysis and Design course in ITM, Seoultech.
Colid Indexing Crawler Service
⭐
5
The Indexing Crawler Service (ICS) repository is part of the Corporate Linked Data Catalog - short: COLID - application. It is responsible to extract data from a RDF storage system, transform and enrich the data and finally to send it via a message queue to the DMP Webservice for indexing.
Elastic_microdata
⭐
5
example app for looking at data in elasticsearch based on a crawl of a site with microdata
Wsu Web Crawler
⭐
5
Crawls URLs for URLs and stores URLs in Elasticsearch.
House Finder
⭐
5
Web crawler, flat search engine and notification tool, that I use to find my new flat!
Capstone
⭐
5
Related Searches
Python Crawler (4,545)
Java Elasticsearch (2,094)
Elasticsearch Kibana (1,982)
Python Elasticsearch (1,920)
Javascript Elasticsearch (1,591)
Docker Elasticsearch (1,543)
Elasticsearch Logstash (1,178)
Javascript Crawler (1,142)
Elasticsearch Elastic (1,124)
Search Elasticsearch (1,035)
1-62 of 62 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.