Awesome Open Source

Programming Languages

Search results for elasticsearch crawler

elasticsearch x

62 search results found

Wooyun_public ⭐ 3,701

This repo is archived. Thanks for wooyun! 乌云公开漏洞、知识库爬虫和搜索 crawl and search for wooyun.org public bug(vulnerability) and drops

Spring Boot Quick ⭐ 2,282

🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如：rabbitmq(延迟队列)、K

News Please ⭐ 1,821

news-please - an integrated web crawler and information extractor for news that just works

Diskover Community ⭐ 1,391

Diskover Community Edition - Open source file indexer, file search engine and data management and analytics powered by Elasticsearch

Fscrawler ⭐ 1,279

Elasticsearch File System Crawler (FS Crawler)

Fess is very powerful and easily deployable Enterprise Search Server.

Ipfs Search ⭐ 779

Search engine for the Interplanetary Filesystem.

Monocle ⭐ 326

Monocle helps teams and individual to better organize daily duties and to detect anomalies in the way changes are produced and reviewed.

Freshonions Torscraper ⭐ 313

Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

Elasticsearch River Web ⭐ 232

Web Crawler for Elasticsearch

News Crawl ⭐ 229

News crawling with StormCrawler - stores content as WARC

Weixin_crawler ⭐ 209

高效微信公众号历史文章和阅读数据爬虫powered by scrapy

Od Database ⭐ 113

Distributed crawler, database and web frontend for public directories indexing

Bathyscaphe ⭐ 83

Fast, highly configurable, cloud native dark web crawler.

Movie Elasticsearch ⭐ 76

使用 SpringBoot2.0+ElasticSearch 实现的开源电影搜索引擎

A distributed DHT crawler that sniffs torrents from BitTorrent network

Docker Diskover ⭐ 66

A Docker container for the Diskover space mapping application

Harvester ⭐ 59

Web crawling and document processing through a usable interface.

Fishfishjump ⭐ 57

Fish Fish Jump is a solution in the python that simply and basic for search engines. 🐟 🐟 🐟

A distributed self-host DHT torrent search suite

Anilist Crawler ⭐ 43

Crawl data from anilist API and store in MariaDB.

Go Crawler Distributed ⭐ 39

分布式爬虫项目，本项目支持个性化定制页面解析器二次开发，项目整体采用微服务架构，通过消息队列实现消息 gorm, goquery, easyjson, viper, amqp, zap, go-micro，并通过Docker实现容器化部署，中间爬虫节点支持水平拓展。

Dataset search engine, discovering data from a variety of sources, profiling it, and allowing advanced queries on the index

Openartbrowser ⭐ 33

Exploring the world of arts using open data

Crawlerflow ⭐ 30

Web Crawlers orchestration framework that lets you create datasets from multiple web sources using yaml configurations.

Playdrone Kitchen ⭐ 28

Kitchen for the Google Play Crawler cluster

Trollhunter ⭐ 27

Twitter Troll & Fake News Hunter - Crawls news websites and twitter to identify fake news

Goodsearcher ⭐ 26

A pyLucene-based search module for searching books from goodreads.com

Crawler Project ⭐ 25

Google资深工程师深度讲解Go语言爬虫项目。

Trend Monitoring ⭐ 23

실시간 트렌드 데이터 분석/모니터링 시스템 tremo

Deadpool ⭐ 22

该项目是一个使用celery作为主体框架的爬虫应用，能够灵活的添加爬虫任务，并且同时运行多站点的爬虫

Bthello App ⭐ 21

Python3 DHT 磁力种子爬虫种子解析种子搜索演示地址

A crawler that extracts data from a dynamic webpage. Written in node js.

Crawling Framework ⭐ 21

Easily crawl news portals or blog sites using Storm Crawler.

一个分布式的爬虫项目

Crawlerx ⭐ 16

CrawlerX - Develop Extensible, Distributed, Scalable Crawler System which is a web platform that can be used to crawl URLs in different kind of protocols in a distributed way.

V2ex Crawler ⭐ 15

A simple single-threaded crawler for V2EX

Information Retrieval ⭐ 15

Elasticsearch, MongoDB, Tornado Server, RESTful API, Python, Information Retrieval, Machine Learning, Web Crawler

Mongo Elasticsearch Nutch ⭐ 15

Docker image for creating a single Apache Nutch server, with mongodb as crawl storage and Elasticsearch for indexing

Python3 DHT 磁力种子爬虫种子解析种子搜索演示地址

open source, distributed, restful crawler engine

Newscrawler ⭐ 13

FTP search based on Go! and ElasticSearch for the 31. Chaos Communication Congress

My_qa_robot ⭐ 13

An AutoQA chatbot based on historical QA pairs and realized through local KB & online crawler

Horizonspider ⭐ 13

The spider for ZeroNet search engine Horizon

Bitinsight ⭐ 12

🌍 Bittorrent Network Overview through Infohash Indexing, Metadata and IP visualisations of the DHT network

Microcrawler Js ⭐ 11

Scrapping made easy...

Emotion_analysis_elastic_pytorch ⭐ 11

Deep Emotion Analysis with Elastic and PyTorch

Filecrawler ⭐ 11

File Crawler index files and search hard-coded credentials

Wsu Accessibility Collector ⭐ 10

Scans and collects accessibility data for a given set of URLs

News Crawler ⭐ 10

Crawler that collects and extracts content of daily published news articles

Bot Marvin ⭐ 9

Highly scalable crawler with best features.

ElasticSearch test

Pandemic Knowledge ⭐ 8

A fully-featured multi-source data pipeline for continuously extracting knowledge from COVID-19 data.

Weixin_crawler 1 ⭐ 8

高效微信公众号历史文章和阅读数据爬虫powered by scrapy 微信公众号爬虫微信采集公众号采集

Podcast Search Engine

Dead simple yet powerful Ruby crawler for easy parallel crawling with support for an anonymity.

Docker Nutch Elasticsearch Mongodb ⭐ 8

Docker Image for Apache Nutch, Elasticsearch and MongoDB

Elastic Webcrawler ⭐ 8

Golang Webcrawler for Elasticsearch

Oeh Search Etl ⭐ 7

The Backend includes all data for the ETL process (Scrapy, Postgres, Elasticsearch)

Rubygems Crawler ⭐ 7

A little utility to download rubygems.org information - used for an ElasticSearch demo at RubyConf 2013

I_love_indexes ⭐ 7

Python Lcv Search Engine ⭐ 7

Updated version of Python distributed crawler- A search engine. It serves as the Google Chrome web browser as its principal user interface.

Jekyll Search Server ⭐ 6

A standalone search crawler and API for Jekyll sites.

Skeleton X ⭐ 6

🎉基于Springboot的SSM脚手架,目前已整合spring-scurity,websocke

Just a typical search engine in this universe 🔥🔥🔥

Nutchelasticsearch ⭐ 6

Systemanalysisdesign ⭐ 6

Term Project repository for System Analysis and Design course in ITM, Seoultech.

Colid Indexing Crawler Service ⭐ 5

The Indexing Crawler Service (ICS) repository is part of the Corporate Linked Data Catalog - short: COLID - application. It is responsible to extract data from a RDF storage system, transform and enrich the data and finally to send it via a message queue to the DMP Webservice for indexing.

Elastic_microdata ⭐ 5

example app for looking at data in elasticsearch based on a crawl of a site with microdata

Wsu Web Crawler ⭐ 5

Crawls URLs for URLs and stores URLs in Elasticsearch.

House Finder ⭐ 5

Web crawler, flat search engine and notification tool, that I use to find my new flat!

Related Searches

Python Crawler (4,545)

Java Elasticsearch (2,094)

Elasticsearch Kibana (1,982)

Python Elasticsearch (1,920)

Javascript Elasticsearch (1,591)

Docker Elasticsearch (1,543)

Elasticsearch Logstash (1,178)

Javascript Crawler (1,142)

Elasticsearch Elastic (1,124)

Search Elasticsearch (1,035)

1-62 of 62 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.