Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for data extraction
data-extraction
x
130 search results found
Flashtext
⭐
5,463
Extract Keywords from sentence or Replace keywords in sentences.
Optimus
⭐
1,446
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Pdflayouttextstripper
⭐
1,390
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
Hacker News Digest
⭐
620
📰 Let ChatGPT Summarize Hacker News for You
Npm Pdfreader
⭐
522
🚜 Parse text and tables from PDF files.
Amazoncaptcha
⭐
368
Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.
Vnstock
⭐
274
A powerful Python library for getting rich data from the Vietnam Stock Market using just a few lines of code
Algoliasearch Netlify
⭐
258
Official Algolia Plugin for Netlify. Index your website to Algolia when deploying your project to Netlify with the Algolia Crawler
Tablenet
⭐
168
Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"
Infoboxer
⭐
166
Wikipedia information extraction library
Sypht Python Client
⭐
163
A python client for the Sypht API
Ralger
⭐
145
ralger makes it easy to scrape a website. Built on the shoulders of titans: rvest, xml2.
Mastiff
⭐
139
Malware static analysis framework
Binarypig
⭐
133
Scalable Binary Data Extraction in Hadoop
Newspaper3_usage_overview
⭐
120
This repository provides usage examples for the Python module Newspaper3k.
Sayn
⭐
117
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Go Htmltable
⭐
110
Structured HTML table data extraction from URLs in Go that has almost no external dependencies
Plotdigitizer
⭐
94
A Python utility to digitize plots.
Benchmarks
⭐
93
Benchmarking PDF libraries
Sypht Java Client
⭐
92
A Java client for the Sypht API
Maldrolyzer
⭐
87
Simple framework to extract "actionable" data from Android malware (C&Cs, phone numbers etc.)
Cyac
⭐
86
High performance Trie and Ahocorasick automata (AC automata) Keyword Match & Replace Tool for python
Flash
⭐
82
Golang Keyword extraction/replacement Datastructure using Tries instead of regexes
Line Segmentation Algorithm To Gcp Vision
⭐
81
Line segmentation algorithm for Google Vision API.
Google Covid19 Mobility Reports
⭐
79
Data extraction of Google's COVID-19 Mobility Reports
Digitizer
⭐
71
R package to extract data from plots and other images. Hosts WebPlotDigitizer locally.
Format_parser
⭐
59
file metadata parsing, done cheap
Clauneck
⭐
57
A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.
Ricloud
⭐
57
Python client for Reincubate's ricloud API. Yes, it works with iOS 13 & iPhone 11 backups!
Sde
⭐
50
Structured Data Extractor. An application to extract structured data from web pages. It uses Data Extraction Based on Partial Tree Alignment (DEPTA) method. (UPDATE: I implemented a newer algorithm: https://github.com/seagatesoft/webdext)
Hext
⭐
48
Domain-specific language for extracting structured data from HTML documents
Itemloaders
⭐
41
Library to populate items using XPath and CSS with a convenient API
Typestream
⭐
39
⚡️ Next-generation data transformation framework for TypeScript that puts developer experience first
Refinery
⭐
37
Refinery is a tool to extract and transform semi-structured data from Excel spreadsheets of different layouts in a declarative way.
Jsonpath
⭐
36
A query expression for extracting data from JSON.
Goscrapy
⭐
34
GoScrapy: Harnessing Go's power for efficient web scraping, inspired by Python's Scrapy framework.
Sypht Golang Client
⭐
34
A Golang client for the Sypht API
Fwdataviz
⭐
33
Fixed Width Data Visualizer plugin for Notepad++. Turns Notepad++ into Excel for fixed-width data files. Displays cursor position data. Jumps to specific fields. Folding Record Blocks. Extracts Data. Builtin dialogs to configure file-type, record-type & fields; Themes & Colors; and Folding. Handles homogenous, mixed & multi-line records.
Scrapemate
⭐
33
Scraping assistant tool. Editing and maintaining CSS/XPath selectors across webpages.
Docwire
⭐
31
DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boost efficiency in text extraction, web data extraction, data mining, document analysis. Offline processing is possible for security and confidentiality
Google Search Results Java
⭐
30
Google Search Results JAVA API via SerpApi
Scraping Medium And Data Analysis
⭐
30
This Repository is for Code Build to Scrape MEDIUM and analyse the scrapped data
Filtr
⭐
28
Data_extractor
⭐
26
Combine XPath, CSS Selectors and JSONPath for Web data extracting.
Newshound
⭐
25
This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around the world in over 50 languages.
Tinvois Parser
⭐
25
Extract receipt info
Kantan.regex
⭐
25
Regular expression library for Scala
Table Extractor From Image
⭐
25
This repository contains the code that extracts a table from an image and exports it to an Excel.
Msrc Dpu Learning To Represent Edits
⭐
24
C# Data Extraction for "Learning to Represent Edits"
Pyandriller
⭐
23
Forensic data extraction and decoding tool for Android devices
Awesome Receipt Data Extraction
⭐
23
A curated list (and summaries) of awesome research publications on topic of data extraction from photos of receipts.
Kitinerary
⭐
22
Data Model and Extraction System for Travel Reservation information
Articdata
⭐
21
Collection of data extracted from Minecraft.
Web Scraping With Selenium
⭐
20
In this guide on how to web scrape with Selenium, we will be using Python 3. The code should work with any version of Python above 3.6
Wiktionary De Parser
⭐
20
Extract data from German Wiktionary XML files. Allows you to add your own extraction methods 🚀
Spandex
⭐
19
Spatial Analysis and Data Extraction
Svg2data
⭐
19
A Python module for reading data from a plot provided as SVG file.
Purlovia
⭐
18
Project Purlovia - digging up Ark data
Githubuserdataextractor
⭐
17
A tool that displays information and received events about any user on GitHub straight on your terminal screen
Google Maps Scraper
⭐
17
Google maps scraper with gui
Datacollection
⭐
17
Data collection, alignment and TAUS repository
Taupe
⭐
17
Taupe takes a downloaded Twitter archive ZIP file, extracts the URLs corresponding to tweets, retweets, replies, quote tweets, and liked tweets, and outputs the results in a comma-separated values (CSV) format that you can use with other software tools.
Jsonknife
⭐
16
useful functions for jsonb in postgresql
Musical Onset Efficient
⭐
15
Supplementary information and code for the paper: An efficient deep learning model for musical onset detection
Android Forensic Timeline
⭐
15
Forensic timeline generation on Android platform
Etymdb
⭐
14
An Etymological DataBase (v2.1) - described in the LREC paper Methodological Aspects of Developing and Managing an Etymological Lexical Resource: Introducing EtymDB-2.0
Groundhog
⭐
14
A framework for crawling GitHub projects and raw data and to extract metrics from them
Cafr Parsing
⭐
14
Automated data extraction from U.S. state Comprehensive Annual Financial Reports (CAFR).
Kick Off Web Scraping Python Selenium Beautifulsoup
⭐
13
A tutorial-based introduction to web scraping with Python.
Webmiddle
⭐
13
Node.js framework for modular web scraping and data extraction
Pdf Miner
⭐
13
python based crawler to mine pdfs from websites and extracting useful features for data extraction
Xxblind
⭐
12
eXtremely fast data eXtraction via blind SQL injection
X4_complexcalculator
⭐
12
Complex calculator for X4: Foundations
Smartmuv
⭐
12
An EVM-compatible Solidity Smart Contract Storage/Slot Analyzer and Data Extractor.
Paleofire
⭐
12
paleofire package
Nimdataframe
⭐
11
Dataframe for Nim
Ofx Data Extractor
⭐
11
A module written in TypeScript that provides a utility to extract data from an OFX file in Node.js and Browser
Endominer
⭐
11
Endoscopic and Pathological data extraction for various endo-pathological data extraction
Moroccanhousing Etl
⭐
11
Moroccan housing data pipeline using scrapy, mongodb , zyte and digitalocean cloud
Azure Data Factory
⭐
11
Aprender Gerencimento de Dados ETL/ELT
Book Depository Dataset
⭐
10
A large collection of books, scraped from bookdepository.com
Ferretapi
⭐
10
🐳 Expose an api for headless chrome and ferret
Lineex
⭐
10
Data Extraction from Scientific Line Charts
Osintifyx
⭐
10
OsintifyX: Powerful Open-source OSINT tool for extracting valuable information from Instagram profiles.
Scrappey Wrapper Python
⭐
10
An API wrapper for Scrappey.com written in Python (cloudflare bypass & solver)
Stellaris Map Generation
⭐
9
Extracts geopolitical data from Stellaris save game files
Justrefs
⭐
9
Just Refs - extract just the references and related topics from any page on the English Wikipedia
Autodias
⭐
9
autoDIAS: a python tool for an automated Distortion/Interaction Activation Strain Analysis
Sypht Node Client
⭐
9
A Nodejs client for the Sypht API
Scrape
⭐
9
When you need those jobs hypersonic 🚀 scrape 🔪
Filmweb Export
⭐
9
Eksport danych z serwisu filmweb
Inparse
⭐
8
Open Collaborative AI Driven Parser builder for Web Scraping, Data Extraction and Crawling,Knowledge Graph
Exif
⭐
8
ExifTool is a powerful command-line tool that can be used to extract and edit metadata in a wide range of media files, including images, audio, and video. Metadata is information that is stored within a file that describes the file’s content or other attributes.
Awesome Scrapy
⭐
8
🕶 Awesome list of Scrapy tools and libraries
Mbuild
⭐
8
General purpose ROM hacking data extraction and insertion tool SNES/SFC, NES, and others
Gotz
⭐
8
Gotz - Heavy duty ETL to automate data extraction from tons of HTML pages
Financial Documents Ocr Deep Learning
⭐
8
Travisminer
⭐
7
Data extraction and mining scripts for Travis CI data
Capstoneproject_house_prices_prediction
⭐
7
Understand the relationships between various features in relation with the sale price of a house using exploratory data analysis and statistical analysis. Applied ML algorithms such as Multiple Linear Regression, Ridge Regression and Lasso Regression in combination with cross validation. Performed parameter tuning, compared the test scores and suggested a best model to predict the final sale price of a house. Seaborn is used to plot graphs and scikit learn package is used for statistical analysi
Sypht Csharp Client
⭐
7
A C# / .NET client for the Sypht API
1-100 of 130 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.