Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for data processing
data-processing
x
355 search results found
Miller
⭐
8,397
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Bash Oneliner
⭐
7,946
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
Awesome Web Scraping
⭐
6,060
List of libraries, tools and APIs for web scraping and data processing.
Dali
⭐
4,926
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
Dasel
⭐
4,845
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
Pandera
⭐
3,012
A light-weight, flexible, and expressive statistical data testing library
Dialogpt
⭐
2,314
Large-scale pretraining for dialogue
Broadway
⭐
2,307
Concurrent and multi-stage data ingestion and data processing with Elixir
Texar
⭐
2,008
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow
Bonobo
⭐
1,548
Extract Transform Load for Python 3.5+
Cascalog
⭐
1,378
Data processing on Hadoop without the hassle.
Data Science On Gcp
⭐
1,249
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Satpy
⭐
980
Python package for earth-observing satellite data processing
Bytewax
⭐
957
Python Stream Processing
Numaflow
⭐
866
Kubernetes-native platform to run massively parallel data/streaming jobs
Dataflowjavasdk
⭐
853
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Godel
⭐
834
Large-scale pretrained models for goal-directed dialog
Texar Pytorch
⭐
711
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
Hstream
⭐
668
HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications.
Qualitis
⭐
631
Qualitis is a one-stop data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. https://github.com/WeBankFinTech/Qualitis
Xidel
⭐
611
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
Collapse
⭐
556
Advanced and Fast Data Transformation in R
Awesome Kafka
⭐
549
A list about Apache Kafka
Haupt
⭐
451
Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon
Eternal
⭐
433
👾~ music, eternal ~ 👾
Text Dedup
⭐
399
All-in-one text de-duplication
Redisgears
⭐
339
Dynamic execution framework for your Redis data
Amadeus
⭐
334
Harmonious distributed data analysis in Rust.
Etl
⭐
327
PHP - ETL (Extract Transform Load) data processing library
Nonechucks
⭐
315
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Lithops
⭐
306
A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
Dolma
⭐
302
Data and tools for generating and inspecting OLMo pre-training data.
Covid Model
⭐
296
Fondant
⭐
293
Production-ready data processing made easy and shareable
Rapidtables
⭐
284
Super fast list of dicts to pre-formatted tables conversion library for Python 2/3
Pulsar Flink
⭐
255
Elastic data processing with Apache Pulsar and Apache Flink
Pysparkling
⭐
253
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Scramjet
⭐
243
Public tracker for Scramjet Cloud Platform, a platform that bring data from many environments together.
Substation
⭐
242
Substation is a cloud-native, event-driven data pipeline toolkit built for security teams.
Machine Learning Notebooks
⭐
241
Machine Learning notebooks for refreshing concepts.
50 Days Of Ml
⭐
237
A day to day plan for this challenge (50 Days of Machine Learning) . Covers both theoretical and practical aspects
Padasip
⭐
236
Python Adaptive Signal Processing
Vaspy
⭐
220
Manipulating VASP files with Python.
Pxi
⭐
217
🧚pxi (pixie) is a small, fast, and magical command-line data processor similar to jq, mlr, and awk.
Forte
⭐
215
Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/
Batchflow
⭐
195
BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.
Sepal
⭐
191
Geographical Data Processing in the Cloud
Mech
⭐
189
🦾 Main repository for the Mech programming language. Start here!
Dataflows
⭐
182
DataFlows is a simple, intuitive lightweight framework for building data processing flows in python.
Vue Datagrid
⭐
179
Spreadsheet data grid component. Handles enormous data processing.
Convtools Ita
⭐
176
convtools is a python library to declaratively define conversions for processing collections, doing complex aggregations and joins.
Snap Engine
⭐
175
ESA Earth Observation Toolbox and Java Development Platform
Catmandu
⭐
170
Catmandu - a data processing toolkit
Incubator Wayang
⭐
162
Apache Wayang(incubating) is the first cross-platform data processing system.
Salem
⭐
161
Add geolocalised subsetting, masking, and plotting operations to xarray
Brutalityextractor
⭐
160
适用于高性能系统的多进程解压缩软件(A multiprocess decompression software for high-performance system)
Skills Ml
⭐
159
Data Processing and Machine learning methods for the Open Skills Project
Spatialspark
⭐
141
Big Spatial Data Processing using Spark
Cq
⭐
139
Clojure Command-line Data Processor for JSON, YAML, EDN, XML and more
Machine_learning_a Z
⭐
130
Learning to create Machine Learning Algorithms
Rsgislib
⭐
130
Remote Sensing and GIS Software Library; python module tools for processing spatial data.
Snap Desktop
⭐
129
Desktop GUI for SNAP based on NetBeans Platform
Data Processing Agreements
⭐
127
Collection of Data Processing Agreement (DPA) and GDPR compliance resources
Splitpipeline
⭐
122
Parallel Data Processing in PowerShell
Sayn
⭐
117
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Shadow Experiments
⭐
108
WIP. Do not use.
Distributed Dataset
⭐
107
A distributed data processing framework in Haskell.
Libertem
⭐
104
Open pixelated STEM framework
Pulsar Spark
⭐
103
Spark Connector to read and write with Pulsar
Dampr
⭐
101
Python Data Processing library
Blinkist M4a Downloader
⭐
97
Grabs all of the audio files from all of the Blinkist books
Biofsharp
⭐
94
Open source bioinformatics and computational biology toolbox written in F#.
Alfa
⭐
93
♿ Suite of open and standards-based tools for performing reliable accessibility conformance testing at scale
Cotk
⭐
93
Conversational Toolkit. An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation
Dolphinnext
⭐
92
A graphical user interface for distributed data processing of high throughput genomics
Breast Cancer Risk Prediction
⭐
83
Classification of Breast Cancer diagnosis Using Support Vector Machines
2019 Electronic Design Competition
⭐
70
【电赛】2019 全国大学生电子设计竞赛 (F题)纸张数量检测装置 (基于STM32F407 & FDC2214 & USART HMI)
Financial Statement Pdf Extractor
⭐
70
Python script to extract as much structured information as possible from annual/quarterly reports.
Vip
⭐
68
VIP is a python package/library for angular, reference star and spectral differential imaging for exoplanet/disk detection through high-contrast imaging.
Deep Learn Oil
⭐
68
Deep learning tools for predicting oil well data
Ijcai18 Mama Ads Competition
⭐
68
IJCAI-18 阿里妈妈搜索广告转化预测初赛方案
Perke
⭐
67
A keyphrase extractor for Persian
Cbrain
⭐
65
CBRAIN is a flexible Ruby on Rails framework for accessing and processing of large data on high-performance computing infrastructures.
Mdsplus
⭐
62
The MDSplus data management system
Ipython Notebooks
⭐
61
Smartcitiesdata
⭐
59
The core micro services of UrbanOS as an umbrella project with component documentation
Machine Learning For Solar Energy Prediction
⭐
57
Predict the Power Production of a solar panel farm from Weather Measurements using Machine Learning
Processor
⭐
57
Ontology-driven Linked Data processor and server for SPARQL backends. Apache License.
Nodium
⭐
56
Nodium is an easy-to-use data analysis and automation platform using Rust with a visual node-based interface. It includes a plugin browser for downloading extensions, making it versatile for a wide range of data manipulation tasks. No coding experience required.
Tubes
⭐
55
A series of tubes.
Pyseis
⭐
54
Pure python seismic data processing
Sentieon Scripts
⭐
53
Helper scripts for biological data processing from Sentieon
Prosto
⭐
53
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Data_processing_course
⭐
53
Some class materials for a data processing course using PySpark
Pyint
⭐
52
Python&GAMMA based interfermetry toolbox for single or time-series of InSAR data processing.
Blaze
⭐
52
A blazing fast exporter for your Elasticsearch data.
Storm Docker
⭐
52
Docker image packaging for Apache Storm
Tqdj
⭐
51
A progress bar that plays lofi music
Go Dataframe
⭐
50
A simple package to abstract away the process of creating usable DataFrames for data analytics. This package is heavily inspired by the amazing Python library, Pandas.
Lrs3 For Speech Separation
⭐
49
Multi-modal speech separation task data generation script on LRS3 data set.
1-100 of 355 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.