Awesome Open Source
Awesome Open Source
Selected Topics
data x

The Top 275 Data Open Source Projects

Categories > Data Storage > Data
Sheetjs ⭐19,051
📗 SheetJS Community Edition -- Spreadsheet Data Toolkit
Metabase ⭐18,097
The simplest, fastest way to get business intelligence and analytics to everyone in your company 😋
Virgilio ⭐12,061
Your new Mentor for Data Science E-Learning.
G2 ⭐8,489
📊 The Grammar of Graphics in JavaScript
Awesome Bigdata ⭐8,331
A curated list of awesome big data frameworks, ressources and other awesomeness.
Openrefine ⭐6,719
OpenRefine is a free, open source power tool for working with messy data and improving it
Chinese Xinhua ⭐6,536
📙 中华新华字典数据库。包括歇后语,成语,词语,汉字。
Raw ⭐6,467
We launched a crowdfunding campaign to develop a brand new version of RAWGraphs
Machine Learning Mindmap ⭐4,272
A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.
Countly Server ⭐4,249
Countly helps you get insights from your application. Available self-hosted or on private cloud.
Knowledge Repo ⭐4,053
A next-generation curated knowledge sharing platform for data scientists and other technical professions.
Bad Data Guide ⭐3,619
An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.
React Refetch ⭐3,336
A simple, declarative, and composable way to fetch data for React components
Tabulator ⭐2,950
Interactive Tables and Data Grids for JavaScript
Data Transfer Project ⭐2,800
The Data Transfer Project makes it easy for people to transfer their data between online service providers. We are establishing a common framework, including data models and protocols, to enable direct transfer of data both into and out of participating online service providers.
Mimesis ⭐2,561
Mimesis is a package for Python, which helps generate big volumes of fake data for a variety of purposes in a variety of languages.
React Query ⭐2,543
⚛️ Hooks for fetching, caching and updating asynchronous data in React
Ckan ⭐2,476
CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers, and among many other sites.
Bogus ⭐2,453
📇 A simple and sane fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js.
Browser Compat Data ⭐2,438
This repository contains compatibility data for Web technologies as displayed on MDN
Aresdb ⭐2,185
A GPU-powered real-time analytics storage and query engine.
Weld ⭐2,076
High-performance runtime for data analytics applications
Fake2db ⭐1,998
create custom test databases that are populated with fake data
Onyx ⭐1,929
Distributed, masterless, high performance, fault tolerant data processing
Altair ⭐1,882
✨⚡️ A beautiful feature-rich GraphQL Client for all platforms.
Awesome Json Datasets ⭐1,859
A curated list of awesome JSON datasets that don't require authentication.
Scio ⭐1,725
A Scala API for Apache Beam and Google Cloud Dataflow.
Tera ⭐1,724
An Internet-Scale Database.
Datasets ⭐1,699
A collection of datasets ready to use with TensorFlow
Illacceptanything ⭐1,627
The project where literally anything* goes.
Data Populator ⭐1,614
A plugin for Sketch and Adobe XD to populate your design mockups with meaningful data. Goodbye Lorem Ipsum. Hello JSON.
Pyfunctional ⭐1,595
Python library for creating data pipelines with chain functional programming
Generatedata ⭐1,534
Random data generator in JS, PHP and MySQL.
Kea ⭐1,510
Data Layer for React. Powered by Redux.
Riko ⭐1,494
A Python stream processing engine modeled after Yahoo! Pipes
Data ⭐1,483
Assorted data from the General Services Administration.
Stats ⭐1,413
A well tested and comprehensive Golang statistics library package with no dependencies.
Just Dashboard ⭐1,399
📊 📋 Dashboards using YAML or JSON files
Data Integration ⭐1,355
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Cubes ⭐1,306
Light-weight Python OLAP framework for multi-dimensional data analysis
Kiba ⭐1,267
Data processing & ETL framework for Ruby
Pandas Datareader ⭐1,266
Extract data from a wide range of Internet sources into a pandas DataFrame.
Zhihu Oauth ⭐1,237
尝试解析出知乎官方未开放的 OAuth2 接口,并提供优雅的使用方式,作为 zhihu-py3 项目的替代者,目前还在实验阶段
Core ⭐1,184
Open source Dota 2 data platform
Js Xls ⭐1,131
❌ XLS (Excel 95-2004) + XML 2003 parser (now merged in )
Glom ⭐1,089
☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
Iso 3166 Countries With Regional Codes ⭐1,056
ISO 3166-1 country lists merged with their UN Geoscheme regional codes in ready-to-use JSON, XML, CSV data sets
Athenax ⭐1,056
SQL-based streaming analytics platform at scale
Muze ⭐1,053
Composable data visualisation library for web with a data-first approach
Resonance ⭐1,022
◾️Resonance | 5kb React animation library
Deeplearning Mindmap ⭐1,013
A mindmap summarising Deep Learning concepts.
Codesearchnet ⭐1,002
Datasets, tools, and benchmarks for representation learning of code.
Proteus ⭐977
Proteus : A JSON based LayoutInflater for Android
Textrecognitiondatagenerator ⭐970
A synthetic data generator for text recognition
Pycm ⭐864
Multi-class confusion matrix library in Python
Quilt ⭐847
Quilt is a versioned data portal for AWS
Rest Hooks ⭐847
Delightful data fetching for React.
Graph ⭐821
Graph is a semantic database that is used to create data-driven applications.
Colour ⭐805
Colour Science for Python
Sensei Grid ⭐799
Simple and lightweight data grid in JS/HTML
Web ⭐751
React web interface for the OpenDota platform
Gofakeit ⭐748
Random fake data generator written in go
Data Forge Ts ⭐748
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Dataframes.jl ⭐705
In-memory tabular data in Julia
Pypika ⭐689
PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.
Rows ⭐626
A common, beautiful interface to tabular data, no matter the format
Listen To Wikipedia ⭐602
Live, generative music from Wikipedia edits
Valid.js ⭐594
📝 A library for data validation.
Gray Matter ⭐579
Smarter YAML front matter parser, used by metalsmith, Gatsby, Netlify, Assemble, mapbox-gl, phenomic, and many others. Simple to use, and battle tested. Parses YAML by default but can also parse JSON Front Matter, Coffee Front Matter, TOML Front Matter, and has support for custom parsers.
Datafusion ⭐579
DataFusion has now been donated to the Apache Arrow project
Atscan ⭐574
Advanced dork Search & Mass Exploit Scanner
Mcw ⭐570
Microsoft Cloud Workshop Project ⭐559
F# Data: Library for Data Access
Faker ⭐553
Faker is a pure Elixir library for generating fake data.
Datasheets ⭐551
Read data from, write data to, and modify the formatting of Google Sheets
Terriajs ⭐543
A library for building rich, web-based geospatial data explorers.
Panini ⭐516
A super simple flat file generator.
Kakajson ⭐516
Fast conversion between JSON and model in Swift.
Vad ⭐454
Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
Footballdata ⭐442
A hodgepodge of JSON and CSV Football/Soccer data
Pyjanitor ⭐422
Clean APIs for data cleaning. Python implementation of R package Janitor
Fetch ⭐409
Simple & Efficient data access for Scala and Scala.js
Rio ⭐406
A Swiss-Army Knife for Data I/O
Pdpipe ⭐402
Easy pipelines for pandas DataFrames.
Isp Data Pollution ⭐393
ISP Data Pollution to Protect Private Browsing History with Obfuscation
Sklearn Classification ⭐368
Data Science Notebook on a Classification Task, using sklearn and Tensorflow.
Disk.frame ⭐365
Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data
Agile_data_code_2 ⭐365
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Pybaseball ⭐363
Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
Samples ⭐362
Sample projects using Material, Graph, and Algorithm.
Awesome Ai Ml Dl ⭐360
Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it. Study notes and a curated list of awesome resources of such topics.
Meza ⭐359
A Python toolkit for processing tabular data
Datacurator Filetree ⭐348
a standard filetree for /r/datacurator [ and r/datahoarder ]
J ⭐341
❌ Multi-format spreadsheet CLI (now merged in )
Anon ⭐335
A UNIX Command To Anonymise Data
Iexfinance ⭐325
Python SDK for IEX Cloud
Data ⭐324
This repository contains general data for Web technologies
Featran ⭐323
A Scala feature transformation library for data science and machine learning
Keypathkit ⭐319
KeyPathKit is a library that provides the standard functions to manipulate data along with a call-syntax that relies on typed keypaths to make the call sites as short and clean as possible.
Migration_data ⭐303
The solution to keep your Rails ActiveRecord migrations up to date
1-100 of 275 projects