Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for dataset data science
data-science
x
dataset
x
165 search results found
Akshare
⭐
8,269
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Cleanlab
⭐
8,182
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Deeplake
⭐
7,689
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
Fiftyone
⭐
6,327
The open-source tool for building high-quality datasets and computer vision models
Sql Translator
⭐
3,842
SQL Translator is a tool for converting natural language queries into SQL code using artificial intelligence. This project is 100% free and open source.
Igel
⭐
3,037
a delightful machine learning tool that allows you to train, test, and use models without writing code
Whylogs
⭐
2,533
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
Gopup
⭐
2,451
数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字
Datascience Pizza
⭐
2,199
🍕 Repositório para juntar informações sobre materiais de estudo em análise de dados e áreas afins, empresas que trabalham com dados e dicionário de conceitos
Codesearchnet
⭐
2,054
Datasets, tools, and benchmarks for representation learning of code.
Osint_collection
⭐
1,821
Maintained collection of OSINT related resources. (All Free & Actionable)
Diffgram
⭐
1,772
The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.
Spark Py Notebooks
⭐
1,515
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Uncertainty Baselines
⭐
1,324
High-quality implementations of standard and SOTA methods on a variety of tasks.
Dataprofiler
⭐
1,310
What's in your data? Extract schema, statistics and entities from datasets
Machine Learning With Python
⭐
1,155
Small scale machine learning projects to understand the core concepts . Give a Star 🌟If it helps you. BONUS: Interview Bank coming up..!
Graph Fraud Detection Papers
⭐
1,148
A curated list of graph-based fraud, anomaly, and outlier detection papers & resources
Qri
⭐
1,053
you're invited to a data party!
Data Juicer
⭐
994
A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!
Pydataset
⭐
853
Instant access to many datasets in Python.
Awesome Twitter Data
⭐
847
A list of Twitter datasets and related resources.
Datasets For Recommender Systems
⭐
821
This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS)
Datastream.io
⭐
761
An open-source framework for real-time anomaly detection using Python, ElasticSearch and Kibana
Tech.ml.dataset
⭐
616
A Clojure high performance data processing system
Datasets
⭐
521
A repository of pretty cool datasets that I collected for network science and machine learning research.
Complete Life Cycle Of A Data Science Project
⭐
499
Complete-Life-Cycle-of-a-Data-Science-Project
Oie Resources
⭐
435
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Dgfraud
⭐
432
A Deep Graph-based Toolbox for Fraud Detection
Carefree Learn
⭐
400
Deep Learning ❤️ PyTorch
Eseur Code Data
⭐
396
Code and data used to create the examples in "Evidence-based Software Engineering based on the publicly available data"
Data Science Hacks
⭐
300
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Opendatasets
⭐
290
A Python library for downloading datasets from Kaggle, Google Drive, and other online sources.
Datascience_course
⭐
289
Curso de Data Science em Português
Retriever
⭐
282
Quickly download, clean up, and install public datasets into a database management system
Squirrel Core
⭐
271
A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way 🌰
Miceforest
⭐
265
Multiple Imputation with LightGBM in Python
Covid19za
⭐
255
Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa
Awesome Public Real Time Datasets
⭐
240
A list of publicly available datasets with real-time data maintained by the team at bytewax.io
Data Science Resources
⭐
197
👨🏽🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋
Batchflow
⭐
195
BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.
Uci Ml Api
⭐
189
Simple API for UCI Machine Learning Dataset Repository (search, download, analyze)
Public Datasets
⭐
187
The list of public blockchain datasets in BigQuery
Hello Kaggle Guide Kor
⭐
185
Kaggle을 처음 접하는 사람들을 위한 문서
Trump Lies
⭐
175
Tutorial: Web scraping in Python with Beautiful Soup
Pureml
⭐
174
Developer platform for production ML.
Setl
⭐
173
A simple Spark-powered ETL framework that just works 🍺
Ml_algo
⭐
171
Machine learning algorithms in Dart programming language
Dud
⭐
158
A lightweight CLI tool for versioning data alongside source code and building data pipelines.
Sweetie Data
⭐
139
This repo contains logstash of various honeypots
Machine Learning Resources
⭐
137
A curated list of awesome machine learning frameworks, libraries, courses, books and many more.
Xgboost_ray
⭐
121
Distributed XGBoost on Ray
Dh Core
⭐
118
Functional data science
Papers With Data
⭐
117
A curated list of papers that released datasets along with their work
Harmonypy
⭐
116
🎼 Integrate multiple high-dimensional datasets with fuzzy k-means and locally linear adjustments.
Xda
⭐
108
R package for exploratory data analysis
Formula1 Datasets
⭐
105
Datasets & Analyses for Formula 1 World Championship
Coffee Quality Database
⭐
102
Building the Coffee Quality Institute Database
Storytelling With Data
⭐
97
Course materials for Dartmouth Course: Storytelling with Data (PSYC 81.09).
Covid 19 Casestudy And Predictions
⭐
90
This repository is a case study, analysis and visualization of COVID-19 Pandemic spread along with prediction models.
Openml R
⭐
90
R package to interface with OpenML
Telemetry
⭐
88
Open-source datasets for anyone interested in working with network anomaly based machine learning, data science and research
Ml Pyxis
⭐
87
Tool for reading and writing datasets of tensors in a Lightning Memory-Mapped Database (LMDB). Designed to manage machine learning datasets with fast reading speeds.
Classix
⭐
85
Fast and explainable clustering in Python
Datasets For Good
⭐
84
List of datasets to apply stats/machine learning/technology to the world of social good.
Medtagger
⭐
82
A collaborative framework for annotating medical datasets using crowdsourcing.
Dataengineeringpilipinas
⭐
80
Data Engineering Pilipinas is a community for data engineers, data analysts, data scientists, developers, AI / ML engineers, and users of closed and open source data tools and methods / techniques in the Philippines. Data Engineering Pilipinas is a PyData group.
Practicalmachinelearning
⭐
79
A curated collection of machine learning resources, including notebooks, code, and books, all of which are either free or open-source
Tiledb Vcf
⭐
79
Efficient variant-call data storage and retrieval library using the TileDB storage library.
Hello Kaggle Guide
⭐
78
For someone who is new at Kaggle
Datacomparer
⭐
74
dataCompareR is an R package that allows users to compare two datasets and view a report on the similarities and differences.
Nn Scratch
⭐
74
Coding up a Neural Network Classifier from Scratch
Foundry
⭐
71
Simplifying the discovery and usage of machine-learning ready datasets in materials science and chemistry
The Sparks Foundation
⭐
68
📌 This repo. Contains Basic - Advance level Machine learning / business analysis Projects. 👨💻
Datasciencewithpython
⭐
66
A repository to store everything for learning DataScience using Python
Vtuber Livechat Dataset
⭐
63
📊 VTuber 1B: Billion-scale Live Chat and Moderation Event Dataset
Visuallayer
⭐
62
Simplify Your Visual Data Ops. Find and visualize issues with your computer vision datasets such as duplicates, anomalies, data leakage, mislabels and others.
Ds With Pysimplegui
⭐
62
Data science and Machine Learning GUI programs/ desktop apps with PySimpleGUI package
Tsgm
⭐
60
Generative modeling of synthetic time series data and time series augmentations
Data Analysis Using Python
⭐
58
Exploratory data analysis 📊using python 🐍of used car 🚘 database taken from ⓚ𝖆𝖌𝖌𝖑𝖊
Crypto Trading Strategy Backtester
⭐
57
Easy-to-use cryptocurrency trading strategy simulator and backtester
Sars2pack
⭐
56
An R package with over 50 highly cited, read-to-use, up-to-date COVID-19 pandemic data resources
Dataset Registry
⭐
56
Dataset registry DVC project
Causeinfer
⭐
54
Machine learning based causal inference/uplift in Python
Knyfe
⭐
54
knyfe is a python utility for rapid exploration of datasets.
Crohme_extractor
⭐
54
CROHME dataset extractor for OFFLINE-text-recognition task.
Mobile Phone Dataset Gsmarena
⭐
53
Python script for creating Mobile Phones Dataset on GSMArena website.
Open Geo Data Education
⭐
51
Open Geospatial Datasets for GIS Education: This is a repository of open geospatial datasets to be used in an educational context. I created these files over years of teaching Geographic Data Science and GIS. All original datasets are freely available online with open data licenses (see the dataset attribution for details). All the datasets in this repository have been selected, cleaned, harmonised, and repackaged for GIS exercises in a higher-education context. This is a pretty time-intensive p
Recommender System Datasets
⭐
50
A list of compatible datasets, noting other major repositories containing popular real-world datasets, along with sample code for a range of recommendation tasks.
Covid19africa
⭐
45
Africa open COVID-19 data working group
Pem Dataset1
⭐
43
Proton Exchange Membrane (PEM) Fuel Cell Dataset
Nyisotoolkit
⭐
42
Access data, statistics, and visualizations for New York's electricity grid.
Deep Learning Resources
⭐
41
A curated list of deep learning resources books, courses, papers, libraries, conferences, sample code, and many more.
Work At Olist Data
⭐
41
Apply for a job at Olist's Data Team: https://olist.gupy.io/
Machine Learning
⭐
41
This repository will contain all the stuffs required for beginners in ML and DL do follow and star this repo for regular updates
Rdataretriever
⭐
40
R interface to the Data Retriever
Spatiotemporal_datasets
⭐
39
Spatiotemporal datasets collected for network science, deep learning and general machine learning research.
Rusquant
⭐
39
Official version of rusquant package for R
Synthetic Data Gen
⭐
39
Various methods for generating synthetic data for data science and ML
Data Polygamy
⭐
38
Data Polygamy is a topology-based framework that allows users to query for statistically significant relationships between spatio-temporal data sets.
Data Science And Machine Learning Resources
⭐
37
List of Data Science and Machine Learning Resource that I frequently use
Related Searches
Python Dataset (14,792)
Python Data Science (6,905)
Jupyter Notebook Dataset (6,824)
Machine Learning Data Science (4,725)
Jupyter Notebook Data Science (3,734)
Deep Learning Dataset (2,364)
Machine Learning Dataset (2,279)
Dataset Pytorch (1,847)
Dataset Tensorflow (1,583)
Dataset Classification (1,500)
1-100 of 165 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.