Awesome Open Source
Awesome Open Source
Combined Topics
datasets
x
Advertising
📦 10
All Projects
Application Programming Interfaces
📦 124
Applications
📦 192
Artificial Intelligence
📦 78
Blockchain
📦 73
Build Tools
📦 113
Cloud Computing
📦 80
Code Quality
📦 28
Collaboration
📦 32
Command Line Interface
📦 49
Community
📦 83
Companies
📦 60
Compilers
📦 63
Computer Science
📦 80
Configuration Management
📦 42
Content Management
📦 175
Control Flow
📦 213
Data Formats
📦 78
Data Processing
📦 276
Data Storage
📦 135
Economics
📦 64
Frameworks
📦 215
Games
📦 129
Graphics
📦 110
Hardware
📦 152
Integrated Development Environments
📦 49
Learning Resources
📦 166
Legal
📦 29
Libraries
📦 129
Lists Of Projects
📦 22
Machine Learning
📦 347
Mapping
📦 64
Marketing
📦 15
Mathematics
📦 55
Media
📦 239
Messaging
📦 98
Networking
📦 315
Operating Systems
📦 89
Operations
📦 121
Package Managers
📦 55
Programming Languages
📦 245
Runtime Environments
📦 100
Science
📦 42
Security
📦 396
Social Media
📦 27
Software Architecture
📦 72
Software Development
📦 72
Software Performance
📦 58
Software Quality
📦 133
Text Editors
📦 49
Text Processing
📦 136
User Interface
📦 330
User Interface Components
📦 514
Version Control
📦 30
Virtualization
📦 71
Web Browsers
📦 42
Web Servers
📦 26
Web User Interface
📦 210
The Top 103 Datasets Open Source Projects
Categories
>
Data Processing
>
Datasets
Awesome Public Datasets
⭐
44,003
A topic-centric list of HQ open datasets.
Pix2code
⭐
11,199
pix2code: Generating Code from a Graphical User Interface Screenshot
Datasets
⭐
7,217
🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Label Studio
⭐
5,610
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Datasette
⭐
4,990
An open source multi-tool for exploring and publishing data
Doccano
⭐
4,624
Open source text annotation tool for machine learning practitioner.
Akshare
⭐
3,396
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Hub
⭐
3,013
Fastest unstructured dataset management for TensorFlow/PyTorch. Stream data real-time & version-control it. http://activeloop.ai
Datasets
⭐
2,774
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
Awesome Json Datasets
⭐
2,271
A curated list of awesome JSON datasets that don't require authentication.
Cluedatasetsearch
⭐
1,658
搜索所有中文NLP数据集,附常用英文NLP数据集
Pipedream
⭐
1,567
Connect APIs, remarkably fast. Free for developers.
Chineseglue
⭐
1,440
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard
Codesearchnet
⭐
1,390
Datasets, tools, and benchmarks for representation learning of code.
Gopup
⭐
1,292
数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…
Coco Annotator
⭐
1,180
✏️ Web-based image segmentation tool for object detection, localization, and keypoints
Colour
⭐
1,157
Colour Science for Python
Pytorch Cpp
⭐
1,038
C++ Implementation of PyTorch Tutorials for Everyone
Dataframes.jl
⭐
976
In-memory tabular data in Julia
Entity Recognition Datasets
⭐
913
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
Ogb
⭐
882
Benchmark datasets, data loaders, and evaluators for graph machine learning
Pydataset
⭐
881
Instant access to many datasets in Python.
Conversational Datasets
⭐
840
Large datasets for conversational AI
Audino
⭐
747
Open source audio annotation tool for humans™
Awesome Transit
⭐
726
Community list of transit APIs, apps, datasets, research, and software 🚌🌟🚋🌟🚂
Loghub
⭐
575
A large collection of system log datasets for AI-powered log analytics
Datasets For Recommender Systems
⭐
573
This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS)
Annotated Semantic Relationships Datasets
⭐
559
A collections of public and free annotated datasets of relationships between entities/nominals (Portuguese and English)
Datasets
⭐
553
Machine learning datasets used in tutorials on MachineLearningMastery.com
Awesome Twitter Data
⭐
542
A list of Twitter datasets and related resources.
Voice_datasets
⭐
517
🔊 A comprehensive list of open-source datasets for voice and sound computing (50+ datasets).
Awesome Dataset Tools
⭐
513
🔧 A curated list of awesome dataset tools
Openml
⭐
496
Open Machine Learning
Awesome Robotics
⭐
491
A curated list of awesome links and software libraries that are useful for robots.
Chinese Nlp Corpus
⭐
466
Collections of Chinese NLP corpus
Projects
⭐
435
🪐 End-to-end NLP workflows from prototype to production
Geobr
⭐
421
Easy access to official spatial data sets of Brazil in R and Python
Awesome Autonomous Vehicle
⭐
405
无人驾驶的资源列表中文版
Awesome Cybersecurity Datasets
⭐
404
A curated list of amazingly awesome Cybersecurity datasets
Awesome Holistic 3d
⭐
394
A list of papers and resources (data,code,etc) for holistic 3D reconstruction in computer vision
Video Understanding Dataset
⭐
388
A collection of recent video understanding datasets, under construction!
Animal Matting
⭐
378
Github repository for the paper End-to-end Animal Image Matting
Paperrobot
⭐
375
Code for PaperRobot: Incremental Draft Generation of Scientific Ideas
Dr.sure
⭐
368
🏫DeepLearning学习笔记以及Tensorflow、Pytorch的使用心得笔记。Dr. Sure会不定时往项目中添加他看到的最新的技术,欢迎批评指正。
Awesome Segmentation Saliency Dataset
⭐
330
A collection of some datasets for segmentation / saliency detection. Welcome to PR...😄
Chakin
⭐
323
Simple downloader for pre-trained word vectors
Medical Datasets
⭐
321
tracking medical datasets, with a focus on medical imaging
Open3d Ml
⭐
320
An extension of Open3D to address 3D Machine Learning tasks
Cleora
⭐
313
Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.
Tdc
⭐
309
Therapeutics Data Commons: Machine Learning Datasets and Tasks for Therapeutics
Cluecorpus2020
⭐
300
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
Meglass
⭐
282
An eyeglass face dataset collected and cleaned for face recognition evaluation, CCBR 2018.
Kartaslov
⭐
274
Roapi
⭐
273
Create full-fledged APIs for static datasets without writing a single line of code.
Annotation_tools
⭐
249
Visipedia Annotation Tools
Retriever
⭐
243
Quickly download, clean up, and install public datasets into a database management system
Datasets
⭐
235
source{d} datasets ("big code") for source code analysis and machine learning on source code
Machine Learning Resources
⭐
234
A curated list of awesome machine learning frameworks, libraries, courses, books and many more.
Zr Obp
⭐
233
Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation
Automated Resume Screening System
⭐
229
Automated Resume Screening System using Machine Learning (With Dataset)
Ner Datasets
⭐
222
Datasets to train supervised classifiers for Named-Entity Recognition in different languages (Portuguese, German, Dutch, French, English)
Aidl_kb
⭐
221
A Knowledge Base for the FB Group Artificial Intelligence and Deep Learning (AIDL)
Mola
⭐
209
A Modular Optimization framework for Localization and mApping (MOLA)
3d Pointcloud
⭐
209
Papers and Datasets about Point Cloud.
Indonlu
⭐
207
The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models, and a starter code! (AACL-IJCNLP 2020)
Nlp_datasets
⭐
198
My NLP datasets for Russian language
Datasaurus
⭐
197
R Package 📦 Containing the Datasaurus Dozen datasets 📊
Unify Emotion Datasets
⭐
171
A Survey and Experiments on Annotated Corpora for Emotion Classification in Text
Corus
⭐
162
Links to Russian corpora + Python functions for loading and parsing
Complete Life Cycle Of A Data Science Project
⭐
154
Complete-Life-Cycle-of-a-Data-Science-Project
Robotcar Dataset Sdk
⭐
153
Software Development Kit for the Oxford Robotcar Dataset
Idenprof
⭐
153
IdenProf dataset is a collection of images of identifiable professionals. It is been collected to enable the development of AI systems that can serve by identifying people and the nature of their job by simply looking at an image, just like humans can do.
Pins
⭐
152
Pin, Discover and Share Resources
Awesome Nlp Polish
⭐
152
A curated list of resources dedicated to Natural Language Processing (NLP) in polish. Models, tools, datasets.
Gekko Datasets
⭐
148
Gekko Trading Bot dataset dumps. Ready to use and download history files in SQLite format.
Remo Python
⭐
141
🐰 Python lib for remo - the app for annotations and images management in Computer Vision
Exposure_correction
⭐
132
Project page of the paper "Learning Multi-Scale Photo Exposure Correction", CVPR 2021.
Multi_object_datasets
⭐
127
Multi-object image datasets with ground-truth segmentation masks and generative factors.
Bird Recognition Review
⭐
117
A list of useful resources in the bird sound (song and calls) recognition, such as datasets, papers, links to open source projects and competitions
Firstcoursenetworkscience
⭐
116
Tutorials, datasets, and other material associated with textbook "A First Course in Network Science" by Menczer, Fortunato & Davis
Aspect Based Sentiment Analysis
⭐
115
Aspect-Based Sentiment Analysis Experiments
Aesthetics
⭐
114
Image Aesthetics Toolkit - includes Fisher Vector implementation, AVA (Image Aesthetic Visual Analysis) dataset and fast multi-threaded downloader
Cholera
⭐
110
R Package for Analyzing John Snow's 1854 Cholera Map
Wb_srgb
⭐
107
White balance camera-rendered sRGB images (CVPR 2019) [Matlab & Python]
Nlu_datasets_with_task_oriented_dialogue
⭐
105
datasets of natural language understanding and dialogue state tracking
Persian Swear Words
⭐
104
دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها
Transitland Datastore
⭐
101
Transitland's centralized web service API for both querying and editing aggregated transit data from around the world
Conversationalir
⭐
100
Overview of venues, research themes and datasets relevant for conversational search.
Doppelganger
⭐
100
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
Crossweigh
⭐
98
CrossWeigh: Training Named Entity Tagger from Imperfect Annotations
Nottingham Dataset
⭐
94
Cleaned version of the Nottingham dataset
Dareblopy
⭐
84
Data Reading Blocks for Python
Atis_dataset
⭐
82
The ATIS (Airline Travel Information System) Dataset
Openml R
⭐
81
R package to interface with OpenML
Photogrammetry_datasets
⭐
81
Collection of 250+ datasets for photogrammetry
Wongnai Corpus
⭐
57
Collection of Wongnai's datasets
Personas
⭐
51
Datasets for Deep learning Personas
Awesome Earth Artificial Intelligence
⭐
46
A curated list of Earth Science's Artificial Intelligence (AI) tutorials, notebooks, software, datasets, courses, books, video lectures and papers. Contributions most welcome.
Describing_a_knowledge_base
⭐
41
Code for Describing a Knowledge Base
Healthcheck
⭐
38
Health Check ✔ is a Machine Learning Web Application made using Flask that can predict mainly three diseases i.e. Diabetes, Heart Disease, and Cancer.
1-100 of 103 projects
Next >
Advertising
📦 10
All Projects
Application Programming Interfaces
📦 124
Applications
📦 192
Artificial Intelligence
📦 78
Blockchain
📦 73
Build Tools
📦 113
Cloud Computing
📦 80
Code Quality
📦 28
Collaboration
📦 32
Command Line Interface
📦 49
Community
📦 83
Companies
📦 60
Compilers
📦 63
Computer Science
📦 80
Configuration Management
📦 42
Content Management
📦 175
Control Flow
📦 213
Data Formats
📦 78
Data Processing
📦 276
Data Storage
📦 135
Economics
📦 64
Frameworks
📦 215
Games
📦 129
Graphics
📦 110
Hardware
📦 152
Integrated Development Environments
📦 49
Learning Resources
📦 166
Legal
📦 29
Libraries
📦 129
Lists Of Projects
📦 22
Machine Learning
📦 347
Mapping
📦 64
Marketing
📦 15
Mathematics
📦 55
Media
📦 239
Messaging
📦 98
Networking
📦 315
Operating Systems
📦 89
Operations
📦 121
Package Managers
📦 55
Programming Languages
📦 245
Runtime Environments
📦 100
Science
📦 42
Security
📦 396
Social Media
📦 27
Software Architecture
📦 72
Software Development
📦 72
Software Performance
📦 58
Software Quality
📦 133
Text Editors
📦 49
Text Processing
📦 136
User Interface
📦 330
User Interface Components
📦 514
Version Control
📦 30
Virtualization
📦 71
Web Browsers
📦 42
Web Servers
📦 26
Web User Interface
📦 210