Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python dataset generation
dataset-generation
x
python
x
74 search results found
Nfstream
⭐
1,015
NFStream: a Flexible Network Data Analysis Framework.
Masktheface
⭐
489
Convert face dataset to masked dataset
Bpycv
⭐
401
Computer vision utils for Blender (generate instance annoatation, depth and 6D pose by one line code)
Doppelganger
⭐
259
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
Datasetgpt
⭐
258
A command-line interface to generate textual and conversational datasets with LLMs.
Chatette
⭐
220
A powerful dataset generator for Rasa NLU, inspired by Chatito
Stopes
⭐
212
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.
Voc2007_for_yolo_torch
⭐
186
👊 Prepare VOC format datasets for ultralytics/yolov3 & yolov5
Open Korean Instructions
⭐
182
언어모델을 학습하기 위한 공개 한국어 instruction dataset들을 모아두었습니다.
Bamboo
⭐
151
Bamboo: 4 times larger than ImageNet; 2 time larger than Object365; Built by active learning.
Auto_annotate
⭐
134
Labeling is boring. Use this tool to speed up your next object detection project!
The Youtube Scraper
⭐
101
Download YouTube video description and video comments without using the YouTube API.
Pyreports
⭐
85
pyreports is a python library that allows you to create complex report from various sources
Download_audioset
⭐
81
📁 This repo makes it easy to download the raw audio files from AudioSet (32.45 GB, 632 classes).
Pansori
⭐
74
Tools for ASR Corpus Generation from Online Video
Prontoqa
⭐
67
Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.
Smart_categorizer
⭐
65
Trainable categorization tool
Clinical Trial Outcome Prediction
⭐
60
benchmark dataset and Deep learning method (Hierarchical Interaction Network, HINT) for clinical trial approval probability prediction, published in Cell Patterns 2022.
Crypto Trading Strategy Backtester
⭐
57
Easy-to-use cryptocurrency trading strategy simulator and backtester
Gutentag
⭐
53
GutenTAG is an extensible tool to generate time series datasets with and without anomalies; integrated with TimeEval.
Trscraper
⭐
47
TRScraper, doğal dil işleme uygulamalarında kullanılmak amacıyla geliştirilmiş, Türkçe içerik girilen büyük platformlarda metin madenciliği yapma imkanı sunan bir uygulamadır.
Vietnameseocr
⭐
47
Vietnamese Optical Character Recognition. It works with Vietnamese and Latin characters as well.
Id2t
⭐
42
Official ID2T repository. ID2T creates labeled IT network datasets that contain user defined synthetic attacks.
Docker Packing Box
⭐
40
Docker image gathering packers and tools for making datasets of packed executables and training machine learning models for packing detection
Building Dataset Generator
⭐
39
Procedural 3D data generation pipeline for architecture
Vocalforge
⭐
39
Your one-stop solution for voice dataset creation
Mpose2021_dataset
⭐
37
This repository contains the MPOSE2021 Dataset for short-time pose-based Human Action Recognition (HAR).
Supercaustics
⭐
36
Real-time, open-source simulation of transparent objects for deep learning applications
Ransomware Json Dataset
⭐
34
Compiles a json dataset using public sources that contains properties to aid in the detection and mitigation of over 1000 variants of ransomware.
Step
⭐
28
Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits
Clevr Dialog
⭐
28
Repository to generate CLEVR-Dialog: A diagnostic dataset for Visual Dialog
Celeba Hq Dataset Download
⭐
26
Download CelebA-HQ dataset easily ! Create with docker or download from Google Drive.
Crawlingathome
⭐
24
A client library for LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.
Es Imagenet Master
⭐
23
code for generating data set ES-ImageNet with corresponding training code
Diabetes Prediction System
⭐
21
Predict Diabetes and its possibility of occurrence from the pathological lab reports on your own.
Spotify Musixmatch Data Collector
⭐
20
A Python module to generate large scale Music datasets using both Spotify and MusixMatch API's.
Facebook Profile Pictures Downloader
⭐
20
😆 Download public profile pictures from Facebook.
Uk
⭐
17
Фонограми та синтагми: інструменти обробки
Persian_licenceplate_generator
⭐
17
Atomated_LP is a simple tool helps you generate persian vehicle number pates in order to train your CNN.
Babi_tools
⭐
15
Augmentation scripts for the bAbI Dialog Tasks dataset
Fmodetect
⭐
15
[ICCV 2021] FMODetect: Robust Detection of Fast Moving Objects
Oidv4_to_voc
⭐
12
Convert Open Image v4 Dataset to VOC pasacal format XML. Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes. https://github.com/openimages/dataset
Synthetic Dataset Generation
⭐
12
Easily create an instance segmentation dataset from an existing pool of objects of interest, distractor objects and background images. Easy configuration, diverse image compositions, multiple blending methods, Dockerized.
Deepscenario
⭐
12
DeepScenario: An Open Driving Scenario Dataset for Autonomous Driving System Testing
Twords
⭐
12
Twitter Word Frequency Analysis
Regen
⭐
11
[ACL'23 Findings] This is the code repo for our ACL'23 Findings paper "ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval".
Cr Plates Generator
⭐
11
Costa Rican license plate dataset generator
Layerx Community
⭐
11
LayerX-AI is a comprehensive platform to annotate and manage your machine learning data.
Webtrench
⭐
10
A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of images, text, videos, meta data, and more. Ideal for machine learning and deep learning engineers. Download and extract data with just one line of code
Mnist Sequence
⭐
10
A tool to generate image dataset for sequences of handwritten digits using MNIST database
Stream This Dataset
⭐
10
Code to convert static datasets into simulated data streams
Crawlingathome Server
⭐
10
A server powering LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.
Kabooks
⭐
10
KABooks is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. Using audiobooks, KABooks will generate dataset with segmented audios and aligned texts.
Auromat
⭐
10
AUROra MApping Toolkit - Python library / CLI tools for creating and working with georeferenced images for aurora research.
Darth
⭐
10
DATASETS FOR WHOLE E-ARTH
Twitter Stream Downloader
⭐
10
A service for downloading twitter streaming data. You can save the data either in text files on disk, or in a database (MongoDB).
Speech Corpus Dl
⭐
9
Download and preperation tool for free speech corpora.
India Trade Data
⭐
9
A web scraper written in Python to gather trade data for India across commodities and countries
Fraud Detection Datagen
⭐
9
Fraud detection data generation with community structure, ready for NebulaGraph.
Named Entity Recognition
⭐
8
Corpus and a baseline neural network system for Named Entity Recognition in Hindi-English Code-Mixed social media text.
Virtualsoc
⭐
8
Dynamic Social Network Simulation Data with Ground Truth Labels and Features
Hybrid Dataset Factory
⭐
8
A semi-synthetic dataset generation tool, specifically crafted for CNN training in drone racing.
Facerecognition
⭐
8
Built on OpenCV 3.2.0 and Python 3.6.0/Anaconda 4.3.0. Code to detect faces using Haar Cascade and match faces using LBPH (Local Binary Patterns Histogram) Recognition on a live web camera.
Dataset Rising
⭐
8
Toolchain for creating custom datasets and training Stable Diffusion (1.x, 2.x, XL) models and LoRAs
Materials Dataset
⭐
8
Script to create a dataset of PBR materials (SVBRDF) from CC0 sources online.
Imagenetsubsetgenerator
⭐
7
Creates subsets of ImageNet (e.g. ImageNet100)
Common_datasets
⭐
7
Common-datasets is a GitHub repository dedicated to providing a wide collection of common datasets for practicing and learning data science and machine learning.
Chessdata
⭐
7
This is a dataset which contains millions of positions with stockfish evaluations.
Netflowlabeler
⭐
6
A configurable rule-based labeling tool for network flow files.
Japanese Street Addresses Scraper
⭐
6
Scraper for Japanese street addresses (住所).
Logicalconsistency
⭐
6
The official PyTorch implementation of Logical Consistency and Greater Descriptive Power for Facial Hair Attribute Learning - CVPR 2023
Tool Fastbatchimagecrop
⭐
5
A simple UI tool to batch crop images to prepare datasets from images and videos.
Scalexi
⭐
5
scalexi is a versatile open-source Python library, optimized for Python 3.11+, focuses on facilitating low-code development and fine-tuning of diverse Large Language Models (LLMs).
Instagram Posts Crawler
⭐
5
Merupakan program yang berguna untuk mendapatkan dan mengolah konten post di Instagram menjadi Dataset.
Escalate_capture
⭐
5
Data capture and experimental interfacing software for chemistry (part 1 of 2)
Related Searches
Python Script (17,004)
Python Dataset (14,792)
Python Docker (14,113)
Python Machine Learning (14,099)
Python Tensorflow (13,736)
Python Command Line (13,351)
Python Deep Learning (13,092)
Python Jupyter Notebook (12,976)
Python Network (11,495)
Python Natural Language Processing (9,064)
1-74 of 74 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.