Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for data quality
data-quality
x
127 search results found
Made With Ml
⭐
35,496
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Applied Ml
⭐
24,828
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Ydata Profiling
⭐
12,080
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Great_expectations
⭐
9,179
Always know what to expect from your data.
Cleanlab
⭐
8,696
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Kestra
⭐
5,257
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
Feast
⭐
5,053
Feature Store for Machine Learning
Lakefs
⭐
3,900
lakeFS - Data version control for your data lake | Git for data
Openmetadata
⭐
3,512
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
Deequ
⭐
3,044
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Mlops Course
⭐
2,744
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Data Diff
⭐
2,707
Compare tables within or across databases
Whylogs
⭐
2,533
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
Feathr
⭐
1,886
Feathr – A scalable, unified data and AI engineering platform for enterprise
Featureform
⭐
1,716
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
Soda Core
⭐
1,644
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
Re Data
⭐
1,499
re_data - fix data issues before your users & CEO would discover them 😊
Odd Platform
⭐
1,047
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Data Centric Ai
⭐
892
A curated, but incomplete, list of data-centric AI resources.
Zingg
⭐
828
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Pointblank
⭐
785
Data quality assessment and metadata reporting for data frames and database tables
Cleanvision
⭐
739
Automatically find issues in image datasets and practice data-centric computer vision.
Chaos_genius
⭐
671
ML powered analytics engine for outlier detection and root cause analysis.
Qualitis
⭐
631
Qualitis is a one-stop data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. https://github.com/WeBankFinTech/Qualitis
Failed Ml
⭐
585
Compilation of high-profile real-world examples of failed machine learning projects
Datacleaner
⭐
557
The premier open source Data Quality solution
Traceml
⭐
493
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
Piperider
⭐
443
Code review for data in dbt
Awesome Data Catalogs
⭐
441
📙 Awesome Data Catalogs and Observability Platforms.
Encord Active
⭐
385
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
Lale
⭐
321
Library for Semi-Automated Data Science
Awesome Data Centric Ai
⭐
282
Open-Source Software, Tutorials, and Research on Data-Centric AI 🤖
Datavines
⭐
275
Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.
Feathub
⭐
255
FeatHub - A stream-batch unified feature store for real-time machine learning
Data Drift
⭐
253
Metrics Observability & Troubleshooting
Whylogs Java
⭐
179
Profile and monitor your ML data pipeline end-to-end
Mobydq
⭐
175
🐳 Tool to automate data quality checks on data pipelines
Tracin
⭐
156
Implementation of Estimating Training Data Influence by Tracing Gradient Descent (NeurIPS 2020)
Lakehouse Engine
⭐
154
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
Airflow Provider Great Expectations
⭐
147
Great Expectations Airflow operator
Rdfunit
⭐
143
An RDF Unit Testing Suite
Csvlint
⭐
123
CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting, fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files.
Datachecks
⭐
117
Open Source Data Quality Monitoring.
Dataqualitydashboard
⭐
111
A tool to help improve data quality standards in observational data science.
Nbi
⭐
103
NBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile your test suite. Just create an Xml file and let the framework interpret it and play your tests. The framework is designed as an add-on of NUnit but with
Pandas_dq
⭐
101
Find data quality issues and clean your data in a single line of code with a Scikit-Learn compatible Transformer.
Django Data Quality System
⭐
100
数据治理、数据质量检核/监控平台(Django+jQuery+MySQL)
Dbt Re Data
⭐
93
re_data - fix data issues before your users & CEO would discover them 😊
Amazon Deequ Glue
⭐
74
Automated data quality suggestions and analysis with Deequ on AWS Glue
Swiple
⭐
72
Swiple enables you to easily observe, understand, validate and improve the quality of your data
Jumbune
⭐
69
Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,
Great_expectations_action
⭐
68
A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.
Sqlbucket
⭐
67
Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.
Leila
⭐
56
Librería para la evaluación de calidad de datos, e interacción con el portal de datos.gov.co
Cuallee
⭐
56
A data quality acceleration library to get data sets verified in a friendly interface
Data Quality Gate
⭐
53
Data Quality Gate based on AWS
Pydvl
⭐
52
pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
Awesome Python For Data Science
⭐
51
A curated list of awesome resources such as books, tutorials, courses, open-source libraries, exercises, and other materials that support Pythonistas in the making, and Pythonistas migrating into Data Science! 📊
Soda Spark
⭐
49
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Bikedna
⭐
46
BikeDNA: Bicycle Infrastructure Data & Network Assessment
Dqlab Career Track
⭐
42
A collection of scripts written to complete DQLab Data Analyst Career Track 📊
Ml_observability_course
⭐
38
Free Open-source ML observability course for data scientists and ML engineers. Learn how to monitor and debug your ML models in production.
Dtcleaner
⭐
37
DTCleaner: data cleaning using multi-target decision trees.
Dqo
⭐
37
Data Quality and Observability platform with custom rules, data quality KPIs and data quality dashboards. Measure the data quality, not only observe it!
Amora Data Build Tool
⭐
37
Amora Data Build Tool enables analysts and engineers to transform data on the data warehouse (BigQuery) by writing Amora Models that describe the data schema using Python's "PEP484 - Type Hints" and select statements with SQLAlchemy. Amora is able to transform Python code into SQL data transformation jobs that run inside the warehouse.
Acharya
⭐
35
A Data Centric annotation tool for your Named Entity Recognition projects
Blast
⭐
31
Blast is a data orchestration tool that can run SQL and Python against Google BigQuery and Snowflake. It supports templating with Jinja, data quality tests, query validation, environment management and more.
Ohsome Quality Api
⭐
31
Data quality estimations for OpenStreetMap
Daiquiri
⭐
30
Data quality reporting for temporal datasets.
Check Engine
⭐
30
Data validation library for PySpark 3.0.0
Hooqu
⭐
24
hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to Python
Osm Data Classification
⭐
24
OpenStreetMap Data Classification
Dqcs
⭐
22
数据质量控制系统
Data Flare
⭐
21
Data quality control tool built on spark and deequ
Convo
⭐
20
R package based on "Column Names as Contracts" blog post (https://emilyriederer.netlify.app/post/column-nam
Verified Telemetry
⭐
19
Azure Verified Telemetry for IoT is a state-of-the-art solution to seamlessly determine the health of the sensor in real-time.
Dataops
⭐
19
DataOps for Government
Redflag
⭐
19
Safety net for machine learning pipelines. Plays nice with sklearn and pandas.
Data Quality Analysis
⭐
18
The PEDSnet Data Quality Assessment Toolkit (OMOP CDM)
Panda_patrol
⭐
18
Penguin Datalayer Collect
⭐
18
A data layer quality monitoring and validation module, this solution is part of the Raft Suite ecosystem.
Dbt Artifacts Loader
⭐
17
Load dbt artifacts uploaded to GCS to BigQuery in order to track historical dbt results
Dirty Dataimpacts
⭐
17
Codes&Datasets
Dqm
⭐
17
A simple platform dedicated to data quality issues detection, especially in the context of online advertising.
Srm
⭐
16
This Chrome Extension automatically performs SRM checks and flags potential data quality issues on supported experimentation platforms.
Cleanlab Studio
⭐
16
Client interface for all things Cleanlab Studio
Hive_compared_bq
⭐
16
hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.
Mms_benchmark
⭐
14
The most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected from over 350 datasets reported in the scientific literature based on strict quality criteria and covers 27 languages.
Dq Tools
⭐
14
Make simple storing test results and visualisation of these in a BI dashboard
Qamd
⭐
14
QAMyData, a data quality assurance tool for SPSS, STATA, SAS and CSV files.
Contessa
⭐
13
Easy way to define, execute and store quality rules for your data.
Hands On Great Expectations With Spark
⭐
12
How to evaluate the Quality of your Data with Great Expectations and Spark.
Iau Course
⭐
12
Intelligent Data Analysis (IAU_B) @ FIIT STU in Bratislava
Roomba
⭐
12
A Node.js tool to examine the correctness of Open Data Metadata and build custom dataset profiles
Nlp Data Readiness
⭐
11
This is a document concerning Data Readiness in the context of machine learning and Natural Language Processing.
Awesome Ml Monitoring
⭐
11
A curated list of awesome open source tools and commercial products for monitoring data quality, monitoring model performance, and profiling data 🚀
Dataqtor
⭐
11
🔍Your Data Quality Detector / Gain insight into your data and get it ready for use before you start working with it 💡📊🛠💎
Data Quality Monitoring
⭐
10
Data Quality Monitoring Tool
Greatex
⭐
10
A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in Airflow.
Huemul Bigdatagovernance
⭐
10
Huemul BigDataGovernance, es una framework que trabaja sobre Spark, Hive y HDFS. Permite la implementación de una estrategia corporativa de dato único, basada en buenas prácticas de Gobierno de Datos. Permite implementar tablas con control de Primary Key y Foreing Key al insertar y actualizar datos utilizando la librería, Validación de nulos, largos de textos, máximos/mínimos de números y fechas, valores únicos y valores por default. También permite clasificar los campos en aplicabilidad de der
1-100 of 127 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.