Awesome Data Temporality

A curated list to help you manage temporal data across many modalities 🚀.
Alternatives To Awesome Data Temporality
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Keras58,4583309 hours ago68May 13, 2022399apache-2.0Python
Deep Learning for humans
Scikit Learn54,41318,9446,7228 hours ago64May 19, 20222,196bsd-3-clausePython
scikit-learn: machine learning in Python
Ml For Beginners48,640
4 days ago12mitJupyter Notebook
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
Made With Ml33,193
a month ago5May 15, 201911mitJupyter Notebook
Learn how to responsibly develop, deploy and maintain production machine learning applications.
Spacy26,2141,5338424 days ago196April 05, 2022107mitPython
💫 Industrial-strength Natural Language Processing (NLP) in Python
Ray25,794801998 hours ago76June 09, 20222,863apache-2.0Python
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for accelerating ML workloads.
Data Science Ipython Notebooks25,025
25 days ago33otherPython
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Streamlit24,963174048 hours ago182July 27, 2022621apache-2.0Python
Streamlit — A faster way to build and share data apps.
Applied Ml24,242
5 days ago3mit
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Ai Expert Roadmap24,033
3 months ago13mitJavaScript
Roadmap to becoming an Artificial Intelligence Expert in 2022
Alternatives To Awesome Data Temporality
Select To Compare


Alternative Project Comparisons
Readme

Awesome Data Temporality

A curated list to help you manage temporal data across many modalities .

Awesome

Generative Art Created By DALLE!

Data Versioning for Machine Learning

Data versioning is the practice of storing multiple versions of the same data and providing a mechanism for accessing and managing these versions. This can be useful in a variety of situations, such as when data is accidentally deleted or corrupted, or when it is necessary to see how the data has changed over time. The vast majority of "data versioning" tools you see today are related to better managing your datasets for machine learning. The implementation paradigm used is to store versions of your data and models in Git commits. Therefore the following part of the awesome list is centered around machine learning. However, there are other ways to manage your temporal data covered in later sections.

Time Travel and Temporal Tables

Data time travel refers to the ability to go back in time and access previous versions of data. In order to enable data time travel, it is necessary to implement a system for versioning data, which involves storing multiple versions of the same data and providing a mechanism for accessing and managing these versions. Whereas temporal tables, also known as system-versioned temporal tables, are tables in a database that automatically track the history of data changes and allow you to query the data as it existed at any point in time. Both time travel an temporal tables often are used interchangablely to mean the same thing. Temporal tables are more of an implementation specific feature of some databases. These tables are useful for auditing, tracking changes to data over time, and performing point-in-time analysis. You can usually query a temporal table using the FOR SYSTEM_TIME clause in a SELECT statement.

Slowly Changing Dimensions Data Modeling

Slowly changing dimensions are those in which the attributes of the dimension change over time, and the changes need to be tracked in the data warehouse. For example, a customer's address or name might change over time, and the data warehouse needs to track these changes so that historical data can be analyzed correctly.

  • VDK Versatile Data Kit (VDK) is an open source framework including help to manage SCD style data.
  • dbtvault A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
  • dataform Common data models for creating type-2 slowly changing dimensions tables from mutable data sources in Dataform.
  • dbt snapshots DBT snapshots
  • DeltaLake Databricks change data capture with Delta Live Tables
  • 6 Kinds 6 Different Types of Slowly Changing Dimensions and How to Apply Them?
  • Data Vault Loading Dimensions from a Data Vault Model
  • SCD Data Warehouse Slowly Changing Dimension Handling in Data Warehouses Using Temporal Database Features
  • Redshift Implement a slowly changing dimension in Amazon Redshift

Bi-temporality Tools + Modeling

Bitemporality is a concept in database management that refers to the ability of a database to store and manage data that is associated with multiple time periods. This can include historical data as well as data that is still in the process of being entered or updated. In a bitemporal database, data is stored in multiple versions, with each version corresponding to a specific point in time. This allows users to view and query the data as it existed at different points in time, which can be useful for a variety of purposes such as understanding how data has changed over time or for tracking the history of a particular piece of data.

  • Martin Fowler Bitemporal History (explained) from world famous Martin Fowler
  • Crux of Bitemporality The Crux of Bitemporality - Jon Pither
  • Capgemini Enhancing Time Series Data by Applying Bitemporality (opinionated white paper mentioning KDB+)
  • GoldenSource A financial services data modeling software company perspective on bitemporality
  • MarkLogic A deep dive into bitemporality in MarkLogic
  • XTDB XTDB bitemporal graph database by Juxt with support for bitemporality
  • ARXIV Bitemporal Property Graphs to Organize Evolving Systems white paper
  • Axway Decision Insights bitemporal capability
  • Cloudera - Data Modeling Bi-temporal data modeling with Envelope
  • Bitemporal Database Book Bitemporal Databases: Modeling and Implementation
  • Speakerdeck An overview of bitemporality
  • Val on Programming (Datomic) Datomic: this is not the history you're looking for
  • Cybertec Implementing "As Of" queries in Postgresql
  • Bitempura.DB Bitempura.DB is a simple, bitemporal key-value database.
  • Modeler (Anchormodeler) (Bi-temporal) data modelling tool inspired by Anchor modeler, for PostgreSQL
  • BarbelHisto Lightweight ultra-fast Java library to store data in bi-temporal format
  • Robinhood Tracking Temporal Data at Robinhood

Change Data Capture (CDC) Tools

Change data capture (CDC) is a process that captures and stores data about changes made to a database or other data source. It is often used in data warehousing and data integration scenarios to ensure that data in different systems is kept up to date and in sync. CDC involves tracking changes made to a database or data source and storing information about those changes in a separate location, such as a separate database or log file. This allows the data in the original source to be updated, while still maintaining a record of the changes that were made.

  • Debezium Change data capture for a variety of databases
  • Supabase realtime Broadcast, Presence, and Postgres Changes via WebSockets
  • airbyte Data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes
  • Flink CDC Connectors for Apache Flink
  • gravity A Data Replication Center
  • brooklin An extensible distributed system for reliable nearline data streaming at scale

Soft Delete in ORM Frameworks

Soft delete is a method of deleting data from a database in a way that allows the data to be recovered if necessary. When data is deleted using the soft delete method, it is not physically removed from the database. Instead, it is marked as deleted and is typically no longer visible to users, but it can still be recovered if necessary. The soft delete method is often used as a way to prevent accidental or unintended data loss, as it allows deleted data to be recovered if necessary. It is also useful in scenarios where data needs to be retained for compliance or regulatory purposes, as it allows data to be retained while still making it unavailable to users.

Contribution

This list started as personal collection of interesting things about data versioning. Your contributions and suggestions are warmly welcomed. Read the contribution guidelines.

Popular Data Science Projects
Popular Machine Learning Projects
Popular Data Processing Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Machine Learning
Deep Learning
Ai
Data Science