Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for data integration
data-integration
x
118 search results found
Airflow
⭐
34,468
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Airbyte
⭐
12,918
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Dagster
⭐
9,467
An orchestration platform for the development, production, and observation of data assets.
Seatunnel
⭐
7,139
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
Mage Ai
⭐
6,324
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Cloudquery
⭐
5,380
The open source high performance data integration platform built for developers.
Kestra
⭐
5,257
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
Hudi
⭐
4,901
Upserts, Deletes And Incremental Processing on Big Data.
Chunjun
⭐
3,893
A data integration framework
Rudder Server
⭐
3,841
Privacy and Security focused Segment-alternative, in Golang and React
Jitsu
⭐
3,723
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
Awesome Single Cell
⭐
2,784
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
Fluvio
⭐
2,373
Lean and mean distributed stream processing system written in rust and web assembly.
Incubator Devlake
⭐
2,322
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
Mara Pipelines
⭐
2,053
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Bitsail
⭐
1,514
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
Kuwala
⭐
610
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demograp
Hudi Resources
⭐
509
汇总Apache Hudi相关资料
Transfer
⭐
495
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.
Harmony
⭐
460
Fast, sensitive and accurate integration of single-cell data with Harmony
Nichenetr
⭐
416
NicheNet: predict active ligand-target links between interacting cells
Seatunnel Web
⭐
365
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Conduit
⭐
321
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
Scarches
⭐
294
Reference mapping for single-cell genomics
Recap
⭐
292
Work with your web service, database, and streaming schemas in a single format.
Cql
⭐
285
Categorical Query Language IDE
Cuelake
⭐
266
Use SQL to build ELT pipelines on a data lakehouse.
Hetionet
⭐
199
Hetionet: an integrative network of disease
Mara Example Project 2
⭐
175
An example mini data warehouse for python project stats, template for new projects
Nomenklatura
⭐
171
Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources
Dataplane
⭐
171
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.
Metl
⭐
154
mito ETL tool
Morph Kgc
⭐
151
Powerful RDF Knowledge Graph Generation with RML Mappings
Scikit Fusion
⭐
136
scikit-fusion: Data fusion via collective latent factor models
Commoncoreontologies
⭐
129
The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.
Pandora
⭐
127
PANDORA Advanced Machine Learning for Data Integration, Analysis, and Insightful Discoveries in Health and Disease 💻
Harmonypy
⭐
116
🎼 Integrate multiple high-dimensional datasets with fuzzy k-means and locally linear adjustments.
Megalista
⭐
109
First Party data integration solution built for marketing teams to enable audience and conversion onboarding into Google Marketing products (Google Ads, Campaign Manager, Google Analytics).
Sdm Rdfizer
⭐
99
An Efficient RML-Compliant Engine for Knowledge Graph Construction
Thedataengineeringbook
⭐
96
The Data Engineering Book - หนังสือวิศวกรรมข้อมูล ของคนไทย เพื่อคนไทย
Winter
⭐
92
WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.
Prism
⭐
70
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.
Gecko
⭐
54
Toolbox for including enzyme constraints on a genome-scale model.
Drivers
⭐
53
Low-code Python library enabling access to APIs, tools, data sources in seconds.
Schemamapper
⭐
48
A .NET class library that allows you to import data from different sources into a unified destination
Cosmosr
⭐
46
COSMOS (Causal Oriented Search of Multi-Omic Space) is a method that integrates phosphoproteomics, transcriptomics, and metabolomics data sets.
Cellhint
⭐
44
A tool for semi-automatic cell type harmonization and integration
Datasphere Integration
⭐
38
an data-centric integration platform
Data Product Batch
⭐
36
Template to deploy a Data Product for Batch data processing into a Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Product template can be used by cross-functional teams to ingest, provide and create new data assets within the platform.
Data Product Streaming
⭐
34
Template to deploy a Data Product for data stream processing into a Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Product template can be used by cross-functional teams to ingest, provide and create new data assets within the platform.
Mara Etl Tools
⭐
33
Utilities for creating ETL pipelines with mara
Nfcompose
⭐
30
Build REST APIs/Integrations in minutes instead of hours - NF Compose is a (data) integration platform that allows developers to define REST APIs in seconds instead of hours. Generated REST APIs are backed by postgres and support automatic consumer webhook notifications on data changes out of the box.
Cogstack Nifi
⭐
29
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
Mapeathor
⭐
26
Translator of spreadsheet mappings into R2RML, RML or YARRRML
Pyradigm
⭐
24
Research data management in biomedical and machine learning applications
Portal
⭐
24
Adversarial domain translation networks for integrating large-scale atlas-level single-cell datasets
Linkml Model
⭐
23
Link Modeling Language (LinkML) model
Openomics
⭐
23
A bioinformatics API to interface with public multi-omics bio databases for wicked fast data integration.
Mario Py
⭐
21
MARIO: single-cell proteomic data matching and integration using both shared and distinct features
Data Integration Library
⭐
21
The Data Integration Library project provides a library of generic components based on a multi-stage architecture for data ingress and egress.
Plugin Sdk
⭐
20
CloudQuery Go SDK for source and destination plugins
Omicspls
⭐
20
R package for High dimensional data analysis and integration with O2PLS!
Barnard59
⭐
20
An intuitive and flexible RDF pipeline solution designed to simplify and automate ETL processes for efficient data management.
Thymeflow
⭐
19
Installer for Thymeflow, a personal knowledge management system.
Reedelk Runtime
⭐
18
Reedelk Runtime Platform Community Edition
Doctoral Thesis
⭐
17
Generation and Applications of Knowledge Graphs in Systems and Networks Biology
Rpanglaodb
⭐
17
An R package to download and merge labeled single-cell RNA-seq data from the PanglaoDB database into a Seurat object.
R Learning Journey
⭐
16
Some of the projects i made when starting to learn R for Data Science at the university
Assignpop
⭐
15
Population Assignment using Genetic, Non-genetic or Integrated Data in a Machine-learning Framework. Methods in Ecology and Evolution. 2018;9:439–446.
Gellish
⭐
15
Development of the Gellish Communicator reference application and tools for universal data exchange and data integration supporting Formal English and other Gellish formalized natural languages.
Gtfs Bench
⭐
15
GTFS-Madrid-Bench: A Benchmark for Knowledge Graph Construction Engines
Geckopy
⭐
14
Enzyme-constrained genome-scale models in python
Bio2bel
⭐
14
A Python framework for integrating biological databases and structured data sources in Biological Expression Language (BEL)
Robustsinglecell
⭐
13
Robust single cell clustering and comparison of population compositions across tissues and experimental models via similarity analysis.
Databridge.net
⭐
13
Configurable data bridge for permanent ETL jobs
Python Extractor Utils
⭐
12
Framework for developing extractors in Python
Schema Matching
⭐
12
Match schema attributes of relational databases by value similarity. As a study assignment, this isn't well documented, but you can contact me for questions and I may even add docs, if I sense enough interest.
Jedai Ui
⭐
12
UI for JedAI Toolkit
Fastintegration
⭐
12
FastIntegrate integrates thousands of scRNA-seq datasets and outputs batch-corrected values for downstream analysis
Pipes
⭐
12
Pipes for MarkLogic DataHub is visual programming tool for MarkLogic Data Hub. It integrates with MarkLogic's Datahub and produces custom code step(s) using a no-code UI environment.
Singer Working Group
⭐
12
Working group for ongoing development and iteration of the Singer Spec, the de-facto protocol for open source data connectors. Please use "Issues" to create discussion items - or use "Discussions" for general questions.
Obg Gen
⭐
11
Ontology-Based GraphQL Server Generation (OBG-gen)
Adtnorm
⭐
11
ADTnorm normalizes the cell surface protein measurement of CITE-seq data, facilitating across batches and across studies data integration.
Integrate
⭐
11
Scripts and resources to create Hetionet v1.0, a heterogeneous network for drug repurposing
Datanator
⭐
11
Toolkit for discovering and aggregating data for whole-cell modeling
Gripnet
⭐
11
GripNet: Graph Information Propagation on Supergraph for Heterogeneous Graphs (PatternRecognit, 2023)
Transmorph
⭐
10
Computational framework for dataset integration
Cerebrum
⭐
10
RDBMS framework for IdM and IAM systems
Mbpls
⭐
10
(Multiblock) Partial Least Squares Regression for Python
Unifuncnet
⭐
10
A multi-reference network annotation tool to support omics analysis
Biodwh2
⭐
10
BioDWH2 is an easy-to-use, automated, graph-based data warehouse and mapping tool for bioinformatics and medical informatics.
Bioen
⭐
9
BioEn - Bayesian Inference Of ENsembles
Muse
⭐
9
MUSE is a deep learning approach characterizing tissue composition through combined analysis of morphologies and transcriptional states for spatially resolved transcriptomics data.
Pymultiomics
⭐
9
Python toolbox for multi-omics data mapping and analysis
D Cca
⭐
9
A Decomposition-based Canonical Correlation Analysis for High-dimensional Datasets (JASA-20 paper)
Analyst
⭐
9
A declarative, SQL-like DSL for data integration tasks.
Socrata Kettle
⭐
9
Socrata plug-in for Pentaho Kettle
Morph Csv
⭐
9
Enhancing virtual KG access over tabular data with RML and CSVW
Pentaho Mongodb Delete Plugin
⭐
8
Pentaho Data Integration Step to Delete MongoDB Document
Thesis
⭐
8
PhD thesis: "Knowledge Graph Construction from Heterogeneous Data Sources exploiting Declarative Mapping Rules"
1-100 of 118 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.