Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for big data
big-data
x
1,346 search results found
Spark Doc Zh
⭐
1,186
Apache Spark 官方文档中文版
Egads
⭐
1,136
A Java package to automatically detect anomalies in large scale time-series data
Scikit Learn Intelex
⭐
1,116
Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
Arrow Ballista
⭐
1,111
Apache Arrow Ballista Distributed Query Engine
Datumbox Framework
⭐
1,089
Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
Arcticdb
⭐
1,071
ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
Hazelcast Jet
⭐
1,065
Distributed Stream and Batch Processing
Kube Batch
⭐
1,065
A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC
Kube Batch
⭐
1,055
A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC
Odd Platform
⭐
1,047
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Utils4s
⭐
1,033
scala、spark使用过程中,各种测试用例以及相关资料整理
Daft
⭐
1,012
Distributed DataFrame for Python designed for the cloud, powered by Rust
Phoenix
⭐
1,006
Mirror of Apache Phoenix
Accumulo
⭐
1,003
Apache Accumulo
Autodl
⭐
999
Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL challenge@NeurIPS.
Traildb
⭐
987
TrailDB is an efficient tool for storing and querying series of events
Adam
⭐
966
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Sparkling Water
⭐
957
Sparkling Water provides H2O functionality inside Spark cluster
Mobius
⭐
937
C# and F# language binding and extensions to Apache Spark
Cds
⭐
935
Data syncing in golang for ClickHouse.
Coding Now
⭐
925
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、
Titanoboa
⭐
905
Titanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.
Tispark
⭐
872
TiSpark is built for running Apache Spark on top of TiDB/TiKV
Dataflowjavasdk
⭐
853
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Incubator Livy
⭐
840
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Sqoop
⭐
820
Mirror of Apache Sqoop
Rakam Api
⭐
798
📈 Collect customer event data from your apps. (Note that this project only includes the API collector, not the visualization platform)
Kafka Streams
⭐
797
equivalent to kafka-streams 🐙 for nodejs ✨🐢🚀✨
Samza
⭐
792
Mirror of Apache Samza
Onlinestats.jl
⭐
786
⚡ Single-pass algorithms for statistics
Blaze
⭐
784
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Gearpump
⭐
758
Lightweight real-time big data streaming engine over Akka
Spark Movie Lens
⭐
757
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Ozone
⭐
753
Scalable, redundant, and distributed object store for Apache Hadoop
Visualpython
⭐
748
GUI-based Python code generator for data science, extension to Jupyter Lab, Jupyter Notebook and Google Colab.
Sciblog_support
⭐
742
Support content for my blog
Incubator Celeborn
⭐
725
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
Flink Boot
⭐
725
懒松鼠Flink-Boot 脚手架让Flink全面拥抱Spring生态体系,使得开发者可以以Java WEB开发模式开发出分布式运行的流处理程序,懒松鼠让跨界变得更加简单。懒松鼠旨在让开发者以更底上手成 ORM框架,Hibernate Validator校验框架,Spring Retry重试框架等,具体见下面的脚手架特性。
Graphchi Cpp
⭐
710
GraphChi's C++ version. Big Data - small machine.
Nipype
⭐
707
Workflows and interfaces for neuroimaging packages
Pgm Index
⭐
693
🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
Oozie
⭐
687
Mirror of Apache Oozie
Data Science Career
⭐
661
Career Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Flink Kubernetes Operator
⭐
657
Apache Flink Kubernetes Operator
Datav Vue
⭐
654
A Powerful Data Visualization Tool. Uses TypeScript And Vue3. Scenario-specific templates. User-friendly interfaces. 一款数据可视化应用搭建工具
Delta Sharing
⭐
654
An open protocol for secure data sharing
Orc
⭐
645
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
Sdc
⭐
645
Numba extension for compiling Pandas data frames, Intel® Scalable Dataframe Compiler
Dataengineeringproject
⭐
644
Example end to end data engineering project.
Oio Sds
⭐
634
High Performance Software-Defined Object Storage for Big Data and AI, that supports Amazon S3 and Openstack Swift
Cortx
⭐
630
CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Wedatasphere
⭐
624
WeDataSphere is a financial grade, one-stop big data platform suite.
Opendata.cern.ch
⭐
620
Source code for the CERN Open Data portal
Spark Rapids
⭐
619
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Amoro
⭐
617
Amoro is a Lakehouse management system built on open data lake formats.
Listenbrainz Server
⭐
613
Server for the ListenBrainz project, including the front-end (javascript/react) code that it serves and all of the data processing components that LB uses.
Scanner
⭐
602
Efficient video analysis at scale
Courses
⭐
590
Answers for Quizzes & Assignments that I have taken
Eland
⭐
588
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Onedal
⭐
584
oneAPI Data Analytics Library (oneDAL)
Giraph
⭐
582
Mirror of Apache Giraph
Parquetviewer
⭐
574
Simple windows desktop application for viewing & querying Apache Parquet files
Nussknacker
⭐
564
Low-code tool for automating actions on real time data | Stream processing for the users.
Tugraph Analytics
⭐
557
TuGraph Analytics is the fastest OLAP graph database.
Redislite
⭐
555
Redis in a python module.
Spline
⭐
553
Data Lineage Tracking And Visualization Solution
Bigtop
⭐
549
Bigtop is an Apache Foundation project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components.
Bigartm
⭐
537
Fast topic modeling platform
Metorikku
⭐
536
A simplified, lightweight ETL Framework based on Apache Spark
Bigdata Ecosystem
⭐
536
BigData Ecosystem Dataset
Running Elasticsearch Fun Profit
⭐
534
A book about running Elasticsearch
Bigslice
⭐
525
A serverless cluster computing system for the Go programming language
Datawave
⭐
512
DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.
Mockneat
⭐
511
MockNeat - the modern faker lib.
Clickbench
⭐
510
ClickBench: a Benchmark For Analytical Databases
Hudi Resources
⭐
509
汇总Apache Hudi相关资料
Magellan
⭐
509
Geo Spatial Data Analytics on Spark
Sidekick
⭐
503
High Performance HTTP Sidecar Load Balancer
Fit Sne
⭐
499
Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
Thrill
⭐
495
Thrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++
Decentralized Internet
⭐
486
A SDK/library for decentralized web and distributing computing projects
Vue Bigdata Table
⭐
476
基于Vue.js的百万级数据表格组件,支持编辑、筛选、过滤、粘贴、拖动调整列宽等多种功能
Jigsaw
⭐
475
Jigsaw七巧板 provides a set of web components based on Angular5/8/9+. The main purpose of Jigsaw is to help the application developers to construct complex & intensive interacting & user friendly web pages. Jigsaw is supporting the development of all applications of Big Data Product of ZTE.
Kafka Connect Hdfs
⭐
473
Kafka Connect HDFS connector
Halodb
⭐
472
A fast, log structured key-value store.
Kusto Query Language
⭐
464
Kusto Query Language is a simple and productive language for querying Big Data.
Conjure Up
⭐
456
Deploying complex solutions, magically.
Circosjs
⭐
454
d3 library to build circular graphs
Sparklearning
⭐
451
A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.
Cogcomp Nlp
⭐
448
CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.
Big Data Demo
⭐
448
基于Vue、three.js、echarts,数据可视化展示项目,包含三维模型导入交互、三维模型标注
Tez
⭐
446
Apache Tez
Awesome Data Catalogs
⭐
441
📙 Awesome Data Catalogs and Observability Platforms.
Helix
⭐
440
Mirror of Apache Helix
Oie Resources
⭐
435
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Ustore
⭐
435
Multi-Modal Database replacing MongoDB, Neo4J, and Elastic with 1 faster ACID solution, with NetworkX and Pandas interfaces, and bindings for C 99, C++ 17, Python 3, Java, GoLang 🗄️
Kotlin Spark Api
⭐
425
This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Mlcraft
⭐
418
Synmetrix – open source semantic layer / Boost your LLM precision
Stroom
⭐
417
Stroom is a highly scalable data storage, processing and analysis platform.
Docker Spark Cluster
⭐
413
A simple spark standalone cluster for your testing environment purposses
101-200 of 1,346 search results
< Previous
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.