| apache/spark |
37,661 |
|
2,394 |
939 |
over 2 years ago |
46 |
May 09, 2021 |
186 |
apache-2.0 |
Scala |
| Apache Spark - A unified analytics engine for large-scale data processing |
| donnemartin/data-science-ipython-notebooks |
25,668 |
|
0 |
0 |
over 2 years ago |
0 |
|
34 |
other |
Python |
| Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines. |
| heibaiying/BigData-Notes |
14,872 |
|
0 |
0 |
over 2 years ago |
0 |
|
39 |
|
Java |
| 大数据入门指南 :star: |
| andkret/Cookbook |
12,557 |
|
0 |
0 |
over 2 years ago |
0 |
|
111 |
apache-2.0 |
|
| The Data Engineering Cookbook |
| trinodb/trino |
9,118 |
|
0 |
29 |
over 2 years ago |
83 |
November 30, 2023 |
2,496 |
apache-2.0 |
Java |
| Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io) |
| wangzhiwubigdata/God-Of-BigData |
8,483 |
|
0 |
0 |
almost 3 years ago |
0 |
|
3 |
|
|
| 专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive... |
| h2oai/h2o-3 |
7,485 |
|
62 |
33 |
about 2 months ago |
49 |
August 09, 2023 |
2,746 |
apache-2.0 |
Jupyter Notebook |
| H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc. |
| apache/hive |
5,222 |
|
0 |
0 |
over 2 years ago |
0 |
|
89 |
apache-2.0 |
Java |
| Apache Hive |
| apache/ignite |
4,626 |
|
15 |
3 |
over 2 years ago |
36 |
May 04, 2023 |
729 |
apache-2.0 |
Java |
| Apache Ignite |
| apache/calcite |
4,216 |
|
390 |
128 |
over 2 years ago |
1,714 |
November 07, 2023 |
315 |
apache-2.0 |
Java |
| Apache Calcite |