Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Data Science Ipython Notebooks | 25,242 | 3 months ago | 34 | other | Python | |||||
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines. | ||||||||||
Deeplearning4j | 13,157 | 175 | 110 | 19 hours ago | 53 | August 10, 2022 | 616 | apache-2.0 | Java | |
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation. | ||||||||||
H2o 3 | 6,489 | 18 | 32 | 9 hours ago | 241 | July 25, 2023 | 2,716 | apache-2.0 | Jupyter Notebook | |
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc. | ||||||||||
Bigdl | 4,396 | 10 | 10 hours ago | 16 | April 19, 2021 | 836 | apache-2.0 | Jupyter Notebook | ||
Accelerating LLM with low-bit (INT3 / INT4 / NF4 / INT5 / INT8) optimizations using bigdl-llm | ||||||||||
Tensorflowonspark | 3,851 | 5 | 3 months ago | 32 | April 21, 2022 | 13 | apache-2.0 | Python | ||
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters. | ||||||||||
Xlearning | 1,729 | 5 months ago | 44 | apache-2.0 | Java | |||||
AI on Hadoop | ||||||||||
Caffeonspark | 1,272 | 4 years ago | 78 | apache-2.0 | Jupyter Notebook | |||||
Distributed deep learning on Hadoop and Spark clusters. | ||||||||||
Tony | 694 | 2 | 23 days ago | 52 | May 26, 2022 | 26 | other | Java | ||
TonY is a framework to natively run deep learning frameworks on Apache Hadoop. | ||||||||||
Dist Keras | 611 | 5 years ago | 2 | October 26, 2017 | 35 | gpl-3.0 | Python | |||
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark. | ||||||||||
Metronome | 103 | 9 years ago | 3 | apache-2.0 | Java | |||||
Suite of parallel iterative algorithms built on top of Iterative Reduce |
This is a Hadoop InputFormat that can be used to load Druid data from deep storage.
To install this library, run mvn install
. You can then include it in projects with Maven by using the dependency:
<dependency>
<groupId>io.imply</groupId>
<artifactId>druid-hadoop-inputformat</artifactId>
<version>0.1-SNAPSHOT</version>
</dependency>
Here's an example of creating an RDD in Spark:
final JobConf jobConf = new JobConf();
final String coordinatorHost = "localhost:8081";
final String dataSource = "wikiticker";
final List<Interval> intervals = null; // null to include all time
final DimFilter filter = null; // null to include all rows
final List<String> columns = null; // null to include all columns
DruidInputFormat.setInputs(
jobConf,
coordinatorHost,
dataSource,
intervals,
filter,
columns
);
final JavaPairRDD<NullWritable, InputRow> rdd = jsc.newAPIHadoopRDD(
jobConf,
DruidInputFormat.class,
NullWritable.class,
InputRow.class
);