Druid Hadoop Inputformat

Hadoop InputFormat for http://druid.io/
Alternatives To Druid Hadoop Inputformat
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Data Science Ipython Notebooks25,242
3 months ago34otherPython
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Deeplearning4j13,15717511019 hours ago53August 10, 2022616apache-2.0Java
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
H2o 36,48918329 hours ago241July 25, 20232,716apache-2.0Jupyter Notebook
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Bigdl4,3961010 hours ago16April 19, 2021836apache-2.0Jupyter Notebook
Accelerating LLM with low-bit (INT3 / INT4 / NF4 / INT5 / INT8) optimizations using bigdl-llm
Tensorflowonspark3,851
53 months ago32April 21, 202213apache-2.0Python
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Xlearning1,729
5 months ago44apache-2.0Java
AI on Hadoop
Caffeonspark1,272
4 years ago78apache-2.0Jupyter Notebook
Distributed deep learning on Hadoop and Spark clusters.
Tony694223 days ago52May 26, 202226otherJava
TonY is a framework to natively run deep learning frameworks on Apache Hadoop.
Dist Keras611
5 years ago2October 26, 201735gpl-3.0Python
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Metronome103
9 years ago3apache-2.0Java
Suite of parallel iterative algorithms built on top of Iterative Reduce
Alternatives To Druid Hadoop Inputformat
Select To Compare


Alternative Project Comparisons
Readme

Druid Hadoop InputFormat

This is a Hadoop InputFormat that can be used to load Druid data from deep storage.

Installation

To install this library, run mvn install. You can then include it in projects with Maven by using the dependency:

<dependency>
  <groupId>io.imply</groupId>
  <artifactId>druid-hadoop-inputformat</artifactId>
  <version>0.1-SNAPSHOT</version>
</dependency>

Example

Here's an example of creating an RDD in Spark:

final JobConf jobConf = new JobConf();
final String coordinatorHost = "localhost:8081";
final String dataSource = "wikiticker";
final List<Interval> intervals = null; // null to include all time
final DimFilter filter = null; // null to include all rows
final List<String> columns = null; // null to include all columns

DruidInputFormat.setInputs(
    jobConf,
    coordinatorHost,
    dataSource,
    intervals,
    filter,
    columns
);

final JavaPairRDD<NullWritable, InputRow> rdd = jsc.newAPIHadoopRDD(
    jobConf,
    DruidInputFormat.class,
    NullWritable.class,
    InputRow.class
);
Popular Hadoop Projects
Popular Deep Learning Projects
Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Java
Deep Learning
Spark
Hadoop
Druid