Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Keras | 59,914 | 690 | 19 hours ago | 86 | November 28, 2023 | 120 | apache-2.0 | Python | ||
Deep Learning for humans | ||||||||||
Netron | 24,666 | 4 | 70 | 11 hours ago | 609 | December 02, 2023 | 21 | mit | JavaScript | |
Visualizer for neural network, deep learning and machine learning models | ||||||||||
D2l En | 19,977 | 5 days ago | 2 | November 13, 2022 | 101 | other | Python | |||
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge. | ||||||||||
Ncnn | 18,372 | 1 | 3 days ago | 26 | October 27, 2023 | 1,046 | other | C++ | ||
ncnn is a high-performance neural network inference framework optimized for the mobile platform | ||||||||||
Onnx | 15,979 | 148 | 493 | 3 days ago | 31 | October 26, 2023 | 310 | apache-2.0 | Python | |
Open standard for machine learning interoperability | ||||||||||
Best Of Ml Python | 14,693 | 4 days ago | 21 | cc-by-sa-4.0 | ||||||
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly. | ||||||||||
Horovod | 13,687 | 20 | 16 | a day ago | 77 | June 12, 2023 | 373 | other | Python | |
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. | ||||||||||
Wandb | 7,538 | 39 | 618 | 2 days ago | 265 | November 07, 2023 | 1,061 | mit | Python | |
🔥 A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API. | ||||||||||
Ai Learn | 7,447 | a year ago | 19 | |||||||
人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域 | ||||||||||
Einops | 7,290 | 5 | 767 | 2 months ago | 14 | October 01, 2023 | 30 | mit | Python | |
Deep learning operations reinvented (for pytorch, tensorflow, jax and others) |
Note: We have merged Analytics Zoo into BigDL 2.0, and our future development will move to the BigDL project
Distributed TensorFlow, PyTorch, Keras and BigDL on Apache Spark & Ray
Analytics Zoo is an open source Big Data AI platform, and includes the following features for scaling end-to-end AI to distributed Big Data:
Orca: seamlessly scale out TensorFlow and PyTorch for Big Data (using Spark & Ray)
RayOnSpark: run Ray programs directly on Big Data clusters
BigDL Extensions: high-level Spark ML pipeline and Keras-like APIs for BigDL
Chronos: scalable time series analysis using AutoML
PPML: privacy preserving big data analysis and machine learning (experimental)
For more information, you may read the docs.
You can use Analytics Zoo on Google Colab without any installation. Analytics Zoo also includes a set of notebooks that you can directly open and run in Colab.
To install Analytics Zoo, we recommend using conda environments.
conda create -n my_env
conda activate my_env
pip install analytics-zoo
To install latest nightly build, use pip install --pre --upgrade analytics-zoo
; see Python and Scala user guide for more details.
Most AI projects start with a Python notebook running on a single laptop; however, one usually needs to go through a mountain of pains to scale it to handle larger data set in a distributed fashion. The Orca library seamlessly scales out your single node TensorFlow or PyTorch notebook across large clusters (so as to process distributed Big Data).
First, initialize Orca Context:
from zoo.orca import init_orca_context
# cluster_mode can be "local", "k8s" or "yarn"
sc = init_orca_context(cluster_mode="yarn", cores=4, memory="10g", num_nodes=2)
Next, perform data-parallel processing in Orca (supporting standard Spark Dataframes, TensorFlow Dataset, PyTorch DataLoader, Pandas, etc.):
from pyspark.sql.functions import array
df = spark.read.parquet(file_path)
df = df.withColumn('user', array('user')) \
.withColumn('item', array('item'))
Finally, use sklearn-style Estimator APIs in Orca to perform distributed TensorFlow, PyTorch or Keras training and inference:
from tensorflow import keras
from zoo.orca.learn.tf.estimator import Estimator
user = keras.layers.Input(shape=[1])
item = keras.layers.Input(shape=[1])
feat = keras.layers.concatenate([user, item], axis=1)
predictions = keras.layers.Dense(2, activation='softmax')(feat)
model = keras.models.Model(inputs=[user, item], outputs=predictions)
model.compile(optimizer='rmsprop',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
est = Estimator.from_keras(keras_model=model)
est.fit(data=df,
batch_size=64,
epochs=4,
feature_cols=['user', 'item'],
label_cols=['label'])
See TensorFlow and PyTorch quickstart, as well as the document website, for more details.
Ray is an open source distributed framework for emerging AI applications. RayOnSpark allows users to directly run Ray programs on existing Big Data clusters, and directly write Ray code inline with their Spark code (so as to process the in-memory Spark RDDs or DataFrames).
from zoo.orca import init_orca_context
# cluster_mode can be "local", "k8s" or "yarn"
sc = init_orca_context(cluster_mode="yarn", cores=4, memory="10g", num_nodes=2, init_ray_on_spark=True)
import ray
@ray.remote
class Counter(object):
def __init__(self):
self.n = 0
def increment(self):
self.n += 1
return self.n
counters = [Counter.remote() for i in range(5)]
print(ray.get([c.increment.remote() for c in counters]))
See the RayOnSpark user guide and quickstart for more details.
Analytics Zoo makes it easier to develop large-scale deep learning applications on Apache Spark, by providing high-level Spark ML pipeline and Keras-like APIs on top of BigDL (a distributed deep learning framework for Spark).
First, call initNNContext
at the beginning of the code:
import com.intel.analytics.zoo.common.NNContext
val sc = NNContext.initNNContext()
Then, define the BigDL model using Keras-style API:
val input = Input[Float](inputShape = Shape(10))
val dense = Dense[Float](12).inputs(input)
val output = Activation[Float]("softmax").inputs(dense)
val model = Model(input, output)
After that, use NNEstimator
to train/predict/evaluate the model using Spark Dataframes and ML pipelines:
val trainingDF = spark.read.parquet("train_data")
val validationDF = spark.read.parquet("val_data")
val scaler = new MinMaxScaler().setInputCol("in").setOutputCol("value")
val estimator = NNEstimator(model, CrossEntropyCriterion())
.setBatchSize(size).setOptimMethod(new Adam()).setMaxEpoch(epoch)
val pipeline = new Pipeline().setStages(Array(scaler, estimator))
val pipelineModel = pipeline.fit(trainingDF)
val predictions = pipelineModel.transform(validationDF)
See the Scala, NNframes and Keras API user guides for more details.
Time series prediction takes observations from previous time steps as input and predicts the values at future time steps. The Chronos library makes it easy to build end-to-end time series analysis by applying AutoML to extremely large-scale time series prediction.
To train a time series model with AutoML, first initialize Orca Context:
from zoo.orca import init_orca_context
#cluster_mode can be "local", "k8s" or "yarn"
sc = init_orca_context(cluster_mode="yarn", cores=4, memory="10g", num_nodes=2, init_ray_on_spark=True)
Next, create an AutoTSTrainer.
from zoo.chronos.autots.deprecated.forecast import AutoTSTrainer
trainer = AutoTSTrainer(dt_col="datetime", target_col="value")
Finally, call fit
on AutoTSTrainer, which applies AutoML to find the best model and hyper-parameters; it returns a TSPipeline which can be used for prediction or evaluation.
#train a pipeline with AutoML support
ts_pipeline = trainer.fit(train_df, validation_df)
#predict
ts_pipeline.predict(test_df)
See the Chronos user guide and example for more details.
Analytics Zoo PPML provides a Trusted Cluster Environment for protecting the end-to-end Big Data AI pipeline. It combines various low level hardware and software security technologies (e.g., Intel SGX, LibOS such as Graphene and Occlum, Federated Learning, etc.), and allows users to run unmodified Big Data analysis and ML/DL programs (such as Apache Spark, Apache Flink, Tensorflow, PyTorch, etc.) in a secure fashion on (private or public) cloud.
See the PPML user guide for more details.
Older Documents