Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Tensorflow | 175,266 | 327 | 77 | 10 hours ago | 46 | October 23, 2019 | 2,140 | apache-2.0 | C++ | |
An Open Source Machine Learning Framework for Everyone | ||||||||||
Transformers | 102,873 | 64 | 911 | 10 hours ago | 91 | June 21, 2022 | 751 | apache-2.0 | Python | |
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. | ||||||||||
Pytorch | 67,523 | 146 | 10 hours ago | 23 | August 10, 2022 | 12,171 | other | Python | ||
Tensors and Dynamic neural networks in Python with strong GPU acceleration | ||||||||||
Keras | 58,518 | 330 | 11 hours ago | 68 | May 13, 2022 | 389 | apache-2.0 | Python | ||
Deep Learning for humans | ||||||||||
Cs Video Courses | 56,273 | 5 days ago | 17 | |||||||
List of Computer Science courses with video lectures. | ||||||||||
Faceswap | 44,904 | 21 days ago | 24 | gpl-3.0 | Python | |||||
Deepfakes Software For All | ||||||||||
D2l Zh | 44,047 | 1 | 5 days ago | 45 | March 25, 2022 | 34 | apache-2.0 | Python | ||
《动手学深度学习》:面向中文读者、能运行、可讨论。中英文版被60多个国家的400多所大学用于教学。 | ||||||||||
Tensorflow Examples | 42,312 | 8 months ago | 218 | other | Jupyter Notebook | |||||
TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2) | ||||||||||
100 Days Of Ml Code | 40,344 | 3 months ago | 61 | mit | ||||||
100 Days of ML Coding | ||||||||||
Deepfacelab | 39,560 | 13 days ago | 531 | gpl-3.0 | Python | |||||
DeepFaceLab is the leading software for creating deepfakes. |
NVTabular is a feature engineering and preprocessing library for tabular data that is designed to easily manipulate terabyte scale datasets and train deep learning (DL) based recommender systems. It provides high-level abstraction to simplify code and accelerates computation on the GPU using the RAPIDS Dask-cuDF library.
NVTabular is a component of NVIDIA Merlin, an open source framework for building and deploying recommender systems and works with the other Merlin components including Merlin Models, HugeCTR and Merlin Systems to provide end-to-end acceleration of recommender systems on the GPU. Extending beyond model training, with NVIDIAs Triton Inference Server, the feature engineering and preprocessing steps performed on the data during training can be automatically applied to incoming data during inference.
When training DL recommender systems, data scientists and machine learning (ML) engineers have been faced with the following challenges:
NVTabular alleviates these challenges and helps data scientists and ML engineers:
Learn more in the NVTabular core features documentation.
When running NVTabular on the Criteo 1TB Click Logs Dataset using a single V100 32GB GPU, feature engineering and preprocessing was able to be completed in 13 minutes. Furthermore, when running NVTabular on a DGX-1 cluster with eight V100 GPUs, feature engineering and preprocessing was able to be completed within three minutes. Combined with HugeCTR, the dataset can be processed and a full model can be trained in only six minutes.
The performance of the Criteo DRLM workflow also demonstrates the effectiveness of the NVTabular library. The original ETL script provided in Numpy took over five days to complete. Combined with CPU training, the total iteration time is over one week. By optimizing the ETL code in Spark and running on a DGX-1 equivalent cluster, the time to complete feature engineering and preprocessing was reduced to three hours. Meanwhile, training was completed in one hour.
NVTabular requires Python version 3.7+. Additionally, GPU support requires:
NVTabular can be installed with Anaconda from the nvidia
channel by running the following command:
conda install -c nvidia -c rapidsai -c numba -c conda-forge nvtabular python=3.7 cudatoolkit=11.2
NVTabular can be installed with pip
by running the following command:
pip install nvtabular
Installing NVTabular with Pip causes NVTabular to run on the CPU only and might require installing additional dependencies manually. When you run NVTabular in one of our Docker containers, the dependencies are already installed.
NVTabular Docker containers are available in the NVIDIA Merlin container repository. The following table summarizes the key information about the containers:
Container Name | Container Location | Functionality |
---|---|---|
merlin-hugectr | https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr | NVTabular, HugeCTR, and Triton Inference |
merlin-tensorflow | https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow | NVTabular, Tensorflow and Triton Inference |
merlin-pytorch | https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-pytorch | NVTabular, PyTorch, and Triton Inference |
To use these Docker containers, you'll first need to install the NVIDIA Container Toolkit to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers. To obtain more information about the software and model versions that NVTabular supports per container, see Support Matrix.
We provide a collection of examples to demonstrate feature engineering with NVTabular as Jupyter notebooks:
In addition, NVTabular is used in many of our examples in other Merlin libraries:
If you'd like to contribute to the library directly, see the Contributing.md. We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations. To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this survey.
If you're interested in learning more about how NVTabular works, see our NVTabular documentation. We also have API documentation that outlines the specifics of the available calls within the library.