Datapipelines Essentials Python

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Alternatives To Datapipelines Essentials Python
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Spark37,6612,3949392 months ago46May 09, 2021186apache-2.0Scala
Apache Spark - A unified analytics engine for large-scale data processing
Data Science Ipython Notebooks25,668
5 months ago34otherPython
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Bigdata Notes14,872
2 months ago39Java
大数据入门指南 :star:
Deeplearning4j13,2901751193 months ago54August 10, 2022624apache-2.0Java
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
Cookbook12,557
2 months ago111apache-2.0
The Data Engineering Cookbook
Doris11,047
3 days ago8September 27, 20232,332apache-2.0Java
Apache Doris is an easy-to-use, high performance and unified analytics database.
It_book8,543
2 years ago7
本项目收藏这些年来看过或者听过的一些不错的常用的上千本书籍,没准你想找的书就在这里呢,包含了互联网行业大多数书籍和面试经验题目等等。有人工智能系列(常用深度学习框架TensorFlow、pytorch、keras。NLP、机器学习,深度学习等等),大数据系列(Spark,Hadoop,Scala,kafka等),程序员必修系列(C、C++、java、数据结构、linux,设计模式、数据库等等)
God Of Bigdata8,483
7 months ago3
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
H2o 36,61862332 months ago49August 09, 20232,746apache-2.0Jupyter Notebook
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Alluxio6,54431532 months ago73November 29, 2023969apache-2.0Java
Alluxio, data orchestration for analytics and machine learning in the cloud
Alternatives To Datapipelines Essentials Python
Select To Compare


Alternative Project Comparisons
Popular Spark Projects
Popular Hadoop Projects
Popular Data Processing Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Xml
Spark
Hadoop
Big Data
Etl
Apache Spark
Pyspark
Xml Parser
Etl Framework