Dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
Alternatives To Dbldatagen
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Synapseml4,96763 days ago12November 27, 2023335mitScala
Simple and Distributed Machine Learning
Spark Nlp3,578303 months ago134December 08, 202343apache-2.0Scala
State of the Art Natural Language Processing
Ibis3,40424293 months ago68December 10, 2023157apache-2.0Python
The flexibility of Python with the scale and performance of modern SQL.
Linkis3,2243817 days ago3July 29, 2023215apache-2.0Java
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
Petastorm1,69385 months ago86February 03, 2023174apache-2.0Python
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Spark Py Notebooks1,515
a year ago9otherJupyter Notebook
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Mleap1,47915125 months ago26May 07, 2021109apache-2.0Scala
MLeap: Deploy ML Pipelines to Production
Awesome Spark1,461
a year ago20cc0-1.0Shell
A curated list of awesome Apache Spark packages and resources.
Optimus1,446
10 days ago32June 19, 202229apache-2.0Python
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Sparkmagic1,2722563 months ago54September 13, 2023156otherPython
Jupyter magics and kernels for working with remote Spark clusters
Alternatives To Dbldatagen
Select To Compare


Alternative Project Comparisons
Popular Spark Projects
Popular Pyspark Projects
Popular Data Processing Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Spark
Pyspark
Faker
Spark Streaming
Data Generation