Sparklyclean

Optimal distributed data deduplication and supervised learning pipeline using Apache Spark
Alternatives To Sparklyclean
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Splink93925 months ago119November 14, 2023167mitPython
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Zingg828
5 months ago1June 01, 202276agpl-3.0Java
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Spark Lucenerdd127
6 months ago39June 02, 202136apache-2.0Scala
Spark RDD with Lucene's query and entity linkage capabilities
Spark Matcher27
8 months ago5gpl-2.0Python
Record matching and entity resolution at scale in Spark
Sparkclean20
5 years agoapache-2.0Python
A Scalable Data Cleaning Library for PySpark.
Spark Search20
2 years ago8September 26, 202132apache-2.0Scala
Spark Search - high performance advanced search features based on Apache Lucene
Sparklyclean6
4 years agomitScala
Optimal distributed data deduplication and supervised learning pipeline using Apache Spark
Alternatives To Sparklyclean
Select To Compare


Alternative Project Comparisons
Popular Spark Projects
Popular Deduplication Projects
Popular Data Processing Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Scala
Data Science
Spark
Hadoop
Reducer
Distributed Systems
Data Engineering
Deduplication
Data Cleaning