Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Splink | 939 | 2 | 5 months ago | 119 | November 14, 2023 | 167 | mit | Python | ||
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends | ||||||||||
Zingg | 828 | 5 months ago | 1 | June 01, 2022 | 76 | agpl-3.0 | Java | |||
Scalable identity resolution, entity resolution, data mastering and deduplication using ML | ||||||||||
Spark Lucenerdd | 127 | 6 months ago | 39 | June 02, 2021 | 36 | apache-2.0 | Scala | |||
Spark RDD with Lucene's query and entity linkage capabilities | ||||||||||
Spark Matcher | 27 | 8 months ago | 5 | gpl-2.0 | Python | |||||
Record matching and entity resolution at scale in Spark | ||||||||||
Sparkclean | 20 | 5 years ago | apache-2.0 | Python | ||||||
A Scalable Data Cleaning Library for PySpark. | ||||||||||
Spark Search | 20 | 2 years ago | 8 | September 26, 2021 | 32 | apache-2.0 | Scala | |||
Spark Search - high performance advanced search features based on Apache Lucene | ||||||||||
Sparklyclean | 6 | 4 years ago | mit | Scala | ||||||
Optimal distributed data deduplication and supervised learning pipeline using Apache Spark |