Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python deduplication
deduplication
x
python
x
84 search results found
Borg
⭐
10,158
Deduplicating archiver with compression and authenticated encryption.
Imagededup
⭐
4,789
😎 Finding duplicate images made easy!
Dupeguru
⭐
4,415
Find duplicate files
Dedupe
⭐
3,879
🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
Rmlint
⭐
1,692
Extremely fast tool to remove duplicates and other lint from your filesystem
Borgmatic
⭐
1,637
Simple, configuration-driven backup software for servers and workstations
Attic
⭐
1,083
Deduplicating backup program
Splink
⭐
939
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Recordlinkage
⭐
808
A powerful and modular toolkit for record linkage and duplicate detection in Python
Dropship
⭐
687
Instantly transfer files between Dropbox accounts using only their hashes.
Warcprox
⭐
348
WARC writing MITM HTTP/S proxy
Dedupe Examples
⭐
306
🆔 Examples for using the dedupe library
Bedup
⭐
287
Btrfs deduplication
Lsh
⭐
243
Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
Imgdupes
⭐
220
Finding and deleting near-duplicate images based on perceptual hash.
Nomenklatura
⭐
171
Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources
Mail Deduplicate
⭐
158
📧 CLI to deduplicate mails from mail boxes.
Dduper
⭐
158
Fast block-level out-of-band BTRFS deduplication tool.
Fingerprints
⭐
138
Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.
Benji
⭐
131
Benji Backup: A block based deduplicating backup software for Ceph RBD images, iSCSI targets, image files and block devices
Py Image Dedup
⭐
125
CLI utility to find near duplicate images and remove all but the best copy.
Pandas Dedupe
⭐
123
Simplifies use of the Dedupe library via Pandas
Encbup
⭐
118
Encrypted backups (without the backups)
Entity Embed
⭐
98
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
Intraarchivededuplicator
⭐
91
Tool for managing data-deduplication within extant compressed archive files, along with a relatively performant BK tree implementation for fuzzy image searching.
Dedupfs
⭐
90
A Python FUSE file system that features transparent deduplication and compression which make it ideal for archiving backups.
Npbackup
⭐
83
A secure and efficient file backup solution that fits both system administrators (CLI) and end users (GUI)
Record Linkage Resources
⭐
82
Resources for tackling record linkage / deduplication / data matching problems
Rltk
⭐
81
Record Linkage ToolKit (Find and link entities)
Lieu
⭐
72
Dedupe/batch geocode addresses and venues around the world with libpostal
Deduplipy
⭐
66
Python package for deduplication/entity resolution using active learning
Horcrux
⭐
61
The Dropbox for IPFS (without the icky stuff)
Es Dedupe
⭐
51
Tool for removing duplicate documents from Elasticsearch
Quipucords
⭐
49
A tool for discovery, inspection, collection/deduplication, and reporting on an IT environment
Pyjedai
⭐
47
An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.
Awesome
⭐
44
Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.
Clonefile Dedup
⭐
41
Use clonefile to deduplicate files on APFS.
Fastcdc Py
⭐
41
FastCDC implementation in Python https://pypi.org/project/fastcdc/
Deduplication Slides
⭐
37
"1 + 1 = 1 or Record Deduplication with Python" Jupyter Notebook
Scrapy Kafka Redis
⭐
35
Distributed crawling/scraping, Kafka And Redis based components for Scrapy
Mediachain Indexer
⭐
34
search, dedupe, and media ingestion for mediachain
Dude
⭐
33
Duplicates Detector is a cross-platform GUI utility for finding duplicate files, allowing you to delete or link them to save space. Duplicate files are displayed and processed on two synchronized panels for efficient and convenient operation.
Pgdedupe
⭐
32
A simple command line interface to the datamade/dedupe library.
Address Matching
⭐
30
Python script for matching a list of messy addresses against a gazetteer using dedupe.
Django Super Deduper
⭐
27
Utilities for de-duping Django model instances
Spark Matcher
⭐
27
Record matching and entity resolution at scale in Spark
Dedupsqlfs
⭐
25
Deduplicating filesystem via Python3, FUSE and SQLite
Sparkclean
⭐
20
A Scalable Data Cleaning Library for PySpark.
Google Music Manager Uploader
⭐
19
Google Music Manager Uploader module / Easily upload MP3 (folder) to Google Music
Image_deduplication
⭐
19
Image Deduplication in Python
Er Evaluation
⭐
18
An End-to-End Evaluation Framework for Entity Resolution Systems
Tummy Backup
⭐
18
Disc-to-disc backup system using ZFS for deduplication and efficient storage.
Youtube Music Uploader
⭐
17
Python Daemon to upload music to Youtube Music
Pydedupe
⭐
16
Python dedupe library using in Mocality
Mergic
⭐
16
workflow support for reproducible deduplication and merging
Theanimescripter
⭐
16
Welcome to TheAnimeScripter, a comprehensive tool designed for both video processing enthusiasts and professionals all within After Effects.
Borg Qt
⭐
15
A Qt frontend for the command line software BorgBackup.
Neural Scam Artist
⭐
15
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Nominally
⭐
14
A maximum-strength name parser for record linkage.
Pyhardlinkbackup
⭐
14
Hardlink/Deduplication Backups with Python
Dedupe Geocoder
⭐
14
📍 Demonstration of how dedupe might be used as geocoder
Dupandas
⭐
14
📊 python package for performing deduplication using flexible text matching and cleaning in pandas dataframe
Dugu
⭐
13
Find, remove and avoid duplicates with dugu: The Duplicates Guru
Sefs
⭐
12
seFS - Storage Efficient file system based on Python-fuse bindings
Google Photo Dedup
⭐
12
Google Photo deduplication
Backend
⭐
11
Backend (Docker & API) for matchID project
Marty
⭐
11
An efficient backup tool inspired by Git, saving your bandwidth and providing global deduplication at file level.
Imgdup
⭐
10
Visual similarity image finder and cleaner
Sdhash
⭐
10
Python library for image hashing and deduplication
Patentsview Evaluation
⭐
10
Evaluation and benchmarking of PatentsView disambiguation algorithms
Unisim
⭐
10
UniSim is a package for efficient similarity computation, fuzzy matching, and clustering of data.
Dedup
⭐
10
Find duplicate text files.
Lafs Backup Tool
⭐
9
Tool to securely push incremental (think "rsync --link-dest") backups to tahoe-lafs
Oc_graphenricher
⭐
9
A tool to enrich any OCDM compliant Knowledge Graph, finding new identifiers and deduplicating entities.
Spatiotemporalresampling
⭐
9
Absolutely Eliminate Anime Stutters By SpatiotemporalResampling
Dedup
⭐
8
a deduplication tool
Rdfhash
⭐
8
RDF Graph Compression Tool. Hash RDF subjects based on a checksum of their triples, effectively consolidating together subjects that contain identical definitions. Reduce time taken to mint URIs. Use Blank Nodes to your Advantage
Qpc
⭐
8
A tool for discovery, inspection, collection/deduplication, and reporting on an IT environment
Chickadee
⭐
8
Yet another IP address enrichment tool
Backupdata
⭐
8
create lots of data for backup scalability testing
Mismo
⭐
7
The SQL/Ibis powered sklearn of record linkage
Hashget
⭐
7
Deduplication/backup tool with extremely high 'compression' rate
Dedupe_yara_rule
⭐
7
Deduplication of yara rules
Narrow Down
⭐
6
Fast fuzzy text search
Common_crawl_corpus
⭐
6
Scripts for building a geo-located web corpus using Common Crawl data
Image Deduplication Plugin
⭐
6
Remove exact and approximate duplicates from your dataset in FiftyOne!
Photo Unique
⭐
5
A simple photo deduplication archiver, using pHash
Dedupe Variable Address
⭐
5
Address Variable Type for dedupe
Qlink
⭐
5
Entity Resolution and Record Linkage library
Simd
⭐
5
high-fidelity reliability Simulator IMproved for data Deduplication (SIMD)
Replicat
⭐
5
Configurable and lightweight backup utility with deduplication and encryption
Searchduplicatefiles
⭐
5
Python写的文件去重,递归找文件,计算MD5,去重;File deduplication script in Python.
Jltool
⭐
5
Tools for working with JSON-Lines data, including diff, dedupe, grep and cleanup
Working_with_dna_meth
⭐
5
Pipeline that I use to process DNA methylation data--both theory and implementation included!
Data Linking
⭐
5
administrative data linking
Face_amnesia
⭐
5
Face detection and retrieval in image and video files.
Dupre
⭐
5
Streamlit UI to remove duplicate or near duplicate images
Removedup
⭐
5
Remove duplicates from parallel corpora
Related Searches
Python Django (28,897)
Python Machine Learning (20,195)
Python Flask (17,643)
Python Dataset (14,792)
Python Tensorflow (13,736)
Python Command Line (13,351)
Python Deep Learning (13,092)
Python Jupyter Notebook (12,976)
Python Pytorch (7,877)
Python Server (7,793)
1-84 of 84 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.