Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for deduplication
deduplication
x
670 search results found
Restic
⭐
21,206
Fast, secure, efficient backup program
Borg
⭐
9,799
Deduplicating archiver with compression and authenticated encryption.
Alertmanager
⭐
5,910
Prometheus Alertmanager
Duplicacy
⭐
4,765
A new generation cloud backup tool
Kopia
⭐
4,662
Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.
Imagededup
⭐
4,625
😎 Finding duplicate images made easy!
Dupeguru
⭐
3,990
Find duplicate files
Libpostal
⭐
3,776
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
Dedupe
⭐
3,708
🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
Node Dev
⭐
2,234
Zero-conf Node.js reloading
Jdupes
⭐
1,668
A powerful duplicate file finder and an enhanced fork of 'fdupes'.
Rmlint
⭐
1,600
Extremely fast tool to remove duplicates and other lint from your filesystem
Borgmatic
⭐
1,481
Simple, configuration-driven backup software for servers and workstations
Yarn Deduplicate
⭐
1,334
Deduplication tool for yarn.lock files
Zbox
⭐
1,233
Zero-details, privacy-focused in-app file system.
Dwarfs
⭐
1,156
A fast high compression read-only file system for Linux and Windows
Gifcap
⭐
1,125
Capture your screen to a GIF in your browser
Attic
⭐
1,083
Deduplicating backup program
Rustic
⭐
1,078
rustic - fast, encrypted, and deduplicated backups powered by Rust
Autorestic
⭐
840
Config driven, easy backup cli for restic.
Recordlinkage
⭐
808
A powerful and modular toolkit for record linkage and duplicate detection in Python
Splink
⭐
784
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Zingg
⭐
782
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Zbackup
⭐
769
ZBackup, a versatile deduplicating backup tool
Rdedup
⭐
757
Data deduplication engine, supporting optional compression and public key encryption.
Free Style
⭐
698
Make CSS easier and more maintainable by using JavaScript
Dropship
⭐
687
Instantly transfer files between Dropbox accounts using only their hashes.
Talisman
⭐
666
Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
Duke
⭐
603
Duke is a fast and flexible deduplication engine written in Java
Cc_net
⭐
599
Tools to download and cleanup Common Crawl data
Duperemove
⭐
578
Tools for deduping file systems
Duplicut
⭐
578
Remove duplicates from MASSIVE wordlist, without sorting it (for dictionary-based password cracking)
Bees
⭐
481
Best-Effort Extent-Same, a btrfs dedupe agent
Csvdedupe
⭐
395
🆔 Command line tool for deduplicating CSV files
Postgresql Patterns Library
⭐
381
Коллекция готовых SQL запросов для PostgreSQL по часто возникающим задачам (получение и модификация данных, ускорение запросов, обслуживание БД)
Warcprox
⭐
343
WARC writing MITM HTTP/S proxy
Sdfs
⭐
338
Deduplication Based Filesystem
Data Matching Software
⭐
311
A list of free data matching and record linkage software.
Dedupe Examples
⭐
306
🆔 Examples for using the dedupe library
Bedup
⭐
287
Btrfs deduplication
Lsh
⭐
243
Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
Fast Sass Loader
⭐
239
High performance sass loader for webpack
Tagsistant
⭐
231
Semantic filesystem for Linux, with relation reasoner, autotagging plugins and a deduplication service
Cargo Limit
⭐
231
Cargo with less noise: warnings are skipped until errors are fixed, LSP-independent Neovim integration, etc.
Imgdupes
⭐
220
Finding and deleting near-duplicate images based on perceptual hash.
Pcompress
⭐
219
A Parallelized Data Deduplication and Compression utility
Rabbitmq Message Deduplication
⭐
219
RabbitMQ Plugin for filtering message duplicates
Kvdo
⭐
211
A pair of kernel modules which provide pools of deduplicated and/or compressed block storage.
Flim Springfield
⭐
210
Analysis of The Simpsons
Deduplicator
⭐
204
Filter, Sort & Delete Duplicate Files Recursively
Elasticsearch Entity Resolution
⭐
202
Elasticsearch entity resolution plugin based on Duke
Zpaqfranz
⭐
172
Deduplicating archiver with encryption and paranoid-level tests. Swiss army knife for the serious backup and disaster recovery manager. Ransomware neutralizer. Win/Linux/Unix
Dupe Krill
⭐
170
A fast file deduplicator
Vdo
⭐
170
Userspace tools for managing VDO volumes.
Dedup
⭐
168
Streaming Deduplication Package for Go
Nomenklatura
⭐
165
Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources
Dejavu
⭐
153
Quickly detect already witnessed data.
Zfs Inplace Rebalancing
⭐
151
Simple bash script to rebalance pool data between all mirrors when adding vdevs to a pool.
Dduper
⭐
144
Fast block-level out-of-band BTRFS deduplication tool.
Mail Deduplicate
⭐
143
📧 CLI to deduplicate mails from mail boxes.
Haxlsharp
⭐
134
Automatically concurrent data fetching and request deduplication in C#.
Benji
⭐
131
Benji Backup: A block based deduplicating backup software for Ceph RBD images, iSCSI targets, image files and block devices
Spark Lucenerdd
⭐
127
Spark RDD with Lucene's query and entity linkage capabilities
Fingerprints
⭐
125
Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.
Pandas Dedupe
⭐
123
Simplifies use of the Dedupe library via Pandas
Encbup
⭐
118
Encrypted backups (without the backups)
Py Image Dedup
⭐
116
CLI utility to find near duplicate images and remove all but the best copy.
Fastcdc Rs
⭐
102
FastCDC implementation in Rust
Go Rsync
⭐
99
Best GTK+ frontend (backup application) for RSYNC utility.
Entity Embed
⭐
98
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
Dupd
⭐
96
CLI utility to find duplicate files
Destor
⭐
91
An experimental platform for chunk-level data deduplication. Key words: DDFS, Sparse Index, Extreme Binning, SiLo, Sample Index, BLC; CBR, CFL, CAP, HAR; ASM, OPT; GC, Cumulus
Intraarchivededuplicator
⭐
91
Tool for managing data-deduplication within extant compressed archive files, along with a relatively performant BK tree implementation for fuzzy image searching.
Dedupfs
⭐
90
A Python FUSE file system that features transparent deduplication and compression which make it ideal for archiving backups.
Pinnacleapi Documentation
⭐
89
Pinnacle API Documentation
Acid Store
⭐
87
A transactional and deduplicating virtual file system
Sfmc Example Jb Custom Activity
⭐
86
Custom activity examples for Journey Builder.
Laravel Console Logger
⭐
85
Logging and Notifications for Laravel Console Commands.
Blobstash
⭐
83
You personal database. Mirror of https://git.sr.ht/~tsileo/blobstash
Gencore
⭐
82
Generate duplex/single consensus reads to reduce sequencing noises and remove duplications
Record Linkage Resources
⭐
82
Resources for tackling record linkage / deduplication / data matching problems
Rltk
⭐
81
Record Linkage ToolKit (Find and link entities)
Daxus
⭐
78
Daxus is a server state management library for React that provides full control over data, leading to a better user experience.
Rocketmqdeduplistener
⭐
73
RocketMQ消息幂等去重消费者,支持使用MySQL或者Redis做幂等表,开箱即用
Lieu
⭐
72
Dedupe/batch geocode addresses and venues around the world with libpostal
Kuromojin
⭐
69
Provide a high-level wrapper for kuromoji.js. Cache/Promise API
Savefile
⭐
66
An easy to use library to save arbitrary rust data-structures to disk (or serialize to any other stream)
Deduplipy
⭐
66
Python package for deduplication/entity resolution using active learning
Wget Lua
⭐
65
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Horcrux
⭐
61
The Dropbox for IPFS (without the icky stuff)
Webpack Deduplication Plugin
⭐
59
Plugin for webpack that de-duplicates transitive dependencies in yarn and webpack-based projects.
Vite Plugin Svelte
⭐
58
Svelte integration for Vite, a fast web dev tool.
Minihttp
⭐
57
A small HTTP server.
Backup Bench
⭐
54
Quick and dirty backup tool benchmark with reproducible results
Es Dedupe
⭐
51
Tool for removing duplicate documents from Elasticsearch
Wretch Middlewares
⭐
50
Collection of middlewares for the Wretch library. 🎁
Autoaction
⭐
49
Declarative data loading and action calling within react-redux
Quipucords
⭐
48
A tool for discovery, inspection, collection/deduplication, and reporting on an IT environment
Npbackup
⭐
47
A secure and efficient file backup solution that fits both system administrators (CLI) and end users (GUI)
Reading
⭐
45
1-100 of 670 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2023 Awesome Open Source. All rights reserved.