Awesome Open Source
Search results for deduplication
670 search results found
Fast, secure, efficient backup program
Deduplicating archiver with compression and authenticated encryption.
A new generation cloud backup tool
Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.
😎 Finding duplicate images made easy!
Find duplicate files
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
Zero-conf Node.js reloading
A powerful duplicate file finder and an enhanced fork of 'fdupes'.
Extremely fast tool to remove duplicates and other lint from your filesystem
Simple, configuration-driven backup software for servers and workstations
Deduplication tool for yarn.lock files
Zero-details, privacy-focused in-app file system.
A fast high compression read-only file system for Linux and Windows
Capture your screen to a GIF in your browser
Deduplicating backup program
rustic - fast, encrypted, and deduplicated backups powered by Rust
Config driven, easy backup cli for restic.
A powerful and modular toolkit for record linkage and duplicate detection in Python
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
ZBackup, a versatile deduplicating backup tool
Data deduplication engine, supporting optional compression and public key encryption.
Instantly transfer files between Dropbox accounts using only their hashes.
Duke is a fast and flexible deduplication engine written in Java
Tools to download and cleanup Common Crawl data
Tools for deduping file systems
Remove duplicates from MASSIVE wordlist, without sorting it (for dictionary-based password cracking)
Best-Effort Extent-Same, a btrfs dedupe agent
🆔 Command line tool for deduplicating CSV files
Postgresql Patterns Library
Коллекция готовых SQL запросов для PostgreSQL по часто возникающим задачам (получение и модификация данных, ускорение запросов, обслуживание БД)
WARC writing MITM HTTP/S proxy
Deduplication Based Filesystem
Data Matching Software
A list of free data matching and record linkage software.
🆔 Examples for using the dedupe library
Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
Fast Sass Loader
High performance sass loader for webpack
Semantic filesystem for Linux, with relation reasoner, autotagging plugins and a deduplication service
Cargo with less noise: warnings are skipped until errors are fixed, LSP-independent Neovim integration, etc.
Finding and deleting near-duplicate images based on perceptual hash.
A Parallelized Data Deduplication and Compression utility
Rabbitmq Message Deduplication
RabbitMQ Plugin for filtering message duplicates
A pair of kernel modules which provide pools of deduplicated and/or compressed block storage.
Analysis of The Simpsons
Filter, Sort & Delete Duplicate Files Recursively
Elasticsearch Entity Resolution
Elasticsearch entity resolution plugin based on Duke
Deduplicating archiver with encryption and paranoid-level tests. Swiss army knife for the serious backup and disaster recovery manager. Ransomware neutralizer. Win/Linux/Unix
A fast file deduplicator
Userspace tools for managing VDO volumes.
Streaming Deduplication Package for Go
Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources
Quickly detect already witnessed data.
Zfs Inplace Rebalancing
Simple bash script to rebalance pool data between all mirrors when adding vdevs to a pool.
Fast block-level out-of-band BTRFS deduplication tool.
📧 CLI to deduplicate mails from mail boxes.
Automatically concurrent data fetching and request deduplication in C#.
Benji Backup: A block based deduplicating backup software for Ceph RBD images, iSCSI targets, image files and block devices
Spark RDD with Lucene's query and entity linkage capabilities
Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.
Simplifies use of the Dedupe library via Pandas
Encrypted backups (without the backups)
Py Image Dedup
CLI utility to find near duplicate images and remove all but the best copy.
FastCDC implementation in Rust
Best GTK+ frontend (backup application) for RSYNC utility.
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
CLI utility to find duplicate files
An experimental platform for chunk-level data deduplication. Key words: DDFS, Sparse Index, Extreme Binning, SiLo, Sample Index, BLC; CBR, CFL, CAP, HAR; ASM, OPT; GC, Cumulus
Tool for managing data-deduplication within extant compressed archive files, along with a relatively performant BK tree implementation for fuzzy image searching.
A Python FUSE file system that features transparent deduplication and compression which make it ideal for archiving backups.
Pinnacle API Documentation
A transactional and deduplicating virtual file system
Sfmc Example Jb Custom Activity
Custom activity examples for Journey Builder.
Laravel Console Logger
Logging and Notifications for Laravel Console Commands.
You personal database. Mirror of https://git.sr.ht/~tsileo/blobstash
Generate duplex/single consensus reads to reduce sequencing noises and remove duplications
Record Linkage Resources
Resources for tackling record linkage / deduplication / data matching problems
Record Linkage ToolKit (Find and link entities)
Daxus is a server state management library for React that provides full control over data, leading to a better user experience.
Dedupe/batch geocode addresses and venues around the world with libpostal
Provide a high-level wrapper for kuromoji.js. Cache/Promise API
An easy to use library to save arbitrary rust data-structures to disk (or serialize to any other stream)
Python package for deduplication/entity resolution using active learning
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
The Dropbox for IPFS (without the icky stuff)
Webpack Deduplication Plugin
Plugin for webpack that de-duplicates transitive dependencies in yarn and webpack-based projects.
Vite Plugin Svelte
Svelte integration for Vite, a fast web dev tool.
A small HTTP server.
Quick and dirty backup tool benchmark with reproducible results
Tool for removing duplicate documents from Elasticsearch
Collection of middlewares for the Wretch library. 🎁
Declarative data loading and action calling within react-redux
A tool for discovery, inspection, collection/deduplication, and reporting on an IT environment
A secure and efficient file backup solution that fits both system administrators (CLI) and end users (GUI)
1-100 of 670 search results
Follow Us On Twitter
Copyright 2018-2023 Awesome Open Source. All rights reserved.