Project Name  Stars  Downloads  Repos Using This  Packages Using This  Most Recent Commit  Total Releases  Latest Release  Open Issues  License  Language 

Difftastic  15,820  14 hours ago  63  November 26, 2023  153  mit  Rust  
a structural diff that understands syntax 🟥🟩  
Differencekit  3,328  19  6 months ago  21  May 07, 2021  27  apache2.0  Swift  
💻 A fast and flexible O(n) difference algorithm framework for Swift collection.  
Textdistance  3,217  14  49  2 months ago  26  September 28, 2023  9  mit  Python  
📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.  
Dwifft  1,767  26  3 years ago  12  October 22, 2018  18  mit  Swift  
Swift Diff  
Diff.swift  935  22  5 years ago  7  September 30, 2017  8  mit  Swift  
The fastest Diff and patch library in Swift. Includes UICollectionView/UITableView utils.  
Nanomorph  714  225  74  3 years ago  33  February 18, 2021  17  mit  JavaScript  
🚅  Hyper fast diffing algorithm for real DOM nodes  
Diffabledatasources  619  2 years ago  4  June 08, 2021  13  apache2.0  Swift  
💾 A library for backporting UITableView/UICollectionViewDiffableDataSource.  
Editscript  423  2  8 months ago  23  March 14, 2023  11  epl1.0  Clojure  
A library to diff and patch Clojure/ClojureScript data structures  
Vim Diff Enhanced  332  4 years ago  Vim script  
Better Diff options for Vim  
Htmldiff.net  267  11  4  a month ago  6  October 27, 2023  35  mit  C#  
Html Diff algorithm for .NET 
TextDistance  python library for comparing distance between two or more sequences by many algorithms.
Features:
Algorithm  Class  Functions 

Hamming  Hamming 
hamming 
MLIPNS  Mlipns 
mlipns 
Levenshtein  Levenshtein 
levenshtein 
DamerauLevenshtein  DamerauLevenshtein 
damerau_levenshtein 
JaroWinkler  JaroWinkler 
jaro_winkler , jaro

Strcmp95  StrCmp95 
strcmp95 
NeedlemanWunsch  NeedlemanWunsch 
needleman_wunsch 
Gotoh  Gotoh 
gotoh 
SmithWaterman  SmithWaterman 
smith_waterman 
Algorithm  Class  Functions 

Jaccard index  Jaccard 
jaccard 
SrensenDice coefficient  Sorensen 
sorensen , sorensen_dice , dice

Tversky index  Tversky 
tversky 
Overlap coefficient  Overlap 
overlap 
Tanimoto distance  Tanimoto 
tanimoto 
Cosine similarity  Cosine 
cosine 
MongeElkan  MongeElkan 
monge_elkan 
Bag distance  Bag 
bag 
Algorithm  Class  Functions 

longest common subsequence similarity  LCSSeq 
lcsseq 
longest common substring similarity  LCSStr 
lcsstr 
RatcliffObershelp similarity  RatcliffObershelp 
ratcliff_obershelp 
Normalized compression distance with different compression algorithms.
Classic compression algorithms:
Algorithm  Class  Function 

Arithmetic coding  ArithNCD 
arith_ncd 
RLE  RLENCD 
rle_ncd 
BWT RLE  BWTRLENCD 
bwtrle_ncd 
Normal compression algorithms:
Algorithm  Class  Function 

Square Root  SqrtNCD 
sqrt_ncd 
Entropy  EntropyNCD 
entropy_ncd 
Work in progress algorithms that compare two strings as array of bits:
Algorithm  Class  Function 

BZ2  BZ2NCD 
bz2_ncd 
LZMA  LZMANCD 
lzma_ncd 
ZLib  ZLIBNCD 
zlib_ncd 
See blog post for more details about NCD.
Algorithm  Class  Functions 

MRA  MRA 
mra 
Editex  Editex 
editex 
Algorithm  Class  Functions 

Prefix similarity  Prefix 
prefix 
Postfix similarity  Postfix 
postfix 
Length distance  Length 
length 
Identity similarity  Identity 
identity 
Matrix similarity  Matrix 
matrix 
Only pure python implementation:
pip install textdistance
With extra libraries for maximum speed:
pip install "textdistance[extras]"
With all libraries (required for benchmarking and testing):
pip install "textdistance[benchmark]"
With algorithm specific extras:
pip install "textdistance[Hamming]"
Algorithms with available extras: DamerauLevenshtein
, Hamming
, Jaro
, JaroWinkler
, Levenshtein
.
Via pip:
pip install e git+https://github.com/life4/textdistance.git#egg=textdistance
Or clone repo and install with some extras:
git clone https://github.com/life4/textdistance.git
pip install e ".[benchmark]"
All algorithms have 2 interfaces:
All algorithms have some common methods:
.distance(*sequences)
 calculate distance between sequences..similarity(*sequences)
 calculate similarity for sequences..maximum(*sequences)
 maximum possible value for distance and similarity. For any sequence: distance + similarity == maximum
..normalized_distance(*sequences)
 normalized distance between sequences. The return value is a float between 0 and 1, where 0 means equal, and 1 totally different..normalized_similarity(*sequences)
 normalized similarity for sequences. The return value is a float between 0 and 1, where 0 means totally different, and 1 equal.Most common init arguments:
qval
 qvalue for split sequences into qgrams. Possible values:
as_set
 for tokenbased algorithms:
t
and ttt
is equal.t
and ttt
is different.For example, Hamming distance:
import textdistance
textdistance.hamming('test', 'text')
# 1
textdistance.hamming.distance('test', 'text')
# 1
textdistance.hamming.similarity('test', 'text')
# 3
textdistance.hamming.normalized_distance('test', 'text')
# 0.25
textdistance.hamming.normalized_similarity('test', 'text')
# 0.75
textdistance.Hamming(qval=2).distance('test', 'text')
# 2
Any other algorithms have same interface.
A few articles with examples how to use textdistance in the real world:
For main algorithms textdistance try to call known external libraries (fastest first) if available (installed in your system) and possible (this implementation can compare this type of sequences). Install textdistance with extras for this feature.
You can disable this by passing external=False
argument on init:
import textdistance
hamming = textdistance.Hamming(external=False)
hamming('text', 'testit')
# 3
Supported libraries:
Algorithms:
Without extras installation:
algorithm  library  time 

DamerauLevenshtein  rapidfuzz  0.00312 
DamerauLevenshtein  jellyfish  0.00591 
DamerauLevenshtein  pyxdameraulevenshtein  0.03335 
DamerauLevenshtein  textdistance  0.83524 
Hamming  Levenshtein  0.00038 
Hamming  rapidfuzz  0.00044 
Hamming  jellyfish  0.00091 
Hamming  distance  0.00812 
Hamming  textdistance  0.03531 
Jaro  rapidfuzz  0.00092 
Jaro  jellyfish  0.00191 
Jaro  textdistance  0.07365 
JaroWinkler  rapidfuzz  0.00094 
JaroWinkler  jellyfish  0.00195 
JaroWinkler  textdistance  0.07501 
Levenshtein  rapidfuzz  0.00099 
Levenshtein  Levenshtein  0.00122 
Levenshtein  jellyfish  0.00254 
Levenshtein  pylev  0.15688 
Levenshtein  distance  0.28669 
Levenshtein  textdistance  0.53902 
Total: 24 libs.
Yeah, so slow. Use TextDistance on production only with extras.
Textdistance use benchmark's results for algorithm's optimization and try to call fastest external lib first (if possible).
You can run benchmark manually on your system:
pip install textdistance[benchmark]
python3 m textdistance.benchmark
TextDistance show benchmarks results table for your system and save libraries priorities into libraries.json
file in TextDistance's folder. This file will be used by textdistance for calling fastest algorithm implementation. Default libraries.json already included in package.
All you need is task. See Taskfile.yml for the list of available commands. For example, to run tests including thirdparty libraries usage, execute task pytestexternal:run
.
PRs are welcome!
textdistance
. More users, more contributions, more amazing features.Thank you ❤️