The C++ implementation of the improved fts_fuzzy_match algorithm by Forrest Smith described in the 2017 update to his Reverse Engineering Sublime Text's Fuzzy Match article, and published in the lib_fts project.
Table of Contents
fts_fuzzy_match.h
— Fuzzy Match v0.2.0 in C++.test.sh
— test launcher.test.cpp
— test code.test_results.txt
— generated test results.
The fts_fuzzy_match.h
file was taken from Forrest Smith's lib_fts repository, commit d541eb4, 2017-02-19.
The test.sh
and test.cpp
files were created and added by Tristano Ajmone.
For the original documentation, see ../../fuzzy_match.md
.
For the original v0.1.0 algorithm, see: ../../0.1.0/cpp/fts_fuzzy_match.h
.
After one year from the pubblication of Forrest's Reverse Engineering Sublime Text's Fuzzy Match article, Sublime Text author Jon Skinner posted on reddit a comment on Forrest article, confirming the soundness of the overall approach and also providing insights on how to improve the algorithm:
jskinner: Sublime Text author here. Nice writeup! The actual algorithm that Sublime Text uses is similar in principle, but the implementation is rather different in the name of speed.
There are a couple of things you may want to tweak for better quality matching: you likely want to search more exhaustively for a better match, e.g., searching for "lll" in the UE4 filename list matches "SVisualLoggerLogsList.h", but not as well as it should, as it doesn't pickup all the upper case Ls. If you don't mind sacrificing a little speed, then it's also nice to handle transpositions, so that "sta" can still match against "SpawnActorTimer", just with a lower score - Sublime Text 3 does this.
Skinner's feedback was a major breakthrough that prompted Forrest to update the fts_fuzzy_match C++ algorithm to v0.2.0 to include the "exhaustive search" suggested by Skinner, and to add the following update note to the original article:
This project has been updated based on feedback from Sublime Text's Jon Skinner. The algorithm now performs an exhaustive search to find all possible matches and returns the match with the highest score.
Consider the string "
SVisualLoggerLogsList.h
" and the search pattern "LLL
". There are four L's so they can be matched several different ways. The naive approach might match the first three L's. A higher scoring match would skip the first L and match the three CamelCase L's.String: SVisualLoggerLogsList.h Pattern: LLL Possible Matches (in bold): SVisualLoggerLogsList.h SVisualLoggerLogsList.h SVisualLoggerLogsList.h SVisualLoggerLogsList.h
The new exhaustive method finds all matches and returns the one with the highest score. Performing an exhaustive search is slower than finding just the first match. However the small decrease in speed is more than made up for by the increase in result quality.
To run the tests, open the Bash shell and type (GCC required):
./test.sh
Since this is the original implementation of v0.2.0 of the algorithm, the generated test_results.txt
file was copied to ../expected_results.txt
and it's being used as a reference to validate the results generated by ports of the v0.2.0 algorithm to other languages.
All files in this folder are in the public domain.
The fts_fuzzy_match.h
file was taken by Forrest Smith's lib_fts project, released under the following license:
This software is dual-licensed to the public domain and under the following license: you are granted a perpetual, irrevocable license to copy, modify, publish, and distribute this file as you see fit.
The test.sh
and test.cpp
test files were written by Tristano Ajmone and released into the public domain via the [CC0 license].