Return a json
aggregating different algorithms to compare strings.
This can help compare results from different algorithms and better choose the most appropriate depending of the objective.
- Cosine
- Levenshtein distance
- Levenshtein
- Trigram
- Jaro Winkler
- Jaro
- Hamming
First thing first, add this line to your Gemfile
gem 'string_similarity_comparator'
Then, run
$ bundle
Or install it yourself as:
$ gem install 'string_similarity_comparator'
You can test it in console right away.
>> require 'string_similarity_comparator'
true
>> StringSimilarityComparator::Pool.new('foo', 'bar').calculate
{
:cosine => 0.0,
:levenshtein => 0.333,
:levenshtein_distance => 3,
:trigram => 0.0,
:jaro_winkler => 0.0,
:jaro => 0.0,
:hamming => 3
}
To compare two words, 'foo' and 'bar' for example, call
>> StringSimilarityComparator::Pool.new('foo', 'bar').calculate
$ git clone [email protected]:coralieco/string_similarity_comparator.git
Then
$ cd string_similarity_comparator
$ bundle
$ ruby lib/string_similarity_comparator/app.rb
Go on the browser, usually on localhost:4567
Then use one of the endpoint:
-
with the form: http://localhost:4567/
-
with the url directly: http://localhost:4567/api/v1/string_similarity?string_a=foo&string_b=bar
I wrote an article about what we do at Appaloosa Store to customize application recommendation using String Similarity Algorithms: String Similarity Algorithms Compared
It compares some of these algorithms and explain why I chose to use the Jaro-Winkler algorithm in Appaloosa use-case.
Using gem work of: See the LICENSE file in the source.
JaroWinkler: Copyright (c) 2014 Jian Weihang
amatch: Copyright [2017] [Florian Frank]
string-similarity: Copyright (c) 2015 Manuel Hutter
Trigram: Copyright (c) 2014 milk1000cc