[Clayton Brown - Emmie Osawa]
Is there an approved standard library/function/algarithm for comparing
two similar strings and returning a percentage match?
See the std difflib module in Python 2.1; the guts of that appeared in
earlier Python releases as part of the ndiff.py utility; it implements an
algorithm related to Ratcliff and Obershelp's "gestalt" pattern matching.
I am aware of soundEx.py / .c which is based on the grammar and
phonetics of words, but from what I have read it seems to be flawed..
and thus removed from the python standard library.
It was removed more because Soundex isn't well-defined (even Knuth's
definition changed between editions 2 and 3 of TAoCP volume 3), and it was a
PITA to keep arguing about which was "the right" version. The version we
had didn't correspond to any known published version anyway. In any case,
Soundex was specifically designed to help match Anglo and some West European
surnames, and uses beyond that were always ill-advised.
I have noticed similar techniques in other languages which are based
on shift matrixes, working out the minimum number of changes to
transform string A into string B.
There are dozens of possibilities.
I am more looking for one which looks at
perhaps omitting spaces, and a common library (the,a,and,mr,mrs......)
with a weighted scoring mechanism...
In that case, there are thousands of possibilities <0.5 wink>.
difflib-offers-one-ly y'rs - tim