My second thought was "Google", and this example looked promising, but failed, sort of. (see below)
There are numerous web pages which promise an on-line test (good for high school students who have a homework deadline), or dedicated sites ("Ancestry" or "Genealogy" dot anything) which focus on Names of People or Places and, I suspect, lean heavily on a variant of Soundex.
Below:
My acid test is for Modern English words that From 38m0s to 39m23s "Begin with C pronounced “ess”". Examples include "Civil", "Cease" and "Cymbal". (Yes, and pronounCement too, although it does not Begin With "C", and anyway, I have a better rule - not shown here - for classifying that sort of word.)
Code: Select all
Sub test()
With Selection.Find
.ClearFormatting
.Text = "Symbal" ' Replace leading C with a leading S and test to find the leading-C string.
.MatchFuzzy = False
.MatchSoundsLike = True
.Execute Format:=False, Forward:=True, Wrap:=wdFindContinue
End With
End Sub
I have a DOCument with notes which notes include various examples, such as "Civil", "Cease" and "Cymbal". I loaded the test VBA code with the word "civil" and changed that leading "c" to be an "s".
I reason that if I have the word "civil" to test, looking for sounds-like "sivil" would, if successful, satisfy the requirement that I have here (in "civil" it will turn out), a word that sounds like "sivil", and since I am testing only words that begin with "c", then any such word that sounds-like itself with the "c" replaced with "s" is a satisfactory target. (The pseudo code says "look at all words and for those that start with "C", search for that string after replacing the "c" with "s" in the FindWhat test")
My SUB TEST above shows me successfully determining that the word "cymbal" sounds like "symbal", and so satisfies my requirements (by locating "cymbal" in my document)
I had strings “Christopher” and “Church” in my document, and the SUB TEST correctly told me that "Christopher" failed the test, but that "Church" satisfied the test (which, sadly, matched "Shurch" to "Church")
Looking for a ç-cedilla probably won't help much (as in comparing “cymbal” and “la cymbale”) in the sense that we have many Modern English words that do not use modified letters (such as the ç character), and even if the french word for a cymbal did have a cedilla, how is my program supposed to know that?
My current status is that the code from the docs.microsoft.com web page is good-but-not-perfect. It seems to work better than Soundex (which was designed for indexing names by sound, as pronounced in English as distinct from classifying Modern English words in general. Soundex is very good at equating Greaves with Greeves with Grieves and so on, Witt with Wit with Whytte and so on when you are asking about your flight reservation. Not so good with "gravitational" or perhaps "generous".
There are more sounds-like conditions than the C/S one I have shown here.
I can run some more tests and measure the accuracy of the docs.microsoft.com code, and then compare that benchmark against any VBA-like method suggested.
(later) this web site suggests that "c" before "i" or "e" is a worthwhile rule in French; that might help me in English to focus on successful matches.
(later still) Getting there! This is with a rule that looks for leading ce/ci/cy (attached Text file)
Thanks
Chris