Word2003: Locating/finding foreign (Greek) text

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15587
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Word2003: Locating/finding foreign (Greek) text

Post by ChrisGreaves »

I have a 415,000 word text file https://www.gutenberg.org/files/27942/2 ... 7942-h.htm which is peppered with Greek words and phrases. I need to manipulate these Greek words and phrases, perhaps by formatting them, perhaps by extracting their host paragraphs, preparatory to learning how to speak or pronounce each Greek term.

The attached text file is my crude VBA solution:-
(a) I fabricate a function “strGenerateGreekAphabet” which delivers to me the Greek alphabet in a form which I can use in VBA
(b) Time is not a problem, so with “FindGreek” I loop through each of the 2,662 paragraphs, examining each paragraph “blnCharacterInText” to see if it contains at least one of the 52 characters in my Greek character string.

The functions appear to work, and the results will be eyeballed for each Greek word, but I worry that there may be words whose appearance is similar to that of English.
For example, “ABATE” and “MONEY” are English words that can be composed from what appears in MSWord as Greek symbols.

Is this a valid fear, or does my use of CHRW() for Unicode decimal values bypass this?

I suspect that there is nothing special about Greek; this approach should work for any text - English or not - that uses a non-Roman alphabet.

Thanks
Chris
You do not have the required permissions to view the files attached to this post.
There's nothing heavier than an empty water bottle

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15587
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Word2003: Locating/finding foreign (Greek) text

Post by ChrisGreaves »

I should add that the VBA code is not perfect, as shown by the unformatted characters within Greek words. Still it gives me a good idea of where to look.
Untitled2.png
You do not have the required permissions to view the files attached to this post.
There's nothing heavier than an empty water bottle

User avatar
HansV
Administrator
Posts: 78412
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Word2003: Locating/finding foreign (Greek) text

Post by HansV »

The A (upper case Alpha) in the Greek alphabet is not the same character as the A in the Latin/Western alphabet.
You should add the accented Greek letters to your list, or use a range: Greek and Coptic is Hex 0370 to Hex 03FF. There is also a Greek Extended range but you probably won't need that.
Best wishes,
Hans

User avatar
SpeakEasy
4StarLounger
Posts: 544
Joined: 27 Jun 2021, 10:46

Re: Word2003: Locating/finding foreign (Greek) text

Post by SpeakEasy »

>a Greek Extended range but you probably won't need that

Chris has "unformatted characters within Greek words" in the example he shows, characters that are from that extended set ...

Chris, here's some code that may prove a little faster, even though I know you say time isn't an issue (it includes two routines - one that marks each individual character, a per your image above, and one which highlights each Greek-containing paragraph as per the code you attached. If you run the DoIt subroutine, both routines run, thus making it easy to spot the paragraphs, and then highlighting the Greek words in that paragraph.

Code: Select all

Option Explicit

Sub Doit()
    HighlightGreekChar
    HighlightGreekPara
End Sub

Sub HighlightGreekChar()
    Dim myRange As Range

    Set myRange = ActiveDocument.Content
    
    With myRange.Find
        .Text = "([" & ChrW(880) & "-" & ChrW(1023) & ChrW(7936) & "-" & ChrW _
            (8191) & "])"
        .Replacement.Text = "\1"
        .Replacement.Style = ActiveDocument.Styles("csGreek")
        .Wrap = wdFindContinue
        .MatchWildcards = True
        .Execute Replace:=wdReplaceAll
    End With '
End Sub

Sub HighlightGreekPara()
    Dim myRange As Range
    
    Set myRange = ActiveDocument.Content

    Do
        With myRange.Find
            .Text = "([" & ChrW(880) & "-" & ChrW(1023) & ChrW(7936) & "-" & ChrW(8191) & "])"
            .MatchWildcards = True
            .Execute
            If .Found = True Then
                .Parent.Expand Unit:=wdParagraph
                .Parent.Style = "psGreek" ' Applies a paragraph style consisting of a red font
                .Parent.Start = .Parent.End
            End If
        End With
    Loop Until myRange.Find.Found = False
End Sub
Below is an an example of the output using the specific styles (psGreek and csGreek) I set up for this . Your output may differ ...
You do not have the required permissions to view the files attached to this post.

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15587
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Word2003: Locating/finding foreign (Greek) text

Post by ChrisGreaves »

SpeakEasy wrote:
19 Feb 2022, 20:17
... making it easy to spot the paragraphs, and then highlighting the Greek words in that paragraph. ...
Steve, thanks for this :clapping: :chocciebar: and I especially like building both cs and ps in the output. Today I am dealing with "footnotes" and "gaps". My crude analysis of Greek was sufficiently productive to make think about the web site https://play.ht/pricing/ (offers USD $14.25 / month for 240,000 words) and since JSMill is 415,000 words, mainly English, I might speak all the Greek in one session before continuing with the bulk of the English text. I can record each Greek "chunk", and then paste the audio in Audacity when I arrive at that particular paragraph (or better yet, provide it to an editor to paste in)

Your better display should make it much easier for me to do all the translations in one session over a few days only.
More later
Chris Thanks again for the superlative code.
There's nothing heavier than an empty water bottle

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15587
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Word2003: Locating/finding foreign (Greek) text

Post by ChrisGreaves »

SpeakEasy wrote:
19 Feb 2022, 20:17

Code: Select all

    With myRange.Find
(snip!)
        .Replacement.Text = "\1"
(snip!)
    End With '
OK, Mike, I'll bite (grin)

Please and Thankyou, what is the "\1"? And are there any more like it?
I am used to (Word2003) Edit/Replace using the FindWhat symbol ^&, but not so any switches or options in the VBA.

It works (wonderfully, and i am rediscovering marching-red-ants).
I understand the range of characters [...] and that you have found Greek character sets in Unicode.

But I am puzzled about \1.
Thanks
Chris
Last edited by ChrisGreaves on 21 Feb 2022, 18:47, edited 1 time in total.
There's nothing heavier than an empty water bottle

User avatar
HansV
Administrator
Posts: 78412
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Word2003: Locating/finding foreign (Greek) text

Post by HansV »

If you find/replace using wildcards, text in the 'Find what' box enclosed in parentheses ( ) defines a sequence.
In the 'Replace with' box, \1 refers to the first sequence, \2 to the second one etc.
Example:
The 'Find what' box contains

Chris ([0-9]{3})

This will search for 'Chris ' followed by three digits: 'Chris 123' or 'Chris 666' etc.
The 'Replace with box contains

Greaves \1

This will cause Word to replace 'Chris ' followed by three digits with 'Greaves ' followed by the same three digits. So 'Chris 123' becomes 'Greaves 123' and 'Chris 666' becomes 'Greaves 666'.

See Finding and replacing characters using wildcards
Best wishes,
Hans

User avatar
SpeakEasy
4StarLounger
Posts: 544
Joined: 27 Jun 2021, 10:46

Re: Word2003: Locating/finding foreign (Greek) text

Post by SpeakEasy »

Hans has pretty much covered it.

And I'm Mike, not Steve ...

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15587
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Word2003: Locating/finding foreign (Greek) text

Post by ChrisGreaves »

SpeakEasy wrote:
21 Feb 2022, 18:16
Hans has pretty much covered it.
And I'm Mike, not Steve ...
Mike, my apologies. There is a "Steve" in another phpBB site who is almost as helpful as you. That is how I got confused.
Next time that you are in Bonavista i will buy you a coffee. :coffeetime:

I have followed Hans's comments, and am now looking in my current text to put the logic to use, to make sure that I have understood it.
Cheers
Chris
There's nothing heavier than an empty water bottle

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15587
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Word2003: Locating/finding foreign (Greek) text

Post by ChrisGreaves »

SpeakEasy wrote:
19 Feb 2022, 20:17
... one that marks each individual character, ...
Speakeasy, when one formats each foreign character in a character style, one can then patrol the document text looking for all Words that exhibit that character style, and with that collection of Words written to a new document, and the duplicates eliminated by a little macro ("ParagraphDuplicateAllKeyEntireDescending") one can than convert the set of (foreign word) paragraphs into a one-column table, duplicate the column, save the document, and then return to the main document and use Index, Automark, nominating the new document as a Concordance Table, thereby producing an Index of foreign words within the document.

An updated form of the 415,000 word tome, augmented with an index for every Greek word, is now under inspection by students of the UWA Greek Students Association (https://www.uwastudentguild.com/clubs/g ... ssociation), so, once again, my thanks to you. :clapping:

Of course, one does not need to eliminate duplicate words, but the Greek Students in Western Australia don't need to know that, do they?
Cheers
Chris
There's nothing heavier than an empty water bottle

User avatar
SpeakEasy
4StarLounger
Posts: 544
Joined: 27 Jun 2021, 10:46

Re: Word2003: Locating/finding foreign (Greek) text

Post by SpeakEasy »

:grin:

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15587
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Word2003: Locating/finding foreign (Greek) text

Post by ChrisGreaves »

ChrisGreaves wrote:
19 Feb 2022, 15:35
I suspect that there is nothing special about Greek; this approach should work for any text - English or not - that uses a non-Roman alphabet.
Distributed proofreaders has a set of tables that would provide fodder for a generic non-English character search.

I imagine that one could detect "Basic Latin" characters (and hence Words, Sentences, Paragraphs, Documents) by searching using the set of Unicode values in the Basic Latin set after reducing the set by the set of Unicode values in the English set.
In the case of Basic Latin one might miss the occasional short word (a? an?), but the sentences and paragraphs, in which a short word is embedded, should be located.
Cheers
Chris
There's nothing heavier than an empty water bottle