Extract formatted sentences from Word into Excel

AmadeusW
NewLounger
Posts: 11
Joined: 21 Jan 2020, 16:11

Extract formatted sentences from Word into Excel

Post by AmadeusW »

Hello everyone,

In the need of extracting (formatted) sentences from a word file into excel.
As an example I want to use the 'Welcome to Word' template file from Word.
And if all is going well, I want to extract also extra data information into excel, something like:
A = Sentence number
B = Original (formatted) sentence, with font size, color, like "This is a Sample of a formatted text."
C = Format of style: Heading, Normal, Title
D = Part of paragraph: Yes or No
E = Part of list: Yes or No
F = Part of numbering: Yes or No
G = Part of table

The definition of a sentence in VBA is difficult, I know. But want to try if it's possible. And how far I can go.


Many thanks.

User avatar
HansV
Administrator
Posts: 78568
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Extract formatted sentences from Word into Excel

Post by HansV »

Welcome to Eileen's Lounge!

That might not be easy, but I'll see if I can come up with something.
Best wishes,
Hans

User avatar
HansV
Administrator
Posts: 78568
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Extract formatted sentences from Word into Excel

Post by HansV »

Here is a first attempt. The macro should be run from Word. It is slow, and as you'll see, pasting formatted sentences also copies images. That makes the result hard to read.
I combined E and F into one column that shows the type of list.

Code: Select all

Sub SentencesToXL()
    Dim xlApp As Object
    Dim xlWbk As Object
    Dim xlWsh As Object
    Dim i As Long
    Dim s As Range
    Dim m As String
    On Error Resume Next
    Set xlApp = GetObject(Class:="Excel.Application")
    On Error GoTo 0
    If xlApp Is Nothing Then
        Set xlApp = CreateObject(Class:="Excel.Application")
    End If
    xlApp.ScreenUpdating = False
    Set xlWbk = xlApp.Workbooks.Add(-4167) ' xlWBatWorksheet
    Set xlWsh = xlWbk.Worksheets(1)
    For i = 1 To ActiveDocument.Sentences.Count
        Set s = ActiveDocument.Sentences(i)
        xlWsh.Range("A" & i).Value = i
        xlWsh.Range("B" & i).Select
        s.Copy
        xlWsh.Paste
        xlWsh.Range("C" & i).Value = s.ParagraphStyle
        xlWsh.Range("D" & i).Value = (Len(s) < Len(s.Paragraphs(1).Range))
        Select Case s.ListFormat.ListType
            Case wdListNoNumbering
                m = "No list"
            Case wdListListNumOnly
                m = "ListNum fields"
            Case wdListBullet
                m = "Bulleted list"
            Case wdListSimpleNumbering
                m = "Simple numbered list"
            Case wdListOutlineNumbering
                m = "Outlined list"
            Case wdListMixedNumbering
                m = "Mixed numbered list"
            Case wdListPictureBullet
                m = "Picture bulleted list"
        End Select
        xlWsh.Range("E" & i).Value = m
        xlWsh.Range("F" & i).Value = s.Information(wdWithInTable)
    Next i
    xlWsh.UsedRange.EntireColumn.AutoFit
    xlApp.Visible = True
    xlApp.ScreenUpdating = True
End Sub
Best wishes,
Hans

User avatar
macropod
4StarLounger
Posts: 508
Joined: 17 Dec 2010, 03:14

Re: Extract formatted sentences from Word into Excel

Post by macropod »

AmadeusW wrote: I want to extract also extra data information into excel, something like:
A = Sentence number
B = Original (formatted) sentence, with font size, color, like "This is a Sample of a formatted text."
C = Format of style: Heading, Normal, Title
You have a plethora of problems there already! As you seem to be aware, VBA has no idea what a grammatical sentence is. For example, consider:
Mr. Smith spent $1,234.56 at Dr. John's Grocery Store, to buy: 10.25kg of potatoes; 10kg of avocados; and 15.1kg of Mrs. Green's Mt. Pleasant macadamia nuts.
For you and me, that would count as one sentence; for VBA it counts as 5 sentences. This: https://gregmaxey.com/word_tip_pages/de ... ences.html" onclick="window.open(this.href);return false; might help with that.
Then there are your font an Style specifications. A given sentence (grammatical or VBA) could contain multiple font faces, point sizes, and Styles. Even in your own example, the sentence might contain three Styles (e.g. Normal, Strong, & another for the underlining).
Paul Edstein
[Fmr MS MVP - Word]

AmadeusW
NewLounger
Posts: 11
Joined: 21 Jan 2020, 16:11

Re: Extract formatted sentences from Word into Excel

Post by AmadeusW »

Many thanks, Hans,

I copied the code VBA in the 'welcom to word' Word file. I can see it copies a lot of information, then I received an error when executed it:
xlWsh.Paste

Did I forget something?

User avatar
HansV
Administrator
Posts: 78568
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Extract formatted sentences from Word into Excel

Post by HansV »

See if it works better if you remove or comment out the line

Code: Select all

    xlApp.ScreenUpdating = False
Best wishes,
Hans

AmadeusW
NewLounger
Posts: 11
Joined: 21 Jan 2020, 16:11

Re: Extract formatted sentences from Word into Excel

Post by AmadeusW »

The line is comment out, but the same error (stop) occured.

User avatar
HansV
Administrator
Posts: 78568
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Extract formatted sentences from Word into Excel

Post by HansV »

And what if you insert a line

Code: Select all

        DoEvents
immediately above the line

Code: Select all

        xlWsh.Paste
Best wishes,
Hans

AmadeusW
NewLounger
Posts: 11
Joined: 21 Jan 2020, 16:11

Re: Extract formatted sentences from Word into Excel

Post by AmadeusW »

The problem found.
It's not the code, but my clipboard. It can handle only 24 items. When I reduced the file to 10 sentences, it works.
Many thanks,

User avatar
HansV
Administrator
Posts: 78568
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Extract formatted sentences from Word into Excel

Post by HansV »

If you turn off all options in the Office Clipboard, you should be able to process more sentences.
Best wishes,
Hans

AmadeusW
NewLounger
Posts: 11
Joined: 21 Jan 2020, 16:11

Re: Extract formatted sentences from Word into Excel

Post by AmadeusW »

I turned off all options for Office Clipboard. So far no more sentences are possibile. Is there a way to clean the clipboard after every paste in the Excel file?

User avatar
HansV
Administrator
Posts: 78568
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Extract formatted sentences from Word into Excel

Post by HansV »

Does this work for you?

1) Copy the following lines to the top of the module (below Option Explicit if you have that):

Code: Select all

Public Declare Function OpenClipboard Lib "user32" (ByVal hwnd As Long) As Long
Public Declare Function EmptyClipboard Lib "user32" () As Long
Public Declare Function CloseClipboard Lib "user32" () As Long

Sub ClearClipboard()
    OpenClipboard (0&)
    EmptyClipboard
    CloseClipboard
End Sub
2) Then insert the following line into the macro below xlWsh.Paste:

Code: Select all

        ClearClipboard
Best wishes,
Hans

AmadeusW
NewLounger
Posts: 11
Joined: 21 Jan 2020, 16:11

Re: Extract formatted sentences from Word into Excel

Post by AmadeusW »

Just added the code to the top of the module (and I see two macro's) and the line below xlWsh.Paste (adjust the typo ClearClipbpard in ClearClipboard),
but no extra sentences are copied without the stop at xlWsh.Paste.
To test, I deleted the Range C-F (to save clipboard space).

User avatar
HansV
Administrator
Posts: 78568
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Extract formatted sentences from Word into Excel

Post by HansV »

Thanks for correcting the typo. I will so too in my reply.
Best wishes,
Hans

User avatar
HansV
Administrator
Posts: 78568
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Extract formatted sentences from Word into Excel

Post by HansV »

And what was the result?
Best wishes,
Hans

AmadeusW
NewLounger
Posts: 11
Joined: 21 Jan 2020, 16:11

Re: Extract formatted sentences from Word into Excel

Post by AmadeusW »

The same error (stop) at xlWsh.Paste is still there, when I try to convert the whole document. Only (approx.) 1 to 2 pages at the time works.

AmadeusW
NewLounger
Posts: 11
Joined: 21 Jan 2020, 16:11

Re: Extract formatted sentences from Word into Excel

Post by AmadeusW »

This morning I reset my computer, and cleared all memory. The codes works fine (ClearClipboard and SentenceToXL).
Many thanks,

AmadeusW
NewLounger
Posts: 11
Joined: 21 Jan 2020, 16:11

Re: Extract formatted sentences from Word into Excel

Post by AmadeusW »

Is there a way to resize the images, so the text is better to read in Excel (as you mentioned right away)?

User avatar
HansV
Administrator
Posts: 78568
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Extract formatted sentences from Word into Excel

Post by HansV »

That might be difficult. Would it be OK to remove all images?
Best wishes,
Hans

AmadeusW
NewLounger
Posts: 11
Joined: 21 Jan 2020, 16:11

Re: Extract formatted sentences from Word into Excel

Post by AmadeusW »

Sorry for the delay, couldn't login yesterday.
Is it possbile to use the text "Image from line x". Something like ("image from line "& i).
If this is difficult, a remove is fine.