Extract formatted sentences from Word into Excel
-
- NewLounger
- Posts: 11
- Joined: 21 Jan 2020, 16:11
Extract formatted sentences from Word into Excel
Hello everyone,
In the need of extracting (formatted) sentences from a word file into excel.
As an example I want to use the 'Welcome to Word' template file from Word.
And if all is going well, I want to extract also extra data information into excel, something like:
A = Sentence number
B = Original (formatted) sentence, with font size, color, like "This is a Sample of a formatted text."
C = Format of style: Heading, Normal, Title
D = Part of paragraph: Yes or No
E = Part of list: Yes or No
F = Part of numbering: Yes or No
G = Part of table
The definition of a sentence in VBA is difficult, I know. But want to try if it's possible. And how far I can go.
Many thanks.
In the need of extracting (formatted) sentences from a word file into excel.
As an example I want to use the 'Welcome to Word' template file from Word.
And if all is going well, I want to extract also extra data information into excel, something like:
A = Sentence number
B = Original (formatted) sentence, with font size, color, like "This is a Sample of a formatted text."
C = Format of style: Heading, Normal, Title
D = Part of paragraph: Yes or No
E = Part of list: Yes or No
F = Part of numbering: Yes or No
G = Part of table
The definition of a sentence in VBA is difficult, I know. But want to try if it's possible. And how far I can go.
Many thanks.
-
- Administrator
- Posts: 78568
- Joined: 16 Jan 2010, 00:14
- Status: Microsoft MVP
- Location: Wageningen, The Netherlands
Re: Extract formatted sentences from Word into Excel
Welcome to Eileen's Lounge!
That might not be easy, but I'll see if I can come up with something.
That might not be easy, but I'll see if I can come up with something.
Best wishes,
Hans
Hans
-
- Administrator
- Posts: 78568
- Joined: 16 Jan 2010, 00:14
- Status: Microsoft MVP
- Location: Wageningen, The Netherlands
Re: Extract formatted sentences from Word into Excel
Here is a first attempt. The macro should be run from Word. It is slow, and as you'll see, pasting formatted sentences also copies images. That makes the result hard to read.
I combined E and F into one column that shows the type of list.
I combined E and F into one column that shows the type of list.
Code: Select all
Sub SentencesToXL()
Dim xlApp As Object
Dim xlWbk As Object
Dim xlWsh As Object
Dim i As Long
Dim s As Range
Dim m As String
On Error Resume Next
Set xlApp = GetObject(Class:="Excel.Application")
On Error GoTo 0
If xlApp Is Nothing Then
Set xlApp = CreateObject(Class:="Excel.Application")
End If
xlApp.ScreenUpdating = False
Set xlWbk = xlApp.Workbooks.Add(-4167) ' xlWBatWorksheet
Set xlWsh = xlWbk.Worksheets(1)
For i = 1 To ActiveDocument.Sentences.Count
Set s = ActiveDocument.Sentences(i)
xlWsh.Range("A" & i).Value = i
xlWsh.Range("B" & i).Select
s.Copy
xlWsh.Paste
xlWsh.Range("C" & i).Value = s.ParagraphStyle
xlWsh.Range("D" & i).Value = (Len(s) < Len(s.Paragraphs(1).Range))
Select Case s.ListFormat.ListType
Case wdListNoNumbering
m = "No list"
Case wdListListNumOnly
m = "ListNum fields"
Case wdListBullet
m = "Bulleted list"
Case wdListSimpleNumbering
m = "Simple numbered list"
Case wdListOutlineNumbering
m = "Outlined list"
Case wdListMixedNumbering
m = "Mixed numbered list"
Case wdListPictureBullet
m = "Picture bulleted list"
End Select
xlWsh.Range("E" & i).Value = m
xlWsh.Range("F" & i).Value = s.Information(wdWithInTable)
Next i
xlWsh.UsedRange.EntireColumn.AutoFit
xlApp.Visible = True
xlApp.ScreenUpdating = True
End Sub
Best wishes,
Hans
Hans
-
- 4StarLounger
- Posts: 508
- Joined: 17 Dec 2010, 03:14
Re: Extract formatted sentences from Word into Excel
You have a plethora of problems there already! As you seem to be aware, VBA has no idea what a grammatical sentence is. For example, consider:AmadeusW wrote: I want to extract also extra data information into excel, something like:
A = Sentence number
B = Original (formatted) sentence, with font size, color, like "This is a Sample of a formatted text."
C = Format of style: Heading, Normal, Title
Mr. Smith spent $1,234.56 at Dr. John's Grocery Store, to buy: 10.25kg of potatoes; 10kg of avocados; and 15.1kg of Mrs. Green's Mt. Pleasant macadamia nuts.
For you and me, that would count as one sentence; for VBA it counts as 5 sentences. This: https://gregmaxey.com/word_tip_pages/de ... ences.html" onclick="window.open(this.href);return false; might help with that.
Then there are your font an Style specifications. A given sentence (grammatical or VBA) could contain multiple font faces, point sizes, and Styles. Even in your own example, the sentence might contain three Styles (e.g. Normal, Strong, & another for the underlining).
Paul Edstein
[Fmr MS MVP - Word]
[Fmr MS MVP - Word]
-
- NewLounger
- Posts: 11
- Joined: 21 Jan 2020, 16:11
Re: Extract formatted sentences from Word into Excel
Many thanks, Hans,
I copied the code VBA in the 'welcom to word' Word file. I can see it copies a lot of information, then I received an error when executed it:
xlWsh.Paste
Did I forget something?
I copied the code VBA in the 'welcom to word' Word file. I can see it copies a lot of information, then I received an error when executed it:
xlWsh.Paste
Did I forget something?
-
- Administrator
- Posts: 78568
- Joined: 16 Jan 2010, 00:14
- Status: Microsoft MVP
- Location: Wageningen, The Netherlands
Re: Extract formatted sentences from Word into Excel
See if it works better if you remove or comment out the line
Code: Select all
xlApp.ScreenUpdating = False
Best wishes,
Hans
Hans
-
- NewLounger
- Posts: 11
- Joined: 21 Jan 2020, 16:11
Re: Extract formatted sentences from Word into Excel
The line is comment out, but the same error (stop) occured.
-
- Administrator
- Posts: 78568
- Joined: 16 Jan 2010, 00:14
- Status: Microsoft MVP
- Location: Wageningen, The Netherlands
Re: Extract formatted sentences from Word into Excel
And what if you insert a line
immediately above the line
Code: Select all
DoEvents
Code: Select all
xlWsh.Paste
Best wishes,
Hans
Hans
-
- NewLounger
- Posts: 11
- Joined: 21 Jan 2020, 16:11
Re: Extract formatted sentences from Word into Excel
The problem found.
It's not the code, but my clipboard. It can handle only 24 items. When I reduced the file to 10 sentences, it works.
Many thanks,
It's not the code, but my clipboard. It can handle only 24 items. When I reduced the file to 10 sentences, it works.
Many thanks,
-
- Administrator
- Posts: 78568
- Joined: 16 Jan 2010, 00:14
- Status: Microsoft MVP
- Location: Wageningen, The Netherlands
Re: Extract formatted sentences from Word into Excel
If you turn off all options in the Office Clipboard, you should be able to process more sentences.
Best wishes,
Hans
Hans
-
- NewLounger
- Posts: 11
- Joined: 21 Jan 2020, 16:11
Re: Extract formatted sentences from Word into Excel
I turned off all options for Office Clipboard. So far no more sentences are possibile. Is there a way to clean the clipboard after every paste in the Excel file?
-
- Administrator
- Posts: 78568
- Joined: 16 Jan 2010, 00:14
- Status: Microsoft MVP
- Location: Wageningen, The Netherlands
Re: Extract formatted sentences from Word into Excel
Does this work for you?
1) Copy the following lines to the top of the module (below Option Explicit if you have that):
2) Then insert the following line into the macro below xlWsh.Paste:
1) Copy the following lines to the top of the module (below Option Explicit if you have that):
Code: Select all
Public Declare Function OpenClipboard Lib "user32" (ByVal hwnd As Long) As Long
Public Declare Function EmptyClipboard Lib "user32" () As Long
Public Declare Function CloseClipboard Lib "user32" () As Long
Sub ClearClipboard()
OpenClipboard (0&)
EmptyClipboard
CloseClipboard
End Sub
Code: Select all
ClearClipboard
Best wishes,
Hans
Hans
-
- NewLounger
- Posts: 11
- Joined: 21 Jan 2020, 16:11
Re: Extract formatted sentences from Word into Excel
Just added the code to the top of the module (and I see two macro's) and the line below xlWsh.Paste (adjust the typo ClearClipbpard in ClearClipboard),
but no extra sentences are copied without the stop at xlWsh.Paste.
To test, I deleted the Range C-F (to save clipboard space).
but no extra sentences are copied without the stop at xlWsh.Paste.
To test, I deleted the Range C-F (to save clipboard space).
-
- Administrator
- Posts: 78568
- Joined: 16 Jan 2010, 00:14
- Status: Microsoft MVP
- Location: Wageningen, The Netherlands
Re: Extract formatted sentences from Word into Excel
Thanks for correcting the typo. I will so too in my reply.
Best wishes,
Hans
Hans
-
- Administrator
- Posts: 78568
- Joined: 16 Jan 2010, 00:14
- Status: Microsoft MVP
- Location: Wageningen, The Netherlands
-
- NewLounger
- Posts: 11
- Joined: 21 Jan 2020, 16:11
Re: Extract formatted sentences from Word into Excel
The same error (stop) at xlWsh.Paste is still there, when I try to convert the whole document. Only (approx.) 1 to 2 pages at the time works.
-
- NewLounger
- Posts: 11
- Joined: 21 Jan 2020, 16:11
Re: Extract formatted sentences from Word into Excel
This morning I reset my computer, and cleared all memory. The codes works fine (ClearClipboard and SentenceToXL).
Many thanks,
Many thanks,
-
- NewLounger
- Posts: 11
- Joined: 21 Jan 2020, 16:11
Re: Extract formatted sentences from Word into Excel
Is there a way to resize the images, so the text is better to read in Excel (as you mentioned right away)?
-
- Administrator
- Posts: 78568
- Joined: 16 Jan 2010, 00:14
- Status: Microsoft MVP
- Location: Wageningen, The Netherlands
Re: Extract formatted sentences from Word into Excel
That might be difficult. Would it be OK to remove all images?
Best wishes,
Hans
Hans
-
- NewLounger
- Posts: 11
- Joined: 21 Jan 2020, 16:11
Re: Extract formatted sentences from Word into Excel
Sorry for the delay, couldn't login yesterday.
Is it possbile to use the text "Image from line x". Something like ("image from line "& i).
If this is difficult, a remove is fine.
Is it possbile to use the text "Image from line x". Something like ("image from line "& i).
If this is difficult, a remove is fine.