Extract data after key word

gailb
3StarLounger
Posts: 254
Joined: 09 May 2020, 14:00

Extract data after key word

Post by gailb »

I found this code to extract information from a group of Word docs in a folder.

In column A of the spreadsheet I'm trying to extract the data after a key word, and then column B, the name of the file.

In column A I only get the first letter and column B is working fine. With the below, I'm trying to extract, My Data is stored here in column A.

1.1.1. Source of Count. My Data is stored here.

Also, in some documents, there may be multiple instances of Source of Count. I would like to capture those also. I supposed the file name would have to go in Column A with the results beginning in Column B and to the right.

Code: Select all

Sub ExtractInfo()
    Dim myPath  As String
    Dim myFile  As String
    Dim myText  As String
    Dim xlRow   As Long
    Dim xl      As Object: Set xl = CreateObject("excel.application")
        
    xl.Workbooks.Add
    xl.Visible = True
    
    myPath = "C:\Users\admin\Library\"
    myFile = Dir(myPath & "*.docx")
    
    xlRow = 1
    Do While myFile <> ""
        Documents.Open myPath & myFile
        Selection.HomeKey Unit:=wdStory
        With Selection.Find
           .ClearFormatting
           .MatchWildcards = True
           .Text = "Source of Count. [<A-Z.]"
           .Execute
        End With
         Selection.MoveStart Count:=Len("Source of Count.  ")
        myText = Selection.Text
        xl.ActiveWorkbook.ActiveSheet.Cells(xlRow, 1) = myText
        xl.ActiveWorkbook.ActiveSheet.Cells(xlRow, 2) = myFile
        xlRow = xlRow + 1
        ActiveWindow.Close False
        myFile = Dir
    Loop

End Sub

User avatar
HansV
Administrator
Posts: 78235
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Extract data after key word

Post by HansV »

Welcome to Eileen's Lounge!

I'll investigate your problem. Stay tuned.
Best wishes,
Hans

User avatar
HansV
Administrator
Posts: 78235
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Extract data after key word

Post by HansV »

Here is a modified version of the macro:

Code: Select all

Sub ExtractInfo()
    Dim myPath  As String
    Dim myFile  As String
    Dim myText  As String
    Dim xlRow   As Long
    Dim xlApp   As Object
    Dim xlWbk   As Object
    Dim xlWsh   As Object
    Dim doc     As Document

    On Error Resume Next
    Set xlApp = GetObject(Class:="Excel.Application")
    If xlApp Is Nothing Then
        Set xlApp = CreateObject(Class:="Excel.Application")
    End If
    On Error GoTo 0
    xlApp.Visible = True

    Set xlWbk = xlApp.Workbooks.Add(Template:=-4167)
    Set xlWsh = xlWbk.Worksheets(1)
    xlApp.ScreenUpdating = False

    myPath = "C:\Users\admin\Library\"
    myFile = Dir(myPath & "*.docx")

    Application.ScreenUpdating = False
    Do While myFile <> ""
        Set doc = Documents.Open(myPath & myFile)
        Selection.HomeKey Unit:=wdStory
        With Selection.Find
           .ClearFormatting
           .MatchWildcards = True
           .Text = "Source of Count. [<A-Z.]"
           Do While .Execute
                Selection.MoveStart Count:=Len("Source of Count. ")
                Selection.MoveEndUntil Cset:="."
                myText = Selection.Text
                xlRow = xlRow + 1
                xlWsh.Cells(xlRow, 1) = myText
                xlWsh.Cells(xlRow, 2) = myFile
                Selection.Collapse Direction:=wdCollapseEnd
           Loop
        End With
        doc.Close SaveChanges:=False
        myFile = Dir
    Loop
    Application.ScreenUpdating = False

    xlWsh.Range("A1:B1").EntireColumn.AutoFit
    xlApp.ScreenUpdating = True
End Sub
Best wishes,
Hans

gailb
3StarLounger
Posts: 254
Joined: 09 May 2020, 14:00

Re: Extract data after key word

Post by gailb »

Hi Hans and thank you for your time.

This is working great so far, but two adjustments if possible.

1. On some of the file there is one space after Source of Count., and sometimes two. Can this be modified to allow either one space or two?
2. Also, my mistake, after Source of Count., it can be multiple sentences to make up a paragraph. Right now, it just finds the end of the first period.

User avatar
HansV
Administrator
Posts: 78235
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Extract data after key word

Post by HansV »

Try this version:

Code: Select all

Sub ExtractInfo()
    Dim myPath  As String
    Dim myFile  As String
    Dim myText  As String
    Dim xlRow   As Long
    Dim xlApp   As Object
    Dim xlWbk   As Object
    Dim xlWsh   As Object
    Dim doc     As Document

    On Error Resume Next
    Set xlApp = GetObject(Class:="Excel.Application")
    If xlApp Is Nothing Then
        Set xlApp = CreateObject(Class:="Excel.Application")
    End If
    On Error GoTo 0
    xlApp.Visible = True

    Set xlWbk = xlApp.Workbooks.Add(Template:=-4167)
    Set xlWsh = xlWbk.Worksheets(1)
    xlApp.ScreenUpdating = False

    myPath = "C:\Users\admin\Library\"
    myPath = "C:\Users\gebruiker\Documents\"
    myFile = Dir(myPath & "*.docx")

    Application.ScreenUpdating = False
    Do While myFile <> ""
        Set doc = Documents.Open(myPath & myFile)
        Selection.HomeKey Unit:=wdStory
        With Selection.Find
           .ClearFormatting
           .MatchWildcards = True
           .Text = "Source of Count.[ ]{1,}[<A-Z.]"
           Do While .Execute
                Selection.MoveStart Count:=Len("Source of Count. ")
                Selection.MoveEnd Unit:=wdParagraph
                myText = Trim(Selection.Text)
                xlRow = xlRow + 1
                xlWsh.Cells(xlRow, 1) = myText
                xlWsh.Cells(xlRow, 2) = myFile
                Selection.Collapse Direction:=wdCollapseEnd
           Loop
        End With
        doc.Close SaveChanges:=False
        myFile = Dir
    Loop
    Application.ScreenUpdating = False

    xlWsh.Range("A1:B1").EntireColumn.AutoFit
    xlApp.ScreenUpdating = True
End Sub
Best wishes,
Hans

gailb
3StarLounger
Posts: 254
Joined: 09 May 2020, 14:00

Re: Extract data after key word

Post by gailb »

Thank you. This does it.

User avatar
HansV
Administrator
Posts: 78235
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Extract data after key word

Post by HansV »

Good to hear that. Thanks for the feedback.
Best wishes,
Hans

gailb
3StarLounger
Posts: 254
Joined: 09 May 2020, 14:00

Re: Extract data after key word

Post by gailb »

Hi Hans,

It turns out there are three other paragraphs of information I would like to collect.

Code: Select all

.Text = "Department.[ ]{1,}[<A-Z.]"

Code: Select all

.Text = "Definition.[ ]{1,}[<A-Z.]"

Code: Select all

.Text = "Place of Order.[ ]{1,}[<A-Z.]"
How can I loop these other find's into the routine?

User avatar
HansV
Administrator
Posts: 78235
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Extract data after key word

Post by HansV »

Try this version:

Code: Select all

Sub ExtractInfo()
    Dim mySrch  As Variant
    Dim myPath  As String
    Dim myFile  As String
    Dim myText  As String
    Dim xlRow   As Long
    Dim xlApp   As Object
    Dim xlWbk   As Object
    Dim xlWsh   As Object
    Dim doc     As Document

    On Error Resume Next
    Set xlApp = GetObject(Class:="Excel.Application")
    If xlApp Is Nothing Then
        Set xlApp = CreateObject(Class:="Excel.Application")
    End If
    On Error GoTo 0
    xlApp.Visible = True

    Set xlWbk = xlApp.Workbooks.Add(Template:=-4167)
    Set xlWsh = xlWbk.Worksheets(1)
    xlApp.ScreenUpdating = False

    myPath = "C:\Users\admin\Library\"
    myPath = "C:\Users\gebruiker\Documents\"
    myFile = Dir(myPath & "*.docx")

    Application.ScreenUpdating = False
    Do While myFile <> ""
        Set doc = Documents.Open(myPath & myFile)
        For Each mySrch In Array("Source of Count.", "Department.", "Definition.", "Place of Order.")
            Selection.HomeKey Unit:=wdStory
            With Selection.Find
               .ClearFormatting
               .MatchWildcards = True
               .Text = mySrch & "[ ]{1,}[<A-Z.]"
               Do While .Execute
                    Selection.MoveStart Count:=Len(mySrch) + 1
                    Selection.MoveEnd Unit:=wdParagraph
                    myText = Trim(Selection.Text)
                    xlRow = xlRow + 1
                    xlWsh.Cells(xlRow, 1) = myText
                    xlWsh.Cells(xlRow, 2) = myFile
                    Selection.Collapse Direction:=wdCollapseEnd
               Loop
            End With
        Next mySrch
        doc.Close SaveChanges:=False
        myFile = Dir
    Loop
    Application.ScreenUpdating = False

    xlWsh.Range("A1:B1").EntireColumn.AutoFit
    xlApp.ScreenUpdating = True
End Sub
Best wishes,
Hans

snb
4StarLounger
Posts: 547
Joined: 14 Nov 2012, 16:06

Re: Extract data after key word

Post by snb »

Alternative (also faster):

Code: Select all

Sub M_snb()
    c00 = "C:\Users\admin\Library\"
    c01 = Dir(c00 & "*.docx")

    Do While c01 <> ""
        With GetObject(c00 & c01)
            c02 = c02 & vbCrLf & c00 & c01 & "," & Join(Filter(Split(.Content, vbCr), "Source of Count"), vbCrLf & c00 & c01 & ",")
            .Close 0
        End With
        c01 = Dir
    Loop
    
   CreateObject("scripting.filesystemobject").createtextfile("C:\snb_001.csv").write c02
   
   GetObject("C:\snb_001.csv").Visible = True
End Sub

gailb
3StarLounger
Posts: 254
Joined: 09 May 2020, 14:00

Re: Extract data after key word

Post by gailb »

Hi Hans,

I've been trying this all day to loop thru the columns, but have seem to fail.

Code: Select all

    Application.ScreenUpdating = False
    xlWsh.Range("A1:E1").value = Array("Standard", "Title", "Definition", "Source of Count", "Method of Count")
    Do While myFile <> ""
        Set doc = Documents.Open(myPath & myFile)
        For Each mySrch In Array("Title.", "Definition.", "Source of Count.", "Method of Count.")
            For xlCol = 2 To UBound(mySrch(xlCol))
            Selection.HomeKey Unit:=wdStory
            With Selection.Find
                .ClearFormatting
                .MatchWildcards = True
                .Text = mySrch & "[ ]{1,}[<A-Z.]"
                Do While .Execute
                    Selection.MoveStart Count:=Len(mySrch) + 2
                    Selection.MoveEnd Unit:=wdParagraph
                    myText = Trim(Selection.Text)
                    xlRow = xlRow + 1
                    xlWsh.Cells(xlRow, 1) = myFile
                    xlWsh.Cells(xlRow, xlCol) = myText
                    Selection.Collapse Direction:=wdCollapseEnd
                Loop
            End With
            Next xlCol
        Next mySrch
        doc.Close SaveChanges:=False
        myFile = Dir
        xlRow = xlWsh.Range("A" & Rows.Count).End(xlUp).Row
    Loop
    Application.ScreenUpdating = False
Hi snb,

I'm afraid I get a variable not defined here and I don't know what to do with the .csv part.

User avatar
HansV
Administrator
Posts: 78235
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Extract data after key word

Post by HansV »

Hi Gail,

To make the code run, add a line

Code: Select all

    Dim xlCol   As Long
near the beginning, and change xlUp to its value -4162

But I'm not sure it will do what you want. Could you explain in more detail what you want to accomplish? Thanks in advance.
Best wishes,
Hans

snb
4StarLounger
Posts: 547
Joined: 14 Nov 2012, 16:06

Re: Extract data after key word

Post by snb »

Comment out 'option explicit' and run the code.

If you open the csv-file in Excel you are ready.

gailb
3StarLounger
Posts: 254
Joined: 09 May 2020, 14:00

Re: Extract data after key word

Post by gailb »

Hi Hans,

I'm trying to get all Titles in column B, Definitions in column C, Source of Count in column D, and Method of Count in column E.

User avatar
HansV
Administrator
Posts: 78235
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Extract data after key word

Post by HansV »

Will those terms always occur in that order in the documents, i.e. a Title, then a Definition, then a Source of Count, next a Method of Count, and then perhaps the next Title, Defintion etc.?
Or can they be in a random order?
Best wishes,
Hans

gailb
3StarLounger
Posts: 254
Joined: 09 May 2020, 14:00

Re: Extract data after key word

Post by gailb »

Yes, they will always appear in the same order, one followed by the other.

User avatar
HansV
Administrator
Posts: 78235
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Extract data after key word

Post by HansV »

Here is a new version:

Code: Select all

Sub ExtractInfo()
    Dim myPath  As String
    Dim myFile  As String
    Dim myText  As String
    Dim xlRow   As Long
    Dim xlApp   As Object
    Dim xlWbk   As Object
    Dim xlWsh   As Object
    Dim doc     As Document

    On Error Resume Next
    Set xlApp = GetObject(Class:="Excel.Application")
    If xlApp Is Nothing Then
        Set xlApp = CreateObject(Class:="Excel.Application")
    End If
    On Error GoTo 0
    xlApp.Visible = True

    Set xlWbk = xlApp.Workbooks.Add(Template:=-4167)
    Set xlWsh = xlWbk.Worksheets(1)
    xlApp.ScreenUpdating = False
    xlWsh.Range("A1:E1").Value = Array("Standard", "Title", "Definition", "Source of Count", "Method of Count")
    xlRow = 1

    myPath = "C:\Users\admin\Library\"
    myFile = Dir(myPath & "*.docx")

    Application.ScreenUpdating = False
    Do While myFile <> ""
        Set doc = Documents.Open(myPath & myFile)
        Selection.HomeKey Unit:=wdStory
        With Selection.Find
           .ClearFormatting
           .MatchWildcards = True
           Do While .Execute(FindText:="Title.[ ]{1,}[<A-Z.]")
                Selection.MoveStart Count:=Len("Title.") + 1
                Selection.MoveEnd Unit:=wdParagraph
                myText = Trim(Selection.Text)
                xlRow = xlRow + 1
                xlWsh.Cells(xlRow, 1) = myFile
                xlWsh.Cells(xlRow, 2) = myText
                Selection.Collapse Direction:=wdCollapseEnd
                .Execute FindText:="Definition.[ ]{1,}[<A-Z.]"
                Selection.MoveStart Count:=Len("Definition.") + 1
                Selection.MoveEnd Unit:=wdParagraph
                myText = Trim(Selection.Text)
                xlWsh.Cells(xlRow, 3) = myText
                Selection.Collapse Direction:=wdCollapseEnd
                .Execute FindText:="Source of Count.[ ]{1,}[<A-Z.]"
                Selection.MoveStart Count:=Len("Source of Count.") + 1
                Selection.MoveEnd Unit:=wdParagraph
                myText = Trim(Selection.Text)
                xlWsh.Cells(xlRow, 4) = myText
                Selection.Collapse Direction:=wdCollapseEnd
                .Execute FindText:="Method of Count.[ ]{1,}[<A-Z.]"
                Selection.MoveStart Count:=Len("Method of Count.") + 1
                Selection.MoveEnd Unit:=wdParagraph
                myText = Trim(Selection.Text)
                xlWsh.Cells(xlRow, 5) = myText
                Selection.Collapse Direction:=wdCollapseEnd
           Loop
        End With
        doc.Close SaveChanges:=False
        myFile = Dir
    Loop
    Application.ScreenUpdating = False

    xlWsh.Range("A1:E1").EntireColumn.AutoFit
    xlApp.ScreenUpdating = True
End Sub
Best wishes,
Hans

snb
4StarLounger
Posts: 547
Joined: 14 Nov 2012, 16:06

Re: Extract data after key word

Post by snb »

Avoid, like in Excel, the UI methods 'select', 'selection', etc.
Use the Word Objects in the VBA library:

Code: Select all

Sub M_snb()
  c00 = "C:\Users\admin\Library\"
  c01 = Dir(c00 & "*.docx")

  Do While c01 <> ""
    With GetObject(c00 & c01)
      c02=.content
      Close 0
    End With

    for each it in Array("Standard", "Title", "Definition", "Source of Count", "Method of Count")
      c03 = c03 & vbCrLf & c00 & c01 & "," & Join(Filter(Split(.Content, vbCr), it), vbCrLf & c00 & c01 & ",")
    next
    c01 = Dir
  Loop
    
  CreateObject("scripting.filesystemobject").createtextfile("C:\snb_001.csv").write join(filter(split(c03,vbcr),","),vbcr)
   
  createObject(""Excel.Application").workbooks.open("C:\snb_001.csv").application.Visible = True
End Sub

gailb
3StarLounger
Posts: 254
Joined: 09 May 2020, 14:00

Re: Extract data after key word

Post by gailb »

Thanks Hans. This last edit does it fine.

snb, sorry, this still does not work for me, but that is most likely due to my overall knowledge level.

snb
4StarLounger
Posts: 547
Joined: 14 Nov 2012, 16:06

Re: Extract data after key word

Post by snb »

Please be more specific: what 'doesn't work ' ?