Alternative to IExplorer in data mining in VBA

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15587
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Alternative to IExplorer in data mining in VBA

Post by ChrisGreaves »

See also this thread
For my sins I have taken on a little data-mining project. I did this successfully for Clients back in the day, and have since deleted the folder Greaves\Clients, but have found some personal code that uses VBA to fire up ie.exe and I can save a web page in an Excel 2003 worksheet in text form. Good enough for me. In VBA I can send the data to Word2003 and make use of my library of Word2003/VBA string utilities to grab data for my buddy.

But ... nowadays ie.exe pops up doo-dads that suggest I migrate to Edge, opens up new tabs and pop-up boxes etc.

Question: Does Chrome, Edge, or any modern browser apart from ie have a VBA interface that would let me pass a URL as a parameter and store the resultant web page on my hard drive (for later analysis?)

I am not looking for specific VBA code at this stage, just a top-level decision as to whether I should continue with ie or bite the bullet and use a different browser.
My browser-of-choice is FireFox, and while I haven't looked, I have not heard of a VBA interface for FireFox.

For the record, here is a simple VBA to fire up ie.exe and load a web page:-

Code: Select all

Function OpenIEDocument(ie As Object, strURL As String, lngLimitWait As Long)
    ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    ''' Function:   OpenIEDocument
    '''
    ''' Comments:   Given a URL, attempt to locate it on the internet using ie.
    '''
    ''' Arguments:  VARIANT             Instance of Internet Explorer
    '''             STRING              URL of a wanted page
    '''             LONG                Limit of "doevents" waits until we should conceded failure
    '''
    ''' Returns:    NONE
    '''
    ''' Comments:
    '''
    ''' Date        Developer           Action
    ''' --------------------------------------------------------------------------
    ''' 2010/02/27  Chris Greaves   Created
    '''
    Dim lngCount ' tally of loops made waiting for the page
    With ie
        .Visible = True
        On Error GoTo Failed
        .Navigate strURL
        On Error GoTo 0
        Do Until .ReadyState = 4
            lngCount = lngCount + 1 ' 5116:
            If lngCount > lngLimitWait Then ' 5116:
                Exit Do ' 5116:
            Else ' 5116:
                DoEvents
            End If ' 5116:
        Loop
    End With
    Exit Function
Failed:
    Dim lngError As Long
    lngError = Err.Number
    Dim strDescription As String
    strDescription = Err.Description
    Select Case lngError
    Case Else
        '            Stop
        Resume Next
    End Select
'Sub TESTOpenIEDocument()
'    Dim ie As Object
'    Set ie = CreateObject("InternetExplorer.Application")
'    Call OpenIEDocument(ie, "https://chrisgreaves.com", 1000)
'End Sub
End Function
I re-read the following EL forum pages before posting:-
https://eileenslounge.com/search.php?st ... t+explorer
https://eileenslounge.com/viewtopic.php ... 2A#p272990
https://eileenslounge.com/viewtopic.php ... 2A#p231289
https://eileenslounge.com/viewtopic.php ... A&start=40
https://eileenslounge.com/viewtopic.php ... %2A#p99520
https://eileenslounge.com/viewtopic.php ... %2A#p65368

Thanks
Chris
Last edited by ChrisGreaves on 16 Oct 2021, 12:54, edited 1 time in total.
There's nothing heavier than an empty water bottle

User avatar
HansV
Administrator
Posts: 78415
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Alternative to IExplorer in data mining in VBA

Post by HansV »

As far as I know, Microsoft hasn't released an Edge browser object for use in VBA etc. yet. They do have a Microsoft Edge (Chromium) WebView2 control, but that is for .NET.
You might try Selenium. That should keep you off the streets for a while... :evilgrin:
Best wishes,
Hans

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15587
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Alternative to IExplorer in data mining in VBA

Post by ChrisGreaves »

HansV wrote:
03 Oct 2021, 13:50
You might try Selenium. That should keep you off the streets for a while... :evilgrin:
Thanks Hans, on my way! I shall report back here with any working examples.
P.S. Being off the streets right now is my Problem, not my solution.
Cheers
Chris
There's nothing heavier than an empty water bottle

User avatar
SpeakEasy
4StarLounger
Posts: 544
Joined: 27 Jun 2021, 10:46

Re: Alternative to IExplorer in data mining in VBA

Post by SpeakEasy »

Or you could try the MSXML library ...

Code: Select all

Set xmlhttp = CreateObject("msxml2.serverxmlHTTP")
xmlhttp.Open "Get", "https://chrisgreaves.com", False
xmlhttp.send

' At this point xmlHTTP.ResponseText contains the html of your webpage
' so do what you like with it

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15587
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Alternative to IExplorer in data mining in VBA

Post by ChrisGreaves »

SpeakEasy wrote:
05 Oct 2021, 19:37
Or you could try the MSXML library ...
Thank you, SpeakEasy .
This method certainly has the quality of brevity; however I may have misunderstood your solution.

I installed the MSXML6.0 library and ran a small test:-

Code: Select all

Sub test()
    Dim xmlhttp
    Set xmlhttp = CreateObject("msxml2.serverxmlHTTP")
    Dim strURL As String
    strURL = "https://www.airbnb.ca/s/Bonavista--NL/homes?tab_id=home_tab&refinement_paths%5B%5D=%2Fhomes&flexible_trip_dates%5B%5D=november&flexible_trip_dates%5B%5D=october&flexible_trip_lengths%5B%5D=weekend_trip&date_picker_type=calendar&query=Bonavista%2C%20NL&place_id=ChIJ-cBbX_irdUsRf6S4lD9WxuQ&checkin=2021-11-29&checkout=2021-11-30&source=structured_search_input_header&search_type=autocomplete_click"
    xmlhttp.Open "Get", strURL, False
    xmlhttp.send
' At this point xmlHTTP.ResponseText contains the html of your webpage
' so do what you like with it
    Dim doc As Document
    Set doc = Documents.Add
    Selection.TypeText (xmlhttp.ResponseText)
End Sub
"xmlHTTP.ResponseText" does indeed contain the html of the web page, but as presented in the word document that does not give me the data i seek (in the example above, descriptions of twenty bed-and-breakfast properties in Bonavista).

I suspect I still need the browser in order for the browser to execute the html code and bring in the database from the web site (in this case AirBNB.ca) database.
Cheers
Chris
You do not have the required permissions to view the files attached to this post.
There's nothing heavier than an empty water bottle

User avatar
SpeakEasy
4StarLounger
Posts: 544
Joined: 27 Jun 2021, 10:46

Re: Alternative to IExplorer in data mining in VBA

Post by SpeakEasy »

Your problem is using

Code: Select all

Selection.TypeText (xmlhttp.ResponseText)
due to the under documented fact that it has a slightly weird 32k character limit. And your example page is much bigger than that
Try

Code: Select all

Selection.Text = xmlhttp.responseText
instead

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15587
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Alternative to IExplorer in data mining in VBA

Post by ChrisGreaves »

SpeakEasy wrote:
05 Oct 2021, 23:39
Try

Code: Select all

Selection.Text = xmlhttp.responseText
Hi SpeakEasy.
Same result:
Untitled.png
I think my problem is that the code "contains the html of your webpage", which is only partway to what I want.
It is, in effect, the machinery behind the web page presented to the user, whereas I want to web page as presented to the user.
Untitled2.png
Much of the data is there, but parsing the HTML to find 20 occurrences of the number of bedrooms is harder than parsing the one-paragraph texts of the same data (one needs only grab the paragraph of text, check that it is "bedrooms", and extract the leading digit-string.
Images, should i need them, are not present in the HTML but are brought into the web page by the code within the html.

I am no expert on HTMl, so I am thrown off my horse by all this gobbledy-gook, just as anyone else is who looks at, say, my Word2003/VBA code!

Cheers
Chris
You do not have the required permissions to view the files attached to this post.
There's nothing heavier than an empty water bottle

User avatar
SpeakEasy
4StarLounger
Posts: 544
Joined: 27 Jun 2021, 10:46

Re: Alternative to IExplorer in data mining in VBA

Post by SpeakEasy »

Code: Select all

    Set xmlhttp = CreateObject("msxml2.serverxmlHTTP")
    xmlhttp.Open "GET", "https://www.airbnb.ca/s/Bonavista--NL/homes?tab_id=home_tab&refinement_paths%5B%5D=%2Fhomes&flexible_trip_dates%5B%5D=november&flexible_trip_dates%5B%5D=october&flexible_trip_lengths%5B%5D=weekend_trip&date_picker_type=calendar&query=Bonavista%2C%20NL&place_id=ChIJ-cBbX_irdUsRf6S4lD9WxuQ&checkin=2021-11-29&checkout=2021-11-30&source=structured_search_input_header&search_type=autocomplete_click", False
    xmlhttp.send
    ' At this point xmlHTTP.ResponseText contains the html of your webpage
    ' so do what you like with it

    'Selection.Text = (xmlhttp.responseText)
    
    CreateObject("scripting.filesystemobject").CreateTextFile("d:\downloads\deleteme\sourcehtml.html", , True).Write xmlhttp.responseText
    Set doc = Documents.Add
    Selection.InsertFile "d:\downloads\deleteme\sourcehtml.html"

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15587
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Alternative to IExplorer in data mining in VBA

Post by ChrisGreaves »

SpeakEasy wrote:
06 Oct 2021, 13:44
CreateObject("scripting.filesystemobject").CreateTextFile("d:\downloads\deleteme\sourcehtml.html", , True).Write xmlhttp.responseText
Set doc = Documents.Add
Selection.InsertFile "d:\downloads\deleteme\sourcehtml.html"[
Speakeasy, I am trying to come up with a superlative to "superlatively superb".

Your solution is elegant on several points:
(a) It works
(b) It is simple, easy to understand
(c) It is browser-independent (which is, I think, the thrust of “Selenium”
(d) It can deliver to any sort of document (word processor, spreadsheet, text file, ...
(e) It will run in any MSOffice application (PowerPoint, Outlook, Word, Excel, ...)
I got the impression from reading a couple of web pages that MSxml version 6 SDK is on the way out, or not supported. For all that it seems like something one should have in one’s tool-chest.
Please: Can you recommend a beginner’s guide to MSxml2?
[later] https://analystcave.com/vba-xml-working-xml-files/ and its section on “

Code: Select all

Creating documents is also quite straight forward in VBA
” looks interesting.
Thanks
Chris Greaves
There's nothing heavier than an empty water bottle

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15587
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Alternative to IExplorer in data mining in VBA

Post by ChrisGreaves »

ChrisGreaves wrote:
03 Oct 2021, 13:43
For my sins I have taken on a little data-mining project.
' https://www.mrexcel.com/board/threads/s ... a.1160977/

Code: Select all

Sub HeadlessSelenium_CD()
   
    Dim CD As Selenium.ChromeDriver
    Dim strHTML As String

    ' Instantiate Selenium through the ChromeDriver
    Set CD = New Selenium.ChromeDriver
   
    ' Run Selenium in Headless mode
    CD.AddArgument "--headless"
    CD.Start
   
    ' Navigate to the URL
    Dim strURL As String
    strURL = "http://www.ChrisGreaves.com"
    strURL = "http://www.LancasterInn.ca/"
    CD.Get strURL
    Debug.Print CD.ActiveElement.Text
    strHTML = CD.PageSource
    CD.Close
    Set CD = Nothing

End Sub
That feels better. This code was copied from a WORD2003\VBA module where it was pasted from an Excel2003\VBA module, so you can take your pick.
Set a reference to the Selenium Type Library:-
Untitled.png
I recommend the first of a zillion video tutorials at WiseOwl

Onwards and ever Upward (Or Upwards and ever Onward)

Along the way I found VBA code to institute a search in Yahoo, so in theory I have permutations of browsers (FireFox, Chrome, Edge, the unsupported Internet Explorer, etc) and search engines (Yahoo, Google, DuckDuckGo etc) to cope with differences between web sites (with bulk data worth mining) and individual sites (which can be quite cruddy at times).

The WiseOwl tutes are good. If ever I want to dissect a web page, or build a front-end in Excel that will grab data from a well-laid-out web site, or fill in data boxes on a web page and press the <Enter> key, WiseOwl will be there.

Cheers
Chris
You do not have the required permissions to view the files attached to this post.
There's nothing heavier than an empty water bottle

LisaGreen
5StarLounger
Posts: 964
Joined: 08 Nov 2012, 17:54

Re: Alternative to IExplorer in data mining in VBA

Post by LisaGreen »

ChrisGreaves wrote:
06 Oct 2021, 11:51
... my Word2003/VBA code!
I'm pretty sure there's nowt wrong with your word code!!
Lisa

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15587
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Alternative to IExplorer in data mining in VBA

Post by ChrisGreaves »

LisaGreen wrote:
16 Oct 2021, 17:51
I'm pretty sure there's nowt wrong with your word code!!
You must be new here ... :grin: :grin: :grin:
Cheers
Chris
There's nothing heavier than an empty water bottle

LisaGreen
5StarLounger
Posts: 964
Joined: 08 Nov 2012, 17:54

Re: Alternative to IExplorer in data mining in VBA

Post by LisaGreen »

About getting the web code and grabbing data.

I think it's a bit chicken and egg. It's good if you know what you're looking for. That's why ( I mention in a different thread ) I'm trying python out. In this instance it's specifically youtube playlists I want so I can use excel/word/access to build a list.

Lisa

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15587
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Alternative to IExplorer in data mining in VBA

Post by ChrisGreaves »

LisaGreen wrote:
16 Oct 2021, 18:33
It's good if you know what you're looking for.
Yes. Most of the tutorials focus on a particular web site with well-defined pages.
Hence the WiseOwl showed me exactly how to institute a search by prepping the search box in yahoo!
My needs are weirder. I want to issue a URL to an individual web site, and then download that unique page into a DOC so that I can parse the heck out of it.

My universe consists of pages with a common format, AND individually constructed web pages - for example corporate or personal web sites.

I was doing this with gay abandon fifteen or twenty years ago, but time have changed:-
Being told that InternetExplorer is no longer "good"
Captchas
Heavy formatting rather than text
Images and animations
Pop-ups and slide-ins telling me to use Firefox rather than my current browser (up to date Firefox!)
javascript in the web page so useful data is not there in the web page itself but is reconstituted by the browser on-the-fly.
A large part of the past week has been spent getting over those hurdles.
Cheers
Chris
There's nothing heavier than an empty water bottle

gnh1201
NewLounger
Posts: 1
Joined: 15 Apr 2022, 05:56

Re: Alternative to IExplorer in data mining in VBA

Post by gnh1201 »

You can control a Chromium-based browser (Chrome, Edge, etc) using MS JScript. It runs on the WSH, just like VBScript.

Example: crawltest.js

Code: Select all

// Based on: gnh1201/welsonjs (github)

var Chrome = require("lib/chrome");

function main() {
    var wbInstance = Chrome.create().setVendor("msedge").open("[protocol]://[domain]:[port]/");
    sleep(5000);
    console.log(wbInstance.getHTML("body"));
}

exports.main = main;
How to run:

Code: Select all

cscript app.js crawltest
VBScript can run this using the "WScript.Shell".
Last edited by gnh1201 on 15 Apr 2022, 06:22, edited 1 time in total.