Excel Web Query vs MSXML2.XMLHTTP60

565 Views Asked by At

I am trying to get data from the web. I tried two avenues in Excel VBA. Both work but yield different results and there is also a noticeable speed difference.

The first method is using Excel Web Query tool.

VBA code I obtained by recording a macro.

ActiveWorkbook.Queries.Add name:="Table 0", Formula:= _
  "let" & Chr(13) & "" & Chr(10) & "    Source = Web.Page(Web.Contents(""" & webSite & """))," & Chr(13) & "" & Chr(10) & "    Data0 = Source{0}[Data]" & Chr(13) & "" & Chr(10) & "in" & Chr(13) & "" & Chr(10) & "    Data0"
    With ActiveSheet.ListObjects.Add(SourceType:=0, Source:="OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=""Table 0"";Extended Properties=""""" _
      , Destination:=Range("$A$1")).QueryTable
    .CommandType = xlCmdSql
    .CommandText = Array("SELECT * FROM [Table 0]")
    .RowNumbers = False
    .FillAdjacentFormulas = False
    .PreserveFormatting = False
    .RefreshOnFileOpen = False
    .BackgroundQuery = False
    .RefreshStyle = xlInsertDeleteCells
    .SavePassword = False
    .SaveData = False
    .AdjustColumnWidth = True
    .RefreshPeriod = 0
    .PreserveColumnInfo = True
    .ListObject.DisplayName = "Table_0"
    .Refresh BackgroundQuery:=False
  End With

The second method is using MSXML2.XMLHTTP60. This method is extremely fast but the table hasn't any headers, which I need.

I noticed the code of the Web Query, to retrieve the headers

Data0 = Source{0}[Data]

is not working using XMLHTTP60.

Is there a way to retrieve the column headers by using MSXML2.XMLHTTP60 by adding or modifying some of code below?

Sub test()
    
    Dim XMLRequest As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument
    Dim HTMLDiv As MSHTML.IHTMLElement
    Dim HTMLTable As MSHTML.IHTMLElement
    
    Dim HTMLDiv2 As MSHTML.IHTMLElement
    Dim HTMLTable2 As MSHTML.IHTMLElement
       
    XMLRequest.Open "GET", "webSite", False
    XMLRequest.send
        
    If XMLRequest.Status <> 200 Then
        MsgBox XMLRequest.Status & "-" & XMLRequest.statusText
        Exit Sub
    End If
        
    HTMLDoc.body.innerHTML = XMLRequest.responseText
        
    Set HTMLDiv = HTMLDoc.getElementById("__next")
    Set HTMLTable = HTMLDiv.getElementsByTagName("table")(0)
        
    Set HTMLDiv = HTMLDoc.getElementsByClassName("tables-container")
    Set HTMLTable2 = HTMLDiv.getElementsByClassName("tables-container")(0).getElementsByClassName("time")
        
    Debug.Print HTMLDiv.className
    
    Call subWriteTableToWs(HTMLTable)
        
End Sub
1

There are 1 best solutions below

1
On

I've had the same problem. The only solution I could find was to save the web page as a single file or save the HTML page and then have a cursor reading the page.

This might work if you do it for one page, but done in a loop for several pages then I am not sure it would a feasible solution.

I would really be happy to read about a solution based on XMLHTTP60

You can also find a similar issue in this post where the headers could have been scraped using XMLHTTP60 but not the table

Excel VBA IE verses XMLHTTP discrepencies