Loop through line of code and change integer for getElementsByClassName

71 Views Asked by At

Previously posted on the MrExcel forum

www.mrexcel.com/board/threads/change-integer-in-code-line-for-htmldoc-getelementsbyclassname.1146814/

My original line of code was

Set DogRows1 = HTMLDoc.getElementsByClassName("rpb-greyhound rpb-greyhound-1 hover-opacity"

It works perfectly for the integer 1. However, I require to increment this by 1 and change to 2, 3, 4, 5 and 6 for other webpages, as below.

Set DogRows1 = HTMLDoc.getElementsByClassName("rpb-greyhound rpb-greyhound-6 hover-opacity"

I tried declaring some variables and adding a For Next Loop, however it will not loop through. What am I doing wrong? Have I put the For Next Loops in the wrong place?

Dim StartRaceNumber As Integer
Dim LastRaceNumber As Integer

XMLReq.Open "GET", DogPageURL, False
XMLReq.send

If XMLReq.Status <> 200 Then
    MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
    Exit Sub
End If

HTMLDoc.body.innerhtml = XMLReq.responseText
Set XMLReq = Nothing

LastRaceNumber = 6

For StartRaceNumber = 1 To LastRaceNumber
    Set DogRows1 = HTMLDoc.getElementsByClassName("rpb-greyhound rpb-greyhound-" & StartRaceNumber & " hover-opacity")
    For Each DogRow1 In DogRows1
        Set DogNameLink1 = DogRow1.getElementsByTagName("a")(0)
        NextHref = DogRow1.getAttribute("href")
        NextURL = DogURL & Mid(NextHref, InStr(NextHref, ":") + 28)
        Debug.Print DogRow1.innerText, NextURL
    Next DogRow1
Next StartRaceNumber
2

There are 2 best solutions below

0
On

Sure SIM

The scraping order is as follows:

Get Greyhound URL racecards Greyhound Races

Get Greyhound URL Dog information List of Greyhounds in the race

Get Greyhound Form details, this is an example for Greyhound#1 Form of Each Greyhound #1

Then loop to the next race and repeat.

As I said, from the code I can scrape only the form for greyhound#1 details for each race. I need to get the other dogs too if you can help?

These are my modules, hopefully they have imported correctly >

Option Explicit

Const DogURL As String = "https://www.timeform.com/greyhound-racing/racecards" Sub ListDogRace()

Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument

Dim TFRaceList As MSHTML.IHTMLElement
Dim TFRaces As MSHTML.IHTMLElementCollection
Dim TFRace As MSHTML.IHTMLElement

Dim NextHref As String
Dim NextURL As String

XMLReq.Open "GET", DogURL, False
XMLReq.send

If XMLReq.Status <> 200 Then
    MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
    Exit Sub
End If

HTMLDoc.body.innerhtml = XMLReq.responseText
Set XMLReq = Nothing

Set TFRaces = HTMLDoc.getElementsByClassName("wfr-race bg-light-gray hover-opacity")

For Each TFRace In TFRaces

    NextHref = TFRace.getAttribute("href")
    NextURL = DogURL & Mid(NextHref, InStr(NextHref, ":") + 28)    
    ListDogsOnPage TFRace.innerText, NextURL
Next TFRace

End Sub Sub ListDogsOnPage(DogName As String, DogPageURL As String)

Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument

Dim DogRow1 As MSHTML.IHTMLElement
Dim DogRows1 As MSHTML.IHTMLElementCollection

Dim DogNameLink1 As MSHTML.IHTMLElement

Dim NextHref As String
Dim NextURL As String

Dim StartRaceNumber As Integer
Dim LastRaceNumber As Integer

XMLReq.Open "GET", DogPageURL, False
XMLReq.send

If XMLReq.Status <> 200 Then
    MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
    Exit Sub
End If

HTMLDoc.body.innerhtml = XMLReq.responseText
Set XMLReq = Nothing

LastRaceNumber = 6

For StartRaceNumber = 1 To LastRaceNumber
    Set DogRows1 = HTMLDoc.getElementsByClassName("rpb-greyhound rpb-greyhound-" & StartRaceNumber & " hover-opacity"
For Each DogRow1 In DogRows1
        Set DogNameLink1 = DogRow1.getElementsByTagName("a")(0)
        NextHref = DogRow1.getAttribute("href")
        NextURL = DogURL & Mid(NextHref, InStr(NextHref, ":") + 28)
        Debug.Print DogRow1.innerText, NextURL
    Next DogRow1
Next StartRaceNumber

End Sub

0
On

Can I just confirm, it is only the URL on the page of each race for each greyhound I need, so I can scrape the greyhound's form.

As an example:

Nottingham 11.06

#1 BALLYBOUGH GARY https://www.timeform.com/greyhound-racing/greyhound-form/ballybough-gary/59297

#2 SALACRES BRUISER

https://www.timeform.com/greyhound-racing/greyhound-form/salacres-bruiser/59746

#3 FOLLOW MY LEAD

https://www.timeform.com/greyhound-racing/greyhound-form/follow-my-lead/54898

#4 HONOUR SAMURAI

https://www.timeform.com/greyhound-racing/greyhound-form/honour-samurai/53100

#5 NIDDERDALEFLURRY

https://www.timeform.com/greyhound-racing/greyhound-form/nidderdaleflurry/56446

#6 SPORTY MELODY

https://www.timeform.com/greyhound-racing/greyhound-form/sporty-melody/58746

I already have a Power Query function I have developed to scrape the form data from that url page. I am just struggling to get that full list of 6x greyhound form urls (as above) for each and every race.

If that makes sense?