VBA Excel get text inside HTMLObject

4.2k Views Asked by At

I know this is really easy for some of you out there. But I have been going deep on the internet and I can not find an answer. I need to get the company name that is inside the

tbody tr td a eBay-tradera.com

and

td class="bS aR" 970,80

/td /tr /tbody

<tbody id="matrix1_group0">
<tr class="oR" onmouseover="onMouseOver(this, false)" onmouseout="onMouseOut(this, false)" onclick="onClick(this, false)">
<td class="bS">&nbsp;</td>
<td>
<a href="aProgramInfoApplyRead.action?programId=175&amp;affiliateId=2014848" title="http://www.tradera.com/" target="_blank">

   eBay-Tradera.com

</a>
</td>
<td class="aR">
<a href="/pan/aProgramInfoApplyRead.action?programId=175&amp;affiliateId=2014848" title="/pan/aProgramInfoApplyRead.action?programId=175&amp;affiliateId=2014848">175</a></td>
<td class="bS aR">0</td><td class="bS aR">0</td><td class="bS aR">187</td>
<td class="aR">0,00%</td><td class="bS aR">124</td>
<td class="aR">0,00%</td>
<td class="bS aR">26</td>
<td class="aR">20,97%</td>
<td class="bS aR">32</td>
<td class="aR">60,80</td>
<td class="aR">25,81%</td>
<td class="bS aR">5&nbsp;102,00</td>
<td class="bS aR">0,00</td>
<td class="aR">0,00</td>
<td class="bS aR">

   970,80

</td>
</tr>
</tbody>

This is my code, where I only try to get the a tag to start of with but I cant get that to work either

Set TDelements = document.getElementById("matrix1_group0").document.getElementsbytagname("a").innerHTML
r = 0
C = 0

For Each TDelement In TDelements
Blad1.Range("A1").Offset(r, C).Value = TDelement.innerText
r = r + 1
Next

Thanks on beforehand I know that this might be to simple. But I hope that other people might have the same issue and this will be helpful for them as well. The reason for the "r = r + 1" is because there are many more companies on this list. I just wanted to make it as easy as I could. Thanks again!

2

There are 2 best solutions below

4
On

You will need to specify the element location in the table. Ebay seems to be obfuscating the class-names so we cannot rely on those being consistent. Nor would I usually rely on the elements by their table index being consistent but I don't see any way around this.

I am assuming that this is the HTML document you are searching

<tbody id="matrix1_group0">
  <tr class="oR" onmouseover="onMouseOver(this, false)" onmouseout="onMouseOut(this, false)" onclick="onClick(this, false)">
    <td class="bS">&nbsp;</td>
    <td>
      <a href="aProgramInfoApplyRead.action?programId=175&amp;affiliateId=2014848" title="http://www.tradera.com/" target="_blank">
        eBay-Tradera.com  <!-- <=== You want this? -->
      </a>
    </td>
    <!-- ... -->
  </tr>
  <!-- ... -->
</tbody>

We can ignore the rest of the document as the table element has an ID. In short, we assume that

.getElementById("matrix1_group0").getElementsByTagName("TR")

will return a collection of html row objects sorted by their appearance.

Set matrix = document.getElementById("matrix1_group0")
Set firstRow = matrix.getElementsByTagName("TR")(1)
Set firstRowSecondCell = firstRow.getElementsByTagName("TD")(2)
traderaName = firstRowSecondCell.innerText

Of course you could inline this all as

document.getElementById("matrix1_group0").getElementsByTagName("TR")(1).getElementsByTagName("TD")(2).innerText

but that would make debugging harder. Also if the web-page is ever presented to you in a different format then this won't work. Ebay is deliberately making it hard for you to scrape data off of it for security.

1
On

With only the HTML you have shown you can use CSS selectors to obtain these:

a[href*='aProgramInfoApplyRead.action?programId']

Which says a tag with attribute href that contains the string 'aProgramInfoApplyRead.action?programId'. This matches two elements but the first is the one you want.

CSS Selector:

CSS selector


VBA:

You can use .querySelector method of .document to retrieve the first match

Debug.Print ie.document.querySelector("a[href*='aProgramInfoApplyRead.action?programId']").innerText