Back on May 19th 2021, I wrote this Q&A regarding recent (Apr-May-21) suspected changes to an interface in relation to mshtml.dll and late bound referencing. This is a part 2, if you will.
Previously, in questions such as this and this, I have remarked upon the lack of support for various CSS selectors with mshtml.dll, in particular regarding pseudo-classes. In the aforementioned questions, I highlighted that nth-child() and nth-of-type() were not implemented with respect to MSHTML.
Typically, as demonstrated here, not supported selector syntax can result in:
Run-time error '-2140143604 (8070000c)': Could not complete the operation due to error 8070000c.
I expect some things to break as various versions/platforms are no longer supported in relation to Internet Explorer (IE) (which MSHTML is related to - see my this. What I did not expect
to find was a recent improvement in supported CSS selectors. Take the following example:
Option Explicit
''Required references:
'' Microsoft HTML Object Library
Public Sub CssTest()
Const URL = "https://books.toscrape.com/"
Dim html As MSHTML.HTMLDocument
Set html = New MSHTML.HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
html.body.innerHTML = .responseText
End With
Debug.Print html.querySelector("meta:nth-of-type(2)").outerHTML
End Sub
Prior to Apr-May'21, this would have errored out due to the use of non-implemented syntax.
Now, on my set-up, where I saw an update to mshtml.dll during early May (latest), I get the same result as had I run this via an automated Internet Explorer instance, where it was already supported:
<meta name="created" content="24th Jun 2016 09:29">
So, what are the currently supported CSS selectors available to VBA?
I have covered the 'why do we care?' in the previous Q&A so won't repeat here. I will however, re-state my set-up:
My set-up:
OS Name Microsoft Windows 10 Pro
Version 10.0.19042 Build 19042
System Type x64-based PC
Microsoft® Excel® 2019 MSO (16.0.13929.20206) 32-bit (Microsoft Office Professional Plus)
Version 2104 Build 13929.20373
mshtml.dll file 11.00.19041.985
ieframe.dll file 11.0.19041.964
Feedback:
As with the prior Q&A, any feedback on set-ups which do/do not see these changes I would appreciate. I will add feedback to this for others to be able to reference.
tl;dr;
There is much greater support for css selectors and for
Element.querySelector(allowing for greater flexibility in chainingquerySelector(All)calls. This enormously enhances the expressivity of theMSHTMLclass, in terms of CSS selectors, and brings it on par withSelenium Basic.Motivation:
I have been wanting to write a list of supported selectors for some time, due to the lack of documentation on this in relation to VBA, and the trial and error nature of learning what does and doesn't work. This latest change has spurred me to do so, and include those libraries which currently support use of CSS selectors within them.
CAVEATS:
Before and After:
Traditionally, the expressivity of CSS selectors within VBA was as follows, with respect to the libraries supporting them:
Selenium implementing, by far, the most CSS selectors.
Current state:
The current state of implemented selectors I believe to be as follows (sorry for image quality, even when you click to enlarge table - please see JSFiddle for clearest table view):
I include this as a simplified HTML insert as well, so you can click on hyperlinks. Please click the Run code snippet below the code insert, then the Full page link. Apologies, the table is large and I haven't even covered all conceivable selectors - only the main ones I consider likely to be frequently of use. Inserting a fancy table threw me over the body character limit so here we are. For a fancy table please see this JSFiddle - the newly supported are shaded.
12 newly supported pseudo-classes and an expanded Element.querySelector:
If you run the above snippet, and view full page, you will see there are now, at least, 12 newly supported pseudo-classes supported, as well as mention of expanded Element.querySelector. Bam, kapow, ker-sploosh, shut the proverbial front door ... welcome to VBA CSS Canaan, Scraper's Shangri-la, Nerd Nirvana!
I think there may also have been interesting updates to
ieframe.dll; the focus here is on recentmshtml.dllchanges. You may wish to review the IE support under the Lifecyle announcements here and here, or search forLifecycle FAQ - Internet Explorer and Microsoft Edge.As the benefit of the expanded
Element.querySelector()was not covered in the last Q&A, I will briefly mention it here. By expanded, I mean an increased number of elements which you can callquerySelectoron, such that you can chain.querySelector()i.e.querySelector(..).querySelector(..)and.querySelector(..).querySelectorAll(..).Previously, this was largely not possible. As exemplified by this question. Typically, the workaround was to chain traditional methods onto the returned node e.g.
html.querySelector("body").getElementsByTagName("li"); this led to unsightly chaining and hard to follow, as well as limited, paths to target elements. Better, IMHO, was the idea of a surrogateMSHTML.HTMLDocumentvariable, which would carry theinnerHTMLof the current node returned byquerySelector, and thus allow you to callquerySelector(All)again; and thereby gain access to much faster matching, clearer syntax and greater versatility. Numerous examples of that approach here.End Notes:
This is a document under revision. All feedback on improvements welcomed.
Thanks:
Finally, a big thanks to @SIM for running a test script of mine to examine this on a different set-up.