I want to use PEP 634 – Structural Pattern Matching to match an HtmlElement that has a particular attribute. The attributes are accessible through an .attrib attribute that returns an instance of the _Attrib class, and IIUC it has all methods for it to be a collections.abc.Mapping.
The PEP says this:
For a mapping pattern to succeed the subject must be a mapping, where being a mapping is defined as its class being one of the following:
- a class that inherits from
collections.abc.Mapping- a Python class that has been registered as a
collections.abc.Mapping- ...
Here's what I'm trying to do, but it doesn't print the href:
from collections.abc import Mapping
from lxml.html import HtmlElement, fromstring
el = fromstring('<a href="https://stackoverflow.com/">StackOverflow</a>')
Mapping.register(type(el.attrib)) # lxml.etree._Attrib
assert(isinstance(el.attrib, Mapping)) # It's True even before registering _Attrib.
match el:
case HtmlElement(tag='a', attrib={'href': href}):
print(href)
This matches and prints attrib:
match el:
case HtmlElement(tag='a', attrib=Mapping() as attrib):
print(attrib)
This does not match, as expected:
match el:
case HtmlElement(tag='a', attrib=list() as attrib):
print(attrib)
I also tried this and it works:
class Upperer:
def __getitem__(self, key): return key.upper()
def __len__(self): return 1
def get(self, key, default): return self[key]
Mapping.register(Upperer) # It doesn't work without this line.
match Upperer():
case {'href': href}:
print(href) # Prints "HREF"
I understand using XPath/CSS selectors would be easier, but at this point I just want to know what is the problem with the _Attrib class and my code.
Also, I don't want to unpack the element and convert the _Attrib instance to dict as follows:
match el.tag, dict(el.attrib):
case 'a', {'href': href}:
print(href)
or use guards:
match el:
case HtmlElement(tag='a', attrib=attrs) if 'href' in attrs:
print(attrs['href'])
It works but it doesn't look right. I'd like to find a solution so the original case HtmlElement(tag='a', attrib={'href': href}) works. Or something that's very close to it.
Python version I'm using is 3.11.4.
There seems to be a problem with Python's use of
match caseto compare two objects for equality, since pattern matching with match case statements is typically used to match different values, not to compare objects for equality. In Python, the==operator is often used to compare objects for equality. If you want to compare two objects for equality, you should use==instead ofmatch case.Write a class for comparison:
Use if statement to determine if they are equal