I am writing a PowerShell script to work in Windows 10. I am using the 'HTML Agility Pack' library version 1.11.43.
In this library, there is a GetAttributeValue method for HTML element nodes in four versions:
public string GetAttributeValue(string name, string def)public int GetAttributeValue(string name, int def)public bool GetAttributeValue(string name, bool def)public T GetAttributeValue<T>(string name, T def)
I have written a test script for these methods on PowerShell:
$libPath = "HtmlAgilityPack.1.11.43\lib\netstandard2.0\HtmlAgilityPack.dll"
Add-Type -Path $libPath
$dom = New-Object -TypeName "HtmlAgilityPack.HtmlDocument"
$dom.Load("test.html", [System.Text.Encoding]::UTF8)
foreach ($node in $dom.DocumentNode.DescendantNodes()) {
if ("#text" -ne $node.Name) {
$node.OuterHTML
" " + $node.GetAttributeValue("class", "")
" " + $node.GetAttributeValue("class", 0)
" " + $node.GetAttributeValue("class", $true)
" " + $node.GetAttributeValue("class", $false)
" " + $node.GetAttributeValue("class", $null)
}
}
File 'test.html':
<p class="true"></p>
<p class="false"></p>
<p></p>
<p class="any other text"></p>
Test script execution result:
<p class="true"></p>
true
0
True
True
True
<p class="false"></p>
false
0
False
False
False
<p></p>
0
True
False
False
<p class="any other text"></p>
any other text
0
True
False
False
I know that to get the attribute value of an HTML element, you can also use the expression $node.Attributes["class"]. I also understand what polymorphism and method overloading are. I also know what a generic method is. I don't need to explain that.
I have three questions:
When called
$node.GetAttributeValue("class", $null)from a PowerShell script, which of the four variants of theGetAttributeValuemethod works?I think the fourth option works (generic method). Then why does a call with the second parameter
$nullwork exactly the same as a call with the second parameter$false?In the C# source code, the fourth option requires the following condition to work
#if !(METRO || NETSTANDARD1_3 || NETSTANDARD1_6)
I tried the library versions for NETSTANDARD1_6 and for NETSTANDARD2_0. The test script works the same way. But with NETSTANDARD1_6 the fourth option should be unavailable, right? Then when NETSTANDARD1_6 then which version of the method GetAttributeValue works with the second parameter $null?
tl;dr
To achieve what you unsuccessfully attempted with
$node.GetAttributeValue("class", $null), i.e., to return the attribute value as a[string]and default to$nullif there is none, use:[string] $nullworks too, but makes""(the empty string) rather than$nullthe default value.While the overload resolution that you're seeing is surprising, you can resolve ambiguity during PowerShell's method overload resolution with casts:
Output:
Alternatively, in PowerShell (Core) 7.3+[1], you can now call generic methods with explicit type arguments:
Note:
When you pass
$nullto a[string]typed parameter (both in cmdlets and .NET methods), PowerShell actually converts it quietly to""(the empty string).[NullString]::Valuetell's PowerShell to pass a truenullinstead, and is mostly needed for calling .NET methods where a behavioral distinction can result from passingnullvs."".Therefore, if you were to call
$nodes[3].GetAttributeValue('class', [string] $null)or, in PS 7.3+,$nodes[3].GetAttributeValue[string]('class', $null), you'd get""(empty string) as the default value if attributeclassdoesn't exist.By contrast,
[NullString]::Value, as used in the commands above, causes a true$nullvalue to be returned if the attribute doesn't exist; you can test for that with$null -eq ....As for your questions:
On a general note, PowerShell's overload resolution is complex, and for the ultimate source of truth you'll have to consult the source code. The following is based on the de-facto behavior as of PowerShell 7.2.6 and musings about logic that could be applied.
In practice, the
public bool GetAttributeValue(string name, bool def)overload is chosen; why it, specifically, is chosen among the available overloads is ultimately immaterial, because the fundamental problem is that to PowerShell,$nullprovides insufficient information as to the type it may be a stand-in for, so it cannot generally be expected to select a specific overload (for the latter, you need a cast, as shown at the top):In C# passing
nullto the second parameter in a non-generic call unambiguously implies the overload with thestring-typeddefparameter, because among the non-generic overloads,stringas the type of thedefparameter is the only .NET reference type, and therefore the only type that can directly accept anullargument.This is not true in PowerShell, which has much more flexible, implicit type-conversion rules: from PowerShell's perspective,
$nullcan bind to any of the types among thedefparameters, because it allows$nullto be converted to those types; specifically,[bool] $nullyields$false,[int] $nullyields0, and - perhaps surprisingly, as discussed above -[string] $nullyields""(the empty string).However, curiously, even using
[NullString]::Valuedoesn't make a difference, even though PowerShell should know that this special value represents a$nullvalue for astringparameter - see GitHub issue #18072With the generic invocation syntax available in v7.3+, the generic overload definitely works - and a
$nullas the default-value argument is converted to the type specified as the type argument (assuming PowerShell allows such a conversion; it wouldn't work with[datetime], for instance, because[datetime] $nullcauses an error).Even with the non-generic syntax, PowerShell does select the generic overload by inference, as the following example shows, but only when you pass an actual object rather than
$null:When you pass
$null, the generic overload is not considered - and cannot be, in the absence of type information - so this doesn't make a difference.[1] As of this writing, v7.3 hasn't been released yet, but preview versions are available - see the repo.