html-agility-pack extract a background image

1.3k Views Asked by At

How do I extract the url from the following HTML.

i.e.. extract:

http://media.somesite.com.au/img-101x76.jpg

from:

<div class="media-img">
    <div class=" searched-img" style="background-image: url(http://media.somesite.com.au/img-101x76.jpg);"></div>
</div>
1

There are 1 best solutions below

1
On BEST ANSWER

In XPath 1.0 in general, you can use combination of substring-after() and substring-before() functions to extract part of a text. But HAP's SelectNodes() and SelectSingleNode() can't return other than node(s), so those XPath functions won't help.

One possible approach is to get the entire value of style attribute using XPath & HAP, then process the value further from .NET, using regex for example :

var html = @"<div class='media-img'>
    <div class=' searched-img' style='background-image: url(http://media.somesite.com.au/img-101x76.jpg);'></div>
</div>";
var doc = new HtmlDocument();
doc.LoadHtml(html);
var div = doc.DocumentNode.SelectSingleNode("//div[contains(@class,'searched-img')]");
var url = Regex.Match(div.GetAttributeValue("style", ""), @"(?<=url\()(.*)(?=\))").Groups[1].Value;
Console.WriteLine(url);

.NET Fiddle Demo

output :

http://media.somesite.com.au/img-101x76.jpg