XPath - Extract spectific file name from string

511 Views Asked by At

I'm trying to extract just the filename from a javascript link in import.io, eg googlebolver.htm from href="javascript:finpopup('googlebolver.htm',920,620,0)"

I've managed to get to the 'link' (javascript:finpopup('googlebolver.htm',920,620,0)) with the following XPath

//*[text()='GOOGLE.MAPS']/@href

but I would like to get to the actual address on its own. As I am running the import.io Extracto on multiple urls, I want it to find something like *.htm

I believe this maybe possible by using the substring function, but I don't know how to do it. The following questions of this site looked promising, but one only works for fixed length stings and the other I don't completely understand and works for only a specific 'word'

  1. Extract value from javascript object in site using xpath and import.io
  2. How to use substring() with Import.io?

Thanks in advance for your help

EDIT: Here is the URL

1

There are 1 best solutions below

0
legrass On

You can use the XPath functions substring-after and substring-before, to select the text after, say, (' and before ',

in your example, it would be

substring-before(substring-after(//*[text()='GOOGLE.MAPS']/@href,"('"),"',")

Note: I don't know if import.io supports these standard XPath function