How to use XPath in JavaScript to select from namespaced OOXML?

1.5k Views Asked by At

I am building a tool to get data from a user specified XML file with an XPath expression. Simple XML files and an XML file with a single namespace works fine but I do not get the following XML file to work where the same prefix is overridden at different hierarchal levels:

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">
 <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
  <Author>Microsoft Office User</Author>
  <LastAuthor>Microsoft Office User</LastAuthor>
  <Created>2019-12-03T15:40:13Z</Created>
  <Version>16.00</Version>
 </DocumentProperties>
 <OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office">
  <AllowPNG/>
 </OfficeDocumentSettings>
 <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
  <WindowHeight>13820</WindowHeight>
  <WindowWidth>23740</WindowWidth>
  <WindowTopX>1520</WindowTopX>
  <WindowTopY>1620</WindowTopY>
  <ProtectStructure>False</ProtectStructure>
  <ProtectWindows>False</ProtectWindows>
 </ExcelWorkbook>
 <Styles>
  <Style ss:ID="Default" ss:Name="Normal">
   <Alignment ss:Vertical="Bottom"/>
   <Borders/>
   <Font ss:FontName="Calibri" x:Family="Swiss" ss:Size="12" ss:Color="#000000"/>
   <Interior/>
   <NumberFormat/>
   <Protection/>
  </Style>
 </Styles>
 <Worksheet ss:Name="Blad1">
  <Table ss:ExpandedColumnCount="2" ss:ExpandedRowCount="8" x:FullColumns="1"
   x:FullRows="1" ss:DefaultColumnWidth="65" ss:DefaultRowHeight="16">
   <Row>
    <Cell><Data ss:Type="String">dfgdfgdfg</Data></Cell>
    <Cell><Data ss:Type="Number">1150</Data></Cell>
   </Row>
   <Row>
    <Cell><Data ss:Type="String">werwerwe</Data></Cell>
    <Cell><Data ss:Type="Number">889</Data></Cell>
   </Row>
  </Table>
  <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
   <PageSetup>
    <Header x:Margin="0.3"/>
    <Footer x:Margin="0.3"/>
    <PageMargins x:Bottom="0.75" x:Left="0.7" x:Right="0.7" x:Top="0.75"/>
   </PageSetup>
   <Selected/>
   <Panes>
    <Pane>
     <Number>3</Number>
     <RangeSelection>R1C1:R8C2</RangeSelection>
    </Pane>
   </Panes>
   <ProtectObjects>False</ProtectObjects>
   <ProtectScenarios>False</ProtectScenarios>
  </WorksheetOptions>
 </Worksheet>
</Workbook>

The xpath I am testing with is

/Workbook[@xmlns="urn:schemas-microsoft-com:office:spreadsheet"]/Worksheet[@ss:Name="Blad1"]/Table[@ss:ExpandedColumnCount="2"]/Row[1]/Cell[2]/Data[@ss:Type="Number"]/text()

and this is the code for parsing and searching for the node:

// downloadedData is a string with the xml, xPath is the string with the xpath

var parser, xmlDoc;

parser = new DOMParser();
xmlDoc = parser.parseFromString(downloadedData, "text/xml");

var xmlEvaluator = new XPathEvaluator();
var xmlResolver = xmlEvaluator.createNSResolver(xmlDoc);
var node = xmlEvaluator.evaluate(xPath, xmlDoc, xmlResolver, XPathResult.FIRST_ORDERED_NODE_TYPE, null);

/// node always empty here

edit: removed typo in code (unrelated to the real issue)

2

There are 2 best solutions below

1
On BEST ANSWER

@MichaelKay is right in pointing out that namespaces are not attributes. Here are some supporting details for making the necessary adjustments so that your XPath will work...

  1. Fix an unrelated bug. Change

    xmlDoc = parser.parseFromString(downloadedData, ''), "text/xml");
    

    which is syntactically incorrect to

    xmlDoc = parser.parseFromString(downloadedData, 'text/xml');
    
  2. Observe that your XML already declares namespace prefixes for all needed components, so you won't have to create a custom nsResolver(); calling xmlEvaluator.createNSResolver() as you're doing will suffice.

  3. Fix your XPath to use namespace prefixes properly:

    xPath = '/ss:Workbook/ss:Worksheet[@ss:Name="Blad1"]/ss:Table[@ss:ExpandedColumnCount="2"]/ss:Row[1]/ss:Cell[2]/ss:Data[@ss:Type="Number"]/text()';
    

Your code will now function properly, allowing XPath selection against your OOXML.

See also How does XPath deal with XML namespaces?

0
On

Namespaces in the XPath data model are not attributes, so this is never going to work:

Workbook[@xmlns="urn:schemas-microsoft-com:office:spreadsheet"]

Instead, you need to bind namespace prefixes and use them in your element names:

x:Workbook

where namespace prefix x is bound to the URI "urn:schemas-microsoft-com:office:spreadsheet". The binding is established using a namespaceResolver as explained here:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Introduction_to_using_XPath_in_JavaScript