Regex for newline in XML

2.3k Views Asked by cYn At 30 July 2014 at 16:01

I'm trying desperately hard to figure this out but with no luck. I'm trying to parse this XML data in Postgres:

<map>
  <entry>
    <string>id</string>
    <string>555</string>
  </entry>
  <entry>
    <string>label</string>
    <string>Need This Value</string>
  </entry>
  <entry>
    <string>key</string>
    <string>748</string>
  </entry>
</map>

I'm trying to get the value in the string element right after <string>label</string>. Note that the Postgres version I'm working does not have the XML (libxml) function installed.

I have tried many variations of:

substring(xmlStringData from E'<string>label</string>\\n<string>(.*?)</string>')

but with no luck.

Original Q&A

There are 3 best solutions below

cYn On 30 July 2014 at 16:14 BEST ANSWER

So I seem to got it figured out. I just needed to account for the spaces after the newline. The solution was:

substring(event_data from E'<string>label</string>\\n\\s*?<string>(.*?)</string>')

Federico Piazza On 30 July 2014 at 16:14

If your <entry> list is not variable. You can use the following regex and access to the capturing group in the 4th match to get the content.

<string>(.*?)<\/string>

Working demo

On the other hand, If you want to access at the first match, you can use the following regex:

<string>id<\/string>|<string>\d+<\/string>|<string>label<\/string>|<string>(.*?)<\/string>

Working demo

Erwin Brandstetter On 30 July 2014 at 16:44

xpath() would be the right tool here. Because, you know ...

RegEx match open tags except XHTML self-contained tags

While stuck with your unfortunate situation, this would do the trick:

WITH t(x) AS (SELECT '<map>
  <entry>
    <string>id</string>
    <string>555</string>
  </entry>
  <entry>
    <string>label</string>
    <string>Need This Value</string>
  </entry>
  <entry>
    <string>key</string>
    <string>748</string>
  </entry>
</map>'::text
)
SELECT substring(x, '<string>label</string>[\s]*?<string>(.*?)</string>')
FROM  t

Returns:

substring
---------------
Need This Value

regexp explained:

<string>label</string> .. finds the position
[\s].. whitespace (including \n and \r)
*? .. do this "non-greedy", so ignore whitespace up until ...
<string>.. the next string element
(.*?) .. capturing parentheses, any characters, non-greedy
</string> .. up to the next appearance of the end tag

This is halfway reliable, unless you throw in unconventional XML formatting - which is why you should use an XML parser to begin with ...

Regex for newline in XML

There are 3 best solutions below

Related Questions in XML

Related Questions in REGEX

Related Questions in POSTGRESQL

Related Questions in POSTGRESQL-9.0

Trending Questions

Popular # Hahtags

Popular Questions