Regular expression to delete XML element names

712 Views Asked by At

I have a situation. In order to develop one quite complex XML, I have used "place-holders". Once my XML is ready, I need to delete those 'place-holders'.

Sample Input

<consumers>
  <place-holder_1>
    <consumer>
      <val>1</val>
    </consumer>
  </place-holder_1>
  <place-holder_2>
    <consumer-info>
      <val>2</val>
    </consumer-info>
  </place-holder_2>
</consumers>

Sample Output

<consumers>
  <consumer>
    <val>1</val>
  </consumer>
  <consumer-info>
    <val>2</val>
  </consumer-info>
</consumers>

Basically, I am looking for a regex which can delete all tags containing anything with "place-holder" in a generic way. Any number between 1 to 10 can be suffix of 'place-holder' tag.

I am struggling to come up with regex for this.

3

There are 3 best solutions below

4
On BEST ANSWER

The following regex captures the desired nodes

^\s*<\/?place-holder_\d{1,2}>

Once captured, you can replace the first capturing group with empty string.

0
On

or you can use :

(?=(<\/?place-holder_(10|\d)>))

you can test it here!

0
On

You should be able to use XSLT in kettle.

XSLT 1.0

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <!--Identity Transform (https://www.w3.org/TR/xslt#copying)-->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="*[starts-with(local-name(),'place-holder')]">
    <xsl:apply-templates/>
  </xsl:template>

</xsl:stylesheet>

Working Example