How to delete duplicate lines from a block using sed

164 Views Asked by Swarnim Lakra At 27 October 2021 at 18:50

Suppose we have a block of lines as given example below:

<segment1>
    <element="1" prop="blah"/>
    <element="2" prop="blah"/>
    .
    .
</segment1>

<segment2>
    <element="1" prop="blah"/>
    <element="2" prop="blah"/>
    .
    .
    <element="1" prop="blah"/>
    <element="2" prop="blah"/>
</segment2>

<segment3>
    <element="1" prop="blah"/>
    <element="2" prop="blah"/>
    .
    .
</segment3>

Here for example segment 2 has duplicates which needs to be deleted(sorting doesn't matter here). So now how to bound sed to delete duplicated from segment 2 only. In this example segment 2 is the second segment which may not be the case for all possible cases which will be presented as it could be a subset of a subset too.

My thought on this is to use label, start at and end at with command gsed -ni 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'

Original Q&A

There are 1 best solutions below

potong On 27 October 2021 at 19:30

This might work for you (GNU sed):

sed -E '/<segment2>/,/<\/segment2>/{G;/^([^\n]*)(\n.*)*\n\1(\n|$)/!{P;h};d}' file

Use a range between <segment2> and </segment2>.

Append a copy of what has already been seen within the range to the current line and if not seen, print the current line and copy.

Otherwise, delete the line.

How to delete duplicate lines from a block using sed

There are 1 best solutions below

Related Questions in XML

Related Questions in BASH

Related Questions in SED

Related Questions in COMMAND-LINE

Related Questions in GNU-SED

Trending Questions

Popular # Hahtags

Popular Questions