. . <" /> . . <" /> . . <"/>

How to delete duplicate lines from a block using sed

164 Views Asked by At

Suppose we have a block of lines as given example below:

<segment1>
    <element="1" prop="blah"/>
    <element="2" prop="blah"/>
    .
    .
</segment1>

<segment2>
    <element="1" prop="blah"/>
    <element="2" prop="blah"/>
    .
    .
    <element="1" prop="blah"/>
    <element="2" prop="blah"/>
</segment2>

<segment3>
    <element="1" prop="blah"/>
    <element="2" prop="blah"/>
    .
    .
</segment3>

Here for example segment 2 has duplicates which needs to be deleted(sorting doesn't matter here). So now how to bound sed to delete duplicated from segment 2 only. In this example segment 2 is the second segment which may not be the case for all possible cases which will be presented as it could be a subset of a subset too.

My thought on this is to use label, start at and end at with command gsed -ni 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'

1

There are 1 best solutions below

1
potong On

This might work for you (GNU sed):

sed -E '/<segment2>/,/<\/segment2>/{G;/^([^\n]*)(\n.*)*\n\1(\n|$)/!{P;h};d}' file

Use a range between <segment2> and </segment2>.

Append a copy of what has already been seen within the range to the current line and if not seen, print the current line and copy.

Otherwise, delete the line.