Delete first three lines containing a certain word

112 Views Asked by At

I am having a bit of trouble with a sed command. I need to delete the first 3 lines that contain a specified word from a file.

My sed command only checks the first 6 lines, whether or not they contain the word. It deletes the lines containing the word, but it stops after 6 lines, not after 3 lines that contain the word.

This is my command:

sed "1,6{/.*$word.*/d;}" "$filename"`

So, for example, I want to delete the first 3 lines containing the word "please", for the file content:

leave this line alone
leave this line alone
1 please delete this line
leave this line alone
2 please delete this line
leave this line alone
leave this line alone
3 please delete this line
leave this line alone

The output is:

leave this line alone
leave this line alone
leave this line alone
leave this line alone
leave this line alone
3 please delete this line
leave this line alone
4

There are 4 best solutions below

4
jhnc On

As noted in the comments, an awk solution is simple.

For example:

awk '!index($0,q) || ++c>n' n=3 q=please "$filename"

or slightly more confusingly but more efficiently (since we can stop checking lines after we've found three matches):

awk 'c==n || !index($0,q) || !++c' n=3 q=please "$filename"

It is not very convenient to try to count with sed, although it is possible. Parameterising arguments is also complicated.

Ignoring both those issues, here is a sed script to delete the first 3 occurrences of lines containing "please":

sed '
    /please/ {
        x
        s/././3
        x
        t
        x
        s/^/ /
        x
        d
    }
' "$filename"

Like the second awk, we can avoid scanning lines after we have found enough matches:

sed '
    /please/!b
    x
    s/././3
    x
    t flush
    x
    s/^/^/
    x
    d
:flush
    n
    b flush
' "$filename"

Applying any script to:

1 leave this line alone
2 leave this line alone
1 please delete this line
3 leave this line alone
2 please delete this line
4 leave this line alone
5 leave this line alone
3 please delete this line
6 leave this line alone
4 please leave this line alone
7 leave this line alone

produces:

1 leave this line alone
2 leave this line alone
3 leave this line alone
4 leave this line alone
5 leave this line alone
6 leave this line alone
4 please leave this line alone
7 leave this line alone
6
jthill On

GNU sed offers a simpler way to do it,

sed '0,/word/{//d}
     0,/word/{//d}
     0,/word/{//d}
'

or even, since there's only the one search,

sed '/word/ { 0,//{//d}; 0,//{//d}; 0,//{//d}; }'

which also scans each line only once, to fix a (minor but possibly measurable) scaling issue noticed in comments.

1
potong On

This might work for you (GNU sed):

sed '/please/{x;s/^./&/m3;x;t;H;d}' file

Use the hold space as a counter for the lines containing the word please.

If a line contains please then:

Swap to the hold space, substitute the first character of the third line with itself and swap back to the pattern space.

If the substitution was successful, carry out no further processing of this line.

Otherwise, append the line to the hold space and delete the current line.

0
Adam Liss On

The problem is that the command specifies 1,6, which will match only the first 6 lines. You could write a sed script that does what you want, but not a one-liner.

On the other hand, this is easy to do with awk, like this:

awk '/.*please.*/ {n++} !/.*please.*/ || n > 3 {print}'

There are actually two "lines" here. The first one just matches lines that contain the word "please" and counts them. The second prints lines that do not (!) contain the word, or all lines after it finds 3 matches.