remove consecutive multi-line duplicates (with bash)

229 Views Asked by pmlk At 19 December 2022 at 18:05

In a text file, I want to remove duplicates spanning two lines. Meaning in four consecutive lines the first two are the same as the last two. I only want to keep the first (or last) two lines. I want to preserve the order of lines in the file.

Example

Consider a file input.txt where foo\nbar is repeated and baz\nboo is repeated, each in consecutive two-line blocks.

1
foo
bar
foo
bar
2
3
baz
boo
baz
boo
4

desired contents:

1
foo
bar
2
3
baz
boo
4

Things considered: `uniq`, `sed`

The same task is fairly simple for removing single line duplicates: uniq input.txt. However, man uniq doesn't suggest that there is an option to get it to work for my use case.

I also had a look at sed, but couldn't get it to work. EDIT: didn't try anything specific as these docs only consider searching and replacing within a two-lines block, not four lines.

Original Q&A

There are 2 best solutions below

anubhava On 19 December 2022 at 18:10 BEST ANSWER

If you want to accept a perl solution then:

perl -0777 -pe 's/(.+\R.+\R)\1/$1/g' file

1
foo
bar
2
3
baz
boo
4

Ed Morton On 19 December 2022 at 18:27

With GNU sed for -E (EREs) and -z (read the whole file into memory):

$ sed -Ez 's/((.*\n){2})\1/\1/g' file
1
foo
bar
2
3
baz
boo
4

I also think you need GNU sed for the backreference in the regexp as I don't think that's part of POSIX but I'm not 100% sure on that one.

remove consecutive multi-line duplicates (with bash)

Example

Things considered: `uniq`, `sed`

There are 2 best solutions below

Related Questions in BASH

Related Questions in SED

Related Questions in UNIQ

Trending Questions

Popular # Hahtags

Popular Questions

remove consecutive multi-line duplicates (with bash)

Example

Things considered: uniq, sed

There are 2 best solutions below

Related Questions in BASH

Related Questions in SED

Related Questions in UNIQ

Trending Questions

Popular # Hahtags

Popular Questions

Things considered: `uniq`, `sed`