Removing lines containing a unique first field with awk?

Question

Removing lines containing a unique first field with awk?

1.8k Views Asked by Kyle At 29 July 2025 at 05:58

Looking to print only lines that have a duplicate first field. e.g. from data that looks like this:

1 abcd
1 efgh
2 ijkl
3 mnop
4 qrst
4 uvwx

Should print out:

1 abcd
1 efgh
4 qrst
4 uvwx

(FYI - first field is not always 1 character long in my data)

Original Q&A

There are 5 best solutions below

Jeremiah Willcock On 25 February 2011 at 23:38

Here is some awk code to do what you want, assuming the input is grouped by its first field already (like uniq also requires):

BEGIN {f = ""; l = ""}
{
  if ($1 == f) {
    if (l != "") {
      print l
      l = ""
    }
    print $0
  } else {
    f = $1
    l = $0
  }
}

In this code, f is the previous value of field 1 and l is the first line of the group (or empty if that has already been printed out).

DigitalRoss On 25 February 2011 at 23:41

BEGIN { IDLE = 0; DUP = 1; state = IDLE }

{ 
  if (state == IDLE) {
    if($1 == lasttime) {
       state = DUP
       print lastline
    } else state = IDLE
  } else {
    if($1 != lasttime)
        state = IDLE
  }
  if (state == DUP)
    print $0
  lasttime = $1
  lastline = $0
}

Dennis Williamson On 26 February 2011 at 01:33

Assuming ordered input as you show in your question:

awk '$1 == prev {if (prevline) print prevline; print $0; prevline=""; next} {prev = $1; prevline=$0}' inputfile

The file only needs to be read once.

kurumi On 26 February 2011 at 03:41

If you can use Ruby(1.9+)

#!/usr/bin/env ruby
hash = Hash.new{|h,k|h[k] = []}
File.open("file").each do |x|
  a,b=x.split(/\s+/,2)
  hash[a] << b
end
hash.each{|k,v| hash[k].each{|y| puts "#{k} #{y}" } if v.size>1 }

output:

$ cat file
1 abcd
1 efgh
2 ijkl
3 mnop
4 qrst
4 uvwx
4 asdf
1 xzzz

$ ruby arrange.rb
1 abcd
1 efgh
1 xzzz
4 qrst
4 uvwx
4 asdf

**SiegeX** · Accepted Answer

awk 'FNR==NR{a[$1]++;next}(a[$1] > 1)' ./infile ./infile

Yes, you give it the same file as input twice. Since you don't know ahead of time if the current record is uniq or not, you build up an array based on $1 on the first pass then you only output records that have seen $1 more than once on the second pass.

I'm sure there are ways to do it with only a single pass through the file but I doubt they will be as "clean"

Explanation

FNR==NR: This is only true when awk is reading the first file. It essentially tests total number of records seen (NR) vs the input record in the current file (FNR).
a[$1]++: Build an associative array a who's key is the first field ($1) and who's value is incremented by one each time it's seen.
next: Ignore the rest of the script if this is reached, start over with a new input record
(a[$1] > 1) This will only be evaluated on the second pass of ./infile and it only prints records who's first field ($1) we've seen more than once. Essentially, it is shorthand for if(a[$1] > 1){print $0}

Proof of Concept

$ cat ./infile
1 abcd
1 efgh
2 ijkl
3 mnop
4 qrst
4 uvwx

$ awk 'FNR==NR{a[$1]++;next}(a[$1] > 1)' ./infile ./infile
1 abcd
1 efgh
4 qrst
4 uvwx

Removing lines containing a unique first field with awk?

There are 5 best solutions below

Explanation

Proof of Concept

Related Questions in SORTING

Related Questions in SED

Related Questions in AWK

Related Questions in GREP

Related Questions in UNIQ

Trending Questions

Popular # Hahtags

Popular Questions