Unique count of a value in a zipped file based on other constraints on surrounding lines

33 Views Asked by At

I have a log file.

Has data like this:

Operation=ABC,
CustomerId=12,
..
..
..
Counters=qwe=1,wer=2,mbn=4,Hello=0, 
----
Operation=CQW,
CustomerId=10,
Time=blah,
..
..
Counters=qwe=1,wer=2,mbn=4,Hello=0,jvnf=2,njfs=4
----
Operation=ABC,
CustomerId=12,
Metric=blah
..
..
Counters=qwe=1,wer=2,mbn=4,Hello=1, uisg=2,vieus=3
----
Operation=ABC,
CustomerId=12,
Metric=blah
..
..
Counters=qwe=1,wer=2,mbn=4,Hello:0, uisg=2,vieus=3
----

Now, I want to find all the unique CustomerIds where Operation=ABC and Hello=0 (in Counters).

All of this info is contained in .gz files in a directory.

So, here is what I've tried to just retrieve the number of times Operation=ABC and "Hello=0" appears in the lines near it.

zgrep -A 20 "Operation=ABC" * | grep "Hello=0" | wc -l

This gave me the number of times that "Hello=0" was found for Operation=ABC. (about 250)

In order to get unique customer Ids, I tried this:

zgrep -A 20 "Operation=ABC" * | grep "Hello=0" -B 10 | grep "CustomerId" | uniq -c 

This gave me no results. What am I getting wrong here?

2

There are 2 best solutions below

1
On

Actually, this works. I was just being impatient.

zgrep -A 20 "Operation=ABC" * | grep "Hello=0" -B 10 | grep "CustomerId" | uniq -c 
0
On

You need NOT to use these many grep and zgrep we could do it within single awk.

awk -F'=' '
/^--/{
  if(val==3){
    print value
  }
  val=value=""
}
/Operation=ABC/{
  val++
}
/CustomerId/{
  if(!a[$NF]++){
     val++
  }
}
/Hello=0/{
  val++
}
{
  value=(value?value ORS:"")$0
}
END{
  if(val && value){
     print value
  }
}'  <(gzip -dc input_file.gz)

Output will be as follows(tested from your sample only):

Operation=ABC,
CustomerId=12,
..
..
..
Counters=qwe=1,wer=2,mbn=4,Hello=0,