Grep a string with number greater than 45

6.3k Views Asked by At

I have multiple files in a directory. I want to extract each line in all the files containing which has integer value greater than 45.

Currently, I am using :

grep "IO resumed after" *

Its displaying me all the files which this string "IO resumed after" I want to put one more parameter that will grep all the lines "IO resumed after [number >45] seconds"

3

There are 3 best solutions below

0
Ivan On BEST ANSWER

Looks like i need to learn awk until then i've got a bash solution. If seconds without decimal point then this:

while read line; do
    number=${line//*after}
    number=${number//seconds*}
    ((number>45)) && echo $line
done <<< $(grep "IO resumed after" *)

otherwise we have to use bc:

while read line; do
    number=${line//*after}
    number=${number//seconds*}
    case $(bc <<< "$number>45") in 1) echo "$line";; esac
done <<< $(grep "IO resumed after" *)
0
kvantour On

It is better to use awk for this:

awk 'match($0,"IO resumed after") { if (substr($0,RSTART+RLENGTH)+0 > 45) print }' file

This searches for the string "IO resumed after", if that string is found it will take everything after this string and convert it to a number: if the substring after "IO resumed after" starts with a number, then it will be converted to that number when we just add zero to it.

This will only work if the line looks like:

xxxxIO resumed after_nnnnyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy

where x and y are random characters, underscore is any sequence of blanks, n is a digit.

You can test it with the following set of commands:

$ seq 40 0.5 50 | awk '{print "foo IO resumed after",$0,"random stuff"}' \
  | awk 'match($0,"IO resumed after") { if (substr($0,RSTART+RLENGTH)+0 > 45) print }'

which outputs:

foo IO resumed after 45.5 random stuff
foo IO resumed after 46.0 random stuff
foo IO resumed after 46.5 random stuff
foo IO resumed after 47.0 random stuff
foo IO resumed after 47.5 random stuff
foo IO resumed after 48.0 random stuff
foo IO resumed after 48.5 random stuff
foo IO resumed after 49.0 random stuff
foo IO resumed after 49.5 random stuff
foo IO resumed after 50.0 random stuff
2
Bodo On

You can use alternatives and repeat counts to define a search pattern for numbers greater than 45.

This solution assumes the numbers are integer numbers without a decimal point.

grep 'IO resumed after \(4[6-9]\|[5-9][0-9]\|[0-9]\{3,\}\) seconds'

or shorter with egrep:

egrep 'IO resumed after (4[6-9]|[5-9][0-9]|[0-9]{3,}) seconds'

I tested the pattern with

for i in 1 10 30 44 45 46 47 48 49 50 51 60 99 100 1234567
do
echo "foo IO resumed after $i seconds bar"
done | grep 'IO resumed after \(4[6-9]\|[5-9][0-9]\|[0-9]\{3,\}\) seconds'

which prints

foo IO resumed after 46 seconds bar
foo IO resumed after 47 seconds bar
foo IO resumed after 48 seconds bar
foo IO resumed after 49 seconds bar
foo IO resumed after 50 seconds bar
foo IO resumed after 51 seconds bar
foo IO resumed after 60 seconds bar
foo IO resumed after 99 seconds bar
foo IO resumed after 100 seconds bar
foo IO resumed after 1234567 seconds bar

If the numbers (can) have a decimal point, it is difficult to define a pattern for numbers > 45, e.g. 45.1.
This pattern allows a decimal point or comma followed by digits and implements a condition >= 46.

grep 'IO resumed after \(4[6-9]\|[5-9][0-9]\|[0-9]\{3,\}\)\([.,][0-9]*\)\{,1\} seconds'

2nd edit:

The patterns above don't handle possible leading zeros. As suggested by user kvantour in a comment, the pattern can be extended to handle this. Furthermore, if it is not required to check the seconds part, the pattern for the decimals can be omitted.

Pattern for numbers >= 45 with optional leading zeros:

grep 'IO resumed after 0*\(4[5-9]\|[5-9][0-9]\|[1-9][0-9]\{2,\}\)'