AWK matching values in a column and performing calculation

166 Views Asked by At

I'm new at AWK and I'm trying to figure out an answer for my problem. I have a flat file with the following values:

403 | SanMateo   | f | 2015-04-09 18:50:24.38
403 | SanMateo   | t | 2015-04-09 18:45:24.36
403 | SanMateo   | t | 2015-04-09 18:40:24.383
403 | SanMateo   | f | 2015-04-09 18:35:24.357
403 | SanMateo   | t | 2015-04-09 18:30:24.355
404 | RedwoodCity| f | 2015-04-09 18:35:50.308
404 | RedwoodCity| t | 2015-04-09 18:30:50.242
404 | RedwoodCity| f | 2015-04-09 18:25:50.245
404 | RedwoodCity| t | 2015-04-09 18:20:50.242
404 | RedwoodCity| f | 2015-04-09 18:15:50.242

I want to use awk to compare $1 of the current line to $1 of the next line, and $3 ~/f/. if the statement is true then subtract $4 of the next line from $4 of the current line and write the difference in a new column of the current line and if false then do nothing. what I have so far is this:

awk 'BEGIN {FS="|";} {if (NR $1 ~ NR++ $1 && $3 ~ /f/) subtract = NR $4 - NR++ $4; {print subtract}}' allHealthRecords_Sorted

and obviously that's not working. Can someone please help?

3

There are 3 best solutions below

1
glenn jackman On BEST ANSWER

save this as time_diff.awk

BEGIN {FS = "[[:blank:]]*\\|[[:blank:]]*"}

# convert "YYYY-mm-dd HH:MM:SS.fff" to a number
function to_time(timestamp,       fraction) {
    fraction = timestamp
    sub(/\..*$/, "", timestamp)
    gsub(/[-:]/, " ", timestamp)
    sub(/.*\./, "0.", fraction)
    return mktime(timestamp) + fraction
}

# gawk has no builtin abs() function
function abs(val) { 
    return( val < 0 ? -1*val : val) 
}

# add the time diff if the condition is met
NR > 1 {
    diff = 0
    if ($1+0 == key && flag == "f") 
        diff = abs( to_time($4) - to_time(time) )
    print line (diff > 0 ? " | " diff : "")
} 

{
    # remember the previous line's values
    key = $1+0; flag = $3; time = $4; line = $0
}

END {print}

Then

$ gawk -f time_diff.awk file
        403 | SanMateo| f                | 2015-04-09 18:50:24.38 | 300.02
        403 | SanMateo| t                | 2015-04-09 18:45:24.36
        403 | SanMateo| t                | 2015-04-09 18:40:24.383
        403 | SanMateo| f                | 2015-04-09 18:35:24.357 | 300.002
        403 | SanMateo| t                | 2015-04-09 18:30:24.355
        404 | RedwoodCity| f                | 2015-04-09 18:35:50.308 | 300.066
        404 | RedwoodCity| t                | 2015-04-09 18:30:50.242
        404 | RedwoodCity| f                | 2015-04-09 18:25:50.245 | 300.003
        404 | RedwoodCity| t                | 2015-04-09 18:20:50.242
        404 | RedwoodCity| f                | 2015-04-09 18:15:50.242
3
user448810 On

In the BEGIN action, read the first line with getline and save the values of $1 and $4.

On each line thereafter, compare $1 to the saved value from the previous line. If they are the same, and $3 ~ /f/, do the desired process. Then save the values of $1 and $4 for the next line.

That should be enough to get you started. If you have trouble writing the code, come back and ask more questions.

1
Ed Morton On

You don't show your expected output so we can't test it, and $4 is a date so idk what you mean by "subtract" but this is basically the right approach:

$ cat tst.awk         
BEGIN{ FS="[[:space:]]*[|][[:space:]]*"; OFS=" | " }
split(prev,p) { print prev ( ($1==p[1])&&(p[3]=="f") ? OFS p[4] - $4 : "") }
{ prev = $0 }
END { print prev ( ($1==p[1])&&(p[3]=="f") ? OFS p[4] - $4 : "") }

$ awk -f tst.awk file
403 | SanMateo   | f | 2015-04-09 18:50:24.38 | 0
403 | SanMateo   | t | 2015-04-09 18:45:24.36
403 | SanMateo   | t | 2015-04-09 18:40:24.383
403 | SanMateo   | f | 2015-04-09 18:35:24.357 | 0
403 | SanMateo   | t | 2015-04-09 18:30:24.355
404 | RedwoodCity| f | 2015-04-09 18:35:50.308 | 0
404 | RedwoodCity| t | 2015-04-09 18:30:50.242
404 | RedwoodCity| f | 2015-04-09 18:25:50.245 | 0
404 | RedwoodCity| t | 2015-04-09 18:20:50.242
404 | RedwoodCity| f | 2015-04-09 18:15:50.242

i.e. you have a buffer of 1 line so you're always operating on and outputing the previous line that you read.