How to remove rows from a CSV with no data using AWK

413 Views Asked by At

I am working with a large csv in a linux shell that I narrowed down to 3 columns:
Species name, Latitude, and Longitude.

awk -F "\t" '{print $10,","$22,",",$23}' occurance.csv > three_col.csv


The file ends up looking like this:

      species         | Lat     | Long   |
----------------------|---------|---------
Leucoraja erinacea    | 41.0748 | 72.9461|
Brevoortia tyrannus   | 39.0748 | 70.9461|
Paralichthys dentatus |         | 73.2354|
Paralichthys dentatus |         |        |
Leucoraja erinacea    | 41.0748 |        |
Brevoortia tyrannus   |         |        |
Brevoortia tyrannus   |         |        |
Paralichthys dentatus | 39.0748 | 70.9461|
Brevoortia tyrannus   | 39.0748 | 70.9461|

However this is what I want it to Look: Notice all species with no lat or long data have been removed

      species         | Lat     | Long   |
----------------------|---------|---------
Leucoraja erinacea    | 41.0748 | 72.9461|
Brevoortia tyrannus   | 39.0748 | 70.9461|
Paralichthys dentatus | 39.0748 | 70.9461|
Brevoortia tyrannus   | 39.0748 | 70.9461|

I've been trying to remove rows that are lacking either Lat or Long data. Using a line like this:

awk -F "\t" BEGIN '{print $1,$2,$3}' END '{$2!=" " && $3!= " " }' three_col.csv > del_blanks.csv

but it results in this error even with small changes that I make trying to solve the problem

awk: line 1: syntax error at or near end of line

How can I get rid of these rows with missing data, is this something I need a "for" loop for?

3

There are 3 best solutions below

0
Hai Vu On BEST ANSWER

Since I don't know what your occurance.csv file looks like, this is a shot in the dark:

awk -F "\t" '$22 && $23 {print $10,","$22,",",$23}' occurance.csv > three_col.csv

The expression $22 && $23 says: Both field 22 and field 23 must not be blank. It is a condition to filter out those lines which don't qualify. It is a shorthand for $22 != "" && $3 != "".

0
RARE Kpop Manifesto On

perhaps something like this ?

 mawk '($!NF=$10","$22","$23)!~",,$"' FS='\t' OFS=','

You already know only fields 10/22/23 needs to be printed, so you can first overwrite $0 with those just 3 columns, already-split by OFS

afterwards simply use a quick regex check, since 2 consecutive OFS at the tail is the sign $22 and $23 are empty - saving the print statement and pattern-action blocks.

0
ufopilot On
    awk -F "|" '
        {
        if (substr($1,1,1) == "-"){  
          e = ""
        }else{
          e=FS
        }
        gsub(/[ \t]+$/, "", $2)
        gsub(/[ \t]+$/, "", $3)
        
        if(length($2) !=0 && length($3) !=0){
          printf "%s%s%-9s%s%-8s%s\n", $1, FS, $2, FS, $3, e
        }
}' file.txt

      species         | Lat     | Long   |
----------------------|---------|---------
Leucoraja erinacea    | 41.0748 | 72.9461|
Brevoortia tyrannus   | 39.0748 | 70.9461|
Paralichthys dentatus | 39.0748 | 70.9461|
Brevoortia tyrannus   | 39.0748 | 70.9461|