Replace character in one column of CSV file with awk gsub

2.4k Views Asked by At

I want to use awk to translate a CSV file into a new CSV file that has only a subset of the original columns. And I also want to replace spaces with underscores for one of the columns only. I've tried like this:

gawk -F "," '
{
  name=gsub(/ /,"_",$1);
  label=$2;
  print ","name","label","
}' ./in.csv >> ./out.csv

But gsub() returns the number of match occurences, not the replacement string. So I get something like this:

,1,label

instead of:

,name_nospace,label

How do I use awk gsub like this to replace a character for one column only?

3

There are 3 best solutions below

0
On BEST ANSWER

Don't:

name=gsub()

as gsub returns the number of substitutions, not a string. Just

gsub()

and print the field you fiddled with, ie:

gsub(/ /,"_",$1);
label=$2;
print "," $1 "," label "," # or whatever you were doing
0
On

To modify "name", change:

name=gsub(/ /,"_",$1)

to (gawk and newer mawk only):

name=gensub(/ /,"_","g",$1)

or (any awk):

name=$1
gsub(/ /,"_",name)

You should also be setting OFS instead of hard-coding commas, especially if you're modifying fields, so your script should be written as:

awk '
BEGIN { FS=OFS="," }
{
  name=$1
  gsub(/ /,"_",name)
  label=$2
  print "", name, label, ""
}' ./in.csv

assuming there's some reason for using variables instead of modifying the fields directly.

0
On
gawk -F "," '
{
  gsub(/ /,"_",$1);
  # print only: ,NameValue,LabelValue,   as output
  # so 4 field with first and last empty as in OP
  print "," $1 "," $2 ","
}' ./in.csv >> ./out.csv

in this case a sed is also available

sed -e ':under' -e 's/^\([^[ ,]*\) /\1_/;t under' -e 's/^\([^,]*,[^,]*,\).*/,\1/' ./in.csv >> ./out.csv