Can RS be set "empty" to split string characters to records?

920 Views Asked by At

Is there a way in awk—gawk most likely—to set the record separator RS to empty value to process each character of a string as a separate record? Kind of like setting the FS to empty to separate each character in its own field:

$ echo abc | awk -F '' '{print $2}'
b

but to separate them each as a separate record, like:

$ echo abc | awk -v RS='?' '{print $0}'
a
b
c

The most obvious one:

$ echo abc | awk -v RS=''  '{print $0}'
abc

didn't award me (as that one was apparently meant for something else per GNU awk documentation).

Am I basically stuck using for etc.?

EDIT:

@xhienne's answer was what I was looking for but even using that (20 chars and a questionable variable A :):

$ echo  abc | awk -v A="\n" -v RS='(.)' -v ORS="" '{print(RT==A?NR:RT)}'
abc4

wouldn't help me shorten my earlier code using length. Then again, how could I win the Pyth code: +Qfql+Q :D.

3

There are 3 best solutions below

6
On BEST ANSWER

If you just want to print one character per line, @klashxx's answer is OK. But a sed 's/./&\n/g' would be shorter since you are golfing.

If you truly want a separate record for each character, the best approaching solution I have found for you is:

echo -n abc | awk -v RS='(.)' '{ print RT }'

(use gawk; your input character is in RT, not $1)

[update] If RS is set to the null string, it means to awk that records are separated by blank lines. If I had just defined RS='.', the record separator would have been a mere dot (i.e. a fixed string). But if its length is more than one character, one feature of gawk is to consider RS as a regex. So, what I did here is to give gawk a regex meaning "each character" as a record separator. And I use another feature of gawk: to retrieve the string that matched the regex in the special variable RT (record terminator)

Here is the relevant parts of the gwak manual:

Normally, records are separated by newline characters. You can control how records are separated by assigning values to the built-in variable RS. If RS is any single character, that character separates records. Otherwise, RS is a regular expression. Text in the input that matches this regular expression separates the record.

If RS is set to the null string, then records are separated by blank lines.

Gawk sets RT to the input text that matched the character or regular expression specified by RS.

0
On

No there is no setting of RS that will do what you want. It looks like your requirement is to append a newline after every character that is not a newline, if so this will produce the output you want:

$ echo 'abc' | awk -v ORS= 'gsub(/[^\n]/,"&\n")'
a
b
c

That will work on any awk on any UNIX system.

4
On

It is not possible

The empty string "" (a string without any characters) has a special meaning as the value of RS. It means that records are separated by one or more blank lines and nothing else.

A simply alternative:

echo abc | awk  'BEGIN{FS="";OFS="\n"}$1=$1'