Pass non printing character to shell command for Hadoop Streaming Separator

401 Views Asked by At

I am using hadoop streaming and I want to change my separator between key and values.

I noticed that I can change it using this argument.

hadoop jar \
/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.3.0-mr1-cdh5.1.0.jar \
-D stream.map.output.field.separator=. \
...

which will use . instead of \t as the new separator.

How can I send a non-printing characters like ^A to the command(start of heading), which is the default separator from Hive?

1

There are 1 best solutions below

3
On

If you're talking about bash or other common Linux shells, you can enter any character literally by preceding it with ctrlV. I'd put it in single quotes just to be sure the shell doesn't treat it specially in any way. So it will be something like stream.map.output.field.separator='^A' where you produce the ^A by typing ctrlV followed by ctrlA.

Mind you, the shell will pass it correctly, but I can't vouch for Hadoop and the way it parses properties.

An alternative is to replace the input file's separators with tabs using sed.

sed -e 's/^A/<tab>/g' <filename> | hadoop …

Where you produce the ^A with ctrlV followed by ctrlA and the tab character with ctrlV followed by tab or by ctrlI.