I am using hadoop streaming and I want to change my separator between key and values.
I noticed that I can change it using this argument.
hadoop jar \
/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.3.0-mr1-cdh5.1.0.jar \
-D stream.map.output.field.separator=. \
...
which will use .
instead of \t
as the new separator.
How can I send a non-printing characters like ^A
to the command(start of heading), which is the default separator from Hive?
If you're talking about bash or other common Linux shells, you can enter any character literally by preceding it with ctrlV. I'd put it in single quotes just to be sure the shell doesn't treat it specially in any way. So it will be something like
stream.map.output.field.separator='^A'
where you produce the^A
by typing ctrlV followed by ctrlA.Mind you, the shell will pass it correctly, but I can't vouch for Hadoop and the way it parses properties.
An alternative is to replace the input file's separators with tabs using
sed
.Where you produce the ^A with ctrlV followed by ctrlA and the tab character with ctrlV followed by tab or by ctrlI.