How do I use hadoop streaming cmdenv with Oozie?

380 Views Asked by At

I have a Hadoop streaming job with the parameter:

 -cmdenv TEXT_DIR=cachetextdir

How do I specify this in an Oozie workflow?

(I am assuming I can point to cachetextdir in Oozie with:

 <archive>hdfs://localhost:54310/user/vm/textinput/cachetextdir.tar.gz#cachetextdir</archive>
1

There are 1 best solutions below

0
On BEST ANSWER

Looks like:

            <streaming>
            <mapper>[MAPPER-PROCESS]</mapper>
            <reducer>[REDUCER-PROCESS]</reducer>
            <record-reader>[RECORD-READER-CLASS]</record-reader>
            <record-reader-mapping>[NAME=VALUE]</record-reader-mapping>
            ...
            <env>[NAME=VALUE]</env>
            ...
        </streaming>

from here will do the job.

UPDATE: yes it does:

    <streaming>
      <mapper>python smspipelineHadoop.py</mapper>
      <env>TEXT_DIR=cachetextdir</env>

    </streaming>