Can we use Sqoop2 import to create only a file and not into a HIVE table etc

148 Views Asked by At

I have tried running below commands in Sqoop2:

This one works wherein TAB-Separated part files (part-m-00000, part-m-00001 etc) were created:

sqoop import --connect jdbc:oracle:thin:@999.999.999.999:1521/SIDNAME --username god --table TABLENAME --fields-terminated-by '\t' --lines-terminated-by '\n' -P

This one fails:

sqoop import -Dmapreduce.job.user.classpath.first=true \
-Dmapreduce.output.basename=`date +%Y-%m-%d` \
--connect jdbc:oracle:thin:@999.999.999.999:1521/SIDNAME \
--username nbkeplo \
--P \
--table TABLENAME \
--columns "COL1, COL2, COL3" \
--target-dir /usr/data/sqoop \
-–as-parquetfile \
-m 10

Error:

20/01/08 09:21:23 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
20/01/08 09:21:23 ERROR tool.BaseSqoopTool: Unrecognized argument: -–as-parquetfile
20/01/08 09:21:23 ERROR tool.BaseSqoopTool: Unrecognized argument: -m
20/01/08 09:21:23 ERROR tool.BaseSqoopTool: Unrecognized argument: 10

Try --help for usage instructions.

I want the output to be a <.parquet> file and not a HIVE table (want to use with Apache Spark directly without using HIVE). Is this <.parquet> file creation possible with Sqoop import ?

2

There are 2 best solutions below

1
Chris Marotta On

Importing directly to HDFS (as AVRO, SequenceFile, or ) is possible with Sqoop. When you output to Hive, it's still written to HDFS, just inside the Hive warehouse for managed tables. Also, Spark is able to read from any HDFS location it has permission to.

Your code snippets are not the same, and you didn't mention troubleshooting steps you have tried.

I would add the --split-by, --fields-terminated-by, and --lines-terminated-by arguments to your command.

0
Voila On

The below works:

sqoop import \
--connect jdbc:oracle:thin:@999.999.999.999:1521/SIDNAME \
--username user \
--target-dir /xxx/yyy/zzz \
--as-parquetfile \
--table TABLE1 \
-P