I am trying to export some data from hdfs to mysql using sqoop. The problem is when I try to export a uncompressed file it exports properly but if I try to export the same file compressed with lzo compression, the sqoop job fails. I am trying it in standard cloudera CDH4 VM environment. The columns in file are separated by tab and null is represented as '\N'.
The file content:
[cloudera@localhost ~]$ cat dipayan-test.txt
dipayan koramangala 29
raju marathahalli 32
raju marathahalli 32
raju \N 32
raju marathahalli 32
raju \N 32
raju marathahalli 32
raju marathahalli \N
raju marathahalli \N
Description of mysql table:
mysql> describe sqooptest;
+---------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+-------+
| name | varchar(100) | YES | | NULL | |
| address | varchar(100) | YES | | NULL | |
| age | int(11) | YES | | NULL | |
+---------+--------------+------+-----+---------+-------+
3 rows in set (0.01 sec)
File in hdfs:
[cloudera@localhost ~]$ hadoop fs -ls /user/cloudera/dipayan-test
Found 1 items
-rw-r--r-- 3 cloudera cloudera 138 2014-02-16 23:18 /user/cloudera/dipayan-test/dipayan-test.txt.lzo
sqoop command:
sqoop export --connect "jdbc:mysql://localhost/bigdata" --username "root" --password "XXXXXX" --driver "com.mysql.jdbc.Driver" --table sqooptest --export-dir /user/cloudera/dipayan-test/ --input-fields-terminated-by '\t' -m 1 --input-null-string '\\N' --input-null-non-string '\\N'
Error:
[cloudera@localhost ~]$ sqoop export --connect "jdbc:mysql://localhost/bigdata" --username "root" --password "mysql" --driver "com.mysql.jdbc.Driver" --table sqooptest --export-dir /user/cloudera/dipayan-test/ --input-fields-terminated-by '\t' -m 1 --input-null-string '\\N' --input-null-non-string '\\N'
14/02/16 23:19:26 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/02/16 23:19:26 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
14/02/16 23:19:26 INFO manager.SqlManager: Using default fetchSize of 1000
14/02/16 23:19:26 INFO tool.CodeGenTool: Beginning code generation
14/02/16 23:19:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqooptest AS t WHERE 1=0
14/02/16 23:19:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqooptest AS t WHERE 1=0
14/02/16 23:19:27 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-0.20-mapreduce
14/02/16 23:19:27 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-0.20-mapreduce/hadoop-core.jar
Note: /tmp/sqoop-cloudera/compile/676bc185f1efffa3b0de0a924df4a02d/sqooptest.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/02/16 23:19:29 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/676bc185f1efffa3b0de0a924df4a02d/sqooptest.jar
14/02/16 23:19:29 INFO mapreduce.ExportJobBase: Beginning export of sqooptest
14/02/16 23:19:30 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqooptest AS t WHERE 1=0
14/02/16 23:19:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/02/16 23:19:31 INFO input.FileInputFormat: Total input paths to process : 1
14/02/16 23:19:31 INFO input.FileInputFormat: Total input paths to process : 1
14/02/16 23:19:31 INFO mapred.JobClient: Running job: job_201402162201_0013
14/02/16 23:19:32 INFO mapred.JobClient: map 0% reduce 0%
14/02/16 23:19:41 INFO mapred.JobClient: Task Id : attempt_201402162201_0013_m_000000_0, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(AbstractList.java:350)
at sqooptest.__loadFromFields(sqooptest.java:225)
at sqooptest.parse(sqooptest.java:174)
at org.apach
14/02/16 23:19:48 INFO mapred.JobClient: Task Id : attempt_201402162201_0013_m_000000_1, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(AbstractList.java:350)
at sqooptest.__loadFromFields(sqooptest.java:225)
at sqooptest.parse(sqooptest.java:174)
at org.apach
14/02/16 23:19:55 INFO mapred.JobClient: Task Id : attempt_201402162201_0013_m_000000_2, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(AbstractList.java:350)
at sqooptest.__loadFromFields(sqooptest.java:225)
at sqooptest.parse(sqooptest.java:174)
at org.apach
14/02/16 23:20:04 INFO mapred.JobClient: Job complete: job_201402162201_0013
14/02/16 23:20:04 INFO mapred.JobClient: Counters: 7
14/02/16 23:20:04 INFO mapred.JobClient: Job Counters
14/02/16 23:20:04 INFO mapred.JobClient: Failed map tasks=1
14/02/16 23:20:04 INFO mapred.JobClient: Launched map tasks=4
14/02/16 23:20:04 INFO mapred.JobClient: Data-local map tasks=4
14/02/16 23:20:04 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=29679
14/02/16 23:20:04 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0
14/02/16 23:20:04 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/02/16 23:20:04 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/02/16 23:20:04 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
14/02/16 23:20:04 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 33.5335 seconds (0 bytes/sec)
14/02/16 23:20:04 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
14/02/16 23:20:04 INFO mapreduce.ExportJobBase: Exported 0 records.
14/02/16 23:20:04 ERROR tool.ExportTool: Error during export: Export job failed!
This works perfectly if the file is not compressed and I am directly working with dipayan-test.txt
file.
Need help in resolving this issue and also want to know if I am missing something when working with lzo files.
Chances are that your table doesn't have the correct columns. You can always go into the .java file sqoop creates for you and debug from there:
sqooptest.java:225