I want to read orders
data and create RDD out of it which is stored as sequence
file in hadoop fs in cloudera
vm
. Below are my steps:
1) Importing orders data as sequence file:
sqoop import --connect jdbc:mysql://localhost/retail_db --username retail_dba --password cloudera --table orders -m 1 --target-dir /ordersDataSet --as-sequencefile
2) Reading file in spark scala:
Spark 1.6
val sequenceData=sc.sequenceFile("/ordersDataSet",classOf[org.apache.hadoop.io.Text],classOf[org.apache.hadoop.io.Text]).map(rec => rec.toString())
3) When I try to read data from above RDD it throws below error:
Caused by: java.io.IOException: WritableName can't load class: orders
at org.apache.hadoop.io.WritableName.getClass(WritableName.java:77)
at org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:2108)
... 17 more
Caused by: java.lang.ClassNotFoundException: Class orders not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2185)
at org.apache.hadoop.io.WritableName.getClass(WritableName.java:75)
... 18 more
I don't know why it says that it can't find orders. Where am I going wrong ?
I referred codes from these two links as well but no luck:
1) Refer sequence part
2) Refer step no. 8
I figured out the solution to my own problem. Well, I am going to write a lengthy solution but I hope it will make some sense.
1) When I tried to read the data which was imported in
HDFS
usingSQOOP
, it gives an error because of following reasons:A) Sequence file is all about
key-value pair
. So when I import it using sqoop, the data which is imported it is not in key value pair that is why while reading it throws an error.B) If you try to read
few characters
from which you can figure out thetwo classes
required for passing as input while reading sequence file you ll get data as below:Above you can see only
one class
i.eorg.apache.hadoop.io.LongWritable
and when I pass this while reading the sequence data it throws an error which is mentioned in the post.I don't think that the
B
point is the main reason of that error but I am very much sure thatA
point is the real culprit of that error.2) Below is the way how I solved my problem.
I imported data as
avro
data
file in other destination usingSQOOP
. Then I created the dataframe from avro using below ways:Now I created
key-value pair
and saved it assequence
fileNow when I try to read
few
characters of the above written file it gives metwo classes
which I need while reading the file as below:Now when I try to print data it displays data as below:
Last but not the least, Thank you everyone for your much appreciated efforts. Cheers!!