In Spark, I can do both:
case class MyObj(val i: Int, val j: String)
import org.apache.spark.sql.{Encoder, Encoders}
implicit val myEncoder: Encoder[MyObj] = Encoders.product[MyObj]
val ds = spark.createDataset(Seq(new MyObj(1, "a"),new MyObj(2, "b"),new MyObj(3, "c")))
ds.show
//+---+---+
//| i| j|
//+---+---+
//| 1| a|
//| 2| b|
//| 3| c|
//+---+---+
and,
case class MyObj(val i: Int, val j: String)
import org.apache.spark.sql.{Encoder, Encoders}
implicit val myEncoder: Encoder[MyObj] = Encoders.kryo[MyObj]
val ds = spark.createDataset(Seq(new MyObj(1, "a"),new MyObj(2, "b"),new MyObj(3, "c")))
ds.show
//+--------------------+
//| value|
//+--------------------+
//|[01 00 24 6C 69 6...|
//|[01 00 24 6C 69 6...|
//|[01 00 24 6C 69 6...|
//+--------------------+
- I assume they are using different serializers?
Encoders.product
is not showing binary format (like whatEncoders.kryo
does) when doing.show
, does it mean something? Shouldn't it do the job for serializing objects into binary format?- What's the difference between the two serializers regarding performance?