Difference between Encoders.kryo[T] and Encoders.product[T] in Spark SQL

355 Views Asked by At

In Spark, I can do both:

case class MyObj(val i: Int, val j: String)

import org.apache.spark.sql.{Encoder, Encoders}
implicit val myEncoder: Encoder[MyObj] = Encoders.product[MyObj] 

val ds = spark.createDataset(Seq(new MyObj(1, "a"),new MyObj(2, "b"),new MyObj(3, "c")))

ds.show

//+---+---+
//|  i|  j|
//+---+---+
//|  1|  a|
//|  2|  b|
//|  3|  c|
//+---+---+

and,

case class MyObj(val i: Int, val j: String)

import org.apache.spark.sql.{Encoder, Encoders}
implicit val myEncoder: Encoder[MyObj] = Encoders.kryo[MyObj] 

val ds = spark.createDataset(Seq(new MyObj(1, "a"),new MyObj(2, "b"),new MyObj(3, "c")))

ds.show

//+--------------------+
//|               value|
//+--------------------+
//|[01 00 24 6C 69 6...|
//|[01 00 24 6C 69 6...|
//|[01 00 24 6C 69 6...|
//+--------------------+
  • I assume they are using different serializers?
  • Encoders.product is not showing binary format (like what Encoders.kryo does) when doing .show, does it mean something? Shouldn't it do the job for serializing objects into binary format?
  • What's the difference between the two serializers regarding performance?
0

There are 0 best solutions below