Some how in Spark2.0, I can use Dataframe.map(r => r.getAs[String]("field"))
without problems
But DataSet.map(r => r.getAs[String]("field"))
gives error that r
doesn't have the "getAs" method.
What's the difference between r
in DataSet
and r
in DataFrame
and why r.getAs
only works with DataFrame
?
After doing some research in StackOverflow, I found a helpful answer here
Encoder error while trying to map dataframe row to updated row
Hope it's helpful
Dataset
has a type parameter:class Dataset[T]
.T
is the type of each record in the Dataset. ThatT
might be anything (well, anything for which you can provide an implicitEncoder[T]
, but that's besides the point).A
map
operation on aDataset
applies the provided function to each record, so ther
in the map operations you showed will have the typeT
.Lastly,
DataFrame
is actually just an alias forDataset[Row]
, which means each record has the typeRow
. AndRow
has a method namedgetAs
that takes a type parameter and a String argument, hence you can callgetAs[String]("field")
on anyRow
. For anyT
that doesn't have this method - this will fail to compile.