I have a SparkSQL DataFrame.
Some entries in this data are empty but they don't behave like NULL or NA. How could I remove them? Any ideas?
In R I can easily remove them but in sparkR it say that there is a problem with the S4 system/methods.
Thanks.
SparkR Column provides a long list of useful methods including
isNullandisNotNull:Please keep in mind that there is no distinction between
NAandNaNin SparkR.If you prefer operations on a whole data frame there is a set of NA functions including
fillnaanddropna:Both can be adjusted to consider only some subset of columns (
cols), anddropnahas some additional useful parameters. For example you can specify minimal number of not null columns: