Spark SQL sql("<some aggregate query>").first().getDouble(0) give me inconsistent results

1.4k Views Asked by At

I have the below query which is supposed to find an average of the column values and return me the result which is a single number.

val avgVal = hiveContext.sql("select round(avg(amount), 4) from users.payment where dt between '2018-05-09' and '2018-05-09'").first().getDouble(0)

I'm facing inconsistent behavior at this statement. This often fails with below error however it gives non-NULL results when executed through Hive."

18/05/10 11:01:12 ERROR ApplicationMaster: User class threw exception: java.lang.NullPointerException: Value at index 0 in null
java.lang.NullPointerException: Value at index 0 in null
    at org.apache.spark.sql.Row$class.getAnyValAs(Row.scala:475)
    at org.apache.spark.sql.Row$class.getDouble(Row.scala:243)
    at org.apache.spark.sql.catalyst.expressions.GenericRow.getDouble(rows.scala:192)

The reason why I use HiveContext instead of SQLContext is that the later doesn't support some of the aggregation functions which I use extensively in my code.

Could you please help me understand why this problem occurs and how to solve?

2

There are 2 best solutions below

3
Bejond On

You need to divide query and get into two parts:

var result = hiveContext.sql("select round(avg(amount), 4) from users.payment where dt between '2018-05-09' and '2018-05-09'");
var first = result.first();
if (first != null && !first.isNullAt(0)) {
var avgVal = first.getDouble(0);
}

This would avoid NPE. This would also be needed in List and array.

For insert or update query, you even need to surround with try...catch block to catch runtime exception.

2
J-Alex On

Let's analyze the case and possible reasons when this exception can be thrown.

Row row = hiveContext.sql("select info, name, desc, id from users.payment where dt between '2018-05-09' and '2018-05-09'").first();

If the value of the row above returned something like:

[null, Kevin, cash, 300]

Trying to get getDouble(0) will lead to java.lang.NullPointerException: Value at index 0 in null

You can try the following approach:

Row row = hiveContext.sql("select round(avg(amount), 4) from users.payment where dt between '2018-05-09' and '2018-05-09'").first();

if (!row.isNullAt(0))
   double d = row.getDouble(0);
else
   logger.error("Value at index zero is null");

If you'll check the sources, the library class doing opposite:

private static Object getAnyValAs(Row $this, int i) {
    if($this.isNullAt(i)) {
        throw new NullPointerException((new StringContext(scala.Predef..MODULE$.wrapRefArray((Object[])(new String[]{"Value at index ", " is null"})))).s(scala.Predef..MODULE$.genericWrapArray(new Object[]{BoxesRunTime.boxToInteger(i)})));
    } else {
        return $this.getAs(i);
    }
}