how to use the "display" function in a scala 2.11 with Spark 2.0 notebook in dsx

9.1k Views Asked by At

In dsx is there a way to use "display" in a scala 2.11 with Spark 2.0 notebook (I know it can be done in a python notebook with pixiedust). Eg:

display(spark.sql("SELECT COUNT(zip), SUM(pop), city FROM hive_zips_table 
                   WHERE state = 'CA' GROUP BY city ORDER BY SUM(pop) DESC"))

But I want to do the same in a scala notebook. Currently I am just doing a show command below that just give data in a tabular format with no graphics etc.

spark.sql("SELECT COUNT(zip), SUM(pop), city FROM hive_zips_table 
          WHERE state = 'CA' GROUP BY city ORDER BY SUM(pop) DESC").show()
2

There are 2 best solutions below

5
On

Note:

  • Pixiedust currently works with Spark 1.6 and Python 2.7.
  • Pixiedust currently supports Spark DataFrames, Spark GraphFrames and Pandas

Reference:- https://github.com/ibm-cds-labs/pixiedust/wiki

But if you can use Spark 1.6 ,here is a quick way around to use that fancy display function:-

You can go the other way around, Since Pixidust let you use scala and python in one python notebook with %%scala line magic.

https://github.com/ibm-cds-labs/pixiedust/wiki/Using-Scala-language-within-a-Python-Notebook

Step 1. Create a notebook with python 2 and spark 1.6 Install pixidust and import it

!pip install --user --no-deps --upgrade pixiedust
import pixiedust

Define your variables or your dataframe in Scala under

%%scala
import org.apache.spark.sql._

print(sc.version)

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val __df = sqlContext.read.json("people.json")

__df.show()

or

do whatever to create your dataframe

val __df = dataframe1.sql("SELECT COUNT(zip), SUM(pop), city FROM hive_zips_table 
      WHERE state = 'CA' GROUP BY city ORDER BY SUM(pop) DESC").show() 

Step 2: In separate cell run following to access df variable in your python shell.

display(__df)

Reference to my sample Notebook:-

Thanks, Charles.

0
On

You can get similar result in Zeppelin

z.show(dataframe)