Register temp table in dataframe not working

19.2k Views Asked by At

Below is my script to use sql in dataframe with python:

pyspark --packages com.databricks:spark-csv_2.10:1.4.0
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
df = sqlContext.read.format('com.databricks.spark.csv').options(header='true', inferschema='true').load('file:///root/Downloads/data/flight201601short.csv')

df.show(5) shows result below:

enter image description here

then I register the dataframe to a temp table:

df.registerTempTable("flight201601")

and tried to run some sql query like below:

sqlContext.sql("select distinct CARRIER from flight201601")

It doesn't produce expected result, instead:

enter image description here

I also tried:

sqlContext.sql("select * from flight201601")

and it gives me:

enter image description here

So it seems the registerTempTable method only create the table schema and the table is NOT populated, what am I missing?

2

There are 2 best solutions below

0
On BEST ANSWER

You will have to call show() method on Dataframe returned by sqlContext.sql to get the result of query. Check the spark document which says

The sql function on a SQLContext enables applications to run SQL queries programmatically and returns the result as a DataFrame.

sqlDF = sqlContext.sql("select * from flight201601")
sqlDF.show()
0
On

@PasLeChoix

when you are executing the below statement

df = sqlContext.sql("select * from flight201601")
df.show()

Spark will return the DataFrame. So you need to store your result into DataFrame and use the show() command to Display your result on to console as mentioned by @abaghel.

By Default, whenever Spark returns the DataFrame it will only show the schema on the console as you are getting in your case.