Databricks not displaying correct output

51 Views Asked by At

In Azure databricks I am applying a filter to show the data where Region column has value 'weu'.

display(df.where(col("Region") == 'weu'))

But the output dataframe I am getting has Region values as eus & sea. Can anyone help why is this happening?

1

There are 1 best solutions below

0
DileeprajnarayanThumula On

I have used some sample data like below:

Region  Value
weu      1
eus      2
sea      3

The reason you are seeing region values as eus & sea Because containing values with leading and trailing white spaces.

I have tried the below approach:

Filter the DataFrame based on the Region column, applying the trim() function to remove any leading or trailing white spaces before filtering.

from pyspark.sql.functions import col, trim
filtered_df = dilip_df.where(trim(col("Region")) ==  "weu")
display(filtered_df)

enter image description here

Also you can check:

from pyspark.sql.functions import lower
dilip_df.select(lower(col("Region"))).distinct().show()

From the dilip_df DataFrame, select the Region column. Convert all the values in the column to lowercase using the lower() function, and then return only the distinct values in the column using the distinct() function.