MS Fabric and geospatial data

288 Views Asked by At

Can Microsoft Fabric's Data Lakehouse handle geospatial data such as shp, geojson, and other formats?

I've tried getting the geopandas geodataframe into pyspark/parquet spark_df.write.parquet(destination_path) but I can't seem to get the data in.

So, is it possible, and are there any good tutorials out there specifically for geospatial data?

1

There are 1 best solutions below

0
JayashankarGS On

First read the geospatial data from the shp or geojson file and convert it to spark dataframe. To do this use below code block.

import geopandas as gpd
gdf = gpd.read_file('/dbfs/json/points.geojson')
gdf['geom'] = [geom.wkt for geom in gdf['geometry']]
s_df = spark.createDataFrame(gdf.drop("geometry", axis=1,inplace=False))
s_df.printSchema()
s_df.show()

enter image description here

Next, writing data to onelake. Before writing make sure you enable Azure Data Lake Storage credential passthrough under Advance option.

enter image description here

Now, go to your one lake house > Files

enter image description here

create new folder. I've created a folder named geodata.

enter image description here

open properties of that folder and copy ABFS path.

enter image description here

Then run below code to write it.

oneLake = "<your_abfs_link>/Files/geodata"
s_df.write.format("parquet").mode("overwrite").save(oneLake)

enter image description here

It is written to onelake.

enter image description here

Again using the same link you can read it.

enter image description here

To convert geom column back to type geometry you can use below code.

from shapely.wkt import loads

odf = spark.read.parquet(oneLake)
p_df = odf.toPandas()
p_df['geometry'] = [loads(i) for i in p_df['geom']]
p_df = p_df.drop("geom",axis=1)
print(p_df.dtypes)

p_df

enter image description here