Saved delta file reads as an df - is it still part of delta lake?

234 Views Asked by BigMadAndy At 29 July 2025 at 06:58

I have problems understanding the concept of delta lake. Example:

I read a parquet file:

taxi_df = (spark.read.format("parquet").option("header", "true").load("dbfs:/mnt/randomcontainer/taxirides.parquet"))
Then I save it using asTable:

taxi_df.write.format("delta").mode("overwrite").saveAsTable("taxi_managed_table")
I read the just stored managed table:

taxi_read_from_managed_table = (spark.read.format("delta").option("header", "true").load("dbfs:/user/hive/warehouse/taxi_managed_table/"))
... and when I check the type it shows "pyspark.sql.dataframe.DataFrame", not deltaTable:

type(taxi_read_from_managed_table) # returns pyspark.sql.dataframe.DataFrame
Only after I transform it explicitly using the following command, I receive the type DeltaTable

taxi_delta_table = DeltaTable.convertToDelta(spark,"parquet.dbfs:/user/hive/warehouse/taxismallmanagedtable/")

type(taxi_delta_table) #returns delta.tables.DeltaTable

/////////////////////////////

Does that mean that the table in stage 4. is not a delta table and won’t provide the automatic optimizations provided by delta lake?

How do you establish if something is part of the delta lake or not?

I understand that delta live tables only work with delta.tables.DeltaTables, is that correct?

Original Q&A

There are 1 best solutions below

Alex Ott On 19 February 2023 at 10:54

When you use spark.read...load() - it returns the Spark's DataFrame object that you can use to process the data. Under the hood this DataFrame use the Delta Lake table. DataFrame is abstracting the data source so you can work with different sources and apply the same operations.

On other hand, DeltaTable is a specific object that allows to apply only Delta-specific operations. You even don't need to perform convertToDelta to get it - just use DeltaTable.forPath or DeltaTable.forName functions to obtain its instance.

P.S. if you saved data with .saveAsTable(my_name), then you don't need to use .load, just use spark.read.table(my_name).

Saved delta file reads as an df - is it still part of delta lake?

There are 1 best solutions below

Related Questions in PYSPARK

Related Questions in HIVE

Related Questions in DATABRICKS

Related Questions in DELTA-LAKE

Related Questions in DATA-LAKEHOUSE

Trending Questions

Popular # Hahtags

Popular Questions