In the Databricks Delta Live Table documentations, I saw some stream()
and ˙readStream()˙ calls in SQL code examples. But I can't figure out what are these, what's the difference. Are those functions documented anywhere?
While spark.readStream
exists in Python/Scala/Java/R, I couldn't find any SQL examples. So how to use it, what are the parameters? Sometimes stream()
also works in SQL, is it just an alias to readStream()
which is an alias to spark.readStream()
in Python?
Actually, it is an alias for reading delta stream table.
Below, the code creates a delta live table from cloud files.
To read data from
streaming_bronze
, you write the following code.If you see here, to read the
streaming_bronze
table,dlt.read_stream("streaming_bronze")
is used.The same thing, if you want in the SQL language, the
STREAM
function is used.For more information, follow this documentation.
Next, the alias for
spark.readStream()
, which actually creates delta streaming table using the Python decorator@dlt.table
, and in SQL is below.This creates a delta streaming table. Then, to read this data again, you use
STREAM(<table_name>)
.Basically, whenever you want to add data incrementally to streaming table you need to use
STREAM
function.