My team is writing a streaming application to load files into our data lake. Our environment is Azure, we are using spark and databricks for the application. It is a streaming application to read mostly csv files with a set schema. We are using Autoloader to facilitate reading the files from their directories. It is important that we are able to note when a schema changes, hence the decision to use autoloader. During testing of the application, when a file schema has a new column added or a column is renamed it recognizes the change. However, when a column is dropped from a file the schema mismatch is not noted and the missing column is set to null in the stream. This might be an edge case, but there is a slim probability we could see this. Is there a way to get autoloader to identify missing schema columns in a file or a way within a readStream to recognize when a file schema devolves?
We have tried using cloudFiles inferschema with different settings and adding the schema to the readStream.