I have a ADLS which has several folders which inturn has sub-folders and so on till the point we have either CSV or Parquet data in it.
How to get the Folder names and subfolders in this folder with the file format in databricks? Also there are some junk folders which I don't want to consider at all like Folder123, Folder_dummy etc.
Suggestions please..
You can add wildcard character in places where you don't know all possible folder names. For eg, if you want to query a parquet file from a nested path, you can use this,
You can use your wildcard to any extend as long as you know which parquet you are querying and give that name alone use Databricks/Spark SQL