Get ADLS directory and sub-directory paths till it gets the file format in a table using databricks

553 Views Asked by At

I have a ADLS which has several folders which inturn has sub-folders and so on till the point we have either CSV or Parquet data in it.

How to get the Folder names and subfolders in this folder with the file format in databricks? Also there are some junk folders which I don't want to consider at all like Folder123, Folder_dummy etc.

Suggestions please..

1

There are 1 best solutions below

0
On

You can add wildcard character in places where you don't know all possible folder names. For eg, if you want to query a parquet file from a nested path, you can use this,

select * from parquet.`{Your ADLS folder}/*/{SomeSpecificFolder}/{your parquet}.parquet`

You can use your wildcard to any extend as long as you know which parquet you are querying and give that name alone use Databricks/Spark SQL