Show object dependant on external location in unity catalog

83 Views Asked by At

while working with Databricks i had the need to remove an external location.

I runt SHOW EXTERNAL LOCATIONS with this results:

image

After this I runt DROP EXTERNAL LOCATION gold_prd

Recivieng the error:

Storage Location has dependent catalogs, schemas, managed or external tables/volumes; You may use force option to delete it but the managed storage data under this location cannot be purged by Unity Catalog anymore.

I have no catalog/schemas or other object under this location, but before forcing the operation is there a way to list all objects dependend from this location?

2

There are 2 best solutions below

0
Pratik Lad On

AFAIK there is no consolidated way to get the list all objects dependent on external location. You need to check for the individual tables or schema using the following command.

--For tables
DESCRIBE TABLE tabalename;
--For Schema
DESCRIBE  SCHEMA  employees;

You can check this document and Verify whether you have any tables selected from this location that are currently active in other catalogs. Your metadata is typically saved in the metastore; only data from external tables is kept in an external place.

0
Gam On

This is the ugly, slow and dirty solution I wrote to check for dependant objects, I think it could be extended to check also for volumes.

%python
import pandas as pd

check_locations = []

query = "SHOW EXTERNAL LOCATIONS"
result = spark.sql(query)
for row in result.collect():
  check_locations.append(row["url"])

#######################
#      catalogs       #
#######################

query = "SHOW CATALOGS"
result = spark.sql(query)
for row in result.collect():
  catalog_name = row["catalog"]
  query = f"DESCRIBE CATALOG EXTENDED {catalog_name}"
  result = spark.sql(query)
  df = pd.DataFrame(result.collect()).transpose()
  if not df.empty:
    df.columns = df.iloc[0]
    df = df[1:]
    if 'Storage Location' in df.columns:
      location = df.iloc[0]["Storage Location"]
      for check_location in check_locations:
        if check_location in location:
          print(catalog_name)
          print(f"\t{location}")

#######################
#       schemas       #
#######################

  query = f"SHOW SCHEMAS IN {catalog_name}"
  result = spark.sql(query)
  for row in result.collect():
    schema_name = row["databaseName"]
    query = f"DESCRIBE SCHEMA {catalog_name}.{schema_name}"
    result = spark.sql(query)
    df = pd.DataFrame(result.collect()).transpose()
    if not df.empty:
      df.columns = df.iloc[0]
      df = df[1:]
      if 'Location' in df.columns:
        location = df.iloc[0]["Location"]
        for check_location in check_locations:
          if check_location in location:
            print(catalog_name)
            print(f'\t{schema_name}')
            print(f"\t\t{location}")
        
#######################
#       tables        #
#######################

    if schema_name != 'information_schema':      
      query = f"SHOW TABLES IN {catalog_name}.{schema_name}"
      result = spark.sql(query)
      for row in result.collect():
        table_name = row["tableName"]
        query = f"DESCRIBE detail  {catalog_name}.{schema_name}.{table_name}"
        result = spark.sql(query)
        for row in spark.sql(query).collect():
          print
          location = row["location"]
          for check_location in check_locations:
            if check_location in location:
              print(catalog_name)
              print(f'\t{schema_name}')
              print(f'\t\t{table_name}')
              print(f"\t\t\t{location}")