I have a large working SQL query that I parameterized and registered as a SQL UDF in Databricks Workspace. Parts of the large query were repeated elsewhere, so, I wanted to separate it out into two SQL UDF's: one, coverage_cohort(), would be called by other functions (condition_a(), condition_b()), etc. In a previous post I got an answer to issues I was having with passing parameters between SQL UDF's (the issue in there was the use of temporary functions). I am now registering the functions permanently.
So, now I have two functions, coverage_cohort() and tdap(). tdap() has four parameters (asofdate, age_lower, age_upper, doses), three of which are passed through to coverage_cohort() (in which they are named as_of_date, age_lower, age_upper). The coverage_cohort() function registers fine and executes when called on its own. The tdap() function throws the following error when I try to register it:
AnalysisException: [UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.ACCESSING_OUTER_QUERY_COLUMN_IS_NOT_ALLOWED]
Unsupported subquery expression: Accessing outer query column is not allowed in this location
Filter ((isnull(DELETION_DATE#8593) AND (cast(VACC_DATE#8574 as date) <= cast(outer(as_of_date#8401) as date))) AND (isnull(HISTORICAL#8577) OR ((HISTORICAL#8577 = Y) AND (cast(REGISTRY_ENTRY_STAMP#8728 as date) <= cast(VACC_DATE#8574 as date)))))
The line reference in the error points to a WHERE statement in the first CTE of tdap(), but the Filter described in the error corresponds to a WHERE statement in coverage_cohort().
tdap() WHERE statement the error is pointing to:
WHERE vm.DELETION_DATE IS NULL
AND vv.FAMILY_CODE = 24
AND vv.INVALID_REASON_CODE IS NULL
AND vv.DERIVED_DOSE is NULL
AND date(vv.VACC_DATE) BETWEEN date(pm.PAT_BIRTH_DATE) AND date(tdap.asofdate)
AND datediff(YEAR, date(pm.PAT_BIRTH_DATE), date(vv.VACC_DATE)) > 9
coverage_cohort() WHERE statement that is listed in the error:
WHERE vm.DELETION_DATE IS NULL
AND date(vm.VACC_DATE) <= date(coverage_cohort.as_of_date)
AND (
(vm.HISTORICAL IS NULL)
OR
(vm.HISTORICAL = "Y" AND date(pm.REGISTRY_ENTRY_STAMP) <= date(vm.VACC_DATE))
)
In case it matters, coverage_cohort() is called from tdap() once in a FROM statement along with some LEFT JOINS in a later CTE of the query:
SELECT
...
FROM func_schema.coverage_cohort(as_of_date => tdap.asofdate, age_lower => tdap.age_lower, age_upper => tdap.age_upper) wd
LEFT JOIN tdap_cte_a ...
LEFT JOIN tdap_cte_b ...
LEFT JOIN my_db.table ...
I apologize for the lack of a reprex, but I couldn't isolate the problem in a smaller example. I also don't want to share the whole query. As I said, the larger UDF worked before I split it into two, so, it seems like it must be an issue with how I'm trying to refer to parameters between the two queries, but none of the iterations I've tried have changed the error. Thank you!