How to validate if parts of a string are in a list of accepted values?

62 Views Asked by At

I have to validate a string that is separated by dashes into 3 parts like so:

'substring1-substring2-substring3'

Substrings 2 and 3 need to be validated to be within a list of accepted values for substring 2 and another list of accepted values for substring 3.

The issue I'm having is that substring 2 and substring 3 have values that contain dashes, so breaking apart the main string to access and validate substring 2 and 3 has become pretty difficult, and I'm not sure how else to tackle this.

I'm doing this in Databricks SQL.

So far, I've used the LIKE operator to select '% - % - %' which just validates that there are two dashes.

I've tried splitting the string by the "-" but because there are dashes in both substrings 2 and 3, I can't figure out a way to select for either of them.

1

There are 1 best solutions below

0
On

This is how you can solve it and not a fully working solution, but I have no doubt it will work. I am assuming you can loop through the list of acceptable values for substring 2 and through the list of acceptable values for string 3 and concatenate them by | Alternatively, you can have two acceptable_two and acceptable_three tables that you can aggregate in a string separated by | By the end of this concatenation, you will have two strings looking like below:

alternative_two = acceptable_two_1|acceptable_two_2|...|acceptable_two_n
alternative_three = acceptable_tree_1|acceptable_three_2|...|acceptable_three_m

Then you can put together a regex like this:

regex = .*-[alternative_two]-[aternative_three]

Once you generated the regex validating will consist in only calling (or whatever the syntax might be):

my_string.matches(regex)

Note I did not give you a working solution as I don't have a databricks-sql running environment, but it should be no problem for you to implement.