ConstraintSuggestionRunner not taking up columns enclosed with backticks

179 Views Asked by At

I am currently importing the dataset from an excel sheet which has a column name with a dot character like this "abc.xyz".

I went through a couple of stackOverflow questions and it says that we can replace it with the column names with backtick like this: "'abc.xyz'". So, I renamed all the column names which have a dot in it with the same name but enclosed in backticks like this:

df.columns.foreach(item => {
      if(item.contains("."))
        {
          df.withColumnRenamed(item, s"`$item`")
        }
    })

Now when I pass this dataframe inside the ConstraintSuggestionRunner class like this:

val suggestionResult = ConstraintSuggestionRunner()
      .onData(df)
      .addConstraintRules(Rules.DEFAULT)
      .setKLLParameters(KLLParameters(sketchSize = 2048, shrinkingFactor = 0.64, numberOfBuckets = 10))
      .run()

I am getting errors like :

ERROR Main: org.apache.spark.sql.AnalysisException: cannot resolve '`abc.xyz`' given input columns:

How can I resolve this error?

1

There are 1 best solutions below

0
On

The escaping must be handled in Deequ but the issue is always open. What you did here is adding the backticks as part of the column names, not escaping them.

You can try to replace the dots by another caracheter like underscore _ then pass the dataframe with the renamed columns to the ConstraintSuggestionRunner:

val df1 = df.toDF(df.columns.map(_.replaceAll("[.]+", "_")):_*)

val suggestionResult = ConstraintSuggestionRunner()
      .onData(df1)
      .addConstraintRules(Rules.DEFAULT)
      .setKLLParameters(KLLParameters(sketchSize = 2048, shrinkingFactor = 0.64, numberOfBuckets = 10))
      .run()