SQL Parser Visitor + Metabase + Presto

681 Views Asked by At

I'm facing what seems to be a quite easy problem, but I'm not able to put my head around the problem to find a suitable solution.

Problem:
I need to append the schema into my SQL statement, in a "weird"(with schema in double quotes) way.

FROM "SCHEMA".tableB tableB
LEFT JOIN "SCHEMA".tableC tableC

Context
Basically, we are hosting and exposing a Metabase tool that will connect and perform query on our Hive database using Presto SQL.

Metabase allow the customer to write SQL statements and some customers, they just don't type the schema on statements. Today we are throwing and error for those queries, but I could easily retrieve the schema value from the Authorization header, since in our multi-tenant product the schema is the tenant id where this user is logged, and with that information in hands, I could append to the customer SQL statement and avoid the error.

Imagine that the customer typed the follow statement:

SELECT tableA.*
     , (tableA.valorfaturado + tableA.valorcortado) valorpedido       
  FROM (SELECT from_unixtime(tableB.datacorte / 1000) datacorte
             , COALESCE((tableB.quantidadecortada * tableC.preco), 0) valorcortado
             , COALESCE((tableB.quantidade * tableC.preco), 0) valorfaturado
             , tableB.quantidadecortada
          FROM tableB tableB
          LEFT JOIN tableC tableC
            ON tableC.numeropedido = tableB.numeropedido
           AND tableC.codigoproduto = tableB.codigoproduto
           AND tableC.codigofilial = tableB.codigofilial
          LEFT JOIN tableD tableD
            ON tableD.numero = tableB.numeropedido
         WHERE (CASE
                  WHEN COALESCE(tableB.codigofilial, '') = '' THEN
                    tableD.codigofilial
                  ELSE
                    tableB.codigofilial
                END) = '10'
           AND from_unixtime(tableB.datacorte / 1000) BETWEEN from_iso8601_timestamp('2020-07-01T03:00:00.000Z') AND from_iso8601_timestamp('2020-08-01T02:59:59.999Z')) tableA
 ORDER BY datacorte

I should convert this into (adding the "SCHEMA"):

SELECT tableA.*
     , (tableA.valorfaturado + tableA.valorcortado) valorpedido       
  FROM (SELECT from_unixtime(tableB.datacorte / 1000) datacorte
             , COALESCE((tableB.quantidadecortada * tableC.preco), 0) valorcortado
             , COALESCE((tableB.quantidade * tableC.preco), 0) valorfaturado
             , tableB.quantidadecortada
          FROM "SCHEMA".tableB tableB
          LEFT JOIN "SCHEMA".tableC tableC
            ON tableC.numeropedido = tableB.numeropedido
           AND tableC.codigoproduto = tableB.codigoproduto
           AND tableC.codigofilial = tableB.codigofilial
          LEFT JOIN "SCHEMA".tableD tableD
            ON tableD.numero = tableB.numeropedido
         WHERE (CASE
                  WHEN COALESCE(tableB.codigofilial, '') = '' THEN
                    tableD.codigofilial
                  ELSE
                    tableB.codigofilial
                END) = '10'
           AND from_unixtime(tableB.datacorte / 1000) BETWEEN from_iso8601_timestamp('2020-07-01T03:00:00.000Z') AND from_iso8601_timestamp('2020-08-01T02:59:59.999Z')) tableA
 ORDER BY datacorte

Still trying to find a solution that uses only presto-parser and Visitor + Instrumentation solution. Also, I know about JSQLParser and I tried, but I alway come back to try to find a "plain" solution scared that JSQLParser will not be able to support all the Presto/Hive queries, that are a little bit different than standard SQL;

I create a little project on GitHub with test case to validate..

https://github.com/genyherrera/prestosqlerror

But for those that don't want to clone a repository, here are the classes and dependencies:

import java.util.Optional;

import com.facebook.presto.sql.SqlFormatter;
import com.facebook.presto.sql.parser.ParsingOptions;
import com.facebook.presto.sql.parser.SqlParser;

public class SchemaAwareQueryAdapter {
    // Inspired from
    // https://github.com/prestodb/presto/tree/master/presto-parser/src/test/java/com/facebook/presto/sql/parser

    private static final SqlParser SQL_PARSER = new SqlParser();

    public String rewriteSql(String sqlStatement, String schemaId) {
        com.facebook.presto.sql.tree.Statement statement = SQL_PARSER.createStatement(sqlStatement, ParsingOptions.builder().build());
        SchemaAwareQueryVisitor visitor = new SchemaAwareQueryVisitor(schemaId);
        statement.accept(visitor, null);
        return SqlFormatter.formatSql(statement, Optional.empty());
    }
}
public class SchemaAwareQueryVisitor extends DefaultTraversalVisitor<Void, Void> {
    private String schemaId;

    public SchemaAwareQueryVisitor(String schemaId) {
        super();
        this.schemaId = schemaId;
    }

    /**
     * The customer can type:
     * [table name]
     * [schema].[table name]
     * [catalog].[schema].[table name]
     */
    @Override
    protected Void visitTable(Table node, Void context) {
        List<String> parts = node.getName().getParts();
        // [table name] -> is the only one we need to modify, so let's check by parts.size() ==1
        if (parts.size() == 1) {
            try {
                Field privateStringField = Table.class.getDeclaredField("name");
                privateStringField.setAccessible(true);
                QualifiedName qualifiedName = QualifiedName.of("\""+schemaId+"\"",node.getName().getParts().get(0));
                privateStringField.set(node, qualifiedName);
            } catch (NoSuchFieldException | SecurityException | IllegalArgumentException | IllegalAccessException e) {
                throw new SecurityException("Unable to execute query");
            }
        }
        return null;
        
    }
}
import static org.testng.Assert.assertEquals;

import org.gherrera.prestosqlparser.SchemaAwareQueryAdapter;
import org.testng.annotations.Test;

public class SchemaAwareTest {
        private static final String schemaId = "SCHEMA";
        private SchemaAwareQueryAdapter adapter = new SchemaAwareQueryAdapter();

        @Test
        public void testAppendSchemaA() {
            String sql = "select * from tableA";
            String bound = adapter.rewriteSql(sql, schemaId);
            assertEqualsFormattingStripped(bound,
                         "select * from \"SCHEMA\".tableA");
        }
        
        private void assertEqualsFormattingStripped(String sql1, String sql2) {
            
            assertEquals(sql1.replace("\n", " ").toLowerCase().replace("\r", " ").replaceAll(" +", " ").trim(),
                         sql2.replace("\n", " ").toLowerCase().replace("\r", " ").replaceAll(" +", " ").trim());
            
        }
}
<dependencies>
        <dependency>
            <groupId>com.facebook.presto</groupId>
            <artifactId>presto-parser</artifactId>
            <version>0.229</version>
        </dependency>
        <dependency>
            <groupId>org.testng</groupId>
            <artifactId>testng</artifactId>
            <version>6.10</version>
            <scope>test</scope>
        </dependency>
    </dependencies>

PS: I was able to add the schema without the doubles quotes, but them I got into identifiers must not start with a digit; surround the identifier with double quotes error. Basically this error comes from SqlParser$PostProcessor.exitDigitIdentifier(...) method..

Thanks

1

There are 1 best solutions below

0
On

I was able to find a solution for my case, either way will share on Presto Slack my finding to see if that is expected behavior.

So, if you want to append with double quote your schema, you will need to create your own Vistor class and you'll need to override the method visitTable and when you Qualify the name of your table with schema, (here's the tick), pass the schema as UPPERCASE, so it will not match the regex pattern on class SqlFormatter on method formatName and it will add the double-quote..

public class SchemaAwareQueryVisitor extends DefaultTraversalVisitor<Void, Void> {
  private String schemaId;

  public SchemaAwareQueryVisitor(String schemaId) {
    super();
    this.schemaId = schemaId;
  }

  @Override
  protected Void visitTable(Table node, Void context) {
      try {
        Field privateStringField = Table.class.getDeclaredField("name");
        privateStringField.setAccessible(true);
        QualifiedName qualifiedName = QualifiedName.of(schemaId, node.getName().getParts().get(0));
        privateStringField.set(node, qualifiedName);
      } catch (NoSuchFieldException
          | SecurityException
          | IllegalArgumentException
          | IllegalAccessException e) {
        throw new SecurityException("Unable to execute query");
      }
    return null;
  }
}