I'm facing what seems to be a quite easy problem, but I'm not able to put my head around the problem to find a suitable solution.
Problem:
I need to append the schema into my SQL statement, in a "weird"(with schema in double quotes) way.
FROM "SCHEMA".tableB tableB
LEFT JOIN "SCHEMA".tableC tableC
Context
Basically, we are hosting and exposing a Metabase tool that will connect and perform query on our Hive database using Presto SQL.
Metabase allow the customer to write SQL statements and some customers, they just don't type the schema on statements. Today we are throwing and error for those queries, but I could easily retrieve the schema value from the Authorization header, since in our multi-tenant product the schema is the tenant id where this user is logged, and with that information in hands, I could append to the customer SQL statement and avoid the error.
Imagine that the customer typed the follow statement:
SELECT tableA.*
, (tableA.valorfaturado + tableA.valorcortado) valorpedido
FROM (SELECT from_unixtime(tableB.datacorte / 1000) datacorte
, COALESCE((tableB.quantidadecortada * tableC.preco), 0) valorcortado
, COALESCE((tableB.quantidade * tableC.preco), 0) valorfaturado
, tableB.quantidadecortada
FROM tableB tableB
LEFT JOIN tableC tableC
ON tableC.numeropedido = tableB.numeropedido
AND tableC.codigoproduto = tableB.codigoproduto
AND tableC.codigofilial = tableB.codigofilial
LEFT JOIN tableD tableD
ON tableD.numero = tableB.numeropedido
WHERE (CASE
WHEN COALESCE(tableB.codigofilial, '') = '' THEN
tableD.codigofilial
ELSE
tableB.codigofilial
END) = '10'
AND from_unixtime(tableB.datacorte / 1000) BETWEEN from_iso8601_timestamp('2020-07-01T03:00:00.000Z') AND from_iso8601_timestamp('2020-08-01T02:59:59.999Z')) tableA
ORDER BY datacorte
I should convert this into (adding the "SCHEMA"):
SELECT tableA.*
, (tableA.valorfaturado + tableA.valorcortado) valorpedido
FROM (SELECT from_unixtime(tableB.datacorte / 1000) datacorte
, COALESCE((tableB.quantidadecortada * tableC.preco), 0) valorcortado
, COALESCE((tableB.quantidade * tableC.preco), 0) valorfaturado
, tableB.quantidadecortada
FROM "SCHEMA".tableB tableB
LEFT JOIN "SCHEMA".tableC tableC
ON tableC.numeropedido = tableB.numeropedido
AND tableC.codigoproduto = tableB.codigoproduto
AND tableC.codigofilial = tableB.codigofilial
LEFT JOIN "SCHEMA".tableD tableD
ON tableD.numero = tableB.numeropedido
WHERE (CASE
WHEN COALESCE(tableB.codigofilial, '') = '' THEN
tableD.codigofilial
ELSE
tableB.codigofilial
END) = '10'
AND from_unixtime(tableB.datacorte / 1000) BETWEEN from_iso8601_timestamp('2020-07-01T03:00:00.000Z') AND from_iso8601_timestamp('2020-08-01T02:59:59.999Z')) tableA
ORDER BY datacorte
Still trying to find a solution that uses only presto-parser
and Visitor + Instrumentation solution.
Also, I know about JSQLParser and I tried, but I alway come back to try to find a "plain" solution scared that JSQLParser will not be able to support all the Presto/Hive queries, that are a little bit different than standard SQL;
I create a little project on GitHub with test case to validate..
https://github.com/genyherrera/prestosqlerror
But for those that don't want to clone a repository, here are the classes and dependencies:
import java.util.Optional;
import com.facebook.presto.sql.SqlFormatter;
import com.facebook.presto.sql.parser.ParsingOptions;
import com.facebook.presto.sql.parser.SqlParser;
public class SchemaAwareQueryAdapter {
// Inspired from
// https://github.com/prestodb/presto/tree/master/presto-parser/src/test/java/com/facebook/presto/sql/parser
private static final SqlParser SQL_PARSER = new SqlParser();
public String rewriteSql(String sqlStatement, String schemaId) {
com.facebook.presto.sql.tree.Statement statement = SQL_PARSER.createStatement(sqlStatement, ParsingOptions.builder().build());
SchemaAwareQueryVisitor visitor = new SchemaAwareQueryVisitor(schemaId);
statement.accept(visitor, null);
return SqlFormatter.formatSql(statement, Optional.empty());
}
}
public class SchemaAwareQueryVisitor extends DefaultTraversalVisitor<Void, Void> {
private String schemaId;
public SchemaAwareQueryVisitor(String schemaId) {
super();
this.schemaId = schemaId;
}
/**
* The customer can type:
* [table name]
* [schema].[table name]
* [catalog].[schema].[table name]
*/
@Override
protected Void visitTable(Table node, Void context) {
List<String> parts = node.getName().getParts();
// [table name] -> is the only one we need to modify, so let's check by parts.size() ==1
if (parts.size() == 1) {
try {
Field privateStringField = Table.class.getDeclaredField("name");
privateStringField.setAccessible(true);
QualifiedName qualifiedName = QualifiedName.of("\""+schemaId+"\"",node.getName().getParts().get(0));
privateStringField.set(node, qualifiedName);
} catch (NoSuchFieldException | SecurityException | IllegalArgumentException | IllegalAccessException e) {
throw new SecurityException("Unable to execute query");
}
}
return null;
}
}
import static org.testng.Assert.assertEquals;
import org.gherrera.prestosqlparser.SchemaAwareQueryAdapter;
import org.testng.annotations.Test;
public class SchemaAwareTest {
private static final String schemaId = "SCHEMA";
private SchemaAwareQueryAdapter adapter = new SchemaAwareQueryAdapter();
@Test
public void testAppendSchemaA() {
String sql = "select * from tableA";
String bound = adapter.rewriteSql(sql, schemaId);
assertEqualsFormattingStripped(bound,
"select * from \"SCHEMA\".tableA");
}
private void assertEqualsFormattingStripped(String sql1, String sql2) {
assertEquals(sql1.replace("\n", " ").toLowerCase().replace("\r", " ").replaceAll(" +", " ").trim(),
sql2.replace("\n", " ").toLowerCase().replace("\r", " ").replaceAll(" +", " ").trim());
}
}
<dependencies>
<dependency>
<groupId>com.facebook.presto</groupId>
<artifactId>presto-parser</artifactId>
<version>0.229</version>
</dependency>
<dependency>
<groupId>org.testng</groupId>
<artifactId>testng</artifactId>
<version>6.10</version>
<scope>test</scope>
</dependency>
</dependencies>
PS: I was able to add the schema without the doubles quotes, but them I got into identifiers must not start with a digit; surround the identifier with double quotes
error. Basically this error comes from SqlParser$PostProcessor.exitDigitIdentifier(...)
method..
Thanks
I was able to find a solution for my case, either way will share on Presto Slack my finding to see if that is expected behavior.
So, if you want to append with double quote your schema, you will need to create your own Vistor class and you'll need to override the method
visitTable
and when you Qualify the name of your table with schema, (here's the tick), pass the schema as UPPERCASE, so it will not match the regex pattern on classSqlFormatter
on methodformatName
and it will add the double-quote..