Find token's position in an SQL query using JSqlParser

1.1k Views Asked by At

I'm trying to wrap my head around finding the position (line is enough) of an SQL object (token?) in an SQL query, using JSqlParser.

For example, in the query:

SELECT a
FROM table
WHERE a = 1 AND
      b = 2 AND
      c = 3

When looking at the expression b = 2, I would like to be able to know that it's in line 4. Or, when looking at the SelectItem "a", I would like to know it's in line 1.

Didn't see anything in the documentation nor in the objects themselves regarding their positions. If this is not currently possible, can you please suggest a good approach to adding this capability on my own (easiest and quickest in terms of performance).

1

There are 1 best solutions below

10
On BEST ANSWER

Without a patched version of JSqlParser, you can't. At the moment JSqlParser V1.2-SNAPSHOT does provide you with AST - node information for some specific parsed tokens, but not for all. Using the following code snipped:

public static void main(String[] args) throws JSQLParserException {
    String sql = "SELECT a\n"
            + "FROM table\n"
            + "WHERE a = 1 AND\n"
            + "      b = 2 AND\n"
            + "      c = 3";

    SimpleNode node = (SimpleNode) CCJSqlParserUtil.parseAST(sql);

    node.jjtAccept(new CCJSqlParserDefaultVisitor() {
        @Override
        public Object visit(SimpleNode node, Object data) {
            System.out.println(node.toString() + " firstToken=" + node.jjtGetFirstToken().image 
                    + " line=" + node.jjtGetFirstToken().beginLine);
            return super.visit(node, data);
        }
    }, null);
}

You can get informations about all saved AST nodes. Exploring the interface of SimpleNode you also the absolute position and range of your token within your source sql string.

This "limitation" comes from JSqlParsers feature driven development and the massive performance impact on generating all AST-Nodes. Additionally some AST nodes are linked with JSQlParsers parse objects like Column, Table, etc. which is not done automatically using JavaCC but has to be done within the grammar definition.

At first I mentioned a patched JSqlParser. This is indeed very easy to do by modifiing the pom.xml and setting

<nodeDefaultVoid>true</nodeDefaultVoid>

to false. Then for all productions AST nodes are generated. But you will see, it results in a huge tree, even for your simple statement.

If you need this information for some specific productions like your EqualTo objects, this could be integrated via a feature request for JSqlParser (github).

Why those two kind of trees?

There is a difference between the parse tree and the abstract syntax tree in JSqlParser. For objects extended from ASTNodeAccessImpl there is a link between those two. The delivered parse tree objects are a more practical approach to the parsed part of your sql. E.g you are able to set a Table object into a Column object to reference the column from this table. Using a AST approach this would result in constructing more nodes to construct a point, identifier, etc.

The AST nodes have the information you need, the exact position of a token within your original sql text. If you look at a Function object, you can use the method getASTNode to get the corresponding AST - node to get the exact position within your sql. But as I wrote, this is only implemented for a few parse tree objects. If you need specific one to get this information, I will happily extend JSqlParser for it, but keep in mind, that getting all of these nodes has a huge performance and memory impact.