Conflicts on Trying to Prevent SQL Clauses Entering Cypher Parser

88 Views Asked by At

I am working on a project to add support for Cypher queries on psql to Apache AGE. Currently, to create a graph with Apache AGE, we need to specify a Cypher query inside the SQL query. For example:

SELECT * FROM cypher('graph_name', $$
MATCH (v)
RETURN v
$$) as (v agtype);

With the new support, we only need to specify MATCH (v) RETURN v; to generate the same result. To achieve this, we implemented the HandleCypherCmds function in the psql mainloop.c file, specifically in the PSCAN_SEMICOLON condition.

Here is the relevant code:

/*
 * Send command if semicolon found, or if end of line and we're in
 * single-line mode.
 */
if (scan_result == PSCAN_SEMICOLON ||
    (scan_result == PSCAN_EOL && pset.singleline))
{
    /*
     * Save line in history.  We use history_buf to accumulate
     * multi-line queries into a single history entry.  Note that
     * history accumulation works on input lines, so it doesn't
     * matter whether the query will be ignored due to \if.
     */
    if (pset.cur_cmd_interactive && !line_saved_in_history)
    {
        pg_append_history(line, history_buf);
        pg_send_history(history_buf);
        line_saved_in_history = true;
    }

    /* execute query unless we're in an inactive \if branch */
    if (conditional_active(cond_stack))
    {
        /* handle cypher match command */
        if (pg_strncasecmp(query_buf->data, "MATCH", 5) == 0 ||
                pg_strncasecmp(query_buf->data, "OPTIONAL", 8) == 0 ||
                pg_strncasecmp(query_buf->data, "EXPLAIN", 7) == 0 ||
                pg_strncasecmp(query_buf->data, "CREATE", 6) == 0)
        {
            cypherCmdStatus = HandleCypherCmds(scan_state,
                                cond_stack,
                                query_buf,
                                previous_buf);

            success = cypherCmdStatus != PSQL_CMD_ERROR;

            if (cypherCmdStatus == PSQL_CMD_SEND)
            {
                //char *qry = convert_to_psql_command(query_buf->data);
                success = SendQuery(convert_to_psql_command(query_buf->data));
            }
        }
        else
            success = SendQuery(query_buf->data);

        slashCmdStatus = success ? PSQL_CMD_SEND : PSQL_CMD_ERROR;
        pset.stmt_lineno = 1;

        /* transfer query to previous_buf by pointer-swapping */
        {
            PQExpBuffer swap_buf = previous_buf;

            previous_buf = query_buf;
            query_buf = swap_buf;
        }
        resetPQExpBuffer(query_buf);

        added_nl_pos = -1;
        /* we need not do psql_scan_reset() here */
    }
    else
    {
        /* if interactive, warn about non-executed query */
        if (pset.cur_cmd_interactive)
            pg_log_error("query ignored; use \\endif or Ctrl-C to exit current \\if block");
        /* fake an OK result for purposes of loop checks */
        success = true;
        slashCmdStatus = PSQL_CMD_SEND;
        pset.stmt_lineno = 1;
        /* note that query_buf doesn't change state */
    }
}

Currently, the code implements temporary constraints to prevent SQL clauses from entering the Cypher parser, as doing so generates syntax errors. However, maintaining these constraints is not practical because they only work if the user correctly writes the Cypher clause. I tried working with the parser variables, but it needs to enter the Cypher parser to work, resulting in the same errors.

I have been unable to find a solution to this problem. Could someone please assist me in implementing this feature?

5

There are 5 best solutions below

0
On BEST ANSWER

This problem is now solved. In case of someone is interested in the answer, this is how we resolved this issue:

All SQL and Cypher clauses enter the Cypher parser. In the parser, we have boolean variables that help to differentiate between SQL and Cypher clauses. The following example shows the rules for the DROP clause:

drop_clause:
    DROP GRAPH if_exists_opt IDENTIFIER cascade_opt { graph_name = $4; drop_graph = true; }
    | DROP VLABEL IDENTIFIER cascade_opt { label_name = $3; drop_label = true; }
    | DROP ELABEL IDENTIFIER cascade_opt { label_name = $3; drop_label = true; }
    | DROP CONSTRAINT identifier_opt ON IDENTIFIER assert_patt_opt
    ;

The variables (e.g. drop_graph) are true only if the query matches the Cypher rules. For example, DROP GRAPH graph_name. Otherwise, if the query is a SQL statement, like DROP TABLE table_name, the variables remain false. The parser then returns false, and the SQL statement continues to the SQL parser. You can refer to the complete code in the AgeSQL repository, files mainloop.c and cypher.y.

1
On

Try first introducing a logic to detect if the query is a cypher query and if so, separate it from the SQL part of the code and pass both separately.

0
On
  • You can use a placeholder or any other syntax to recognize the cypher queries in SQL queries. While parsing the SQL query, find the cypher query with that placeholder or syntax and extract those cypher queries.

  • In the second step, execute those cypher queries separately and place their(cypher queries) result within SQL queries at the place where cypher queries were present.

0
On

One approach that comes to mind is Pre-processing the query. You can try to perform a pre-processing step such that instead of sending the whole query to the cypher parser, you can try and separate out SQL parts and Cypher parts using regular expressions or some custom function.

0
On

There are two approaches by which this conflict seems to be controlled.

  1. To resolve this conflict is to use a pre-processing step to separate the SQL clauses from the Cypher clauses. The pre-processing step would identify the SQL clauses and remove them from the query. The remaining query would then be parsed by the Cypher parser.

  2. Use a placeholder to recognize the Cypher queries in SQL queries. The placeholder would be a special keyword or symbol that would indicate that the following query is a Cypher query. The parser would then treat the query as a Cypher query and not an SQL query.

Note: The best approach to resolving the conflict will depend on the specific application (If the application only uses Cypher queries, then the pre-processing step is the simplest approach. If the application uses both SQL and Cypher queries, then the placeholder approach may be more appropriate).