How to do case-insensitive query in tree-sitter

821 Views Asked by At

I'm working on trying to create and use tree-sitter grammar in a language server I am implementing in order to support features like finding all references of a variable. Given the grammar I would be able to write a query to find all of the references to a variable with a specific name (ex. myVar). However, the language I am writing a language server for uses case insensitive variables (ex. myVar can be referenced as MYVAR, MyVaR, myvar, etc.).

How would I be able to write a tree-sitter query to match a pattern where a token must case-insensitively match a particular string?

I could write the query to not filter by the variable name and implement my own filtering of the results, but I was wondering if there was a way to handle this within the query itself rather than implementing custom filtering code.

Example

Here is a simplified example case to show what I mean.

Given the following grammar, I want to query for all of the set_statements that set a new value to the variable myVar.

module.exports = grammar({
  name: 'mylang',

  rules: {
    source_file: $ => repeat($._statement),
    _statement: $ => choice(
      $.set_statement,
    ),
    set_statement: $ => seq(
      'set',
      field("variable", $.identifier),
      field("value", $._expression),
    ),
    _expression: $ => choice(
      $.integer_literal
    ),

    identifier: $ => /[a-zA-Z0-9]+/,
    integer_literal: $ => /[0-9]+/,
  }
});

Normally I would be able to do this with a query like the following.

(
    (set_statement
        variable: (identifier) @variable)
    (#eq? @variable "myVar")
)

However, as we can see with the following example of running the query, this only picks up on the references to myVar that use the same casing as the query.

$ cat set_testing.txt 
set myVar 0
set MYVAR 23
set myVar2 72
set MyVaR 14
$ tree-sitter query find_variable.query set_testing.txt
set_testing.txt
  pattern: 0
    capture: variable, start: (0, 4), text: "myVar"

I want to create a query that would instead find:

tree-sitter query find_variable.query set_testing.txt
set_testing.txt
  pattern: 0
    capture: variable, start: (0, 4), text: "myVar"
  pattern: 0
    capture: variable, start: (1, 4), text: "MYVAR"
  pattern: 0
    capture: variable, start: (3, 4), text: "MyVaR"
1

There are 1 best solutions below

0
On BEST ANSWER

Change your query to match a regular expression matching all possible upper/lower combinations of an identifier, in this case myvar.

If you change find_variable.query to use match with a regular expression for all case combinations:

(
    (set_statement
        variable: (identifier) @variable)
    (#match? @variable "^[mM][yY][vV][aA][rR]$")
)

Now running tree-sitter query find_variable.query set_testing.txt returns:

set_testing.txt
  pattern: 0
    capture: variable, start: (0, 4), text: "myVar"
  pattern: 0
    capture: variable, start: (1, 4), text: "MYVAR"
  pattern: 0
    capture: variable, start: (3, 4), text: "MyVaR"

Tree-sitter does not support case insensitive regular expression searches Issue #261 so the regular expressions are a little longer.