OCamlLex case-insenstitive

439 Views Asked by At

Is there a way to have case in-sensitive token in Ocamllex specification? I already tried to make case in-sensitive token in this way:

let token = parser
    ...
   | ['C''c']['A''a']['S''s']['E''e'] { CASE }
    ...

but I'm searching for something else, if exists.

2

There are 2 best solutions below

0
On BEST ANSWER

Use an ordinary token lexer that accepts both lower- and upper-case, and look up keywords in a table, ignoring case:

{
type token = Case | Test | Ident of string

let keyword_tbl = Hashtbl.create 64

let _ = List.iter (fun (name, keyword) ->
    Hashtbl.add keyword_tbl name keyword) [
    "case", Case;
    "test", Test;
  ]
}

let ident_char = ['a'-'z' 'A'-'Z' '_']

rule next_token = parse
  | ident_char+ as s {
      let canon = String.lowercase s in
      try Hashtbl.find keyword_tbl canon
      with Not_found ->
        (* `Ident canon` if you want case-insensitive vars as well
         * as keywords *)
        Ident s
    }
0
On

As proposed by @gsg, the correct way to solve this is to use an ordinary token lexer that accepts both lower- and upper-case, and then look up keywords in a table. Having a regexp for each keyword is actually an anti-pattern mentioned by the documentation of ocamllex:

12.7 Common errors

ocamllex: transition table overflow, automaton is too big The deterministic automata generated by ocamllex are limited to at most 32767 transitions. The message above indicates that your lexer definition is too complex and overflows this limit. This is commonly caused by lexer definitions that have separate rules for each of the alphabetic keywords of the language, as in the following example. […]

http://caml.inria.fr/pub/docs/manual-ocaml-4.00/manual026.html#toc111

The documentation continues with an explicit code example and a rewrite using a lookup table.

The lookup-table should encapsulate the “case-insensitivity” of lookups, we should therefore use a functorial associative structure (read: Map or HashTbl) that allows us to define our own comparison function. Since our lookup-table is likely to be immutable, we pick the Map:

{
type token = Case | Test | Ident of string

module KeywordTable =
  Map.Make(struct
    type t = string
    let compare a b =
      String.(compare (lowercase a) (lowercase b))
  end)

let keyword_table =
  List.fold_left
    (fun (k, v) -> KeywordTable.add k v))
    [
      "case", Case;
      "test", Test;
    ]
    KeywordTable.empty
}

let ident_char = ['a'-'z' 'A'-'Z' '_']

rule next_token = parse
  | ident_char+ as s {
      try KeywordTable.find keyword_table s
      with Not_found -> Ident s
    }