Which is the key and which is the item for a keywords hash table?

257 Views Asked by At

I need to implement a lexical analyzer and I need a data structure to save the keywords. I was advised to use a hash table to keep the keywords and one suggestion was to use C# Hash Table form System.Collections. But I have a problem: to use this hash table I need a key and an item. I have only the keyword. Should I use the keyword as key or as item,or as both? And since the keywords are different can I use another data structure, for example a binary tree? My real interest is this: how does a compiler implement this issue?

1

There are 1 best solutions below

0
On BEST ANSWER

In general, keywords have only syntactic value, so in most compilers they are only used to select an appropriate grammatical rule. Their "value", as such, is consumed immediately. Since their identity is the only useful information, it is probably more appropriate to use a HashSet than a HashMap.

However, there might be a set of keywords which are syntactically identical, forming what is effectively an enumeration type. In such cases, the enumeration value could be the value associated with the keyword.

For a handbuilt lexical analyzer, the use of a hashset or other such datastructure may prove simple, but most compilers will actually compile the keywords along with the other lexical token patterns into a finite state automaton. This allows the keywords to be recognized during the lexical scan, without any external datastructure.

Regardless, in almost all languages the set of keywords is fixed and so it is most appropriate to use an efficient datastructure compiled into the lexical scanner. For example, instead of a binary tree, it would be reasonable to use a sorted static vector of strings which could be binary searched. Alternatively, a preconstructed trie could be used; this would be almost equivalent to the finite state automaton referred to above.