I am creating a lexical analyser that must read a text input and output tokens for a basic 'created' language and should output a token when called. I would like it to distinguish between identifiers, constants etc.. from a list of which I pre-determine.
I need to read the text file using an input stream. A while loop will loop through chars individually but I need it to recognise if the chars scanned are an identifier or a '+' '-' '*' '/' etc... what would be the best way to do this?
I am fairly new to programming so any advice on how to construct this would be appreciated. many thanks for any answers
The
StreamTokenizer
class will probably help you out the most. It will read and distinguish between identifiers, numbers, and strings. You can also configure it to identify operators, such as+
,*
, etc.