Background Information:
I have a desire to make a programming language, knowing the tools to do so, I don't have any good examples on how to use them. I really do not want to use Flex or Bison as it doesn't teach the abstractness that I feel is needed for creating a compiler. I have the concepts of creating strings, tokenizing them, feeding them to a file that acts as grammar and parses eventually creating an actual program to run the language. The problem is, I don't know how to write a tokenizer or parser. I have general ideas, but I understand better when I can see examples. If someone could post an/few example(s), that would be great!
My question is as follows:
Can someone post examples of how to do write a syntax tokenizer/parser in C?
If you want to write a very complex syntax parser in C, without using any existing pattern matching code, it's usually best to implement a state machine and then process source code char by char.
The output of Flex+Bison is also just a state machine. Flex uses regular expressions to tokenize a string into tokens that are then passed to a Bison state machine, processing one token after another, depending on the current state of the machine. But you don't need a regex tokenizer, you can tokenize the input as part of the state machine processing. A regex matcher itself can also be implemented as a state machine, so token generation can be directly part of your state machine.
Here's an interesting link for you; it's not C in particular, more a general overview how state machines work, but once you got the concept, it's easy to transpose that to C code:
Parsing command line arguments using a finite state machine and backtracking
Here's some sample code of a super primitive CSV parser:
The code makes the following assumptions:
(that's what CVS implies but not all CVS files use comma for that purpose)
(usually this is optional unless they contain spaces or quotation marks)
(this is usually allowed)
This code can definitely not parse any CSV data you feed it, but when you feed it that file:
It will produce the following output:
And it's only supposed to give you an idea how you parse complex syntax with a state machine. This code is far from production quality and as you can see, such a
switchquickly grows huge, so I'd at least put the state code into functions or even turn every state into something like a struct or an object for data encapsulation, otherwise this whole thing soon becomes unmanageable.