Is ragel not designed to be used with files?

250 Views Asked by At

Looking into ragel and can't figure out how to reasonably read from a file. As far as I understand, it requires a memory buffer that is not broken in the middle of a token. This is obviously quite a lot of work to implement, especially if I don't know the size of the tokens, e.g. strings with new-lines, escapes, etc. If I'm implementing all that, I'm not sure I need ragel any more.

Is there no better way?

2

There are 2 best solutions below

1
On

If you map a file into memory (mmap, CreateFileMapping) you'll have the entire file available as one contiguous chunk of memory.

Also look at the Ragel User Guide (5.9 Maintaining Pointers to Input Data) which has some sample code for dealing with that situation. For strings or tokens that might exceed a fixed buffer size, you could use a variable sized buffer that grows as necessary.

0
On

It's very tricky to get it right and dependent on the kind of parsing you are doing.

For files, you can just map the entire file into memory and process it.

It's the best option.

If you read data in chunks, you can still parse it, but you need to keep track of your state variables across calls to parse. I normally do this by putting them in a class and having a method parse which can incrementally parse buffers of data.

If you are extracting tokens from the data, you'll need to capture the token into a string before returning. When you resume parsing, when you finish matching the token, you concatenate it with the previously matched part and that's the complete token. In the worst case, your token buffer might be as big as the original file.

You can see some examples of this here:

  1. If the parser marked the start of a token, but didn't complete it yet, it gets stored: https://github.com/kurocha/async-http/blob/eff77f61f7a85a3ac21f7a8f51ba07f069063cbe/source/Async/HTTP/V1/RequestParser.rl#L52-L54
  2. Parser state is preserved across calls to parse: https://github.com/kurocha/async-http/blob/eff77f61f7a85a3ac21f7a8f51ba07f069063cbe/source/Async/HTTP/V1/Parser.hpp#L57-L73
  3. The loop which reads data and calls parse: https://github.com/kurocha/async-http/blob/eff77f61f7a85a3ac21f7a8f51ba07f069063cbe/source/Async/HTTP/V1/Protocol.cpp#L36-L54