Looking into ragel and can't figure out how to reasonably read from a file. As far as I understand, it requires a memory buffer that is not broken in the middle of a token. This is obviously quite a lot of work to implement, especially if I don't know the size of the tokens, e.g. strings with new-lines, escapes, etc. If I'm implementing all that, I'm not sure I need ragel any more.
Is there no better way?
If you map a file into memory (mmap, CreateFileMapping) you'll have the entire file available as one contiguous chunk of memory.
Also look at the Ragel User Guide (5.9 Maintaining Pointers to Input Data) which has some sample code for dealing with that situation. For strings or tokens that might exceed a fixed buffer size, you could use a variable sized buffer that grows as necessary.