Using FsCheck in C#, I need to generate a list of values where certain values may not occur next to each other. This is a list of tokens for a lexer. So for example, I need to not generate two identifiers next to each other, or a keyword next to an identifier because when they are turned into a string, the lexer would see them as one token. I can't simply do something like:
Gen.ListOf(Gen.Elements("fizz", "buzz", "bazz", "+", " ", "if")).Where(list => ...);
In a reasonable length list, there are very likely to be two tokens next to each other that shouldn't be. Thus the filter would discard too many values, and generation would be very slow. I've considered adding or removing tokens to try to fix the list, but this seems very complicated, and I'm worried it would produce lists that were too short or too long.
What I want to be able to do is generate the list "one element at a time." So I would randomly pick the length, then randomly pick each element respecting whether it was allowed to follow the preceding element until I had the full list. This seems like it should be doable recursively, but I can't figure out how to express it with the generator combinators. Is there a way to do that? I wish I could write an arbitrary lambda that could directly invoke generators inside it rather than needing to compose them. Is something like that possible?
Note: I've simplified the example. Token generation is much more complicated.
Say you have
identifier
andkeyword
both of typeGen<string>
, you could do something like (C#-y pseudocode)You could also create a list of enum value + string so the enum value defines what kind of token you are generating (it seems kind of strange to me to figure out the kind of token from the text you just generated...this also sounds like it would be exactly the code you are trying to test).
Generally I would perhaps advise a different approach, depending on how many tokens you have and what all the constraints are. In such cases it is often easier to define a number of "patterns", i.e. lists of tokens that are legal and that you can string one after another without breaking (too much, possibly doing another filter after works). An example of that (in F#) is here: https://github.com/fsprojects/fantomas/blob/master/src/Fantomas.Tests/FormattingPropertyTests.fs
These patterns typically end up being your grammar or syntax tree, so the basic structure should be straightforward.