This is a minimal PackCC grammar example.
I try to retrieve and print the $$ value after parsing. The word is matched but only garbage is displayed by the printf call.
%value "char*"
word <- < [a-z]+[\n]* > {$$ = $1;}
%%
int main(void)
{
char* val = "Value";
// Create a file to parse.
FILE* f = freopen("text.txt", "w", stdin);
if(f != NULL) {
// Write the text to parse.
fprintf(f, "example\n");
// Set the file in read mode.
f = freopen("text.txt", "r", stdin);
pcc_context_t *ctx = pcc_create(NULL);
// I expect val to receive the "$$" value from the parse.
while(pcc_parse(ctx, &val));
printf("val: %s\n",val);
pcc_destroy(ctx);
fclose(f);
}
else {
puts("File is NULL");
}
return 0;
}
The PackCC doc says that $$ is:
The output variable, to which the result of the rule is stored.
And it says that the pcc_parse function:
Parses an input text (from standard input by default) and returns the result in
ret. Theretcan beNULLif no output data is needed. This function returns 0 if no text is left to be parsed, or a non-0 value otherwise.
There is no problem with your use of
$$, in the sense that thechar *value stored in$$by thewordaction is faithfully returned intoval.The problem is that the
char*value is a pointer to dynamically-allocated memory, and by the time the parser returns that dynamically-allocated memory has already been freed. So the pointer returned intovalis a dangling pointer, and by the timeprintfis called, the memory region has been been used for some other object.The documention for PackCC, such as it is, does not go into any detail about its memory management strategy, so it's not really clear how long the
$1pointer in a rule is valid. I think it would be safest to assume that it is only valid until the end of the last action in the rule. But it is certainly not reasonable to assume that the pointer will outlast a call topcc_parse. After all, the parser has no way to know that you have stored the pointer outside of the parser context. The parser cannot rely on the programmer tofreecapture strings produced during rules; having tofreeevery capture, even the ones never used, would be a sever inconvenience. To avoid memory leaks, the parser therefore mustfreeits capture buffers.The problem is easy to see if you are able to use valgrind or some similar tool. (Valgrind is available for most Linux distributions and for OS X since v10.9.x. Other platforms might be supported.) Running your parser under valgrind produced the following error report (truncated):
That's a lot to go through, but it shows that there was an attempt to use the first byte of a 9-byte dynamically-allocated memory region which has already been free'd. ("Address 0x5232e20 is 0 bytes inside a block of size 9 free'd".) Furthermore, the backtrace shows that the error was triggered by a call to
strlen, which had been called byprintf;printfwas called from yourmainfunction. (Unfortunately, PackCC does not issue#linedirectives, making it impossible to correlate the line numbers in the generated C parser with the line numbers in the original PEG grammar file. However, in this case it's clear where theprintfis, since there's really only one possibility inside themainfunction.) Valgrind also shows you where the memory was dynamically allocated; although you'd have to have a copy of the generated parser handy to see how all the parts fit together, the names of the functions in the call trace are somewhat helpful.The solution is basically the same as the way you must handle
yytextin a parser which relies on (f)lex-based scanners: since the string pointed to by the action is in memory which whose lifetime is about to end, any token which you want to use later must be copied. The simplest way to do that is to usestrdup(or equivalent, if you're not able to use standard Posix interfaces), changing the action to:Once you do this, the "word"
examplewill be printed as expected (including the newline character which terminates it).You also must remember to
freethe copies you have made in order to avoid leaking memory. Valgrind will also help you detect memory leaks, so it can help you catch errors resulting from forgetting to do so.