I want to say to strtok()
to use as delimeters everything except the alphanumerical characters.
My attempts are the example of the ref:
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
However I am going to parse real text files (that contain reviews for a site). Currently I check to see what other delimeters occurs and I augment the second argument of strtok()
. For example, I saw an [
, so I did it " ,.-["
and so on, but OK I might miss something and maybe a new text file contain a new delimeter.
Can't I do something smarter (and actually correct, because this is not)?
For example if I get:
[Hello_sir I'm George]
I would like to get these tokens:
Hello
sir
I
m
George
The problem is that I don't know which are the delimeters.
I would like to say use as delimeters everything except alphanumerical characters.
EDIT
I thought of going character by character and checking if it is alphanumerical, but I was hoping for something built-in, like feeding as desired the strtok()
.
The only way to do that with
strtok
(without overwriting the source string's non-alphanumeric characters with something else) would be to pass a delimiter string which contained all the non-alphanumeric characters. You could build this once at first run-time like this:Then use
delims
as your delimiter string.However, this is both ugly and inefficient. You would be better writing a hand-rolled parser, borrowing the source to
strtok
if necessary.