Parse string separated with semicolon using a regex

471 Views Asked by At

I'm trying to separate a string of numbers using a regex. The following C code works when the numbers are separated with a comma:

#include <stdio.h>

int main()
{ 
    char *multiple = "0.20,0.37,0.75,0.56";
    char one[4];
    char two[4];
    char three[4];
    char four[4];

    sscanf(multiple, "%[^','], %[^','], %[^','], %[^',']", one, two, three, four);
    printf("one %s, two %s, three %s, four %s\n", one, two, three, four);

    return 0;
}

However, in my code they are separated with semicolon and I would like to do the same thing. Only, it doesn't work in this case:

#include <stdio.h>

int main()
{
    char *multiple = "0.20;0.37;0.75;0.56";
    char one[4];
    char two[4];
    char three[4];
    char four[4];

    sscanf(multiple, "%[^';'], %[^';'], %[^';'], %[^';']", one, two, three, four);
    printf("one %s, two %s, three %s, four %s\n", one, two, three, four);

    return 0;
}

Can anyone let me know why this is the case and how to fix it?

3

There are 3 best solutions below

1
On BEST ANSWER

scanf doesn't support regular expressions. It supports strings of a given set of characters. When your format contains %[^';'] it matches any sequence of one or more characters except ' and ;. When your format contains a comma (,) it matches a comma.

So when you say:

sscanf(multiple, "%[^';'], %[^';'], %[^';'], %[^';']", one, two, three, four);

it matches as many characters other than ' and ; as it can, and stores them in one. It then tries to match a ,, which will fail (causing scanf to return 1 -- one thing matched and stored) as any comma would have been included in one -- the next character can only be a ; or '.

What you want is

if (sscanf(multiple, "%[^;];%[^;];%[^;];%[^;]", one, two, three, four) != 4)
    /* failed -- do something appropriate */

You should always check the return value of scanf to see if it matched all your patterns and grabbed as many things as you think it should.

Note also the lack of spaces in the format -- a space will match (and skip) any sequence of 0 or more whitespace characters in your string. That may actually be what you want (strip leading whitespace in each of the fields you extract), but is not what you described

1
On

here, this should work:

#include <stdio.h>

int main()
{ 
    char * multiple("0.20,0.37,0.75,0.56");
    char one[10];
    char two[10];
    char three[10];
    char four[10];

    sscanf(multiple, "%[^';'];%[^';'];%[^';'];%[^';']", one, two, three, four);
    printf("one %f, two %f, three %s, four %s\n", one, two, three, four);

    return 0;
}
0
On

Do the same thing as your comment but with semicolons

sscanf(multiple, "%[^';'];%[^';'];%[^';'];%[^';']", one, two, three, four);

The why, I don't know, since the format specifiers in the scanf family of functions are not generally considered to be a species of regular expression. And I do not know every detail about scanf functionality.