std::string to int / double in one pass

698 Views Asked by At

I'm parsing a string which may contain either a real or an integral value. I would like to parse that string and get either the integral or the real value in a single parsing.

I could use std::stoi and std::stod, but if i call stoi first and it is a real, then it's going to fail and i will have to call stof, causing a second parsing. And if i call stof first and that the string contains an integral, it's going to consider it as a valid real value, losing the information that it is an integral.

Is there some kind of function that can parse both types in a single pass ? Or do i first have to look for a dot manually and call the right function ?

Thank you. :)

3

There are 3 best solutions below

4
On BEST ANSWER

You will not find a standard call to achieve this for the simple reason that a string of digits without a dot is both a valid integer and a valid double.

If your criterion is "double if and only if dot", then look for the dot by hand. Alternatively, read as double and check that the fractional part is null.

0
On

Since you said (in the comments above) that simple dot notation is all you want in real numbers, and you want a single-pass (i.e. no back-stepping to already-parsed input), and (again from your comment) are more after the programming experience than efficiency / maintainability / extendability, how about this:

char const * input = /*...*/;
char const * parse_end;
size_t pos;
size_t pos2 = 0;

// parse integer (or pre-digit part of real)
int integer = strtol( input, &parse_end, 10 );

if ( *parse_end == '.' )
{
    // you have a real number -- parse the post-digit part
    input = parse_end;
    double real = strtod( input, &parse_end );
    // real + integer is your result
}
else
{
    // integer is your result
}

// in either case, parse_end is your position

Why did I use C functions... stoi returns an index, but stod expects a string. So I'd have to do a substr() or similar, while the C functions work with pointers, making things easier.

What I said in my comment holds true: As a brain experiment this holds some value, but any real parsing work should make use of existing solutions like Boost.Spirit. Getting familiar with such building blocks is, IMHO, more valuable than learning how to roll your own.

0
On

You should parse it by yourself, using std::string::substr, std::string::find_first_of, std::string::find_first_not_of, etc.

As you know, each of std::stoi and std::stof interprets the first longest substring matching a right representation pattern of required type. You might think the integral-parsed result is always different real-parsed result if both possible, but it isn't.

Example 1: think about "123.". std::stoi will parse the substring "123" and std::stof will parse the whole "123.". "123." is a valid floating-point literal, but it represents an exact integer.

Example 2: think about "123.0". This is a trivial real value representation. std::stoi will parse the substring "123" and std::stof will parse the whole "123.0". Two results evaluate arithmetically same.

This is where you should decide what to parse and what not to. Please see cppreference.com article integer literal and floating-point literal for possible patterns.

With this difficulties, many lexers just tokenize the input (separating it by spaces) and check if the full token matches any of valid representation. I think, If you don't know whether the input is integral or approx real, just parse it by std::stof.

In addition, some solutions casting float to int would cause an erroneous behavior. A float typed variable having integral value is not guaranteed to be evaluated equal to an int typed variable with the same integral value. It's because float, commonly compiled to use float32_t(IEEE 754-1985 single / IEEE 754-2008 binary32) has 24 bits width of significand. So a valid string representation of integer which fits in 32-bit signed, may not fit in float. You lose the precision. double, commonly IEEE 754-2008 binary64, will not lose significand width compared with int32_t, but same problem with int64_t and so on.