How to compare two strings which can have null character \0 in between in C++?

306 Views Asked by At

I want to compare two strings s1 and s2 and both strings can have null characters in between. I want both case sensitive and insensitive compare like strcmp and strcasecmp. Suppose my strings are:

std::string s1="Abcd\0abcd"
std::string s2="Abcd\0cccc"

Currently, I'm doing strcmp(s1.c_str(), s2.c_str()) and strcasecmp(s1.c_str(), s2.c_str()) but strcasecmp and strcmp end up giving equal in this case and skip the comparison after \0. Any libraries I can use to compare these strings.

2

There are 2 best solutions below

0
Jan Schultke On

Case-sensitive comparison

Case-sensitive comparison is simple. However, we need to use the sv literal to make a std::string_view that can contain null characters. Some of its constructors could also handle it, but no solution is as concise as sv literals (or s for std::string).

using namespace std::string_view_literals;

// prior to C++17, you can use std::string and "..."s
std::string_view s1 = "Abcd\0abcd"sv;
std::string_view s2 = "Abcd\0cccc"sv;

bool eq = s1 == s2; // false

std::string_view already doesn't care about null characters in the string, so you can use the overloaded == operator.

In general, you should avoid C functions like strcmp; there are much better alternatives in C++ that don't require null-terminated strings.

Case-insensitive comparison

Case-insensitive comparison is slightly more difficult, but can be easily done with std::ranges::equal or std::equal.

#include <cctype>    // std::tolower
#include <algorithm> // std::ranges::equal or std::equal

// C++20
bool eq = std::ranges::equal(s1, s2, [](unsigned char a, unsigned char b) {
    return std::tolower(a) == std::tolower(b);
});
// legacy
bool eq = std::equal(std::begin(s1), std::end(s1), std::begin(s2), std::end(s2),
    [](unsigned char a, unsigned char b) {
        return std::tolower(a) == std::tolower(b);
    });

Note: it's important that the lambda accepts unsigned char, not char; std::tolower doesn't work properly if we input negative values, and char may be negative.

Note: std::tolower doesn't handle unicode strings. See also Case-insensitive string comparison in C++ for more robust solutions.

1
selbie On

This:

std::string s1="Abcd\0abcd"

Will result in std::string getting assigned "Abcd" since the assignment will stop at the first null char in the string literal.

This will include the full binary string with the null chars that appear in the middle.

std::string s1("Abcd\0abcd", 9);
std::string s2=("Abcd\0cccc", 9);

Then you can do:

if (s1 < s2) {
}

Or case insensitive:

auto s1lower = s1;
std::transform(s1lower.begin(), s1lower.end(), s1lower.begin(),
[](char c){ return std::tolower(c); });

auto s2lower = s2;
std::transform(s2lower.begin(), s2lower.end(), s2lower.begin(),
[](char c){ return std::tolower(c); });

if (s1lower < s2lower) {
    ...
}