Transforming a string_view in-place

3.6k Views Asked by At

std::transform, as of C++20, is declared constexpr. I have a bunch of string utility functions that take std::string arguments, but a lot of the usage ends up just passing in small, short, character literal sequences at compile-time. I thought I would leverage this fact and declare versions that are constexpr and take std::string_views instead of creating temporary std::string variables just to throw them away...

ORIGINAL STD::STRING VERSION:

[[nodiscard]] std::string ToUpperCase(std::string string) noexcept {
    std::transform(string.begin(), string.end(), string.begin(), [](unsigned char c) -> unsigned char { return std::toupper(c, std::locale("")); });
    return string;
}

NEW STD::STRING_VIEW VERSION:

[[nodiscard]] constexpr std::string_view ToUpperCase(std::string_view stringview) noexcept {
    std::transform(stringview.begin(), stringview.end(), stringview.begin(), [](unsigned char c) -> unsigned char { return std::toupper(c, std::locale("")); });
    return stringview;
}

But MSVC complains:

error C3892: '_UDest': you cannot assign to a variable that is const

Is there a way to call std::transform with a std::string_view and put it back into the std::string_view or am I going to have to create a local string and return that, thereby defeating the purpose of using std::string_view in the first place?

[[nodiscard]] constexpr std::string ToUpperCase(std::string_view stringview) noexcept {
    std::string copy{stringview};
    std::transform(stringview.begin(), stringview.end(), copy.begin(), [](unsigned char c) -> unsigned char { return std::toupper(c, std::locale("")); });
    return copy;
}
2

There are 2 best solutions below

5
passing_through On

You can't in-place transform a std::string_view - what if it has been constructed from char const*?

a lot of the usage ends up just passing in small, short, character literal sequences at compile-time.

...but you can lift string literals to the type level

namespace impl {
    template<std::size_t n> struct Str {
        std::array<char, n> raw{};
        constexpr Str(char const (&src)[n + 1]) { std::copy_n(src, n, raw.begin()); }
    };
    template<std::size_t n> Str(char const (&)[n]) -> Str<n - 1>;
}
template<char... cs> struct Str { static char constexpr value[]{cs..., '\0'}; };
template<impl::Str s>
auto constexpr str_v = []<std::size_t... is>(std::index_sequence<is...>) {
    return Str<s.raw[is]...>{};
}(std::make_index_sequence<s.raw.size()>{});

...and add a special case. In general, this hack can be avoided with range/tuple polymorphic algorithms.

[[nodiscard]] constexpr auto ToUpperCase(auto str) {
    for (auto&& c: str) c = ConstexprToUpper(c); // std::toupper doesn't seem constexpr
    return str;
}
template<char... cs> [[nodiscard]] constexpr auto ToUpperCase(Str<cs...>) {
    return Str<ConstexprToUpper(cs)...>{};
}

So, to use that transformation optimized for character literal sequences, now write ToUpperCase(str_v<"abc">) instead of ToUpperCase("abc"sv). If you always want string_view as output, return std::string_view{Str<ConstexprToUpper(cs)...>::value} in that overload.

0
alfC On

As said in one comment, span is a better vocabulary type for this because individual elements can be modified, giving a sort of mutable string view (msv). Also, I wouldn't make it nodiscard, because it can be useful even without assigning the result:

#include<algorithm>  // for std::transform
#include<cassert>
#include<locale>  // for std::to_upper
#include<string_view>
#include<span>

constexpr auto ToUpperCase(std::span<char> msv) noexcept {
    std::transform(msv.begin(), msv.end(), msv.begin(), [](unsigned char c) -> unsigned char { return std::toupper(c); });
    return msv;
}

int main() {
    auto a = std::string{"compiler"};
    auto&& tmp = ToUpperCase(a);
    auto b = std::string{tmp.begin(), tmp.end()};
    assert( a == "COMPILER");
    assert( b == "COMPILER");
}

https://godbolt.org/z/zPr968PYr


Somewhat departing from your original aim... I think this is more elegant, although subject to bloating and ugly compilation errors. It has the same effect in the cases provided.

Also I don't like the design of span (or string_view for that matter)

(Exercise: add Concepts)

template<class StringRange>
constexpr StringRange&& ToUpperCase(StringRange&& stringview) noexcept {
    std::transform(stringview.begin(), stringview.end(), stringview.begin(),
        [](unsigned char c) -> unsigned char { return std::toupper(c); });
    return std::forward<StringRange>(stringview);
}

https://godbolt.org/z/e9aWKMerE

I find myself using this idiom quite a bit lately.