Initializing very large C++ std::bitset at compile time

952 Views Asked by At

I want to store a static constant bitset of 216 bits, with a specific sequence of 1s and 0s that never changes.

I thought of using an initializer string as proposed by this post :

std::bitset<1<<16> myBitset("101100101000110 ... "); // the ellipsis are replaced by the actual 65536-character sequence

But the compiler (VS2013) gives me the "string too long" error.

UPDATE

I tried splitting the string into smaller chunks, as proposed in the post linked above, like so:

std::bitset<1<<16> myBitset("100101 ..."
                            "011001 ..."
                            ...
                            );

But I get the error C1091: compiler limit: string exceeds 65535 bytes in length. My string is 65536 bytes (well technically 65537, with the EOS character).

What are my other options?

UPDATE

Thanks to luk32, this is the beautiful code I ended up with:

const std::bitset<1<<16> bs = (std::bitset<1<<16>("101011...")
    << 7* (1<<13)) | (std::bitset<1<<16>("110011...")
    << 6* (1<<13)) | (std::bitset<1<<16>("101111...")
    << 5* (1<<13)) | (std::bitset<1<<16>("110110...")
    << 4* (1<<13)) | (std::bitset<1<<16>("011011...")
    << 3* (1<<13)) | (std::bitset<1<<16>("111011...")
    << 2* (1<<13)) | (std::bitset<1<<16>("111001...")
    << 1* (1<<13)) | std::bitset<1<<16>("1100111...");
3

There are 3 best solutions below

6
On BEST ANSWER

You didn't really split the literal. It gets concatenated for compilation anyways. You are getting limited by the compiler. I don't think there's a way to increase this limit in MSVC.

You can split it into two literals, initialize two bitsets, shift 1st part and OR with the other.

Something like:

#include <iostream>
#include <string>
#include <bitset>

 
using namespace std;
int main()
{
    std::bitset<8> dest("0110");
    std::bitset<8> lowBits("1001");

    dest <<= dest.size()/2;
    dest |= lowBits;
    std::cout << dest << '\n';
}

If you look at the clang compiler output at -02, it gets optimized to loading 105 which is 01101001.

My testing shows that if you swap 8 for 1<<16 it uses SSE, so it should be pretty safe bet. It didn't drop the literals like in case of 8 or 16, so there might be some runtime overhead, but I am not sure if you can do much better.

EDIT:

I did some more tests, here is my playground:

#include <iostream>
#include <string>
#include <bitset>
 

using namespace std;
int main()
{
    //static const std::bitset<16> set1( "01100110011001100110011001100110");
    static const std::bitset<16> set2(0b01100110011001100110011001100110);

    static const std::bitset<16> high(0b01100110);
    static const std::bitset<16> low (0b01100110);
    static const std::bitset<16> set3 = (high << 8) | low;
    std::cout << (set3 == set2) << '\n';
}

I couldn't get compile time optimization for const char* constructor on any compiler except for clang, and that worked up to 14 characters. There seems to be some promise if you make a bunch of bitsets initialized from unsigned long long and shift and combine them together:

static const std::bitset<128> high(0b0110011001100110011001100110011001100110011001100110011001100110);
static const std::bitset<128> low (0b1001100110011001100110011001100110011001100110011001100110011001);
static const std::bitset<128> set3 = (high << high.size()/2) | low;
std::cout << set3 << '\n';

This makes compilers to stick to binary data storage. If could use a bit newer compiler with constexpr I think it would be possible to declare it as an array of bitsets constructed from ulls and have them concatenated by a constexpr function and bound to a constexpr const variable, which should ensure best optimization possible. Compiler still could go against you, but there would be no reason. Maybe even without constexpr it would generate pretty much optimal code.

0
On

You may consider skipping compilation altogether, and simply:

  • Assemble the data into an object file (segment .rodata), exporting symbols for it and its size.
  • Declaring these symbols as extern const in a .h file.
  • Use these symbols and link your program to this object file.

I don't have MASM32 handy to write a complete answer that actually works, but I use this technique often with GAS and LD and it culls a lot of issues. (loading-on-demand, security descriptors of an otherwise separate data file, blazingly fast compile times...)

Note that this is what the VS resource compiler does, in short... so you may include your data as a resource and get a pointer to it.

6
On

It's impossible to have a static std::bitset like that because:


In case construction at runtime is allowed then simply split the string literal into multiple smaller ones less than 2048 characters in case the total length is smaller than 65536:

ANSI compatibility requires a compiler to accept up to 509 characters in a string literal after concatenation. The maximum length of a string literal allowed in Microsoft C is approximately 2,048 bytes. However, if the string literal consists of parts enclosed in double quotation marks, the preprocessor concatenates the parts into a single string, and for each line concatenated, it adds an extra byte to the total number of bytes.

[...]

While an individual quoted string cannot be longer than 2048 bytes, a string literal of roughly 65535 bytes can be constructed by concatenating strings.

https://learn.microsoft.com/en-us/cpp/c-language/maximum-string-length?view=msvc-160

As said, longer strings must be concatenated manually. Here

const int LENGTH = 1 << 16;
std::bitset<LENGTH> myBitset(
    "100101 ..."  // 2ᴺ bits
    "011001 ..."  // 2ᴺ bits
    ...
    "001011 ...", // must be one shorter than the previous lines: 2ᴺ⁻¹ bits
    LENGTH - 1    // size
);
myBitset[LENGTH - 1] = 1; // set the final bit

Alternatively just use an array instead of string literal:

static const char BITSET[LENGTH] = {
    '1', '0', '0', '1',...
    ...
    '0', '1', '0', '0'
};
std::bitset<LENGTH> myBitset(BITSET, sizeof(BITSET));