How is date encoded/stored in MySQL?

544 Views Asked by At

I have to parse date from raw bytes I get from the database for my application on C++. I've found out that date in MySQL is 4 bytes and the last two are month and day respectively. But the first two bytes strangely encode the year, so if the date is 2002-08-30, the content will be 210, 15, 8, 31. If the date is 1996-12-22, the date will be stored as 204, 15, 12, 22. Obviously, the first byte can't be bigger than 255, so I've checked year 2047 -- it's 255, 15, and 2048 -- it's 128, 16.

At first I thought that the key is binary operations, but I did not quite understand the logic:

2047: 0111 1111 1111
255:  0000 1111 1111
15:   0000 0000 1111

2048: 1000 0000 0000
128:  0000 1000 0000
16:   0000 0001 0000

Any idea?

5

There are 5 best solutions below

0
On BEST ANSWER

Based on what you provide, it seems to be N1 - 128 + N2 * 128.

0
On

It seems that the logic of encoding is to erase the most significant bit of the first number and to write the second number from this erased bit like this:

2002 from 210 and 15:

1101 0010 -> _101 0010;
0000 1111 + _101 0010 -> 0111 1101 0010

2048 from 128 and 16:

1000 0000 -> _000 0000
0001 0000 + _000 0000 -> 1000 0000 0000
0
On

Which version???

DATETIME used to be encoded in packed decimal (8 bytes). But, when fractional seconds were added, the format was changed to something like

  • Length indication (1 byte)
  • INT UNSIGNED for seconds-since-1970 (4 bytes)
  • fractional seconds (0-3 bytes)

DATE is stored like MEDIUMINT UNSIGNED (3 bytes) as days since 0000-00-00 (or something like that).

How did you get the "raw bytes"? There is no function to let you do that. Select HEX(some-date) first converts to a string (like "2022-03-22") then takes the hex of it. That gives you 323032322D30332D3232.

0
On

About Code refrence the answer.

About document content check below words :

getBytes document links to ColumnMetaData document url.

ColumnMetaData document links to protobuf encoding url.

protobuf encoding url / Protocol Buffers Documentation Documentation say :

Base 128 Varints

Variable-width integers, or varints, are at the core of the wire format. They allow encoding unsigned 64-bit integers using anywhere between one and ten bytes, with small values using fewer bytes.

Each byte in the varint has a continuation bit that indicates if the byte that follows it is part of the varint. This is the most significant bit (MSB) of the byte (sometimes also called the sign bit). The lower 7 bits are a payload; the resulting integer is built by appending together the 7-bit payloads of its constituent bytes.

So, for example, here is the number 1, encoded as 01 – it’s a single byte, so the MSB is not set:

0000 0001
^ msb

And here is 150, encoded as 9601 – this is a bit more complicated:

10010110 00000001
^ msb    ^ msb

How do you figure out that this is 150? First you drop the MSB from each byte, as this is just there to tell us whether we’ve reached the end of the number (as you can see, it’s set in the first byte as there is more than one byte in the varint). Then we concatenate the 7-bit payloads, and interpret it as a little-endian, 64-bit unsigned integer:

10010110 00000001        // Original inputs.
 0010110  0000001        // Drop continuation bits.
 0000001  0010110        // Put into little-endian order.
 10010110                // Concatenate.
 128 + 16 + 4 + 2 = 150  // Interpret as integer.

Because varints are so crucial to protocol buffers, in protoscope syntax, we refer to them as plain integers. 150 is the same as 9601.

Each byte in the varint has a continuation bit that indicates if the byte that follows it is part of the varint.

0
On

We had the same issue and developed the following C++20 helper methods for production use with mysqlx (MySQL Connector/C++ 8.0 X DevAPI) to properly read DATE, DATETIME and TIMESTAMP fields:

#pragma once

#include <vector>
#include <cstddef>
#include <chrono>
#include <mysqlx/xdevapi.h>

namespace mysqlx {

static inline std::vector<uint64_t>
mysqlx_raw_as_u64_vector(const mysqlx::Value& in_value)
{
  std::vector<uint64_t> out;

  const auto bytes = in_value.getRawBytes();
  auto ptr = reinterpret_cast<const std::byte*>(bytes.first);
  auto end = reinterpret_cast<const std::byte*>(bytes.first) + bytes.second;

  while (ptr != end) {
    static constexpr std::byte carry_flag{0b1000'0000};
    static constexpr std::byte value_mask{0b0111'1111};

    uint64_t v = 0;
    uint64_t shift = 0;
    bool is_carry;
    do {
      auto byte = *ptr;
      is_carry = (byte & carry_flag) == carry_flag;
      v |= std::to_integer<uint64_t>(byte & value_mask) << shift;

      ++ptr;
      shift += 7;
    } while (is_carry && ptr != end && shift <= 63);

    out.push_back(v);
  }

  return out;
}

static inline std::chrono::year_month_day
read_date(const mysqlx::Value& value)
{
  const auto vector = mysqlx_raw_as_u64_vector(value);
  if (vector.size() < 3)
    throw std::out_of_range{"Value is not a valid DATE"};

  return std::chrono::year{static_cast<int>(vector.at(0))} / static_cast<int>(vector.at(1)) / static_cast<int>(vector.at(2));
}

static inline std::chrono::system_clock::time_point
read_date_time(const mysqlx::Value& value)
{
  const auto vector = mysqlx_raw_as_u64_vector(value);
  if (vector.size() < 3)
    throw std::out_of_range{"Value is not a valid DATETIME"};

  auto ymd = std::chrono::year{static_cast<int>(vector.at(0))} / static_cast<int>(vector.at(1)) / static_cast<int>(vector.at(2));
  auto sys_days = std::chrono::sys_days{ymd};

  auto out = std::chrono::system_clock::time_point(sys_days);

  auto it = vector.begin() + 2;
  auto end = vector.end();

  if (++it == end)
    return out;
  out += std::chrono::hours{*it};

  if (++it == end)
    return out;
  out += std::chrono::minutes{*it};

  if (++it == end)
    return out;
  out += std::chrono::seconds{*it};

  if (++it == end)
    return out;
  out += std::chrono::microseconds{*it};

  return out;
}

} //namespace

Which can then be used as follows:

auto row = table.select("datetime", "date").execute().fetchOne();
auto time_point = read_date_time(row[0]);
auto year_month_day = read_date(row[1]);

getBytes document links to ColumnMetaData document url.

ColumnMetaData document links to protobuf encoding url.

protobuf encoding url / Protocol Buffers Documentation Documentation say :

Base 128 Varints

Variable-width integers, or varints, are at the core of the wire format. They allow encoding unsigned 64-bit integers using anywhere between one and ten bytes, with small values using fewer bytes.

Each byte in the varint has a continuation bit that indicates if the byte that follows it is part of the varint. This is the most significant bit (MSB) of the byte (sometimes also called the sign bit). The lower 7 bits are a payload; the resulting integer is built by appending together the 7-bit payloads of its constituent bytes.

So, for example, here is the number 1, encoded as 01 – it’s a single byte, so the MSB is not set:

0000 0001
^ msb

And here is 150, encoded as 9601 – this is a bit more complicated:

10010110 00000001
^ msb    ^ msb

How do you figure out that this is 150? First you drop the MSB from each byte, as this is just there to tell us whether we’ve reached the end of the number (as you can see, it’s set in the first byte as there is more than one byte in the varint). Then we concatenate the 7-bit payloads, and interpret it as a little-endian, 64-bit unsigned integer:

10010110 00000001        // Original inputs.
 0010110  0000001        // Drop continuation bits.
 0000001  0010110        // Put into little-endian order.
 10010110                // Concatenate.
 128 + 16 + 4 + 2 = 150  // Interpret as integer.

Because varints are so crucial to protocol buffers, in protoscope syntax, we refer to them as plain integers. 150 is the same as 9601.