Binary data to date/time stamp, unable to find the proper conversion using Python

119 Views Asked by At

First post, long time lurker, couldn't get the format the way I wanted it to be. -sorry

I'm trying to convert part of a binary file to a date/time (in Python). But whatever I try I'm unable to find the proper conversion.

My guess is that the left byte (0x30) is not part of the data, and the remaining 8 bytes contain the relevant data.

Below are the binary parts, both in decimal and in Hex, and the date/time they represent. Any help is highly appreciated.

48  101 26  235 227 242 150 197 65
30  65  1a  eb  e3  f2  96  c5  41  
  -- should read as 16 December 2023 at 15:03

48  198 54  133 112 138 151 197 65
30  c6  36  85  70  8a  97  c5  41 
  -- should read as 17 December 2023 at 12:37

48  74  38  27  107 41  116 196 65
30  4a  26  1b  6b  29  74  c4  41 
  -- should read as 1 October 2022 at 12:49

I've tried to unpack the data as either double or long long int and then obtain a date from it. I've searched the site and tried chat GPT to no avail.

Extra sample data

30  23  84  b1  a8  b5  97  c5  41 : 17 December 2023 at 18:45
30  3f  91  e7  96  b5  97  c5  41 : 17 December 2023 at 18:45 (slightly later)
30  a6  d6  2f  d1  b5  97  c5  41 : 17 December 2023 at 18:46
30  e8  16  9c  b9  b5  97  c5  41 : 17 December 2023 at 18:47
1

There are 1 best solutions below

1
Pierre D On BEST ANSWER

The reason I asked in comments for some more examples, especially ones close to each other by time, was to see what parts in the binary values were changing. I considered several types of encoding (some even based on textual representations of the timestamps). I looked at temporenc. I looked at floating point representations of seconds since the Epoch.

But one thing struck me: it was quite interesting to see that among these three examples:

{
    '30 65 1a eb e3 f2 96 c5 41': '16 December 2023 at 15:03',
    '30 c6 36 85 70 8a 97 c5 41': '17 December 2023 at 12:37',
    '30 23 84 b1 a8 b5 97 c5 41': '17 December 2023 at 18:45',
}

the c5 byte (2nd from right) is constant, while the 3rd byte from the right is 97 for Dec. 17 and 96 for Dec. 16.

Further, I started looking at the whole integer value of the bytes in reverse order (excluding the first and last ones that are constant and may be delimiters).

I then noticed that the time differences between two consecutive timestamps corresponded to a multiple of the int values. That multiple is close to 8_388_608, which is 2 ** 23.

Fast-forward to a few more steps, and we get:

def f(k):
    return (int(''.join(k.split()[1:-1][::-1]), 16) >> 23) - 4927272860

That function gives a fairly good approximation of the timestamps provided, in seconds since the Epoch. One additional thing is, there was a conspicuous 3600 seconds error for the October date, so I figured there was some daylight savings in your dates. Since you are in Europe, I used Zurich's timezone.

Put all together:

import pandas as pd


tz = 'Europe/Zurich'

examples = {
    '30 65 1a eb e3 f2 96 c5 41': '16 December 2023 at 15:03',
    '30 c6 36 85 70 8a 97 c5 41': '17 December 2023 at 12:37',
    '30 4a 26 1b 6b 29 74 c4 41': '1 October 2022 at 12:49',
    '30 23 84 b1 a8 b5 97 c5 41': '17 December 2023 at 18:45',
    '30 3f 91 e7 96 b5 97 c5 41': '17 December 2023 at 18:45:30',
    '30 a6 d6 2f d1 b5 97 c5 41': '17 December 2023 at 18:46',
    '30 e8 16 9c b9 b5 97 c5 41': '17 December 2023 at 18:47',
}

examples = dict(sorted([
    (k, pd.Timestamp(v, tz=tz)) for k, v in examples.items()
], key=lambda item: item[1]))

Then:

def f(k):
    return (int(''.join(k.split()[1:-1][::-1]), 16) >> 23) - 4927272860

def to_time(k, tz):
    return pd.Timestamp(f(k) * 1e9, tz=tz)

fmt = '%F %T %Z'

test = [
    (
        f'{v:{fmt}}',  # given time
        f'{to_time(k, tz=tz):{fmt}}', # estimate from bytes
        (to_time(k, tz=tz) - v).total_seconds(), # difference in seconds
    )
    for k, v in examples.items()
]

>>> test
[('2022-10-01 12:49:00 CEST', '2022-10-01 12:49:30 CEST', 30.0),
 ('2023-12-16 15:03:00 CET', '2023-12-16 15:03:23 CET', 23.0),
 ('2023-12-17 12:37:00 CET', '2023-12-17 12:36:37 CET', -23.0),
 ('2023-12-17 18:45:00 CET', '2023-12-17 18:45:25 CET', 25.0),
 ('2023-12-17 18:45:30 CET', '2023-12-17 18:44:49 CET', -41.0),
 ('2023-12-17 18:46:00 CET', '2023-12-17 18:46:46 CET', 46.0),
 ('2023-12-17 18:47:00 CET', '2023-12-17 18:45:59 CET', -61.0)]

Perhaps with more examples and more info, you may adjust the constants used above. I tried to express the offset above in terms of an origin as a date, but it wasn't satisfying. One approach I tried was with:

origin = pd.Timestamp('2018-01-05 18:48:33')
offset = int(origin.value / 1e9)

def f(k):
    return (int(''.join(k.split()[::-1])[3:-2], 16) >> 23) + offset

but I didn't find it much better from an "Occam's razor" perspective.