Excel raw date number representation incoherency

149 Views Asked by At

[C# 9.0 and net7.0]

(* main request edit)
(Why can excel have the same date value representation as different dates? 43708 can be 31/08/19 and 01/09/23)

As stated by @user8728340's response, this happens because of a misalignment of bool literals representations. When scheffer represented it as "true" and the excel default (and what the library expected) was "1".

(A github issue and pool request have been created to solve the problem)

More information about the outcome of this observation can be found in the problems github issue ("DateTime incoherent value")

I have a .xlsx file from a HttpClient request and read these bytes as a stream. I'm trying to use ExcelDataReader in the stream; I detected that datetime values are not correct. The table has only one column of date and all dates start in 2021.

All the datetimes read with the IExcelDataReader are specifically 31/08/2019 - this cannot be because of the above statement.

When I tested it in my personal computer the dates returned right what I realized is in order to transfer this file to my computer I had to save it using Excel.

The stream from the request has ~2079 bytes while the (saved by Excel) file has ~5569 so I compared the byte structure in both files in notepad++ and tested a lot to ensure that the original version in fact never returns the dates right (every other data is correct) but the modified one is perfectly fine. I tested this behavior in 3 machines and the results are the same. So, here is the deal

I'm already using the ExcelDataReader library so I would like not to use another to solve the same problem.

The origin file from the request ends with "Implementation by Anton Scheffer" literary written to the file

Clarification

this is the origin file from the web request

While the Excel saved version eds with a more understandable manner

The modified by Excel version

Looking into the .xlsx file structure as a .zip in windows, we can access the entry: xl/worksheets/sheet1.xml, storing the table's data from the file. In the 14º column, the date one, the value written to the file is 43708 this value can be interpreted by excel in two ways,
as 31/08/19 or 01/09/23. I don't know what defines that.

3

There are 3 best solutions below

0
user8728340 On BEST ANSWER

The Excel library that created this file - PL/SQL "as_xlsx" by Anton Schleffer - saves files with a non-standard value for the "date1904" workbook property. Excel itself sets the date1904 property to 1, but in files produced by "as_xlsx" it is true. ExcelDataReader does not support this particular deviation and proceeds to return wrong dates off by 4 years.

3
James Jonatah On

I detected that the date always shifted 1462 days in the past constantly. I know it doens't solve the root of the problem but its a solution to our business for now. Adding the missing days to the readed value fixes the problem.

Thanks to @Flydog57 for explaining that xlsx files are structured as zip files so i could access the file internal content and read the xml raw data. The values written to the xml's content at the target column is 43708 which is, the number representation of the date 31/08/2019 literally. In the excel saved version of the file the same value 43708 is represented as 01/09/2023, the reason for that remains unknown.

comparasion:
raw file: at the raw request content excel saved file: at the excel's modified version

1
Ron Rosenfeld On

What you are seeing seems dependent on the date system set in your Excel application for the workbook.

Using the 1904 date system, 43708 will convert to 1-Sep-2023 Using the 1900 date system, 43708 will convert to 31-Aug-2019

The setting is under File=>Options=>Advanced=>When Calculating this Workbook

enter image description here

Excel stores dates as serial numbers starting at either 1/1/1900 or 1/1/1904 depending on the date system selected.

Most windows installs default to the 1900 date system. Mac installs used to default to the 1904 date system. You should be aware that the 1900 date system has a deliberate error (supposedly to be compatible with a Lotus 123 accidental error) wherein 1900 is considered a leap year.

See Differences between the 1900 and the 1904 date system in Excel for a more detailed explanation.

Depending on your requirements, you may need to change the date system in the workbook, or use the 1462 constant you have derived.