R jsonlite stream_in losing precision

93 Views Asked by At

I am reading in ndjson file (~1Gb) with large IDs. The IDs are around 19 digits and lose precision when streamed in. The last 4-5 digits differ. How can I avoid this? Thank you!

library(jsonlite)
data_out <- data.frame(userID = c(1123581321345589000, 3141592653589793000, 2718281828459045000),
                   variable = c("a", "b", "c"))

con_out <- file("test_output.json", open = "wb")
jsonlite::stream_out(data_out, con_out, auto_unbox = T)
close(con_out)

con_in <- file("test_output.json")
data_in <- jsonlite::stream_in(con_in)

> format(data_in$userID, scientific = F)
[1] "1123581321345590016" "3141592653589790208" "2718281828459039744"

edit: I have no control over the input file or its formats. If I open the input file in the editor, the IDs are correct. The "error" happens when streaming in.

1

There are 1 best solutions below

5
On

You could convert userID to character:

library(jsonlite)
data_out <- data.frame(userID = c(1123581321345589000, 3141592653589793000, 2718281828459045000),
                       variable = c("a", "b", "c"))

# Convert to character
data_out$userID <- as.character(data_out$userID)

con_out <- file("test_output.json", open = "wb")
jsonlite::stream_out(data_out, con_out, auto_unbox = T)
#> Complete! Processed total of 3 rows.
close(con_out)

con_in <- file("test_output.json")
data_in <- jsonlite::stream_in(con_in)
#> opening file input connection.
#>  Found 3 records... Imported 3 records. Simplifying...
#> closing file input connection.

identical(data_in,data_out)
#> [1] TRUE