How to prevent Columns from being Decimals in DataFrames when retrieved from Postgres

168 Views Asked by At

I have a DataFrame df which I retrieved from a Postgres database as follows

using DataFrames, LibPQ

con = LibPQ.Connection(con_string)
result = execute(con, "SELECT * FROM [table]")
df = DataFrame(result)
close(con)

Sorry, I cannot make this reproducible.

Now, either DataFrames or LibPQ is turning NUMERIC Postgres columns into type Decimals.Decimal. This might be cool for being as accurate as possible, but it provides problems when I try to plot anything with these columns.

eltype.(eachcol(df))

5-element Vector{Union}:
 Union{Missing, String}
 Union{Missing, TimeZones.ZonedDateTime}
 Union{Missing, Int32}
 Union{Missing, Date}
 Union{Missing, Decimals.Decimal}

As very nicely explained here by Bogumił Kamiński I can change the columns of a specific type to some other type. The caveat is that I cannot even test whether a column is of type Union{Missing, Decimals.Decimal}, because the Decimals package is not loaded. OK, I thought, let's load the Decimals package then - but it doesn't work, because the package must be installed first...

Is there some other way to turn these columns into Float64s? Without having to install the entire package? I know that I could change the column types by using the column names, like

df.my_column = Float64.(df.my_column)

but I will not know the relevant column names in advance.

3

There are 3 best solutions below

5
On BEST ANSWER

You can use Union{Missing, AbstractFloat} as type selector as Decimal <: AbstractFloat.

Since Union{Missing, AbstractFloat} is not a concrete type you need to write eltype(col) <: Union{Missing, AbstractFloat} to check a subtyping condition.


By the way if you have LibPQ.jl installed then you also have access to Decimals.jl:

julia> LibPQ.Decimals.Decimal
Decimals.Decimal
0
On

The logic behind how it maps each data type to each Julia type is in https://github.com/iamed2/LibPQ.jl/blob/master/src/parsing.jl. For instance there are the lines:

## integers
_DEFAULT_TYPE_MAP[:int2] = Int16
_DEFAULT_TYPE_MAP[:int4] = Int32
_DEFAULT_TYPE_MAP[:int8] = Int64
## floating point
_DEFAULT_TYPE_MAP[:float4] = Float32
_DEFAULT_TYPE_MAP[:float8] = Float64
## oid
_DEFAULT_TYPE_MAP[:oid] = Oid
## numeric
_DEFAULT_TYPE_MAP[:numeric] = Decimal

It does not appear that there is an easy way to overwrite the last default type map if you just want a Float64 rather than trying to deal with Decimals in return for additional accuracy at the 16th decimal place. But what you can do is:

  • do cast(whatever as float8) in your postgres query for all of the numeric columns being output.
  • Fork the package. Change this line and call your forked package rather than this one.
2
On

You can use identity to properly type every column in DataFrame.

julia> df=DataFrame(A=Number[1,2],B=Union{Missing,AbstractFloat}[3,4])
2×2 DataFrame
 Row │ A       B          
     │ Number  Abstract…? 
─────┼────────────────────
   1 │      1         3.0
   2 │      2         4.0

julia> identity.(df)
2×2 DataFrame
 Row │ A      B       
     │ Int64  Float64 
─────┼────────────────
   1 │     1      3.0
   2 │     2      4.0