Converting R dataframe with NA in text column to Python with rpy2

42 Views Asked by At

I want to convert an R dataframe to python using rpy2. I don't know how to convert NA appearing in a text column to a Python value.

This is an example which shows my problem.

import rpy2.robjects as ro
ro.r('n = c(1,2)')
ro.r("b = c(NA,'def')")
ro.r("df = data.frame(n,b)")
rdf = ro.r('df')
from rpy2.robjects.conversion import localconverter
from rpy2.robjects import pandas2ri
with localconverter(ro.default_converter + pandas2ri.converter):
    df = ro.conversion.rpy2py(rdf)

Produces:

>>> print(df)
     n              b
1  1.0  NA_character_
2  2.0            def

A similar code with an old version of rpy2 used to work

import rpy2.robjects as ro
ro.r('n = c(1,2)')
ro.r("b = c(NA,'def')")
ro.r("df = data.frame(n,b)")
rdf = ro.r('df')
from rpy2.robjects import pandas2ri
df = pandas2ri.ri2py(rdf)

Produced:

>>> print(df)
     n    b
0  1.0  NaN
1  2.0  def

How do I get back the old behaviour?

1

There are 1 best solutions below

0
Matyasch On

This seems to be a bug as of rpy2 version 3.5.14, which also appeared with the numpy converter. Until it is fixed, you could use the deprecated pandas2ri.activate().

import rpy2.robjects as ro
ro.r('n = c(1,2)')
ro.r("b = c(NA,'def')")
ro.r("df = data.frame(n,b)")
rdf = ro.r('df')
from rpy2.robjects import pandas2ri
pandas2ri.activate()
df = ro.conversion.rpy2py(rdf)
print(df)

Outputs:

     n     b
1  1.0  None
2  2.0   def

Note that instead of NaN the first value is None, but this makes sense since the values of column b are strings instead of numbers. If b was a column of number, e.g. ro.r("b = c(NA, 3)"), then you would get NaN.