First happy new year to everybody and happy coding for 2017.
I have a Python pandas dataframe that I need to convert to a R dataframe. My Python pandas dataframe looks like this:
'data.frame': 302 obs. of 19 variables:
$ typ : chr "page" "area" "par" "line" ...
$ id : chr "page_1" "block_1_1" "par_1_1" "line_1_1" ...
$ page : num 1 1 1 1 1 1 1 1 1 1 ...
$ area : num NA 1 1 1 2 2 2 2 3 3 ...
$ par : num NA NA 1 1 NA 2 2 2 NA 3 ...
$ line : num NA NA NA 1 NA NA 2 2 NA NA ...
$ x1 : num 0 0.02 36.91 36.91 0.03 ...
$ y1 : num 0 26.1 4.2 4.2 26.1 ...
$ x2 : num 100 5.95 36.92 36.92 5.97 ...
$ y2 : num 100 26.09 8.29 8.29 44.54 ...
$ length : num 100 5.93 0.02 0.02 5.93 ...
$ heigth : num 100 0.01 4.09 4.09 18.44 ...
$ txt : chr "" "" "" "" ...
$ strong : chr "" "" "" "" ...
$ special : chr "" "" "" "" ...
$ AVGx : num 50 2.98 36.91 36.91 3 ...
$ AVGy : num 50 26.09 6.24 6.24 35.31 ...
$ SC_NR : chr "41151000029" "41151000029" "41151000029" "41151000029" ...
$ DOK_LFNR: chr "640" "640" "640" "640" ...
I am using:
pandas2ri.activate()
pandas2ri.py2ri(dataframe)
and I got the following R dataframe:
'data.frame': 302 obs. of 19 variables:
$ typ : Factor w/ 5 levels "area","line",..: 3 1 4 2 1 4 2 5 1 4 ...
$ id : Factor w/ 302 levels "block_1_1","block_1_10",..: 77 1 78 28 12 89 39 216 21 100 ...
$ page : num 1 1 1 1 1 1 1 1 1 1 ...
$ area : num NA 1 1 1 2 2 2 2 3 3 ...
$ par : num NA NA 1 1 NA 2 2 2 NA 3 ...
$ line : num NA NA NA 1 NA NA 2 2 NA NA ...
$ x1 : num 0 0.02 36.91 36.91 0.03 ...
$ y1 : num 0 26.1 4.2 4.2 26.1 ...
$ x2 : num 100 5.95 36.92 36.92 5.97 ...
$ y2 : num 100 26.09 8.29 8.29 44.54 ...
$ length : num 100 5.93 0.02 0.02 5.93 ...
$ heigth : num 100 0.01 4.09 4.09 18.44 ...
$ txt : Factor w/ 189 levels "","[e]","{minutes}",..: 1 1 1 1 1 1 1 107 1 1 ...
$ strong : Factor w/ 3 levels "","0","1": 1 1 1 1 1 1 1 2 1 1 ...
$ special : Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
$ AVGx : num 50 2.98 36.91 36.91 3 ...
$ AVGy : num 50 26.09 6.24 6.24 35.31 ...
$ SC_NR : Factor w/ 1 level "41151000029": 1 1 1 1 1 1 1 1 1 1 ...
$ DOK_LFNR: Factor w/ 1 level "640": 1 1 1 1 1 1 1 1 1 1 ...
The issue is that the R dataframe has factor type instead of chr type. I managed to fix it with R code:
i <- sapply(df, is.factor)
df[i] <- lapply(df[i], as.character)
Is there a way to do that during the conversion directly?
I am using :
python 2.7.12
rpy2 2.8.2
pandas 0.18.1
Thanks Fabien
Consider converting to character columns in Python by importing R's base package. Apparently, the
pandas2ri.py2ri()
method only uses the default features of R'sdata.frame()
which renders characters to factors. Below uses therclass
method as described in rpy2 docs: