A large data frame (a couple of million rows, a few thousand columns) is created Pandas in python. This data frame is to be passed to R using PyRserve. This has to be quick - few seconds at most.
There is a to_json function in pandas. Is to and from json conversation for such large objects the only way? is it OK for such large objects?
I can always write it to disk and read it (fast using fread, and that it what I have done), but what is the best way to do this?
Without having tried it out,
to_json
seems to be a very bad idea, getting worse with larger dataframes as this has a lot of overhead, both in writing and reading the data.I'd recommend using rpy2 (which is supported directly by pandas) or, if you want to write something to disk (maybe because the dataframe is only generated once) you can use HDF5 (see this thread for more information on interfacing pandas and R using this format).