I am very new to using servers for data analysis, and I am completely baffled by what a "tbl_df""tbl""data.frame" object, and "tbl_Redshift" "tbl_dbi" "tbl_sql" "tbl_lazy" "tbl" is in R? I know that something related to the Redshift server and tbl-lazy tables but that's it.

I have been following on a previous analyst's work, and I have used collect() to pull a bunch of data onto the local machine, doing some transformation before trying to upload it back to the server. However, I cannot upload the data using compute() for some reason, and I believe the reason is because the object itself became "tbl_df""tbl""data.frame" from "tbl_Redshift" "tbl_dbi" "tbl_sql" "tbl_lazy" "tbl".

Is there a way to transform this back so I can upload it? Thanks in advance.

1

There are 1 best solutions below

1
Simon.S.A. On BEST ANSWER

Let's begin by ensuring the core concepts are clear:

  • tbl_df, tbl, and data.frame are all local objects where the data exist in local R memory.
  • tbl_Redshift, tbl_dbi, and tbl_sql are all remote objects where the data exists in the database and R has a pointer/query/connection to the database in local R memory.

The collect command retrieves data from a remote database into local R memory. Hence it converts an object of type tbl_Redshift, tbl_dbi, or tbl_sql into an object of type tbl_df, tbl, 0r data.frame.

The compute command works for remote objects (tbl_Redshift, tbl_dbi, or tbl_sql) and saves an temporary/intermediate stage of a calculation to the database, creating a new remote object. This helps reduce additional computation. It can not be used to copy data from local R memory.

Put another way:

  • collect : remote data input -> local data output
  • compute : remote data input -> remote date output

It sounds like what you need is : local data input -> remote data output.

For this, there are two broad options:

  1. Redesign your code to avoid collect. If the data is never copied from the server into local R memory then you never have to transfer it back.
  2. Use a function designed for copying data to a server. I use the DBI::dbWriteTable command for this purpose. Example function below.
copy_from_r_to_sql <- function(db_conn, db, schema, sql_table_name, r_table_name) {
  DBI::dbWriteTable(
    db_conn,
    DBI::Id(
      catalog = db,
      schema = schema,
      table = sql_table_name
    ),
    r_table_name
  )
}

(This is a cut down version of one of my dbplyr helpers here)