I am attempting to loop-apply a function to a list of filepaths in R, with the goal of appending the output to an out-of-memory SQLite database using dbplyr. The database is currently empty and has a predefined structure.

Here is the database structure (I do not know how to create a proper reprex for a db connection)

database
# Source:   table<epi_table> [0 x 14]
# Database: sqlite 3.44.2 [/home/me/repositories/some_path/data/some_path.db]
# ℹ 14 variables: doc_index <int>, content_type <chr>, style_name <chr>, text <chr>, level <dbl>,
#   num_id <int>, row_id <int>, is_header <int>, cell_id <dbl>, col_span <dbl>, row_span <int>,
#   successful_import <int>, filepath <chr>, doc_vs_docx <chr>

The function I loop-apply to the list of filepaths produces outputs like this for each iteration:

new_row <- structure(list(doc_index = 1L, content_type = "paragraph", style_name = NA_character_, 
    text = "Some_text", level = NA_real_, num_id = NA_integer_, 
    row_id = NA_integer_, is_header = NA, cell_id = NA_real_, 
    col_span = NA_real_, row_span = NA_integer_, successful_import = TRUE, 
    filepath = "some_file.docx", doc_vs_docx = "docx"), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame"))

new_row
# A tibble: 1 × 14
  doc_index content_type style_name text      level num_id row_id is_header cell_id col_span row_span successful_import filepath       doc_vs_docx
      <int> <chr>        <chr>      <chr>     <dbl>  <int>  <int> <lgl>       <dbl>    <dbl>    <int> <lgl>             <chr>          <chr>      
1         1 paragraph    NA         Some_text    NA     NA     NA NA             NA       NA       NA TRUE              some_file.docx docx     

I attempted to use the following code to insert rows into the database:

database |> rows_insert(new_rows, conflict = 'ignore')

However, I encountered an error regarding different object classes:

Error in `auto_copy()`:
! `x` and `y` must share the same src.
ℹ `x` is a <tbl_SQLiteConnection/tbl_dbi/tbl_sql/tbl_lazy/tbl> object.
ℹ `y` is a <tbl_df/tbl/data.frame> object.
ℹ Set `copy = TRUE` if `y` can be copied to the same source as `x` (may be slow).

If I intend to add regular tibble rows to a tbl object, do I need to convert it to a tbl object beforehand? Is dbplyr not that useful for this particular use case? I am not much interested in learning proper SQL synthax right now, and always relied on dbplyr.

Related: Is it possible to insert (add) a row to a SQLite db table using dplyr package?

1

There are 1 best solutions below

0
On

I eventually understood DBI::dbWriteTable is so simple and efficient, we should not be looking elsewhere.

Although dbplyr can be a go-to for anyone averse to SQL, it is usually recommended as a data analysis tool, and there is some strong opinion against using the package to manipulate/edit databases see Hadley's criticism in a comment here and here.

I found a lot of comments suggesting we use the DBI package instead. I managed to add the rows by simply using DBI::dbWriteTable(), as in:

dbWriteTable(con, "epi_table", new_row, append = TRUE, row.names = FALSE)

And now I can even keep using dbplyr to query the updated table with collect(database)