R sqldf SELECT when col may be missing

Question

R sqldf SELECT when col may be missing

96 Views Asked by linguist_at_large At 17 August 2025 at 17:21

I have a number of .csv files. Certain columns appear in all files, but some others only appear in some files. There are also many other columns I don't want.

I'm using R sqldf to iterate through the .csv files, select the desired columns, and write it to a new .csv file. However, I don't know how to tell sqldf that some columns may not be found in some of the .csv files, in which case I want it to create it anyway (even though it will be blank).

In the example below, the column userlocation1 only appears in some of the .csv files. So the code doesn't work if a file doesn't have it.

I've tried try-catch but that just caused that some files weren't processed at all.

for (fileNumber in fileNumbers){
  newfileName <- paste(sub(paste("\\.", sep = ""), "", fileNames[fileNumber]),
                       "-clean", ".csv", sep = "")
  tx <- paste(readLines(fileNames[fileNumber], encoding="UTF-8"), collapse="%%%")  
  tx_as_table <- fread(text = tx, header = TRUE, sep = ",", nrows = 501, 
                 stringsAsFactors = FALSE, fill = TRUE, select = 1:50)  
  dfmediaurl <- sqldf('select created_at, text, id_str, in_reply_to_status_id, 
                username1, userscreen_name1, userlocation1 from tx_as_table')
  write.table(x = dfmediaurl, file = newfileName, append = FALSE, quote = TRUE, sep 
              = ";", row.names = FALSE, col.names = TRUE)
  
}

Original Q&A

There are 1 best solutions below

**G. Grothendieck** · Answer 1

To refer to all columns use * as shown in this example with the builtin anscombe data frame:

sqldf("select * from anscombe limit 3")
##   x1 x2 x3 x4   y1   y2    y3   y4
## 1 10 10 10  8 8.04 9.14  7.46 6.58
## 2  8  8  8  8 6.95 8.14  6.77 5.76
## 3 13 13 13  8 7.58 8.74 12.74 7.71

or to show only certain columns such that the table may not have some of them then intersect the column names with those actually in the table creating a character vector of column names to use, convert that to a comma separated string using toString and insert that into the SQL statement. Note that fn$ must prefix sqldf to enable string interpolation. verbose=TRUE will also display the SQL statement that is run (i.e. it will also show the output lines that begin with sqldf: below -- note that the second last sqldf: line in the output displays the generated SQL statement):

# use those columns among x3, x4 and x5 that are in anscombe
cols <- toString(intersect(c("x3", "x4", "x5"), names(anscombe)))

fn$sqldf("select $cols from anscombe limit 3", verbose = TRUE)
## sqldf: library(RSQLite)
## sqldf: m <- dbDriver("SQLite")
## sqldf: connection <- dbConnect(m, dbname = ":memory:")
## sqldf: initExtension(connection)
## sqldf: dbWriteTable(connection, 'anscombe', anscombe, row.names = FALSE)
## sqldf: dbGetQuery(connection, 'select x3, x4 from anscombe limit 3')
## sqldf: dbDisconnect(connection)
##   x3 x4
## 1 10  8
## 2  8  8
## 3 13  8

R sqldf SELECT when col may be missing

There are 1 best solutions below

Related Questions in R

Related Questions in SQLDF

Trending Questions

Popular # Hahtags

Popular Questions