Protects dplyr against SQL injections?

92 Views Asked by At

I would like to use R-Shiny to develop a web app that accesses an SQL database with sensitive data. So far I have written the database query as a pure SQL string. However, this approach is very susceptible to SQL injections. My intention is therefore to write the database query with dplyr, as the query is written in R and not in pure SQL. Unfortunately, I have not yet been able to find any more detailed information or an estimate on this topic online. Therefore I would like to clarify my request in this question.
So far I have only read a statement about the dbplyr::translate_sql() function that is supposed to protect SQL injections: https://dbplyr.tidyverse.org/articles/sql-translation.html#vectors
As my data is very confidential, dbplyr / dplyr should provide 100% protection against SQL injections.
I would be very pleased to receive further information and assessments on this topic.

UPDATE
As suggested in the comments, I have added an example for reproducibility: (adapted from: https://shiny.posit.co/r/articles/build/pool-dplyr/)

library(shiny)
library(DBI)
library(pool)
library(tidyverse)
library(dbplyr)

pool <- dbPool(
  drv = RMySQL::MySQL(),
  dbname = "shinydemo",
  host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
  username = "guest",
  password = "guest")

ui <- fluidPage(
  textInput("ID", "Enter your ID:", "5"),
  tableOutput("tbl"),
  numericInput("nrows", "How many cities to show?", 10),
  plotOutput("popPlot"))

server <- function(input, output, session) {
  output$tbl <- renderTable({
    pool %>% tbl("City") %>%
      filter(ID == !!input$ID)
  })
  output$popPlot <- renderPlot({
    df <- pool %>% tbl("City") %>%
      head(as.integer(input$nrows)[1]) %>% collect()
    pop <- df$Population
    names(pop) <- df$Name
    barplot(pop)
  })
}

shinyApp(ui, server)
1

There are 1 best solutions below

1
On BEST ANSWER

The threat of SQL injection depends on how the user-provided text is incorporated into the dynamic SQL query. So if you are trying to protect against injection, you need to assess each of the ways user input is used when querying the database.

In the example you have posted there are two pieces of user input:

  1. input$nrows which is limited to numeric by the input widget
  2. input$ID which is free text, and hence has greater risk of SQL injection.

You can use any of the approaches below to investigate each piece of user input. Overall, the approach I would choose is to Sanitise and then Review.

This answer focuses on the case where you are using dbplyr to translate from R to SQL. If other approaches are used to access the database (such as the DBI package) then a different approach will be required.


Option 1) Review the SQL query that is generated

Consider where input$ID is used in the example code:

pool %>%
  tbl("City") %>%
  filter(ID == !!input$ID)

In order to execute this, dbplyr will translate the dplyr commands from R into SQL. We can review this translation using the show_query command.

pool %>%
  tbl("City") %>%
  filter(ID == !!input$ID) %>%
  show_query()

This will probably produce an SQL query similar to the following:

SELECT *
FROM city
WHERE ID == 'my_id_value'

You do not need the entire Shiny app to test this for SQL injection. You can simply vary this query and see what happens. For example, try a simple alternative query:

attempt_inject = 'my_id_value;SELECT 123'
pool %>%
  tbl("City") %>%
  filter(ID == !!attempt_inject) %>%
  show_query()

I suspect this will produce SQL similar to the following:

SELECT *
FROM city
WHERE ID == 'my_id_value;SELECT 123'

Which would seem an effective prevention for SQL injection. But you will need to test this as thoroughly as required to ensure you are confident.


Option 2) Review source code

The dbplyr package is open source and available online. You could review the source code to confirm how inputs are handled during translation.

My suspicion is that unless inputs are of type sql, then they will be escaped as either text or an SQL object.

There is an sql() function that converts input to type sql. One of its uses is to tell dbplyr not to translate the contents of the function but to use them as-is. This means that if your user can submit R objects as inputs then this is a significant area of vulnerability. However, this is unlikely to be the case as your app restricts users to submitting text and numeric input.


Option 3) Sanitise your inputs before passing them to dbplyr

R has several advantages over pure SQL, in that you can use R to validate users' input before including it in a dbplyr query.

Consider this an an alternative to your output$tbl component of the app:

output$tbl <- renderTable({
  current_ID = as.character(input$ID)
  acceptable_ids = pool %>%
    tbl("City") %>%
    select(ID) %>%
    distinct() %>%
    collect() %>%
    pull()
  req(current_ID %in% acceptable_ids)

  pool %>%
    tbl("City") %>%
    filter(ID == !!current_ID)
})

This approach enforces that the user input is converted to type character and is a value found in the ID column, before passing the user input through to dbplyr.

The exact choice of checks will depend on how user input is used. I often include a check that user input does not contain special characters (such as ;{}[]* or spaces).

dbplyr includes commands to delimit certain inputs. Investigate and use these commands where applicable.

Notes

  • req is a Shiny command that stops execution of a component if the condition is not met. You can use stopifnot outside a Shiny context.
  • Fetching the list of acceptable ids within the renderTable statement is not efficient app design. This should be calculated once and reused multiple times.