I'm using the following code however it is creating multiple connections when calling the map function and they are not closing. As a result my rds database is getting flooded with connections. Is there any way to change this code to prevent so many connections?
connect.to.database <- function (dbname, schema = "public", host, port, user, pass) {
con <- dbConnect(RPostgres::Postgres(),
dbname = dbname,
user = user,
password = pass,
host = host,
port = port)
# this puts the schema in the search path, which means that instead of
# having to use <schema name>.<table name> you can just write <table name>
res <- dbSendQuery(con, paste0("SET search_path TO ",
dbQuoteIdentifier(con, schema),
", public"))
# check for errors
dbFetch(res)
dbClearResult(res)
con
}
schemas <- dbGetQuery(connect.to.database(dbname, "public", host, port, user, password), paste0("SELECT schema_name FROM information_schema.schemata"))
schema_names <- schemas %>% pull()
schemas_tables <- map(.x = schema_names,~dbGetQuery(connect.to.database(dbname, "public", host, port, user, password), paste0("SELECT table_name FROM information_schema.tables WHERE table_schema = ","'",.x,"'")) %>% mutate(schema_name = .x)) %>%
bind_rows()
Create a single global connection object and use it within
map
. (I remove the unnecessarypaste0
from your first query.)You may want to consider parameterized queries vice constructing query strings manually. While there are security concerns about malicious SQL injection (e.g., XKCD's Exploits of a Mom aka "Little Bobby Tables"), it is also a concern for malformed strings or Unicode-vs-ANSI mistakes, even if it's a single data analyst running the query. Both
DBI
(withodbc
) andRODBC
support parameterized queries, either natively or via add-ons.That would change this to:
But frankly, I think it might be much easier to use
IN
instead of=
. Again, using parameter-binding.(No
map
required.)Or I believe you can do it in one query, not two.
Remember
... to close the connection when you are done.