RMariaDB on Databricks

237 Views Asked by At

I'm trying to get R (either via a notebook or RStudio) to connect to MariaDB on Databricks Azure 10.1. However, whether I add RMariaDB in the Libraries tab of the cluster or via install.packages("RMariaDB") in RStudio I get a failure because:

-----------------------------[ ANTICONF ]-----------------------------
Configure could not find suitable mysql/mariadb client library. Try installing:
* deb: libmariadb-dev (Debian, Ubuntu)
* rpm: mariadb-connector-c-devel | mariadb-devel | mysql-devel (Fedora, CentOS, RHEL)
* csw: mysql56_dev (Solaris)
* brew: mariadb-connector-c (OSX)
If you already have a mysql client library installed, verify that either
mariadb_config or mysql_config is on your PATH. If these are unavailable
you can also set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------[ ERROR MESSAGE ]----------------------------
<stdin>:1:10: fatal error: mysql.h: No such file or directory
compilation terminated.
-----------------------------------------------------------------------

python, R, and java jar files I have installed on databricks, but not C libraries. I found the ubuntu library to download to my laptop, but the 'upload library' function in databricks seems to just want jars.

Anyone have any idea how to get R to speak to MariaDB in Databricks? Alternatively, is it possible to do the query in a python cell of a notebook (I have this working) and access the data in an R cell?

thanks

1

There are 1 best solutions below

4
Alex Ott On

The easiest way to do that on Spark/Databricks is to use spark.read.jdbc (see docs) - you just need to provide JDBC URL, user name & password.

sparkR.session()
jdbcUrl <- "jdbc:mysql://<host>:3306/databasename"
df <- read.jdbc(jdbcUrl, "table", user = "username", password = "password")