Deploy tidymodel model to GCP using Docker and Vetiver

236 Views Asked by At

I'm trying to follow along to this Julia Silge MLOps video where she uses Vetiver and Tidymodels to deploy to AWS Sagemaker however after running up hundreds of dollars of bills on AWS :( I've moved to GCP because they offer $300 of free credit.

I'm at the stage of trying to create the docker image to push to GCP however when I run:

docker run --env-file C:/Users/John/Documents/.Renviron --rm -p 8000:8000 penguins

I get the following error:

enter image description here

I'm slightly confused because I've set the .Renviron to include the service account json file as per below:

enter image description here

Based on the reply from VonC I've added the /path/in/container as '/documents/'

enter image description here

In the screenshot below I can see that this path/in/container has been pushed to the image:

enter image description here

As I can run gcs_list_buckets(projectId = "my-project-id") and see the buckets I've created so it looks as though I'm fully connected to my cloud environment.

Having researched this for a number of days it appears that I have to supply a full path to my environment variables to enable authentication, am I missing something?

1

There are 1 best solutions below

5
On BEST ANSWER

You mentioned that you have set your environment variables in the .Renviron file. However, when you run your Docker container, it cannot locate or properly use the credentials file specified in the GCE_AUTH_FILE environment variable.

For testing, in your Docker container, you could try and set up the environment variables properly.
Modify your Dockerfile to include the environment variables:

# Use the appropriate base image
FROM r-base:latest

# Set environment variables
ENV GCE_AUTH_FILE /path/to/your-service-account-file.json
ENV GCE_DEFAULT_PROJECT_ID your-project-id
ENV GCS_DEFAULT_BUCKET your-bucket-name

# (other Dockerfile commands)

When running your Docker container, you should mount the directory containing your service account file to the Docker container using a volume. Your docker run command might look something like this:

docker run --env-file C:/Users/John/Documents/.Renviron -v C:/path/to/directory/with/credentials:/path/in/container --rm -p 8000:8000 penguins

With /path/to/directory/with/credentials being the path to the directory on your host machine that contains your service account JSON file and /path/in/container with the path inside the Docker container where you want to mount this directory.

Still for testing, before trying to authenticate in your R script, print the environment variables to ensure they are being set correctly.

print(Sys.getenv("GCE_AUTH_FILE"))
print(Sys.getenv("GCE_DEFAULT_PROJECT_ID"))
print(Sys.getenv("GCS_DEFAULT_BUCKET"))

/path/in/container refers to the path inside your Docker container where you wish to have access to your .json and .Renviron files. This path does not exist until you create it; it is up to you to define it when you run the docker run command with the -v option. The -v option creates a bind mount, which allows you to specify a file or directory on your host system (i.e., your personal computer or wherever you are running the Docker daemon) and a path in the Docker container where that file or directory will be accessible.

docker run --env-file C:/path/to/your/project/directory/.Renviron -v C:/path/to/your/project/directory:/path/in/container --rm -p 8000:8000 penguins
  • C:/path/to/your/project/directory/ is the path on your host system where your .json and .Renviron files are located.
  • /path/in/container is the path inside the Docker container where those files will be accessible. You can name this whatever you like; it is just a path in the Linux file system of the Docker container.

In your R script, or wherever you are using these files inside the Docker container, you would use the /path/in/container to refer to these files. For example, in your .Renviron file inside the Docker container, you might set GCE_AUTH_FILE like so:

GCE_AUTH_FILE=/path/in/container/your-service-account-file.json

This way, the R processes running inside the Docker container will be able to find and use the service account file for authentication.


The OP TheGoat adds in the comments

I'm actually working in an R project and the code above was pointing to the wrong .Renviron file: there's actually one in the directory of my R project folder, I figures this out using your suggestion to print the environment variables.

I modified my docker file to include the 3 parameter and my docker run statement looks as follows: docker run --env-file C:/MLOps-in-R/.Renviron -v C:/MLOps-in-R:/documents --rm -p 8000:8000 penguins, where the path in the container is /documents.

Using docker desktop, I can see that my ENV GCE_AUTH_FILE is prefixed with '/documents'.
The error once I run the docker run command is as follows: No .httr-oauth file exists in current working directory. Do library authentication steps to provide credentials.

The error message "No .httr-oauth file exists in current working directory. Do library authentication steps to provide credentials", is from the googleAuthR package and indicates that httr package authentication has not been properly set up within your R environment in the Docker container.

To resolve this, you need to use the gar_auth_service() function from the googleAuthR package to authenticate using the service account JSON file, and specify the path to this file using the GCE_AUTH_FILE environment variable.

In your Dockerfile, ensure that you have installed the necessary R packages. You will need both the googleAuthR and httr packages. Here is how you can install them in a Dockerfile:

# other Dockerfile commands

RUN R -e "install.packages(c('googleAuthR', 'httr'), dependencies=TRUE)"

In your R script that is being used with plumber (likely plumber.R given your error message), you should set up authentication using googleAuthR::gar_auth_service() before making any GCP API calls.
For instance:

library(googleAuthR)

# Authenticate using the service account file specified in the GCE_AUTH_FILE environment variable
gar_auth_service(Sys.getenv("GCE_AUTH_FILE"))

Include the above lines at the beginning of your R script to authenticate using the service account file before making any API calls.

Before deploying your application, test the authentication locally to ensure it is working correctly. Run your R script in a local R session and check that you are able to authenticate without any errors.

Make sure that GCE_AUTH_FILE in your .Renviron file points to the correct path in the Docker container, like so:

GCE_AUTH_FILE=/documents/your-service-account-file.json

I'm still having issues with "authenticating" with my account. I feel as though I've taken a few steps backwards, I'm now getting a 403 insufficient permission error when I try gcs_list_buckets even though I have the .Renviron file with the correct JSON file for my service account.

A "403 Insufficient Permission" error usually indicates that the service account you are using does not have the necessary permissions to perform the action you are trying to execute. It is not just about setting the GCE_AUTH_FILE variable correctly; the service account associated with that file must also have sufficient permissions to interact with the Google Cloud Storage (GCS).

Verify first your Service Account Permissions

  1. Go to the GCP Console and navigate to "IAM & Admin" > "Service accounts".
  2. Locate the service account associated with your project and check the permissions it has. It should have roles that grant permission to interact with GCS. If not, you will need to edit the roles to include the necessary permissions, such as "Storage Admin" or "Storage Object Admin".

Make sure that the service account JSON key file (GCE_AUTH_FILE) you are using corresponds to the service account you verified in Step 1. If you have multiple service accounts, it can be easy to mix them up.

Before dealing with Docker, ensure that your local R session can successfully call gcs_list_buckets() with the current .Renviron settings. That can help you isolate the problem.

# Load googleCloudStorageR library
library(googleCloudStorageR)

# Test list buckets
gcs_list_buckets("your-project-id")

And double-check .Renviron:

# Print the current value to verify
print(Sys.getenv("GCE_AUTH_FILE"))

If it works locally but fails in Docker, consider adding debugging statements in your R code inside the Docker container. Log the environment variable values to ensure they are being picked up correctly.

Also, reattempt authentication: run googleAuthR::gar_auth_service(Sys.getenv("GCE_AUTH_FILE")) to authenticate manually. If it fails, it should provide a more detailed error message that could be useful for debugging.