Trouble Deploying Flask-Based Machine Learning Model to Vertex AI Endpoint with Custom Container

25 Views Asked by At

I'm working on deploying a Flask application that serves a machine learning model using PyTorch, packaged as a Docker container, to a Vertex AI endpoint for online predictions. Despite the

Flask application starting successfully within the container (as indicated by my logs), my attempts to deploy the model to the Vertex AI endpoint consistently fail.

Here are the details of my setup:

Model Name: EnsembleFlaskClassifierV2 Model ID: 8528383931077099520 Region: asia-southeast1 Container Image: asia-southeast1-docker.pkg.dev/fyp-jx-416511/ensembleflask/ensembleflask-app:latest Machine Type for Deployment: n1-standard-4

My Flask application (ensemble_deploy_1.py) initializes and loads PyTorch models successfully and is designed to predict with an ensemble method upon receiving a POST request. The Flask app is set to run on port 5001, and this is correctly exposed and specified in the Dockerfile.

Dockerfile Configuration:

# Use an official PyTorch image as the base
FROM pytorch/pytorch:1.7.1-cuda11.0-cudnn8-runtime

# Set the working directory in the container
WORKDIR /usr/src/app

# Install dependencies
RUN pip install numpy Pillow flask torch torchvision

# Copy the Flask script, model files, and entrypoint script into the container
COPY ensemble_deploy_1.py .
COPY Models/DenseNet_Optimal.pt .
COPY Models/ResNext_Optimal.pt .
COPY Models/MobileNetV2_Optimal.pt .
COPY entrypoint.sh .

# Make the entrypoint script executable
RUN chmod +x entrypoint.sh

# Set environment variable to specify the Flask application
ENV FLASK_APP=ensemble_deploy_1.py

# EXPOSE command is commented out because it's not necessary for Vertex AI
# EXPOSE 5001

# Use entrypoint.sh to start the service
ENTRYPOINT ["./entrypoint.sh"]

entrypoint.sh:

#!/bin/sh
# This script sets the FLASK_APP environment variable and starts the Flask server
export FLASK_APP=ensemble_deploy_1.py
flask run --host=0.0.0.0 --port=5001

ensemble_deploy_1.py:

from flask import Flask, request, jsonify
import torch
from torchvision import models, transforms
from PIL import Image
import io
import torch.nn.functional as F
import logging
from logging.handlers import RotatingFileHandler

app = Flask(__name__)

# Define global device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Define the preprocess function
def preprocess_image(image_bytes):
    preprocess = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.Grayscale(num_output_channels=3),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])
    image = Image.open(io.BytesIO(image_bytes)).convert("RGB")
    image = preprocess(image).unsqueeze(0)
    return image.to(device)

# Model initialization functions
def initialize_model(model_name):
    if model_name == "densenet":
        model = models.densenet121(pretrained=False)
        num_ftrs = model.classifier.in_features
        model.classifier = torch.nn.Linear(num_ftrs, 7)
    elif model_name == "resnext":
        model = models.resnext50_32x4d(pretrained=False)
        num_ftrs = model.fc.in_features
        model.fc = torch.nn.Linear(num_ftrs, 7)
    elif model_name == "mobilenetv2":
        model = models.mobilenet_v2(pretrained=False)
        num_ftrs = model.classifier[1].in_features
        model.classifier = torch.nn.Sequential(
            torch.nn.Dropout(0.2),
            torch.nn.Linear(num_ftrs, 7)
        )
    else:
        raise ValueError("Invalid model name")
    return model.to(device).eval()

# Initialize and load models outside the request handler
model_paths = {
    'densenet': 'DenseNet_Optimal.pt',
    'resnext': 'ResNext_Optimal.pt',
    'mobilenetv2': 'MobileNetV2_Optimal.pt'
}

models = {name: initialize_model(name) for name in model_paths}
for name, model in models.items():
    model.load_state_dict(torch.load(model_paths[name], map_location=device))

# Define your F1 scores
all_model_f1_scores = {
    'Angry': {'DenseNet': 0.62, 'ResNext': 0.63, 'MobileNetV2': 0.60},
    'Disgust': {'DenseNet': 0.57, 'ResNext': 0.73, 'MobileNetV2': 0.63},
    'Fear': {'DenseNet': 0.51, 'ResNext': 0.54, 'MobileNetV2': 0.51},
    'Happy': {'DenseNet': 0.89, 'ResNext': 0.89, 'MobileNetV2': 0.88},
    'Neutral': {'DenseNet': 0.67, 'ResNext': 0.66, 'MobileNetV2': 0.66},
    'Sad': {'DenseNet': 0.57, 'ResNext': 0.58, 'MobileNetV2': 0.58},
    'Surprise': {'DenseNet': 0.81, 'ResNext': 0.81, 'MobileNetV2': 0.79},
}

class_names = ['Angry', 'Disgust', 'Fear', 'Happy', 'Neutral', 'Sad', 'Surprise']

def predict_with_ensemble(image_tensor, models, f1_scores, class_names):
    # Ensure the image tensor is on the correct device
    image_tensor = image_tensor.to(device)
    weighted_preds = torch.zeros(1, len(class_names), device=device)
    
    for model_name, model in models.items():
        with torch.no_grad():
            outputs = model(image_tensor)
            probs = F.softmax(outputs, dim=1)
            for i, class_name in enumerate(class_names):
                # Multiply by F1 score if available, else default to 1 (no weighting)
                f1_weight = f1_scores.get(class_name, {}).get(model_name, 1)
                weighted_preds[:, i] += probs[:, i] * f1_weight

    final_pred = torch.argmax(weighted_preds, dim=1)
    predicted_class = class_names[final_pred.item()]
    return predicted_class

# Setup logging
handler = RotatingFileHandler('app.log', maxBytes=10000, backupCount=3)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
logger.addHandler(handler)

@app.route('/predict', methods=['POST'])
def predict():
    logger.info("Received prediction request")
    if request.method == 'POST':
        # Convert string of image data to uint8
        if 'file' not in request.files:
            return jsonify({'error': 'No file part'})
        file = request.files['file']
        if file.filename == '':
            return jsonify({'error': 'No selected file'})
        if file:
            image_bytes = file.read()
            image_tensor = preprocess_image(image_bytes)
            predicted_class = predict_with_ensemble(image_tensor, models, all_model_f1_scores, class_names)
            return jsonify({'predicted_class': predicted_class})

if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=5001)

Logs:

INFO * Serving Flask app 'ensemble_deploy_1.py' INFO * Debug mode: off ERROR * CUDA initialization: Found no NVIDIA driver on your system. WARNING: This is a development server. Do not use it in a production deployment.

I tried deploying on other region instead of asia-southeast1, tried using other machine type for deployment. I also tried testing the container locally and it works.

I've verified that the FLASK_APP environment variable is set, and running the container locally works as expected. However, deploying to Vertex AI fails.

Not sure is it because additional configurations is needed when deploying a Flask-based container to a Vertex AI endpoint? I am kinda lost at this point.

0

There are 0 best solutions below