MLFlow: Consider running at a lower rate. How do I do so?

95 Views Asked by At

I am currently trying to set up a self-managed instance of mlflow to evaluate Azure OpenAI. I set up the following code, just from the demos and starter code I have been finding in documentation:

system_prompt = (
  "The following is a conversation with an AI assistant."
  + "The assistant is helpful and very friendly."
)

example_questions = pd.DataFrame(
    {
        "question": [
            "How do you create a run with MLflow?",
            "How do you log a model with MLflow?",
            "What is the capital of France?",
        ]
    }
)

#start a run
with mlflow.start_run() as run:
    mlflow.autolog()
    mlflow.log_param("system_prompt", system_prompt)

    # Create a question answering model using prompt engineering
    # with OpenAI. Log the model to MLflow Tracking
    logged_model = mlflow.openai.log_model(
        model="gpt-3.5-turbo",
        task=openai.ChatCompletion,
        artifact_path="model",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": "{question}"},
        ],
    )

    mlflow.evaluate(
        model=logged_model.model_uri,
        model_type="question-answering",
        data=example_questions,
    )

Whenever I run this, I get the following exception: MlflowException: 3 tasks failed. See logs for details. It seems like the logs say, "Consider running at a lower rate."

I am confused on how to lower the rate as I can't find any documentation for it, or if there is something entirely else that I am missing.

1

There are 1 best solutions below

0
Nicolas R On

If you have a look to mlflow github project for this error, you will find it here: https://github.com/mlflow/mlflow/blob/8a723062c79d1f6382cf2c1139487df903d14c67/mlflow/openai/api_request_parallel_processor.py#L353

If you check how these status_tracker.num_rate_limit_errors are set: enter image description here

=> It's incremented when "rate limit" is part of the error message, and as stated in the comment at the beginning of this block, the errors can be seen here.

So basically, you will get this kind of error when you make too many requests at the same time to your Azure OpenAI models (so the API will send 429 status code as a result). It can be due to:

  • too many tokens sent per minute (accumulated across several requests): you reached your "TPM" quota (tokens-per-minute)
  • but also too many requests (even if they have just a few tokens inside): you reached your "RPM" quota (requests-per-minute)

enter image description here

To fix that:

  • increase the quotas of your deployed model in Azure - see how-to here
  • and/or maybe check if you can add some delays in mlflow processing