MLFlow: Consider running at a lower rate. How do I do so?

95 Views Asked by Arjun Chattoraj At 07 February 2024 at 18:42

I am currently trying to set up a self-managed instance of mlflow to evaluate Azure OpenAI. I set up the following code, just from the demos and starter code I have been finding in documentation:

system_prompt = (
  "The following is a conversation with an AI assistant."
  + "The assistant is helpful and very friendly."
)

example_questions = pd.DataFrame(
    {
        "question": [
            "How do you create a run with MLflow?",
            "How do you log a model with MLflow?",
            "What is the capital of France?",
        ]
    }
)

#start a run
with mlflow.start_run() as run:
    mlflow.autolog()
    mlflow.log_param("system_prompt", system_prompt)

    # Create a question answering model using prompt engineering
    # with OpenAI. Log the model to MLflow Tracking
    logged_model = mlflow.openai.log_model(
        model="gpt-3.5-turbo",
        task=openai.ChatCompletion,
        artifact_path="model",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": "{question}"},
        ],
    )

    mlflow.evaluate(
        model=logged_model.model_uri,
        model_type="question-answering",
        data=example_questions,
    )

Whenever I run this, I get the following exception: MlflowException: 3 tasks failed. See logs for details. It seems like the logs say, "Consider running at a lower rate."

I am confused on how to lower the rate as I can't find any documentation for it, or if there is something entirely else that I am missing.

Original Q&A

There are 1 best solutions below

Nicolas R On 08 February 2024 at 12:29

If you have a look to mlflow github project for this error, you will find it here: https://github.com/mlflow/mlflow/blob/8a723062c79d1f6382cf2c1139487df903d14c67/mlflow/openai/api_request_parallel_processor.py#L353

If you check how these status_tracker.num_rate_limit_errors are set:

=> It's incremented when "rate limit" is part of the error message, and as stated in the comment at the beginning of this block, the errors can be seen here.

So basically, you will get this kind of error when you make too many requests at the same time to your Azure OpenAI models (so the API will send 429 status code as a result). It can be due to:

too many tokens sent per minute (accumulated across several requests): you reached your "TPM" quota (tokens-per-minute)
but also too many requests (even if they have just a few tokens inside): you reached your "RPM" quota (requests-per-minute)

To fix that:

increase the quotas of your deployed model in Azure - see how-to here
and/or maybe check if you can add some delays in mlflow processing

MLFlow: Consider running at a lower rate. How do I do so?

There are 1 best solutions below

Related Questions in EVALUATION

Related Questions in LARGE-LANGUAGE-MODEL

Related Questions in MLFLOW

Related Questions in AZURE-OPENAI

Trending Questions

Popular # Hahtags

Popular Questions