Truncated Output from AI21 Bedrock Model with Langchain Library in Python

619 Views Asked by At

I'm using the Langchain library to make predictions with the AI21 Bedrock model. I have implemented the following code:

from langchain.chains import ConversationChain
from langchain.llms.bedrock import Bedrock
from langchain.memory import ConversationBufferMemory

ai21_llm = Bedrock(model_id="ai21.j2-ultra-v1", client=boto3_bedrock)
memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=ai21_llm, verbose=False, memory=memory
)

try:
    print(conversation.predict(input="write a paragraph about the wonders of wonder bread"))
except ValueError as error:
    if "AccessDeniedException" in str(error):
        print(f"\x1b[41m{error}\
        \nTo troubeshoot this issue please refer to the following resources.\
         \nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\
         \nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")
        class StopExecution(ValueError):
            def _render_traceback_(self):
                pass
        raise StopExecution        
    else:
        raise error

However, I'm encountering an issue where the output of conversation.predict() is truncated. For instance, the output I get is:

"Wonder Bread is a type of bread that is sold in stores. It is made from flour, water,"

I expected a complete paragraph, but it cuts off. I've checked the Langchain memory documentation here, but I didn't find anything that would suggest it's affecting the output size.

How can I debug this issue to find out why the output is truncated? Are there any limitations with AI21 or Langchain that could be causing this? Any help would be appreciated.

Langchain memory documentation. But doesn't seem helpful.

The following code does work, so I'm thinking it has something to do with Langchain

body = json.dumps({"prompt": prompt_data, "maxTokens": 200})
modelId = "ai21.j2-mid-v1"  # change this to use a different version from the model provider
accept = "application/json"
contentType = "application/json"

try:

    response = bedrock_runtime.invoke_model(
        body=body, modelId=modelId, accept=accept, contentType=contentType
    )
    response_body = json.loads(response.get("body").read())

    print(response_body.get("completions")[0].get("data").get("text"))

except botocore.exceptions.ClientError as error:

    if error.response['Error']['Code'] == 'AccessDeniedException':
           print(f"\x1b[41m{error.response['Error']['Message']}\
                \nTo troubeshoot this issue please refer to the following resources.\
                 \nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\
                 \nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")

    else:
        raise error
3

There are 3 best solutions below

2
On

This is just a stream of completions:

response_body.get("completions")[0].get("data").get("text")

If you want the full response as a string, use the following:

print(response_body.get("results")[0].get("outputText"))
0
On

I was having the same issue. What worked for me is to set max_tokens parameter of the LLM to -1. Using OpenAI, for example:

llm = OpenAI(
            temperature=0,
            model_name='gpt-3.5-turbo-1106',
            max_tokens=-1
        )

in your case:

ai21_llm = Bedrock(model_id="ai21.j2-ultra-v1", client=boto3_bedrock, max_tokens=-1)

Hope it helps :)

1
On

In order to config the output tokens with Amazon Bedrock SDK and the Anthropic Claude v2 or v2.1 you can use this addition config:

bedrock_llm = Bedrock(client=bedrock_client,
                      model_id="anthropic.claude-v2:1",
                      model_kwargs={'max_tokens_to_sample': 10000,
                                    'temperature': 0.5,
                                    'top_k': 250,
                                    'top_p': 0.7,
                                    'stop_sequences': ['Human:']})