How do I solve working around "Models maximum content length exceeded" errors

814 Views Asked by At

I am using LangChain and OpenAI to interract with my postgres database. It works well on small databases. But when the database is larger (say over 10K rows), I get the following error

This model's maximum context length is 4097 tokens. However, your messages resulted in 5100 tokens. Please reduce the length of the messages

An example public database that shows this error is the MindsDB real estate database:

postgresql+psycopg2://demo_user:demo_password@REDACTED:5432/demo

My code is below... this python code can connect to any postgres database and allow you to chat with it, but it fails as noted above on larger databases

import sys
from langchain import OpenAI
from langchain import SQLDatabase
from langchain.chat_models import ChatOpenAI
from langchain_experimental.sql import SQLDatabaseChain

import environ
env = environ.Env()
environ.Env.read_env()

API_KEY = env('OPENAI_API_KEY')

if API_KEY == "":
    print("Missing OpenAPI key")
    exit()

if len(sys.argv) < 2:
    print("Missing db connection string.  Example 'postgresql+psycopg2://postgres:1234@localhost:6667/mydb'")
    exit()

dbstring = sys.argv[1]

print("Using OpenAPI with key ["+API_KEY+"] and Database ["+dbstring+"]")

# Setup database
db = SQLDatabase.from_uri(
    dbstring,
)

# setup llm
llm = ChatOpenAI(model_name="gpt-3.5-turbo",
    temperature=0,
    max_tokens=1000,
    openai_api_key=API_KEY)

# Create db chain
QUERY = """
Given an input question, first create a syntactically correct postgresql query to run, then look at the results of the query and return the answer.
Use the following format:

Question: Question here
SQLQuery: SQL Query to run
SQLResult: Result of the SQLQuery
Answer: Final answer here

{question}
"""

# Setup the database chain
db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True)

def get_prompt():
    print("Type 'exit' to quit")

    while True:
        prompt = input("Enter a prompt: ")

        if prompt.lower() == 'exit':
            print('Exiting...')
            break
        else:
            try:
                question = QUERY.format(question=prompt)
                print(db_chain.run(question))
            except Exception as e:
                print(e)

get_prompt()
1

There are 1 best solutions below

4
On

You can you this split_query method to chunk you data..you can modify it based on your requirement. Also you have to make sure the contextual details are not lost when you chunk the data. You can try few options (splitting or playing with max tokens) so that it works for your use case

        def split_query(query, max_tokens):

            tokens = query.split()
            chunks = []
            chunk = []
            for token in tokens:
                if len(chunk) + len(token) > max_tokens:
                    chunks.append(" ".join(chunk))
                    chunk = []
                chunk.append(token)
            if chunk:
                chunks.append(" ".join(chunk))
            return chunks

        def get_prompt():
            print("Type 'exit' to quit")

            while True:
                prompt = input("Enter a prompt: ")

                if prompt.lower() == 'exit':
                    print('Exiting...')
                    break
                else:
                    try:
                        query = prompt
                        chunks = split_query(query, 4097)
                        for chunk in chunks:
                            question = QUERY.format(question=chunk)
                            print(db_chain.run(question))
                    except Exception as e:
                        print(e)

        get_prompt()