LlamaIndex times out when evaluating OpenAI response

Question

LlamaIndex times out when evaluating OpenAI response

434 Views Asked by Katie At 30 November 2025 at 23:25

I am having issues finding the correct method of evaluating a response from OpenAI and LlamaIndex. I am using Streamlit and LlamaIndex to create a gpt-3.5 RAG built from blog posts. I am now trying to determine whether a blog post has been used to generate the response and determine specifically which one. I am currently using RelevancyEvaluator to do this. By using '''evaluator.evaluate()''' I hope to pass back whether an article has been used (and later to tell me what article). However, when I do this it does not work as intended. The first time I send a message to ChatGPT it works, and it tells me whether a document has been used. However, the second message I send causes the system to time out. Specifically, I get the response from ChatGPT, but the '''evaluator.evaluate()''' causes a time-out.

I have tried:

I have tried using '''index.as_chat_engine()''' instead of '''index.as_query_engine''', but the same behaviour occurs
I have tried using prompt engineering, but this hallucinates some answers.
I have checked to ensure I am not hitting any rate limits within OpenAI (I am not on the basic version where you only get 3 calls a minute).

I have attached a slightly redacted and reduced version of the code below - it follows very closely the tutorials that LlamaIndex provides

@st.cache_resource(show_spinner=False)
def load_data():
   with st.spinner(text="Loading and indexing knowledge – hang tight! This should take 1-2 minutes."):
       reader = SimpleDirectoryReader(input_dir="./data", recursive=True)
       docs = reader.load_data()
       service_context = ServiceContext.from_defaults(llm=OpenAI(model="gpt-3.5-turbo", temperature=0.5, system_prompt="...."))
       index = VectorStoreIndex.from_documents(docs, service_context=service_context)
       return index, service_context
 
index, service_context = load_data()
chat_engine = index.as_query_engine()
 
if prompt := st.chat_input("Your question"): # Prompt for user input and save to chat history
   st.session_state.messages.append({"role": "user", "content": prompt})
 
for message in st.session_state.messages: # Display the prior chat messages
   with st.chat_message(message["role"]):
       st.write(message["content"])
 
 
if st.session_state.messages[-1]["role"] != "assistant":
   with st.chat_message("assistant", avatar=assistant_img):
       with st.spinner("Thinking..."):
           evaluator = RelevancyEvaluator(service_context=service_context)
           response = chat_engine.query(prompt)
           st.write(response.response)
           response_str = response.response
           for source_node in response.source_nodes:
               eval_result = evaluator.evaluate(
                       query=prompt, response=response_str, contexts=[source_node.get_content()]
               )
               print("RESULT")
               print(str(eval_result.passing))
               print(eval_result.feedback)
           message = {"role": "assistant", "content": response.response}
           st.session_state.messages.append(message) # Add response to message history

If anyone could provide any feedback why this behaviour is occurring, or how I can fix my problem, I would be very grateful!

Original Q&A

There are 1 best solutions below

**Qian-Hua Wu** · Answer 1

Qian-Hua Wu On 06 February 2024 at 07:17

I encountered the same issue as you did, but I managed to solve it by adding the following code at the top of my program.

import nest_asyncio
nest_asyncio.apply()

LlamaIndex times out when evaluating OpenAI response

There are 1 best solutions below

Related Questions in CHATBOT

Related Questions in OPENAI-API

Related Questions in LARGE-LANGUAGE-MODEL

Related Questions in LLAMA-INDEX

Related Questions in RETRIEVAL-AUGMENTED-GENERATION

Trending Questions

Popular # Hahtags

Popular Questions