I'm working on a project involving the llama_index library where I'm using an instance of a query engine to retrieve a response to a query. The response, an instance of Response class, is expected to contain a list of source nodes, each representing a source document that contributed to the response.
I have written a function that extracts the unique filenames of the source documents from the response. However, the function consistently returns only one filename, even when it's evident that the response is derived from multiple source documents. Here's my function:
def answer(vectorIndex, question):
storage_context = StorageContext.from_defaults(persist_dir='./storage')
vIndex = load_index_from_storage(storage_context)
query_engine = vIndex.as_query_engine()
query_response = query_engine.query(question)
filenames = set()
for source_node in query_response.source_nodes:
filenames.add(source_node.node.extra_info["file_name"])
filenames = list(filenames)
return {"answer": query_response.response, "filenames": filenames}
I'm uncertain whether the problem lies in how I extract the filenames or whether it's related to the underlying query engine. When inspecting query_response.source_nodes, it seems to only contain a single source node (but multiple times).
Could anyone provide some insight into what I might be doing wrong, or provide some advice on how to debug this issue? I would like to return all unique source documents that contribute to a response.