How to get return_source_documents metadata output from Langchain ConversationalRetrievalChain to output in Gradio chatbot?

Question

How to get return_source_documents metadata output from Langchain ConversationalRetrievalChain to output in Gradio chatbot?

968 Views Asked by texasdave At 18 August 2025 at 04:13

I have a working RAG chatbot using Zephyr, a conversation chain that retrieves from pdf files, and a gradio blocks UI. All work fine, very happy, but I would like to show my "source" metadata as a URL in the output of the answer from my chatbot. Or any other way to show the source document URL or content.

The metadata is in this format when I print the response of return_source_documents:

metadata={'page': 1, 'source': 'pdfs-folder/my-document.pdf'}

I'd like source key / value pair to appear in the output of the conversation answer, something like this:

Question:  How are does the apple fall from the tree?

Answer:  Not very far, and here's the source where we got the info from: folder/docuemnt.pdf

OR any other method to show the actual source document content in a gradio output bubble or answer.

Here is my prompt template:

prompt_template: str = """/
<|system|>
You are a helpful, respectful and honest assistant. Use the following pieces of context to answer the question at the end. Always respond to questions using the English language unless asked to do so otherwise.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.  If a question does not make any sense, or is not factually coherent, explain why instead of answering something incorrectly. If you don't know the answer to a question, please don't share false information.
</s>
{context}
{chat_history}
<|user|>
{question}
</s>
<|assistant|>"
"""

PROMPT = PromptTemplate.from_template(template=prompt_template)

here is my conversation chain:

from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory


conv_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    combine_docs_chain_kwargs={"prompt": PROMPT},
    retriever=vectordb.as_retriever(search_kwargs={"k": 2}),
    chain_type="stuff",
    return_source_documents=True,
    memory=ConversationBufferMemory(memory_key='chat_history', return_messages=True, input_key="question", output_key='answer'),   
    get_chat_history=lambda h : h,
)

Here is my gradio UI:

import gradio as gr
import os
import time

### define our classes

with gr.Blocks() as demo:

    gr.Markdown(
    """
    # Demo Company Custom Dataset Chatbot
    """)

    
    chatbot = gr.Chatbot(
        bubble_full_width=False,
        avatar_images=(None, (os.path.join(os.path.dirname("__file__"), "images/dell-logo-sm.jpg"))),
    )


    def vote(data: gr.LikeData):
        if data.liked:
            print("You upvoted this response: " + data.value)
        else:
            print("You downvoted this response: " + data.value)

    chatbot.like(vote, None, None)  # Adding this line causes the like/dislike icons to appear in your chatbot


    def user(user_message, history):
        # Get response from QA chain
        response = conv_chain({"question": user_message, "chat_history": history})

        # Append user message and response to chat history
        history.append((user_message, response["answer"]))
        return gr.update(value=""), history


    
    with gr.Row():
            msg = gr.Textbox(
                scale=4,
                placeholder="Input text or click an example button below, then press enter",
                container=False,
            )
            submit_btn = gr.Button('Submit')
            submit_btn.click(user, [msg, chatbot], [msg, chatbot], queue=False)

    
   
    clear = gr.Button("Clear")
    clear.click(lambda: None, None, chatbot, queue=False)
    
    chat_history = []
    


#    msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False)
        

if __name__ == "__main__":
    demo.queue(max_size=10)  ## sets up websockets for bidirectional comms and no timeouts, set a max number users in queue
    demo.launch(share=False, debug=True, server_name="xxxxxx", server_port=xxx, allowed_paths=["images/xxxx"])

Original Q&A

How to get return_source_documents metadata output from Langchain ConversationalRetrievalChain to output in Gradio chatbot?

There are 0 best solutions below

Related Questions in LANGCHAIN

Related Questions in GRADIO

Trending Questions

Popular # Hahtags

Popular Questions