I use the below code based on the tutorials from Haystack:
lfqa_prompt = PromptTemplate("deepset/question-answering-with-references", output_parser=AnswerParser(reference_pattern=r"Document\[(\d+)\]"))
prompt_node = PromptNode(model_name_or_path="google/flan-t5-large", default_prompt_template=lfqa_prompt)
pipe = Pipeline()
pipe.add_node(component=retriever, name="retriever", inputs=["Query"])
pipe.add_node(component=prompt_node, name="prompt_node", inputs=["retriever"])
output = pipe.run(query="A question?")
print(output["answers"][0].answer)
The very first time when I ran this the below line took time as it downloaded the model to my cache:
prompt_node = PromptNode(model_name_or_path="google/flan-t5-large", default_prompt_template=lfqa_prompt)
My Assumption was for the next run it will use the cached model. As expected it's not downloading, but it's still taking a lot of time.
Can we reduce this time by saving the already processed model?
I couldn't reproduce exactly your example as you have a retriever as well but I used a minimal example below:
I changed your model to
google/flan-t5-base, and the script instantly produced the correct response every time I ran it. My best guess is that your retriever might be doing some time-consuming setup. Please try the example above.