Improving accuracy of queries in Azure OpenAI and Cognitive Search

528 Views Asked by At

I have a blob storage with around 1500 accounting documents indexed using Cognitive Search, with OCR skills integration. Although the index seems fine, the model's query results seem inaccurate. For example, when performing the query "Tell me everything about the document where the date is 20.09.2023" in the playground, the model responds "The requested information is not found in the retrieved data." To understand why this happens mostly, I set up a log analytics workspace and found the following log when filtering for the query executed by the model: "..search=Everything about document with date 20.09.2023...". The query is just a snippet of the question, without understanding the question itself. As a result, this query doesn't lead to any useful information. This problem occurs with almost every question, and I was wondering whether there is a way to make the chat perform better queries? The model used is GPT 3.5 Turbo 0301, both keyword and semantic search have the problem.

Here an example of a data point from the index:

{

"@search.score": 1,

"content": "",

"metadata_storage_path": "my path",

"merged_content": "My ocr extracted content"

},
1

There are 1 best solutions below

3
On

I am assuming you are using Azure OpenAI Studio, and you have configured it to use your own data. And that data is documents stored in Azure Blob Storag and it is indexed with Azure Cognitive Search.

This is a typical RAG setup (Retrieval Augmented Generation). It works amazingly well, but it depends on your content and the type of questions you ask.

Your query (question) will be simplified. The purpose is to produce a semantic query that will return snippets of data from your index that AI can utilize to produce an answer.

Your data needs to contain semantic statements or facts. And your question must be something that can be answered by these facts. Open your Azure Search service and inspect the indexed data. Then attempt to ask questions to statements found in that data.