azure openai cognitive search data architecture for RAG

318 Views Asked by At

I was wondering is someone could suggest the best approach to build an index with various data and use it in the most efficient with OpenAI

Imagine we need to support storing two type of content, on one side we have articles with

  1. Publish date
  2. Title
  3. Author(s)
  4. Content - 10,000 words content

On the other side, we have an event session/presentation with attributes:

  1. Event Name
  2. Event session/presentation title
  3. Speakers (each speaker has a name, job title and company)
  4. Transcript of the session, basically the same as the Article content, a bunch of sentences

All this needs to be stored in one index.

My question is how to structure the index using the RAG pattern, the most relevant content so Open AI can give the most relevant answers to

  1. Top 5 takeaways from a session where the speaker was XYX (azure search should return the session content as the most relevant result)
  2. Top 5 latest articles written by XYZ
  3. Give me what is happening around XYZ topic (Apple m3 chips) this can be mentioned in some of the articles or sessions

What would be the best approach to handle this?

thanks

1

There are 1 best solutions below

0
On BEST ANSWER

Create an index with a Title, Content, Keywords, and Url. Map these properties to our semantic config.

Map your source content as best you can to this model. Event Name and Title are titles. Large sections of text map to Content(transcript or content). A list of speakers, topics,

Add any other properties you may need to the index. These will not be part of you RAG setup however.

Use the SDK to push content. This makes it trivial to enrich/modify/map your content before indexing it. If you use the built-in connectors you can still map and enrich, but it is difficult to develop, maintain, and debug.