I was wondering is someone could suggest the best approach to build an index with various data and use it in the most efficient with OpenAI
Imagine we need to support storing two type of content, on one side we have articles with
- Publish date
- Title
- Author(s)
- Content - 10,000 words content
On the other side, we have an event session/presentation with attributes:
- Event Name
- Event session/presentation title
- Speakers (each speaker has a name, job title and company)
- Transcript of the session, basically the same as the Article content, a bunch of sentences
All this needs to be stored in one index.
My question is how to structure the index using the RAG pattern, the most relevant content so Open AI can give the most relevant answers to
- Top 5 takeaways from a session where the speaker was XYX (azure search should return the session content as the most relevant result)
- Top 5 latest articles written by XYZ
- Give me what is happening around XYZ topic (Apple m3 chips) this can be mentioned in some of the articles or sessions
What would be the best approach to handle this?
thanks
Create an index with a Title, Content, Keywords, and Url. Map these properties to our semantic config.
Map your source content as best you can to this model. Event Name and Title are titles. Large sections of text map to Content(transcript or content). A list of speakers, topics,
Add any other properties you may need to the index. These will not be part of you RAG setup however.
Use the SDK to push content. This makes it trivial to enrich/modify/map your content before indexing it. If you use the built-in connectors you can still map and enrich, but it is difficult to develop, maintain, and debug.