I am writing a python application that should be able to vectorize CVs and then use langchain to answer questions like "CV with the most working experience in software development" (no CV contains personal information). I went with Qdrant vector db running locally using Docker. When I read PDF file I have to split them into ~1000 tokens size chunks which means, that some CVs might be split in up to 10 different Points in Qdrant. Of course I keep the metadata which is page (the page of the original CV pdf) and source which is the full path including filename. After I embedded all CVs how should I form a request to FOR EXAMPLE search for a CV with the longest working experience in software development?
How to preserve points "tenacity" in Qdrant using langchain?
200 Views Asked by Ruby Rain AtThere are 2 best solutions below

The problem you're facing cannot be easily solved by semantic search. To find the most relevant documents with a query such as "CV with the most working experience in software development", you need to order the set of CVs you have by their experience, and that sounds like something that might be solved by SQL query generation if you can extract your data into a tabular format.
See: https://python.langchain.com/docs/use_cases/qa_structured/sql#case-1-text-to-sql-query
A single embedding encodes the semantic information of a particular document chunk. Assuming there is info like "I have ten years of experience working as a Software Developer", our embedding will encode it, but it doesn't mean a sentence "I have fifteen years of experience working as a Software Developer" will be far away from the first one.
Embeddings might be useful to find the devs working with a similar stack, but if you want to sort the results, then it's impossible and you need to use other means.
It sounds like you could try to generate SQL queries based on user prompts, not to implement RAG. Semantic search can't solve this kind of problem. The question "CV with the most working experience in software development" cannot be answered with a single document in your collection, as it's more a question about all the documents overall (the longest experience means we know all the other documents).
Qdrant, with a proper embedding model, can help you find the CVs of people working in specific technologies or with the desired experience.