I'm building a QA machine. I have a problem that one question maybe have multiple answers, and the answers are located in different position in context. For example:
Question: What does Chris have to do?
Context: ....Chris have to wash dishes....(more text)....Chris have to do his homework....
Correct answers:
- wash dishes
- do homework
When I got the answers out for a question, I use a clustering algorithm to deduplicate and get "separate" answers. Therefore, I need a dataset having some pair of 1 question - many answers like above to evaluate my clustering algorithm and sentence embedding model.
Is there any public dataset that support a pair of one question - multiple correct answers (not duplicated)? I tried MS MARCO but most of multiple answers in this dataset are duplicated.
Muc2004 is a document-level event extraction dataset, for each event role, there are multiple answers. For example,
Question: Who are the victims of the attack?
Context: ....because of Carlos Valencia Garcia's death sentence is the last night....(more text)...The assassination of Maria Elena Diaz...
Correct answers: