Let's say I have some documents belonging to both train and test sets and I have respective ground truth for these documents (0/1).
Consider a scenario where I intend to do a deep learning sentence regression experiment. If I extract all individual sentences from all the documents in both train and test sets, I would not have a ground truth at the "sentence-level". To assign a target value/ground truth estimate/numerical score (to perform the regression) for sentences belonging to the train set, I devised a method where I utilized the first n words from a train SHAP beeswarm summary plot obtained in a prior classification experiment at the document-level. To clarify, this method first checks if an individual sentence (from the train set of documents) belongs to a 0 or 1 class (recall that I have the ground truth for all documents at the document level). If it belongs to a 0 document, then the scoring mechanism uses the first n words from the train SHAP plot in one way and if it belongs to a 1 document then the scoring mechanism uses the first n words from the train SHAP plot in a different way.
How do I assign a numerical score to sentences belonging to the test set? My performance evaluation objective or metric is the MAE which requires a comparison between target scores and predicted scores.
Do I mirror the same scoring mechanism that was applied for sentences in the train set? i.e. first checking if a sentence in the test set belongs to 0 or 1 and then use the first n words from the train SHAP plot itself to assign a target estimate?
Or do I not check if a sentence belongs to 0 or 1 and use the the first n words from the train SHAP plot in a generic, uniform fashion?
Which method is right/correct/appropriate ? I guess both methods are not wrong and I also guess one should not use the test SHAP plot. My goal is to avoid data leakage whatsoever.