Could you please assist me in finding resources related to training models like BERT using RLHF? Additionally, I'm curious about the scarcity of research on applying reinforcement learning to decoder-only models. Are there any specific challenges or issues associated with this area that limit the research? Any guidance or insights would be greatly appreciated. Thank you!
while exploring trl library i found this issue (https://github.com/huggingface/trl/issues/747) so have brainstorming a lot about how to define the trajectories for this