Tag: dpo
2 posts
Paper Review: Diffusion Model Alignment Using Direct Preference Optimization
Adapting DPO from language models to image generation — training Stable Diffusion XL on 851K human preferences to sig...
Paper Review: Zephyr: Direct Distillation of LM Alignment
My review of the paper Zephyr Direct Distillation of LM Alignment