FIPO: Teaching LLMs Which Thoughts Actually Matter
20 April 2026
FIPO - an RL algorithm that fixes one of the core limitations of RL for LLM reasoning - credit assignment. Instead of giving every token in a rollout the same outcome advantage, it re-weights tokens by a discounted future-KL signal, enabling longer and more effective reasoning chains.