Tag: reasoning
4 posts
FIPO: Teaching LLMs Which Thoughts Actually Matter
FIPO - an RL algorithm that fixes one of the core limitations of RL for LLM reasoning - credit assignment. Instead of...
Paper Review: M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
My review of the paper M1 Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper Review: Training Large Language Models to Reason in a Continuous Latent Space
Coconut lets LLMs reason in latent space instead of generating text tokens, enabling breadth-first exploration of rea...
Paper Review: Reverse Thinking Makes LLMs Stronger Reasoners
My review of the paper Reverse Thinking Makes LLMs Stronger Reasoners