Tag: rl
9 posts
Collaborative Reinforcement Learning: Why HACRL Trains Models in Teams Instead of Isolation
HACRL proposes a new paradigm for reinforcement learning - instead of training models in isolation, multiple agents c...
Paper Review: Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
My review of the paper Sharing is Caring Efficient LM Post-Training with Collective RL Experience Sharing
Paper Review: Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
My review of the paper Pref-GRPO Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Paper Review: Group Sequence Policy Optimization
My review of the paper Group Sequence Policy Optimization
Paper Review: ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
My review of the paper ProRL Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper Review: Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Only ~20% of tokens actually matter when training LLMs to reason with RL. Updating the low-entropy majority actively ...
Paper Review: Visual Planning: Lets Think Only with Images
My review of the paper Visual Planning Let's Think Only with Images
Paper Review: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
How pure reinforcement learning (without supervised fine-tuning) can teach LLMs to reason, producing open-source mode...
Paper Review: Training Language Models to Self-Correct via Reinforcement Learning
My review of the paper Training Language Models to Self-Correct via Reinforcement Learning