Tag: rl

9 posts

Mar 16, 2026
Collaborative Reinforcement Learning: Why HACRL Trains Models in Teams Instead of Isolation
HACRL proposes a new paradigm for reinforcement learning - instead of training models in isolation, multiple agents c...
paperreview deeplearning rl llm
Sep 15, 2025
Paper Review: Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
My review of the paper Sharing is Caring Efficient LM Post-Training with Collective RL Experience Sharing
paperreview deeplearning nlp llm
Sep 01, 2025
Paper Review: Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
My review of the paper Pref-GRPO Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
paperreview deeplearning imagegeneration cv
Aug 04, 2025
Paper Review: Group Sequence Policy Optimization
My review of the paper Group Sequence Policy Optimization
paperreview deeplearning llm rl
Jun 30, 2025
Paper Review: ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
My review of the paper ProRL Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
paperreview deeplearning llm rl
Jun 09, 2025
Paper Review: Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Only ~20% of tokens actually matter when training LLMs to reason with RL. Updating the low-entropy majority actively ...
paperreview deeplearning llm rl
May 26, 2025
Paper Review: Visual Planning: Lets Think Only with Images
My review of the paper Visual Planning Let's Think Only with Images
paperreview deeplearning llm rl
Jan 27, 2025
Paper Review: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
How pure reinforcement learning (without supervised fine-tuning) can teach LLMs to reason, producing open-source mode...
paperreview deeplearning llm rl
Sep 23, 2024
Paper Review: Training Language Models to Self-Correct via Reinforcement Learning
My review of the paper Training Language Models to Self-Correct via Reinforcement Learning
paperreview deeplearning rl llm

← All tags