Tag: rl – Andrey Lukyanenko

Apr 24, 2026

DeepSeek-V4 Review: Why Million-Token Context Needs Efficient Attention, Not Just Larger Windows

DeepSeek V4 pairs a hybrid sparse-attention stack with on-policy distillation across domain specialists to bring 1M-t...

paperreview deeplearning llm moe

Apr 20, 2026

FIPO: Teaching LLMs Which Thoughts Actually Matter

FIPO - an RL algorithm that fixes one of the core limitations of RL for LLM reasoning - credit assignment. Instead of...

paperreview deeplearning llm rl

Apr 06, 2026

Book Review: A Practical Guide to Reinforcement Learning from Human Feedback

A review of Sandip Kulkarni book on RLHF, covering its strengths as a structured learning resource, its reliance on b...

blogpost books rl rlhf

Mar 16, 2026

Collaborative Reinforcement Learning: Why HACRL Trains Models in Teams Instead of Isolation

HACRL proposes a new paradigm for reinforcement learning - instead of training models in isolation, multiple agents c...

paperreview deeplearning rl llm

Sep 15, 2025

Paper Review: Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

My review of the paper Sharing is Caring Efficient LM Post-Training with Collective RL Experience Sharing

paperreview deeplearning nlp llm

Sep 01, 2025

Paper Review: Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

My review of the paper Pref-GRPO Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

paperreview deeplearning imagegeneration cv

Aug 04, 2025

Paper Review: Group Sequence Policy Optimization

My review of the paper Group Sequence Policy Optimization

paperreview deeplearning llm rl

Jun 30, 2025

Paper Review: ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

My review of the paper ProRL Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

paperreview deeplearning llm rl

Jun 09, 2025

Paper Review: Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Only ~20% of tokens actually matter when training LLMs to reason with RL. Updating the low-entropy majority actively ...

paperreview deeplearning llm rl

May 26, 2025

Paper Review: Visual Planning: Lets Think Only with Images

My review of the paper Visual Planning Let's Think Only with Images

paperreview deeplearning llm rl

Jan 27, 2025

Paper Review: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

How pure reinforcement learning (without supervised fine-tuning) can teach LLMs to reason, producing open-source mode...

paperreview deeplearning llm rl

Sep 23, 2024

Paper Review: Training Language Models to Self-Correct via Reinforcement Learning

My review of the paper Training Language Models to Self-Correct via Reinforcement Learning

paperreview deeplearning rl llm