Andrey Lukyanenko's personal site

Paper Review: Group Sequence Policy Optimization

04 August 2025

My review of the paper Group Sequence Policy Optimization

paperreview deeplearning llm rl

Paper Review: Subliminal Learning: Language models transmit behavioral traits via hidden signals in data

28 July 2025

My review of the paper Subliminal Learning Language models transmit behavioral traits via hidden signals in data

paperreview deeplearning llm distillation

Paper Review: ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

30 June 2025

My review of the paper ProRL Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

paperreview deeplearning llm rl

Paper Review: V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

23 June 2025

A self-supervised video model trained on 1M+ hours of video that understands motion, anticipates actions, and — with just 62 hours of robot data — performs zero-shot robotic pick-and-place planning.

paperreview deeplearning cv selfsupervised

Paper Review: Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

09 June 2025

Only ~20% of tokens actually matter when training LLMs to reason with RL. Updating the low-entropy majority actively hurts performance — a finding that challenges standard RLVR practice.

paperreview deeplearning llm rl

Paper Review: SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

02 June 2025

My review of the paper SWE-rebench An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

paperreview deeplearning llm evaluation

Blogposts

Browse by category:

Paper Review: Group Sequence Policy Optimization

Paper Review: Subliminal Learning: Language models transmit behavioral traits via hidden signals in data

Paper Review: ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper Review: V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Paper Review: Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper Review: SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents