Paper Review: Group Sequence Policy Optimization
04 August 2025
My review of the paper Group Sequence Policy Optimization
Data science, career and other topics
04 August 2025
My review of the paper Group Sequence Policy Optimization
28 July 2025
My review of the paper Subliminal Learning Language models transmit behavioral traits via hidden signals in data
30 June 2025
My review of the paper ProRL Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
23 June 2025
A self-supervised video model trained on 1M+ hours of video that understands motion, anticipates actions, and — with just 62 hours of robot data — performs zero-shot robotic pick-and-place planning.
09 June 2025
Only ~20% of tokens actually matter when training LLMs to reason with RL. Updating the low-entropy majority actively hurts performance — a finding that challenges standard RLVR practice.
02 June 2025
My review of the paper SWE-rebench An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Type at least 2 characters to search...