Paper Review: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
27 January 2025
How pure reinforcement learning (without supervised fine-tuning) can teach LLMs to reason, producing open-source models that rival OpenAI-o1 on math and coding benchmarks.