Tag: llm
68 posts
Collaborative Reinforcement Learning: Why HACRL Trains Models in Teams Instead of Isolation
HACRL proposes a new paradigm for reinforcement learning - instead of training models in isolation, multiple agents c...
Beyond Positional Bias: How DroPE Unlocks Zero-Shot Long Context in LLMs
A review of DroPE, a simple but counterintuitive method that extends LLM context length by dropping positional embedd...
Kimi k2.5 Review: Native Multimodality and Agent Swarms at 1 Trillion Parameters
A deep-dive review of Kimi K2.5, a next-generation open multimodal model that combines native vision-language trainin...
Paper Review: mHC: Manifold-Constrained Hyper-Connections
My review of the paper mHC Manifold-Constrained Hyper-Connections
Paper Review: SAM 3: Segment Anything with Concepts
Meta's unified model for detecting, segmenting, and tracking objects using text or image prompts — trained on 4M conc...
Paper Review: HunyuanImage 3.0 Technical Report
My review of the paper HunyuanImage 3.0 Technical Report
Paper Review: Chronos-2: From Univariate to Universal Forecasting
Chronos-2 extends zero-shot time series forecasting to multivariate and covariate settings with a new group attention...
Paper Review: The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
A biologically inspired LLM built as a graph of spiking neurons with Hebbian learning — it matches GPT-2 scaling whil...
Paper Review: Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
My review of the paper Sharing is Caring Efficient LM Post-Training with Collective RL Experience Sharing
Paper Review: Group Sequence Policy Optimization
My review of the paper Group Sequence Policy Optimization
Paper Review: Subliminal Learning: Language models transmit behavioral traits via hidden signals in data
My review of the paper Subliminal Learning Language models transmit behavioral traits via hidden signals in data
Paper Review: ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
My review of the paper ProRL Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper Review: Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Only ~20% of tokens actually matter when training LLMs to reason with RL. Updating the low-entropy majority actively ...
Paper Review: SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
My review of the paper SWE-rebench An Automated Pipeline for Task Collection and Decontaminated Evaluation of Softwar...
Paper Review: Visual Planning: Lets Think Only with Images
My review of the paper Visual Planning Let's Think Only with Images
Paper Review: AlphaEvolve: A coding agent for scientific and algorithmic discovery
DeepMind's autonomous coding agent that evolves algorithms through LLM-driven iteration — it discovered the first imp...
Paper Review: AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents
My review of the paper AgentA/B Automated and Scalable Web A/BTesting with Interactive LLM Agents
Paper Review: M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
My review of the paper M1 Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper Review: Large Language Diffusion Models
LLaDA replaces autoregressive token generation with diffusion-based masked prediction, rivaling LLaMA3 8B while natur...
Paper Review: Titans: Learning to Memorize at Test Time
A new architecture that pairs attention with a learnable long-term memory module, scaling to 2M+ tokens and outperfor...
Paper Review: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
How pure reinforcement learning (without supervised fine-tuning) can teach LLMs to reason, producing open-source mode...
Paper Review: Training Large Language Models to Reason in a Continuous Latent Space
Coconut lets LLMs reason in latent space instead of generating text tokens, enabling breadth-first exploration of rea...
Paper Review: Byte Latent Transformer: Patches Scale Better Than Tokens
My review of the paper Byte Latent Transformer Patches Scale Better Than Tokens
Paper Review: Reverse Thinking Makes LLMs Stronger Reasoners
My review of the paper Reverse Thinking Makes LLMs Stronger Reasoners
Paper Review: Project Sid: Many-agent simulations toward AI civilization
What happens when you put 1k AI agents in Minecraft and let them self-organize? They form governments, transmit cultu...
Paper Review: Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
My review of the paper Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper Review: Unbounded: A Generative Infinite Game of Character Life Simulation
My review of the paper Unbounded A Generative Infinite Game of Character Life Simulation
Paper Review: Training Language Models to Self-Correct via Reinforcement Learning
My review of the paper Training Language Models to Self-Correct via Reinforcement Learning
Paper Review: Agentic Retrieval-Augmented Generation for Time Series Analysis
My review of the paper Agentic Retrieval-Augmented Generation for Time Series Analysis
Paper Review: Winning Amazon KDD Cup24
My review of the paper Winning Amazon KDD Cup24
Paper Review: Wolf: Captioning Everything with a World Summarization Framework
My review of the paper Wolf Captioning Everything with a World Summarization Framework
Paper Review: RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
My review of the paper RankRAG Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
Paper Review: Unveiling Encoder-Free Vision-Language Models
My review of the paper Unveiling Encoder-Free Vision-Language Models
Paper Review: Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
My review of the paper Husky A Unified, Open-Source Language Agent for Multi-Step Reasoning
Paper Review: FlowMind: Automatic Workflow Generation with LLMs
My review of the paper FlowMind Automatic Workflow Generation with LLMs
Paper Review: Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
My review of the paper Ferret-v2 An Improved Baseline for Referring and Grounding with Large Language Models
Paper Review: Chronos: Learning the Language of Time Series
Amazon's framework that tokenizes time series data for pretrained language models, enabling zero-shot forecasting tha...
Paper Review: Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting
My review of the paper Lag-Llama Towards Foundation Models for Probabilistic Time Series Forecasting
Paper Review: Ferret: Refer and Ground Anything Anywhere at Any Granularity
My review of the paper Ferret Refer and Ground Anything Anywhere at Any Granularity
Paper Review: DocLLM: A layout-aware generative language model for multimodal document understanding
My review of the paper DocLLM A layout-aware generative language model for multimodal document understanding
Paper Review: Pixel Aligned Language Models
My review of the paper Pixel Aligned Language Models
Paper Review: Orca 2: Teaching Small Language Models How to Reason
My review of the paper Orca 2 Teaching Small Language Models How to Reason
Paper Review: Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models
My review of the paper Chain-of-Note Enhancing Robustness in Retrieval-Augmented Language Models
Paper Review: Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
My review of the paper Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
Paper Review: Collaborative Large Language Model for Recommender Systems
My review of the paper Collaborative Large Language Model for Recommender Systems
Paper Review: Zephyr: Direct Distillation of LM Alignment
My review of the paper Zephyr Direct Distillation of LM Alignment
Paper Review: Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
My review of the paper Self-RAG Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper Review: PaLI-3 Vision Language Models: Smaller, Faster, Stronger
My review of the paper PaLI-3 Vision Language Models Smaller, Faster, Stronger
Paper Review: InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
My review of the paper InstructRetro Instruction Tuning post Retrieval-Augmented Pretraining
Paper Review: Mistral 7B
My review of the paper Mistral 7B
Paper Review: Think before you speak: Training Language Models With Pause Tokens
My review of the paper Think before you speak Training Language Models With Pause Tokens
Paper Review: QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
My review of the paper QA-LoRA Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper Review: DreamLLM: Synergistic Multimodal Comprehension and Creation
My review of the paper DreamLLM Synergistic Multimodal Comprehension and Creation
Paper Review: Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
My review of the paper Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Paper Review: RecMind: Large Language Model Powered Agent For Recommendation
My review of the paper RecMind Large Language Model Powered Agent For Recommendation
Paper Review: Giraffe: Adventures in Expanding Context Lengths in LLMs
My review of the paper Giraffe Adventures in Expanding Context Lengths in LLMs
Paper Review: OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
My review of the paper OBELISC An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Paper Review: Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
A systematic survey of what's broken in RLHF — from reward hacking to evaluation gaps — and what techniques can fix, ...
Paper Review: UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition
My review of the paper UniversalNER Targeted Distillation from Large Language Models for Open Named Entity Recognition
Paper Review: Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
My review of the paper Skeleton-of-Thought Large Language Models Can Do Parallel Decoding
Paper Review: Retentive Network: A Successor to Transformer for Large Language Models
My review of the paper Retentive Network A Successor to Transformer for Large Language Models
Paper Review: Multilingual End to End Entity Linking
My review of the paper Multilingual End to End Entity Linking
Paper Review: Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
My review of the paper Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Paper Review: Chain of Hindsight Aligns Language Models with Feedback
My review of the paper Chain of Hindsight Aligns Language Models with Feedback
Paper Review: DarkBERT: A Language Model for the Dark Side of the Internet
My review of the paper DarkBERT A Language Model for the Dark Side of the Internet
Paper Review: Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
My review of the paper Distilling Step-by-Step Outperforming Larger Language Models with Less Training Data and Small...
Paper Review: Hyena Hierarchy: Towards Larger Convolutional Language Models
My review of the paper Hyena Hierarchy Towards Larger Convolutional Language Models
Paper Review: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
My review of the paper Visual ChatGPT Talking, Drawing and Editing with Visual Foundation Models