Tag: distillation
3 posts
Paper Review: Subliminal Learning: Language models transmit behavioral traits via hidden signals in data
My review of the paper Subliminal Learning Language models transmit behavioral traits via hidden signals in data
Paper Review: M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
My review of the paper M1 Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper Review: Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
My review of the paper Distilling Step-by-Step Outperforming Larger Language Models with Less Training Data and Small...