Tag: efficiency
4 posts
MiniMax Sparse Attention: Per-Group Block Selection for Cheap Million-Token Inference
MiniMax Sparse Attention is a practical sparse-attention design for million-token LLMs - it uses a lightweight learne...
Gamma-World: Simplex Agent Encoding and Hub Attention for Multi-Agent World Models
A review of Gamma-World, NVIDIA's generative multi-agent world model that produces shared, action-controllable video ...
Beyond Positional Bias: How DroPE Unlocks Zero-Shot Long Context in LLMs
A review of DroPE, a simple but counterintuitive method that extends LLM context length by dropping positional embedd...
Paper Review: Linformer: Self-Attention with Linear Complexity
My review of the paper Linformer Self-Attention with Linear Complexity.