Tag: moe
2 posts
MiniMax Sparse Attention: Per-Group Block Selection for Cheap Million-Token Inference
MiniMax Sparse Attention is a practical sparse-attention design for million-token LLMs - it uses a lightweight learne...
DeepSeek-V4 Review: Why Million-Token Context Needs Efficient Attention, Not Just Larger Windows
DeepSeek V4 pairs a hybrid sparse-attention stack with on-policy distillation across domain specialists to bring 1M-t...