Tag: moe

Tag: moe

2 posts

MiniMax Sparse Attention: Per-Group Block Selection for Cheap Million-Token Inference

MiniMax Sparse Attention is a practical sparse-attention design for million-token LLMs - it uses a lightweight learne...

paperreview deeplearning llm attention

DeepSeek-V4 Review: Why Million-Token Context Needs Efficient Attention, Not Just Larger Windows

DeepSeek V4 pairs a hybrid sparse-attention stack with on-policy distillation across domain specialists to bring 1M-t...

paperreview deeplearning llm moe