Andrey Lukyanenko

Blogposts

Data science, career and other topics

Browse by category:

MiniMax Sparse Attention: Per-Group Block Selection for Cheap Million-Token Inference

15 June 2026

MiniMax Sparse Attention is a practical sparse-attention design for million-token LLMs - it uses a lightweight learned indexer to select relevant KV blocks and performs exact attention only over those blocks. The paper is important because it connects architecture, training stability, and GPU kernels into a deployable long-context system, powering the open-weight MiniMax-M3 model.

Testing MiniMax M3 on real tasks: repo refactor, screenshot debugging, and Spotify recommendations

10 June 2026

A hands-on look at MiniMax M3 through Claude Code — what its new MiniMax Sparse Attention (MSA) is and how it differs from the lightning-attention and full-attention designs of earlier MiniMax models, plus three real tasks: auditing and refactoring an old idle game, debugging two stubborn UI bugs from screenshots, and turning years of Spotify history into music recommendations.

Gamma-World: Simplex Agent Encoding and Hub Attention for Multi-Agent World Models

01 June 2026

A review of Gamma-World, NVIDIA's generative multi-agent world model that produces shared, action-controllable video rollouts for multiple independently acting agents. It places agents at the vertices of a regular simplex in rotary space for permutation symmetry, routes cross-agent interaction through learnable hub tokens to keep attention linear in the number of agents, and distills a diffusion teacher into a causal student that streams at 24 FPS.

Type at least 2 characters to search...