Tag: efficiency – Andrey Lukyanenko

Jun 22, 2026

LocateAnything Explained: Parallel Box Decoding and how it makes visual grounding faster and more precise

A review of LocateAnything, an NVIDIA vision-language model that treats each bounding box as one atomic unit and deco...

paperreview deeplearning computervision objectdetection

Jun 15, 2026

MiniMax Sparse Attention: Per-Group Block Selection for Cheap Million-Token Inference

MiniMax Sparse Attention is a practical sparse-attention design for million-token LLMs - it uses a lightweight learne...

paperreview deeplearning llm attention

Jun 01, 2026

Gamma-World: Simplex Agent Encoding and Hub Attention for Multi-Agent World Models

A review of Gamma-World, NVIDIA's generative multi-agent world model that produces shared, action-controllable video ...

paperreview deeplearning computervision generativemodels

Feb 23, 2026

Beyond Positional Bias: How DroPE Unlocks Zero-Shot Long Context in LLMs

A review of DroPE, a simple but counterintuitive method that extends LLM context length by dropping positional embedd...

paperreview deeplearning llm attention

Jun 10, 2020

Paper Review: Linformer: Self-Attention with Linear Complexity

My review of the paper Linformer Self-Attention with Linear Complexity.

paperreview deeplearning attention transformer