Tag: attention – Andrey Lukyanenko

Feb 23, 2026

Beyond Positional Bias: How DroPE Unlocks Zero-Shot Long Context in LLMs

A review of DroPE, a simple but counterintuitive method that extends LLM context length by dropping positional embedd...

paperreview deeplearning llm attention

Mar 03, 2025

Paper Review: NeoBERT: A Next-Generation BERT

A compact 250M-parameter bidirectional encoder that incorporates RoPE, SwiGLU, and modern pretraining to outperform m...

paperreview deeplearning nlp transformer

Feb 03, 2025

Paper Review: Titans: Learning to Memorize at Test Time

A new architecture that pairs attention with a learnable long-term memory module, scaling to 2M+ tokens and outperfor...

paperreview deeplearning llm nlp

Dec 23, 2024

Paper Review: Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

BERT rebuilt with modern tricks — 2 trillion training tokens, 8192 context length, Flash Attention, and rotary embedd...

paperreview deeplearning nlp transformer

Oct 14, 2024

Paper Review: Differential Transformer

My review of the paper Differential Transformer

paperreview deeplearning transformer attention

Jul 29, 2024

Paper Review: Masked Attention is All You Need for Graphs

My review of the paper Masked Attention is All You Need for Graphs

paperreview deeplearning graph transformer

Apr 01, 2024

Paper Review: Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

My review of the paper Vision-RWKV Efficient and Scalable Visual Perception with RWKV-Like Architectures

paperreview deeplearning cv rnn

Mar 04, 2024

Paper Review: Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

My review of the paper Griffin Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

paperreview deeplearning recurrent attention

Jan 08, 2024

Paper Review: DocLLM: A layout-aware generative language model for multimodal document understanding

My review of the paper DocLLM A layout-aware generative language model for multimodal document understanding

paperreview deeplearning llm attention

Jul 12, 2021

Paper Review: Long-Short Transformer Efficient Transformers for Language and Vision

My review of the paper Long-Short Transformer Efficient Transformers for Language and Vision

paperreview deeplearning cv nlp

Jun 10, 2021

Paper Review: CoAtNet Marrying Convolution and Attention for All Data Sizes

My review of the paper CoAtNet Marrying Convolution and Attention for All Data Sizes

paperreview deeplearning cv pretraining

Jun 10, 2020

Paper Review: Linformer: Self-Attention with Linear Complexity

My review of the paper Linformer Self-Attention with Linear Complexity.

paperreview deeplearning attention transformer