Tag: attention

12 posts

Feb 23, 2026
Beyond Positional Bias: How DroPE Unlocks Zero-Shot Long Context in LLMs
A review of DroPE, a simple but counterintuitive method that extends LLM context length by dropping positional embedd...
paperreview deeplearning llm attention
Mar 03, 2025
Paper Review: NeoBERT: A Next-Generation BERT
A compact 250M-parameter bidirectional encoder that incorporates RoPE, SwiGLU, and modern pretraining to outperform m...
paperreview deeplearning nlp transformer
Feb 03, 2025
Paper Review: Titans: Learning to Memorize at Test Time
A new architecture that pairs attention with a learnable long-term memory module, scaling to 2M+ tokens and outperfor...
paperreview deeplearning llm nlp
Dec 23, 2024
Paper Review: Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
BERT rebuilt with modern tricks — 2 trillion training tokens, 8192 context length, Flash Attention, and rotary embedd...
paperreview deeplearning nlp transformer
Oct 14, 2024
Paper Review: Differential Transformer
My review of the paper Differential Transformer
paperreview deeplearning transformer attention
Jul 29, 2024
Paper Review: Masked Attention is All You Need for Graphs
My review of the paper Masked Attention is All You Need for Graphs
paperreview deeplearning graph transformer
Apr 01, 2024
Paper Review: Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
My review of the paper Vision-RWKV Efficient and Scalable Visual Perception with RWKV-Like Architectures
paperreview deeplearning cv rnn
Mar 04, 2024
Paper Review: Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
My review of the paper Griffin Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
paperreview deeplearning recurrent attention
Jan 08, 2024
Paper Review: DocLLM: A layout-aware generative language model for multimodal document understanding
My review of the paper DocLLM A layout-aware generative language model for multimodal document understanding
paperreview deeplearning llm attention
Jul 12, 2021
Paper Review: Long-Short Transformer Efficient Transformers for Language and Vision
My review of the paper Long-Short Transformer Efficient Transformers for Language and Vision
paperreview deeplearning cv nlp
Jun 10, 2021
Paper Review: CoAtNet Marrying Convolution and Attention for All Data Sizes
My review of the paper CoAtNet Marrying Convolution and Attention for All Data Sizes
paperreview deeplearning cv pretraining
Jun 10, 2020
Paper Review: Linformer: Self-Attention with Linear Complexity
My review of the paper Linformer Self-Attention with Linear Complexity.
paperreview deeplearning attention transformer

← All tags