Tag: transformer
36 posts
Beyond Positional Bias: How DroPE Unlocks Zero-Shot Long Context in LLMs
A review of DroPE, a simple but counterintuitive method that extends LLM context length by dropping positional embedd...
Paper Review: RWKV-7 Goose with Expressive Dynamic State Evolution
My review of the paper RWKV-7 Goose with Expressive Dynamic State Evolution
Paper Review: Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
My review of the paper Audio Flamingo 2 An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Ab...
Paper Review: Large Language Diffusion Models
LLaDA replaces autoregressive token generation with diffusion-based masked prediction, rivaling LLaMA3 8B while natur...
Paper Review: NeoBERT: A Next-Generation BERT
A compact 250M-parameter bidirectional encoder that incorporates RoPE, SwiGLU, and modern pretraining to outperform m...
Paper Review: SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Google's upgraded vision-language encoders that add self-supervised learning and online data curation to SigLIP, deli...
Paper Review: Goku: Flow Based Video Generative Foundation Models
My review of the paper Goku Flow Based Video Generative Foundation Models
Paper Review: Titans: Learning to Memorize at Test Time
A new architecture that pairs attention with a learnable long-term memory module, scaling to 2M+ tokens and outperfor...
Paper Review: Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
BERT rebuilt with modern tricks — 2 trillion training tokens, 8192 context length, Flash Attention, and rotary embedd...
Paper Review: Byte Latent Transformer: Patches Scale Better Than Tokens
My review of the paper Byte Latent Transformer Patches Scale Better Than Tokens
Paper Review: Contextual Document Embeddings
My review of the paper Contextual Document Embeddings
Paper Review: Differential Transformer
My review of the paper Differential Transformer
Paper Review: Masked Attention is All You Need for Graphs
My review of the paper Masked Attention is All You Need for Graphs
Paper Review: FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization
My review of the paper FastViT A Fast Hybrid Vision Transformer using Structural Reparameterization
Paper Review: Meta-Transformer: A Unified Framework for Multimodal Learning
My review of the paper Meta-Transformer A Unified Framework for Multimodal Learning
Paper Review: Retentive Network: A Successor to Transformer for Large Language Models
My review of the paper Retentive Network A Successor to Transformer for Large Language Models
Paper Review: Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
My review of the paper Hiera A Hierarchical Vision Transformer without the Bells-and-Whistles
Paper Review: Scaling Transformer to 1M tokens and beyond with RMT
My review of the paper Scaling Transformer to 1M tokens and beyond with RMT
Paper Review: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
My review of the paper Visual ChatGPT Talking, Drawing and Editing with Visual Foundation Models
Paper Review: PaLM-E: An Embodied Multimodal Language Model
My review of the paper PaLM-E An Embodied Multimodal Language Model
Paper Review: In-Context Instruction Learning
My review of the paper In-Context Instruction Learning
Paper Review: LLaMA: Open and Efficient Foundation Language Models
My review of the paper LLaMA Open and Efficient Foundation Language Models
Paper Review: Scaling Vision Transformers to 22 Billion Parameters
My review of the paper Scaling Vision Transformers to 22 Billion Parameters
Paper Review: Dual PatchNorm
My review of the paper Dual PatchNorm
Paper Review: Next-ViT Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios
My review of the paper Next-ViT Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial S...
Paper Review: NÜWA Visual Synthesis Pre-training for Neural visUal World creAtion
My review of the paper NÜWA Visual Synthesis Pre-training for Neural visUal World creAtion
Paper Review: Swin Transformer V2 Scaling Up Capacity and Resolution
My review of the paper Swin Transformer V2 Scaling Up Capacity and Resolution
Paper Review: SwinIR Image Restoration Using Swin Transformer
My review of the paper SwinIR Image Restoration Using Swin Transformer
Paper Review: Long-Short Transformer Efficient Transformers for Language and Vision
My review of the paper Long-Short Transformer Efficient Transformers for Language and Vision
Paper Review: Are Pre-trained Convolutions Better than Pre-trained Transformers?
My review of the paper Are Pre-trained Convolutions Better than Pre-trained Transformers?
Paper Review: Language-agnostic BERT Sentence Embedding
My review of the paper Language-agnostic BERT Sentence Embedding.
Paper Review: VirTex: Learning Visual Representations from Textual Annotations
My review of the paper VirTex Learning Visual Representations from Textual Annotations.
Paper Review: Linformer: Self-Attention with Linear Complexity
My review of the paper Linformer Self-Attention with Linear Complexity.
Paper Review: End-to-End Object Detection with Transformers
My review of the paper End-to-End Object Detection with Transformers.
Paper Review: SpERT Span-based Joint Entity and Relation Extraction with Transformer Pre-training
My review of the paper SpERT Span-based Joint Entity and Relation Extraction with Transformer Pre-training.
Paper Review: Transformer Reasoning Network for Image-Text Matching and Retrieval
My review of the paper Transformer Reasoning Network for Image-Text Matching and Retrieval.