Tag: attention

Paper Review: NeoBERT: A Next-Generation BERT (03 Mar 2025)
Paper Review: Titans: Learning to Memorize at Test Time (03 Feb 2025)
Paper Review: Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference (23 Dec 2024)
Paper Review: Differential Transformer (14 Oct 2024)
Paper Review: Masked Attention is All You Need for Graphs (29 Jul 2024)
Paper Review: Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures (01 Apr 2024)
Paper Review: Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models (04 Mar 2024)
Paper Review: DocLLM: A layout-aware generative language model for multimodal document understanding (08 Jan 2024)
Paper Review: Long-Short Transformer Efficient Transformers for Language and Vision (12 Jul 2021)
Paper Review: CoAtNet Marrying Convolution and Attention for All Data Sizes (10 Jun 2021)
Paper Review: Linformer: Self-Attention with Linear Complexity (10 Jun 2020)

All tags

paperreview (169) deeplearning (165) cv (69) nlp (66) llm (48) transformer (35) blogpost (16) pretraining (15) sota (14) imagesegmentation (12) attention (11) pytorch (9) objectdetection (9) video (7) career (7) diffusion (6) datascience (6) stablediffusion (5) life (5) imagegeneration (5) vlm (4) timeseries (4) selfsupervised (4) ner (4) mllm (4) languages (4) gan (4) audio (4) agent (4) yolo (3) tokenization (3) superresolution (3) styletransfer (3) kaggle (3) imagecaptioning (3) bert (3) augmentation (3) visual (2) tts (2) transferlearning (2) simulation (2) sd (2) rnn (2) recommender (2) reasoning (2) ranking (2) rag (2) qa (2) languagemodel (2) gpt (2) gnn (2) generation (2) fewshotlearning (2) dpo (2) competition (2) cnn (2) classification (2) weaksupervision (1) videogeneration (1) unet (1) textgeneration (1) tensorflow (1) tabular (1) swa (1) summarization (1) speechtranslation (1) speechtospeech (1) speechrecognition (1) speechgeneration (1) sentenceembeddings (1) semisupervised (1) scaling (1) robustness (1) robotics (1) rl (1) relationextrction (1) relationextraction (1) reinforcementlearning (1) recurrent (1) recommendation (1) realtime (1) quantization (1) promptengineering (1) objecttracking (1) objectdetecion (1) nlg (1) nas (1) multimodal (1) motivation (1) motiontracking (1) mlp (1) mentoring (1) memoryoptimization (1) mamba (1) languagetranslation (1) jigsaw (1) instructlearning (1) inferencespeed (1) imagetextmatching (1) imagerestoration (1) imageinpainting (1) graphneuralnets (1) graph (1) forecasting (1) fail (1) entitylinking (1) endtoend (1) embedding (1) efficiency (1) distillation (1) diffusionmodels (1) depthestimation (1) curriculumlreaning (1) contrastivelearning (1) coco (1) clip (1) chatbot (1) captioning (1) books (1) autoencoder (1) asr (1) annotation (1) anchorfree (1) alignment (1) advice (1) adversarial (1) activationfunction (1) CV (1)