Tag: deeplearning
- Paper Review: PaperBanana: Automating Academic Illustration for AI Scientists (09 Feb 2026)
- Paper Review: mHC: Manifold-Constrained Hyper-Connections (26 Jan 2026)
- Top-10 ML papers I read in 2025 (24 Dec 2025)
- Paper Review: NitroGen: A Foundation Model for Generalist Gaming Agents (22 Dec 2025)
- Paper Review: SAM 3: Segment Anything with Concepts (24 Nov 2025)
- Paper Review: HunyuanImage 3.0 Technical Report (17 Nov 2025)
- Paper Review: Chronos-2: From Univariate to Universal Forecasting (03 Nov 2025)
- Paper Review: The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain (27 Oct 2025)
- Paper Review: LongLive: Real-time Interactive Long Video Generation (06 Oct 2025)
- Paper Review: Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing (15 Sep 2025)
- Paper Review: Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning (01 Sep 2025)
- Paper Review: DINOv3 (25 Aug 2025)
- Paper Review: Group Sequence Policy Optimization (04 Aug 2025)
- Paper Review: Subliminal Learning: Language models transmit behavioral traits via hidden signals in data (28 Jul 2025)
- Paper Review: ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models (30 Jun 2025)
- Paper Review: V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning (23 Jun 2025)
- Paper Review: Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning (09 Jun 2025)
- Paper Review: SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents (02 Jun 2025)
- Paper Review: Visual Planning: Lets Think Only with Images (26 May 2025)
- Paper Review: AlphaEvolve: A coding agent for scientific and algorithmic discovery (15 May 2025)
- Paper Review: AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents (28 Apr 2025)
- Paper Review: M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models (21 Apr 2025)
- Paper Review: TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes (07 Apr 2025)
- Paper Review: Video-T1: Test-Time Scaling for Video Generation (24 Mar 2025)
- Paper Review: RWKV-7 Goose with Expressive Dynamic State Evolution (24 Mar 2025)
- Paper Review: Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities (17 Mar 2025)
- Paper Review: Large Language Diffusion Models (10 Mar 2025)
- Paper Review: NeoBERT: A Next-Generation BERT (03 Mar 2025)
- Paper Review: SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features (24 Feb 2025)
- Paper Review: Goku: Flow Based Video Generative Foundation Models (17 Feb 2025)
- Paper Review: Titans: Learning to Memorize at Test Time (03 Feb 2025)
- Paper Review: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (27 Jan 2025)
- Paper Review: STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution (13 Jan 2025)
- Paper Review: Training Large Language Models to Reason in a Continuous Latent Space (06 Jan 2025)
- Paper Review: Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference (23 Dec 2024)
- Paper Review: Byte Latent Transformer: Patches Scale Better Than Tokens (16 Dec 2024)
- Paper Review: Reverse Thinking Makes LLMs Stronger Reasoners (09 Dec 2024)
- Paper Review: Project Sid: Many-agent simulations toward AI civilization (25 Nov 2024)
- Paper Review: Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level (11 Nov 2024)
- Paper Review: Unbounded: A Generative Infinite Game of Character Life Simulation (29 Oct 2024)
- Paper Review: Contextual Document Embeddings (21 Oct 2024)
- Paper Review: Differential Transformer (14 Oct 2024)
- Paper Review: Depth Pro: Sharp Monocular Metric Depth in Less Than a Second (07 Oct 2024)
- Paper Review: Training Language Models to Self-Correct via Reinforcement Learning (23 Sep 2024)
- Paper Review: Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency (16 Sep 2024)
- Paper Review: Agentic Retrieval-Augmented Generation for Time Series Analysis (04 Sep 2024)
- Paper Review: Winning Amazon KDD Cup24 (19 Aug 2024)
- Paper Review: Wolf: Captioning Everything with a World Summarization Framework (12 Aug 2024)
- Paper Review: Diffusion Feedback Helps CLIP See Better (05 Aug 2024)
- Paper Review: Masked Attention is All You Need for Graphs (29 Jul 2024)
- Paper Review: RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs (22 Jul 2024)
- Paper Review: Unveiling Encoder-Free Vision-Language Models (15 Jul 2024)
- Paper Review: Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning (01 Jul 2024)
- Paper Review: Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling (17 Jun 2024)
- Paper Review: σ-GPTs: A New Approach to Autoregressive Models (10 Jun 2024)
- Paper Review: LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models (03 Jun 2024)
- Paper Review: YOLOv10: Real-Time End-to-End Object Detection (27 May 2024)
- Paper Review: Chameleon: Mixed-Modal Early-Fusion Foundation Models (20 May 2024)
- Paper Review: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models (13 May 2024)
- Paper Review: FlowMind: Automatic Workflow Generation with LLMs (06 May 2024)
- Paper Review: Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models (15 Apr 2024)
- Paper Review: Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction (08 Apr 2024)
- Paper Review: Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures (01 Apr 2024)
- Paper Review: Chronos: Learning the Language of Time Series (25 Mar 2024)
- Paper Review: Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks (19 Mar 2024)
- Paper Review: NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models (11 Mar 2024)
- Paper Review: Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models (04 Mar 2024)
- Paper Review: YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information (26 Feb 2024)
- Paper Review: LiRank: Industrial Large Scale Ranking Models at LinkedIn (19 Feb 2024)
- Paper Review: Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting (12 Feb 2024)
- Paper Review: Lumiere: A Space-Time Diffusion Model for Video Generation (29 Jan 2024)
- Paper Review: Scalable Pre-training of Large Autoregressive Image Models (22 Jan 2024)
- Paper Review: Ferret: Refer and Ground Anything Anywhere at Any Granularity (15 Jan 2024)
- Paper Review: DocLLM: A layout-aware generative language model for multimodal document understanding (08 Jan 2024)
- Paper Review: StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation (25 Dec 2023)
- Paper Review: Pixel Aligned Language Models (18 Dec 2023)
- Paper Review: EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything (12 Dec 2023)
- Paper Review: Translatotron 3: Speech to Speech Translation with Monolingual Data (07 Dec 2023)
- Paper Review: Adversarial Diffusion Distillation (04 Dec 2023)
- Paper Review: Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion (30 Nov 2023)
- Paper Review: Diffusion Model Alignment Using Direct Preference Optimization (27 Nov 2023)
- Paper Review: Orca 2: Teaching Small Language Models How to Reason (23 Nov 2023)
- Paper Review: Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models (20 Nov 2023)
- Paper Review: Deep Learning for Day Forecasts from Sparse Observations (16 Nov 2023)
- Paper Review: Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM (13 Nov 2023)
- Paper Review: CogVLM: Visual Expert for Pretrained Language Models (09 Nov 2023)
- Paper Review: Collaborative Large Language Model for Recommender Systems (06 Nov 2023)
- Paper Review: SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding (02 Nov 2023)
- Paper Review: Zephyr: Direct Distillation of LM Alignment (30 Oct 2023)
- Paper Review: Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture (26 Oct 2023)
- Paper Review: Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection (23 Oct 2023)
- Paper Review: PaLI-3 Vision Language Models: Smaller, Faster, Stronger (19 Oct 2023)
- Paper Review: InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining (16 Oct 2023)
- Paper Review: Mistral 7B (12 Oct 2023)
- Paper Review: Think before you speak: Training Language Models With Pause Tokens (09 Oct 2023)
- Paper Review: QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models (05 Oct 2023)
- Paper Review: LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models (02 Oct 2023)
- Paper Review: DreamLLM: Synergistic Multimodal Comprehension and Creation (28 Sep 2023)
- Paper Review: FreeU: Free Lunch in Diffusion U-Net (25 Sep 2023)
- Paper Review: Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers (21 Sep 2023)
- Paper Review: SLiMe: Segment Like Me (18 Sep 2023)
- Paper Review: TSMixer: An All-MLP Architecture for Time Series Forecasting (14 Sep 2023)
- Paper Review: Explaining grokking through circuit efficiency (11 Sep 2023)
- Paper Review: Contrastive Feature Masking Open-Vocabulary Vision Transformer (07 Sep 2023)
- Paper Review: RecMind: Large Language Model Powered Agent For Recommendation (04 Sep 2023)
- Paper Review: CoTracker: It is Better to Track Together (31 Aug 2023)
- Paper Review: Giraffe: Adventures in Expanding Context Lengths in LLMs (28 Aug 2023)
- Paper Review: OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents (24 Aug 2023)
- Paper Review: LISA: Reasoning Segmentation via Large Language Model (21 Aug 2023)
- Paper Review: FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization (17 Aug 2023)
- Paper Review: Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback (10 Aug 2023)
- Paper Review: UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition (10 Aug 2023)
- Paper Review: Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding (07 Aug 2023)
- Paper Review: Tracking Anything in High Quality (03 Aug 2023)
- Paper Review: TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning (31 Jul 2023)
- Paper Review: Meta-Transformer: A Unified Framework for Multimodal Learning (27 Jul 2023)
- Paper Review: Retentive Network: A Successor to Transformer for Large Language Models (24 Jul 2023)
- Paper Review: Llama 2: Open Foundation and Fine-Tuned Chat Models (20 Jul 2023)
- Paper Review: Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning (17 Jul 2023)
- Paper Review: UniverSeg: Universal Medical Image Segmentation (13 Jul 2023)
- Paper Review: Recognize Anything: A Strong Image Tagging Model (10 Jul 2023)
- Paper Review: Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles (06 Jul 2023)
- Paper Review: Multilingual End to End Entity Linking (03 Jul 2023)
- Paper Review: Fast Segment Anything (29 Jun 2023)
- Paper Review: Tracking Everything Everywhere All at Once (26 Jun 2023)
- Paper Review: Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale (23 Jun 2023)
- Paper Review: Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision (19 Jun 2023)
- Paper Review: Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (15 Jun 2023)
- Paper Review: BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks (12 Jun 2023)
- Paper Review: StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners (08 Jun 2023)
- Paper Review: The effectiveness of MAE pre-pretraining for billion-scale pretraining (05 Jun 2023)
- Paper Review: QLoRA: Efficient Finetuning of Quantized LLMs (01 Jun 2023)
- Paper Review: Chain of Hindsight Aligns Language Models with Feedback (30 May 2023)
- Paper Review: MMS: Scaling Speech Technology to 1000+ languages (25 May 2023)
- Paper Review: Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold (22 May 2023)
- Paper Review: DarkBERT: A Language Model for the Dark Side of the Internet (18 May 2023)
- Paper Review: NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers (15 May 2023)
- Paper Review: ImageBind: One Embedding Space To Bind Them All (10 May 2023)
- Paper Review: Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes (08 May 2023)
- Paper Review: Phoenix: Democratizing ChatGPT across Languages (04 May 2023)
- Paper Review: Scaling Transformer to 1M tokens and beyond with RMT (01 May 2023)
- Paper Review: Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations (27 Apr 2023)
- Paper Review: Generative Agents: Interactive Simulacra of Human Behavior (24 Apr 2023)
- Paper Review: DINOv2: Learning Robust Visual Features without Supervision (20 Apr 2023)
- Paper Review: InceptionNeXt: When Inception Meets ConvNeXt (17 Apr 2023)
- Paper Review: Segment Anything (08 Apr 2023)
- Paper Review: BloombergGPT: A Large Language Model for Finance (02 Apr 2023)
- Paper Review: ReBotNet: Fast Real-time Video Enhancement (27 Mar 2023)
- Paper Review: Hyena Hierarchy: Towards Larger Convolutional Language Models (20 Mar 2023)
- Paper Review: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models (13 Mar 2023)
- Paper Review: PaLM-E: An Embodied Multimodal Language Model (09 Mar 2023)
- Paper Review: In-Context Instruction Learning (06 Mar 2023)
- Paper Review: LLaMA: Open and Efficient Foundation Language Models (26 Feb 2023)
- Paper Review: Scaling Vision Transformers to 22 Billion Parameters (20 Feb 2023)
- Paper Review: Dual PatchNorm (13 Feb 2023)
- Paper Review: Cut and Learn for Unsupervised Object Detection and Instance Segmentation (06 Feb 2023)
- Paper Review: StyleGAN-T Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis (29 Jan 2023)
- Paper Review: Next-ViT Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios (24 Jul 2022)
- Paper Review: NL-Augmenter A Framework for Task-Sensitive Natural Language Augmentation (10 Dec 2021)
- Paper Review: NÜWA Visual Synthesis Pre-training for Neural visUal World creAtion (25 Nov 2021)
- Paper Review: Swin Transformer V2 Scaling Up Capacity and Resolution (19 Nov 2021)
- Paper Review: A Recipe For Arbitrary Text Style Transfer with Large Language Models (10 Oct 2021)
- Paper Review: SwinIR Image Restoration Using Swin Transformer (13 Sep 2021)
- Paper Review: Efficient Visual Pretraining with Contrastive Detection (01 Sep 2021)
- Paper Review: Domain-Aware Universal Style Transfer (15 Aug 2021)
- Paper Review: YOLOX Exceeding YOLO Series in 2021 (23 Jul 2021)
- Paper Review: Long-Short Transformer Efficient Transformers for Language and Vision (12 Jul 2021)
- Paper Review: Semi-Autoregressive Transformer for Image Captioning (18 Jun 2021)
- Paper Review: CoAtNet Marrying Convolution and Attention for All Data Sizes (10 Jun 2021)
- Paper Review: ByT5 Towards a token-free future with pre-trained byte-to-byte models (02 Jun 2021)
- Paper Review: Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence (21 May 2021)
- Paper Review: Are Pre-trained Convolutions Better than Pre-trained Transformers? (10 May 2021)
- Paper Review: MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding (04 May 2021)
- Paper Review: Generating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains (07 Apr 2021)
- Paper Review: EfficientNetV2: Smaller Models and Faster Training (02 Apr 2021)
- Paper Review: Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning (29 Mar 2021)
- Paper Review: LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval (21 Mar 2021)
- Paper Review: Real-World Super-Resolution of Face-Images from Surveillance Cameras (22 Feb 2021)
- Paper Review: ObjectAug: Object-level Data Augmentation for Semantic Image Segmentation (07 Feb 2021)
- Paper Review: JigsawGAN: Self-supervised Learning for Solving Jigsaw Puzzles with Generative Adversarial Networks (31 Jan 2021)
- Paper Review: Language-agnostic BERT Sentence Embedding (19 Aug 2020)
- Paper Review: Funnel Activation for Visual Recognition (28 Jul 2020)
- Paper Review: ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network (04 Jul 2020)
- Paper Review: VirTex: Learning Visual Representations from Textual Annotations (14 Jun 2020)
- Paper Review: Linformer: Self-Attention with Linear Complexity (10 Jun 2020)
- Paper Review: End-to-End Object Detection with Transformers (28 May 2020)
- Paper Review: SpERT Span-based Joint Entity and Relation Extraction with Transformer Pre-training (23 May 2020)