Tag: multimodal
- Paper Review: PaperBanana: Automating Academic Illustration for AI Scientists (09 Feb 2026)
- Paper Review: SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features (24 Feb 2025)
- Paper Review: Wolf: Captioning Everything with a World Summarization Framework (12 Aug 2024)
- Paper Review: Diffusion Feedback Helps CLIP See Better (05 Aug 2024)
- Paper Review: Unveiling Encoder-Free Vision-Language Models (15 Jul 2024)
- Paper Review: Chameleon: Mixed-Modal Early-Fusion Foundation Models (20 May 2024)
- Paper Review: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models (13 May 2024)
- Paper Review: Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models (15 Apr 2024)
- Paper Review: Ferret: Refer and Ground Anything Anywhere at Any Granularity (15 Jan 2024)
- Paper Review: DocLLM: A layout-aware generative language model for multimodal document understanding (08 Jan 2024)
- Paper Review: PaLI-3 Vision Language Models: Smaller, Faster, Stronger (19 Oct 2023)
- Paper Review: DreamLLM: Synergistic Multimodal Comprehension and Creation (28 Sep 2023)
- Paper Review: Meta-Transformer: A Unified Framework for Multimodal Learning (27 Jul 2023)
- Paper Review: Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning (17 Jul 2023)
- Paper Review: Recognize Anything: A Strong Image Tagging Model (10 Jul 2023)
- Paper Review: BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks (12 Jun 2023)
- Paper Review: ImageBind: One Embedding Space To Bind Them All (10 May 2023)
- Paper Review: PaLM-E: An Embodied Multimodal Language Model (09 Mar 2023)
- Paper Review: Semi-Autoregressive Transformer for Image Captioning (18 Jun 2021)
- Paper Review: MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding (04 May 2021)
- Paper Review: VirTex: Learning Visual Representations from Textual Annotations (14 Jun 2020)
- Paper Review: Transformer Reasoning Network for Image-Text Matching and Retrieval (17 May 2020)