Tag: vlm
6 posts
Kimi k2.5 Review: Native Multimodality and Agent Swarms at 1 Trillion Parameters
A deep-dive review of Kimi K2.5, a next-generation open multimodal model that combines native vision-language trainin...
Paper Review: PaperBanana: Automating Academic Illustration for AI Scientists
My review of the paper PaperBanana Automating Academic Illustration for AI Scientists
Paper Review: SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Google's upgraded vision-language encoders that add self-supervised learning and online data curation to SigLIP, deli...
Paper Review: Wolf: Captioning Everything with a World Summarization Framework
My review of the paper Wolf Captioning Everything with a World Summarization Framework
Paper Review: Unveiling Encoder-Free Vision-Language Models
My review of the paper Unveiling Encoder-Free Vision-Language Models
Paper Review: PaLI-3 Vision Language Models: Smaller, Faster, Stronger
My review of the paper PaLI-3 Vision Language Models Smaller, Faster, Stronger