Tag: mllm
5 posts
Kimi k2.5 Review: Native Multimodality and Agent Swarms at 1 Trillion Parameters
A deep-dive review of Kimi K2.5, a next-generation open multimodal model that combines native vision-language trainin...
Paper Review: Chameleon: Mixed-Modal Early-Fusion Foundation Models
My review of the paper Chameleon Mixed-Modal Early-Fusion Foundation Models
Paper Review: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
My review of the paper Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
Paper Review: Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
My review of the paper Ferret-v2 An Improved Baseline for Referring and Grounding with Large Language Models
Paper Review: Ferret: Refer and Ground Anything Anywhere at Any Granularity
My review of the paper Ferret Refer and Ground Anything Anywhere at Any Granularity