Tag: cv
76 posts
Paper Review: NitroGen: A Foundation Model for Generalist Gaming Agents
My review of the paper NitroGen A Foundation Model for Generalist Gaming Agents
Paper Review: LongLive: Real-time Interactive Long Video Generation
My review of the paper LongLive Real-time Interactive Long Video Generation
Paper Review: Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
My review of the paper Pref-GRPO Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Paper Review: DINOv3
Meta's self-supervised vision model trained on 17 billion images, introducing Gram anchoring to prevent feature degra...
Paper Review: V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
A self-supervised video model trained on 1M+ hours of video that understands motion, anticipates actions, and — with ...
Paper Review: TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes
My review of the paper TextCrafter Accurately Rendering Multiple Texts in Complex Visual Scenes
Paper Review: Video-T1: Test-Time Scaling for Video Generation
My review of the paper Video-T1 Test-Time Scaling for Video Generation
Paper Review: SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Google's upgraded vision-language encoders that add self-supervised learning and online data curation to SigLIP, deli...
Paper Review: Goku: Flow Based Video Generative Foundation Models
My review of the paper Goku Flow Based Video Generative Foundation Models
Paper Review: STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
My review of the paper STAR Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolu...
Paper Review: Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
My review of the paper Depth Pro Sharp Monocular Metric Depth in Less Than a Second
Paper Review: Diffusion Feedback Helps CLIP See Better
My review of the paper Diffusion Feedback Helps CLIP See Better
Paper Review: LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models
My review of the paper LiteVAE Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models
Paper Review: YOLOv10: Real-Time End-to-End Object Detection
My review of the paper YOLOv10 Real-Time End-to-End Object Detection
Paper Review: Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
My review of the paper Ferret-v2 An Improved Baseline for Referring and Grounding with Large Language Models
Paper Review: Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
My review of the paper Vision-RWKV Efficient and Scalable Visual Perception with RWKV-Like Architectures
Paper Review: YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
My review of the paper YOLOv9 Learning What You Want to Learn Using Programmable Gradient Information
Paper Review: Lumiere: A Space-Time Diffusion Model for Video Generation
My review of the paper Lumiere A Space-Time Diffusion Model for Video Generation
Paper Review: Scalable Pre-training of Large Autoregressive Image Models
My review of the paper Scalable Pre-training of Large Autoregressive Image Models
Paper Review: Ferret: Refer and Ground Anything Anywhere at Any Granularity
My review of the paper Ferret Refer and Ground Anything Anywhere at Any Granularity
Paper Review: StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
My review of the paper StreamDiffusionStreamDiffusion A Pipeline-Level Solution for Real-Time Interactive Generation
Paper Review: Pixel Aligned Language Models
My review of the paper Pixel Aligned Language Models
Paper Review: EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
My review of the paper EfficientSAM Leveraged Masked Image Pretraining for Efficient Segment Anything
Paper Review: Adversarial Diffusion Distillation
My review of the paper Adversarial Diffusion Distillation
Paper Review: Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion
My review of the paper Diffuse, Attend, and Segment Unsupervised Zero-Shot Segmentation using Stable Diffusion
Paper Review: Diffusion Model Alignment Using Direct Preference Optimization
Adapting DPO from language models to image generation — training Stable Diffusion XL on 851K human preferences to sig...
Paper Review: CogVLM: Visual Expert for Pretrained Language Models
My review of the paper CogVLM Visual Expert for Pretrained Language Models
Paper Review: SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
My review of the paper SAM-CLIP Merging Vision Foundation Models towards Semantic and Spatial Understanding
Paper Review: Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
My review of the paper Monarch Mixer A Simple Sub-Quadratic GEMM-Based Architecture
Paper Review: PaLI-3 Vision Language Models: Smaller, Faster, Stronger
My review of the paper PaLI-3 Vision Language Models Smaller, Faster, Stronger
Paper Review: LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models
My review of the paper LAVIE High-Quality Video Generation with Cascaded Latent Diffusion Models
Paper Review: DreamLLM: Synergistic Multimodal Comprehension and Creation
My review of the paper DreamLLM Synergistic Multimodal Comprehension and Creation
Paper Review: FreeU: Free Lunch in Diffusion U-Net
My review of the paper FreeU Free Lunch in Diffusion U-Net
Paper Review: SLiMe: Segment Like Me
My review of the paper SLiMe Segment Like Me
Paper Review: Contrastive Feature Masking Open-Vocabulary Vision Transformer
My review of the paper Contrastive Feature Masking Open-Vocabulary Vision Transformer
Paper Review: CoTracker: It is Better to Track Together
My review of the paper CoTracker It is Better to Track Together
Paper Review: LISA: Reasoning Segmentation via Large Language Model
My review of the paper LISA Reasoning Segmentation via Large Language Model
Paper Review: FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization
My review of the paper FastViT A Fast Hybrid Vision Transformer using Structural Reparameterization
Paper Review: Meta-Transformer: A Unified Framework for Multimodal Learning
My review of the paper Meta-Transformer A Unified Framework for Multimodal Learning
Paper Review: Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
My review of the paper Scaling Autoregressive Multi-Modal Models Pretraining and Instruction Tuning
Paper Review: UniverSeg: Universal Medical Image Segmentation
My review of the paper UniverSeg Universal Medical Image Segmentation
Paper Review: Recognize Anything: A Strong Image Tagging Model
My review of the paper Recognize Anything A Strong Image Tagging Model
Paper Review: Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
My review of the paper Hiera A Hierarchical Vision Transformer without the Bells-and-Whistles
Paper Review: Fast Segment Anything
My review of the paper Fast Segment Anything
Paper Review: Tracking Everything Everywhere All at Once
My review of the paper Tracking Everything Everywhere All at Once
Paper Review: StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
My review of the paper StableRep Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
Paper Review: The effectiveness of MAE pre-pretraining for billion-scale pretraining
My review of the paper The effectiveness of MAE pre-pretraining for billion-scale pretraining
Paper Review: Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
My review of the paper Drag Your GAN Interactive Point-based Manipulation on the Generative Image Manifold
Paper Review: ImageBind: One Embedding Space To Bind Them All
My review of the paper ImageBind One Embedding Space To Bind Them All
Paper Review: DINOv2: Learning Robust Visual Features without Supervision
How Meta built all-purpose visual features by scaling self-supervised pretraining to a curated 142M-image dataset, pr...
Paper Review: InceptionNeXt: When Inception Meets ConvNeXt
My review of the paper InceptionNeXt When Inception Meets ConvNeXt
Paper Review: Segment Anything
My review of the paper Segment Anything
Paper Review: ReBotNet: Fast Real-time Video Enhancement
My review of the paper ReBotNet Fast Real-time Video Enhancement
Paper Review: Hyena Hierarchy: Towards Larger Convolutional Language Models
My review of the paper Hyena Hierarchy Towards Larger Convolutional Language Models
Paper Review: Scaling Vision Transformers to 22 Billion Parameters
My review of the paper Scaling Vision Transformers to 22 Billion Parameters
Paper Review: Dual PatchNorm
My review of the paper Dual PatchNorm
Paper Review: Cut and Learn for Unsupervised Object Detection and Instance Segmentation
My review of the paper Cut and Learn for Unsupervised Object Detection and Instance Segmentation
Paper Review: StyleGAN-T Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
My review of the paper StyleGAN-T Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
A third life of a personal pet-project for handwritten digit recognition
A pet-project for handwritten digit recognition using YOLOv3 and Streamlit
Paper Review: Next-ViT Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios
My review of the paper Next-ViT Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial S...
Paper Review: NÜWA Visual Synthesis Pre-training for Neural visUal World creAtion
My review of the paper NÜWA Visual Synthesis Pre-training for Neural visUal World creAtion
Paper Review: Swin Transformer V2 Scaling Up Capacity and Resolution
My review of the paper Swin Transformer V2 Scaling Up Capacity and Resolution
Paper Review: SwinIR Image Restoration Using Swin Transformer
My review of the paper SwinIR Image Restoration Using Swin Transformer
Paper Review: Efficient Visual Pretraining with Contrastive Detection
My review of the paper Efficient Visual Pretraining with Contrastive Detection
Paper Review: Domain-Aware Universal Style Transfer
My review of the paper Domain-Aware Universal Style Transfer
Paper Review: YOLOX Exceeding YOLO Series in 2021
My review of the paper YOLOX Exceeding YOLO Series in 2021
Paper Review: Long-Short Transformer Efficient Transformers for Language and Vision
My review of the paper Long-Short Transformer Efficient Transformers for Language and Vision
Paper Review: CoAtNet Marrying Convolution and Attention for All Data Sizes
My review of the paper CoAtNet Marrying Convolution and Attention for All Data Sizes
Paper Review: Generating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains
My review of the paper Generating Furry Cars Disentangling Object Shape and Appearance across Multiple Domains.
Paper Review: EfficientNetV2: Smaller Models and Faster Training
My review of the paper EfficientNetV2 Smaller Models and Faster Training.
Paper Review: Revisiting ResNets: Improved Training and Scaling Strategies
My review of the paper Revisiting ResNets, Improved Training and Scaling Strategies.
Paper Review: Real-World Super-Resolution of Face-Images from Surveillance Cameras
My review of the paper Real-World Super-Resolution of Face-Images from Surveillance Cameras.
Paper Review: Funnel Activation for Visual Recognition
My review of the paper Funnel Activation for Visual Recognition.
Paper Review: ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network
My review of the paper ReXNet Diminishing Representational Bottleneck on Convolutional Neural Network.
Paper Review: VirTex: Learning Visual Representations from Textual Annotations
My review of the paper VirTex Learning Visual Representations from Textual Annotations.
Paper Review: Transformer Reasoning Network for Image-Text Matching and Retrieval
My review of the paper Transformer Reasoning Network for Image-Text Matching and Retrieval.