Tag: cv

76 posts

Dec 22, 2025
Paper Review: NitroGen: A Foundation Model for Generalist Gaming Agents
My review of the paper NitroGen A Foundation Model for Generalist Gaming Agents
paperreview deeplearning flowmatching cv
Oct 06, 2025
Paper Review: LongLive: Real-time Interactive Long Video Generation
My review of the paper LongLive Real-time Interactive Long Video Generation
paperreview deeplearning imagegeneration videogeneration
Sep 01, 2025
Paper Review: Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
My review of the paper Pref-GRPO Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
paperreview deeplearning imagegeneration cv
Aug 25, 2025
Paper Review: DINOv3
Meta's self-supervised vision model trained on 17 billion images, introducing Gram anchoring to prevent feature degra...
paperreview deeplearning cv pytorch
Jun 23, 2025
Paper Review: V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
A self-supervised video model trained on 1M+ hours of video that understands motion, anticipates actions, and — with ...
paperreview deeplearning cv selfsupervised
Apr 07, 2025
Paper Review: TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes
My review of the paper TextCrafter Accurately Rendering Multiple Texts in Complex Visual Scenes
paperreview deeplearning cv imagegeneration
Mar 24, 2025
Paper Review: Video-T1: Test-Time Scaling for Video Generation
My review of the paper Video-T1 Test-Time Scaling for Video Generation
paperreview deeplearning cv
Feb 24, 2025
Paper Review: SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Google's upgraded vision-language encoders that add self-supervised learning and online data curation to SigLIP, deli...
paperreview deeplearning transformer cv
Feb 17, 2025
Paper Review: Goku: Flow Based Video Generative Foundation Models
My review of the paper Goku Flow Based Video Generative Foundation Models
paperreview deeplearning transformer imagegeneration
Jan 13, 2025
Paper Review: STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
My review of the paper STAR Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolu...
paperreview deeplearning cv video
Oct 07, 2024
Paper Review: Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
My review of the paper Depth Pro Sharp Monocular Metric Depth in Less Than a Second
paperreview deeplearning cv depthestimation
Aug 05, 2024
Paper Review: Diffusion Feedback Helps CLIP See Better
My review of the paper Diffusion Feedback Helps CLIP See Better
paperreview deeplearning clip diffusion
Jun 03, 2024
Paper Review: LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models
My review of the paper LiteVAE Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models
paperreview deeplearning cv autoencoder
May 27, 2024
Paper Review: YOLOv10: Real-Time End-to-End Object Detection
My review of the paper YOLOv10 Real-Time End-to-End Object Detection
paperreview deeplearning objectdetection yolo
Apr 15, 2024
Paper Review: Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
My review of the paper Ferret-v2 An Improved Baseline for Referring and Grounding with Large Language Models
paperreview deeplearning llm cv
Apr 01, 2024
Paper Review: Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
My review of the paper Vision-RWKV Efficient and Scalable Visual Perception with RWKV-Like Architectures
paperreview deeplearning cv rnn
Feb 26, 2024
Paper Review: YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
My review of the paper YOLOv9 Learning What You Want to Learn Using Programmable Gradient Information
paperreview deeplearning cv objectdetection
Jan 29, 2024
Paper Review: Lumiere: A Space-Time Diffusion Model for Video Generation
My review of the paper Lumiere A Space-Time Diffusion Model for Video Generation
paperreview deeplearning cv stablediffusion
Jan 22, 2024
Paper Review: Scalable Pre-training of Large Autoregressive Image Models
My review of the paper Scalable Pre-training of Large Autoregressive Image Models
paperreview deeplearning cv
Jan 15, 2024
Paper Review: Ferret: Refer and Ground Anything Anywhere at Any Granularity
My review of the paper Ferret Refer and Ground Anything Anywhere at Any Granularity
paperreview deeplearning llm cv
Dec 25, 2023
Paper Review: StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
My review of the paper StreamDiffusionStreamDiffusion A Pipeline-Level Solution for Real-Time Interactive Generation
paperreview deeplearning stablediffusion cv
Dec 18, 2023
Paper Review: Pixel Aligned Language Models
My review of the paper Pixel Aligned Language Models
paperreview deeplearning llm cv
Dec 12, 2023
Paper Review: EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
My review of the paper EfficientSAM Leveraged Masked Image Pretraining for Efficient Segment Anything
paperreview deeplearning imagesegmentation cv
Dec 04, 2023
Paper Review: Adversarial Diffusion Distillation
My review of the paper Adversarial Diffusion Distillation
paperreview deeplearning cv stablediffusion
Nov 30, 2023
Paper Review: Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion
My review of the paper Diffuse, Attend, and Segment Unsupervised Zero-Shot Segmentation using Stable Diffusion
paperreview deeplearning cv stablediffusion
Nov 27, 2023
Paper Review: Diffusion Model Alignment Using Direct Preference Optimization
Adapting DPO from language models to image generation — training Stable Diffusion XL on 851K human preferences to sig...
paperreview deeplearning cv stablediffusion
Nov 09, 2023
Paper Review: CogVLM: Visual Expert for Pretrained Language Models
My review of the paper CogVLM Visual Expert for Pretrained Language Models
paperreview deeplearning cv pretraining
Nov 02, 2023
Paper Review: SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
My review of the paper SAM-CLIP Merging Vision Foundation Models towards Semantic and Spatial Understanding
paperreview deeplearning cv imagesegmentation
Oct 26, 2023
Paper Review: Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
My review of the paper Monarch Mixer A Simple Sub-Quadratic GEMM-Based Architecture
paperreview deeplearning nlp cv
Oct 19, 2023
Paper Review: PaLI-3 Vision Language Models: Smaller, Faster, Stronger
My review of the paper PaLI-3 Vision Language Models Smaller, Faster, Stronger
paperreview deeplearning llm vlm
Oct 02, 2023
Paper Review: LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models
My review of the paper LAVIE High-Quality Video Generation with Cascaded Latent Diffusion Models
paperreview deeplearning cv diffusion
Sep 28, 2023
Paper Review: DreamLLM: Synergistic Multimodal Comprehension and Creation
My review of the paper DreamLLM Synergistic Multimodal Comprehension and Creation
paperreview deeplearning llm cv
Sep 25, 2023
Paper Review: FreeU: Free Lunch in Diffusion U-Net
My review of the paper FreeU Free Lunch in Diffusion U-Net
paperreview deeplearning stablediffusion unet
Sep 18, 2023
Paper Review: SLiMe: Segment Like Me
My review of the paper SLiMe Segment Like Me
paperreview deeplearning cv imagesegmentation
Sep 07, 2023
Paper Review: Contrastive Feature Masking Open-Vocabulary Vision Transformer
My review of the paper Contrastive Feature Masking Open-Vocabulary Vision Transformer
paperreview deeplearning cv objectdetection
Aug 31, 2023
Paper Review: CoTracker: It is Better to Track Together
My review of the paper CoTracker It is Better to Track Together
paperreview deeplearning cv objecttracking
Aug 21, 2023
Paper Review: LISA: Reasoning Segmentation via Large Language Model
My review of the paper LISA Reasoning Segmentation via Large Language Model
paperreview deeplearning cv imagesegmentation
Aug 17, 2023
Paper Review: FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization
My review of the paper FastViT A Fast Hybrid Vision Transformer using Structural Reparameterization
paperreview deeplearning cv sota
Jul 27, 2023
Paper Review: Meta-Transformer: A Unified Framework for Multimodal Learning
My review of the paper Meta-Transformer A Unified Framework for Multimodal Learning
paperreview deeplearning nlp transformer
Jul 17, 2023
Paper Review: Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
My review of the paper Scaling Autoregressive Multi-Modal Models Pretraining and Instruction Tuning
paperreview deeplearning cv nlp
Jul 13, 2023
Paper Review: UniverSeg: Universal Medical Image Segmentation
My review of the paper UniverSeg Universal Medical Image Segmentation
paperreview deeplearning cv imagesegmentation
Jul 10, 2023
Paper Review: Recognize Anything: A Strong Image Tagging Model
My review of the paper Recognize Anything A Strong Image Tagging Model
paperreview deeplearning cv imagecaptioning
Jul 06, 2023
Paper Review: Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
My review of the paper Hiera A Hierarchical Vision Transformer without the Bells-and-Whistles
paperreview deeplearning cv transformer
Jun 29, 2023
Paper Review: Fast Segment Anything
My review of the paper Fast Segment Anything
paperreview deeplearning cv imagesegmentation
Jun 26, 2023
Paper Review: Tracking Everything Everywhere All at Once
My review of the paper Tracking Everything Everywhere All at Once
paperreview deeplearning cv motiontracking
Jun 08, 2023
Paper Review: StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
My review of the paper StableRep Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
paperreview deeplearning stablediffusion nlp
Jun 05, 2023
Paper Review: The effectiveness of MAE pre-pretraining for billion-scale pretraining
My review of the paper The effectiveness of MAE pre-pretraining for billion-scale pretraining
paperreview deeplearning cv pretraining
May 22, 2023
Paper Review: Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
My review of the paper Drag Your GAN Interactive Point-based Manipulation on the Generative Image Manifold
paperreview deeplearning cv gan
May 10, 2023
Paper Review: ImageBind: One Embedding Space To Bind Them All
My review of the paper ImageBind One Embedding Space To Bind Them All
paperreview deeplearning nlp cv
Apr 20, 2023
Paper Review: DINOv2: Learning Robust Visual Features without Supervision
How Meta built all-purpose visual features by scaling self-supervised pretraining to a curated 142M-image dataset, pr...
paperreview deeplearning cv pytorch
Apr 17, 2023
Paper Review: InceptionNeXt: When Inception Meets ConvNeXt
My review of the paper InceptionNeXt When Inception Meets ConvNeXt
paperreview deeplearning cv pytorch
Apr 08, 2023
Paper Review: Segment Anything
My review of the paper Segment Anything
paperreview deeplearning cv pytorch
Mar 27, 2023
Paper Review: ReBotNet: Fast Real-time Video Enhancement
My review of the paper ReBotNet Fast Real-time Video Enhancement
paperreview deeplearning cv video
Mar 20, 2023
Paper Review: Hyena Hierarchy: Towards Larger Convolutional Language Models
My review of the paper Hyena Hierarchy Towards Larger Convolutional Language Models
paperreview deeplearning nlp cv
Feb 20, 2023
Paper Review: Scaling Vision Transformers to 22 Billion Parameters
My review of the paper Scaling Vision Transformers to 22 Billion Parameters
paperreview deeplearning cv transformer
Feb 13, 2023
Paper Review: Dual PatchNorm
My review of the paper Dual PatchNorm
paperreview deeplearning transformer cv
Feb 06, 2023
Paper Review: Cut and Learn for Unsupervised Object Detection and Instance Segmentation
My review of the paper Cut and Learn for Unsupervised Object Detection and Instance Segmentation
paperreview deeplearning cv objectdetection
Jan 29, 2023
Paper Review: StyleGAN-T Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
My review of the paper StyleGAN-T Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
paperreview deeplearning cv gan
Dec 22, 2022
A third life of a personal pet-project for handwritten digit recognition
A pet-project for handwritten digit recognition using YOLOv3 and Streamlit
blogpost cv objectdetection
Jul 24, 2022
Paper Review: Next-ViT Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios
My review of the paper Next-ViT Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial S...
paperreview deeplearning cv transformer
Nov 25, 2021
Paper Review: NÜWA Visual Synthesis Pre-training for Neural visUal World creAtion
My review of the paper NÜWA Visual Synthesis Pre-training for Neural visUal World creAtion
paperreview deeplearning cv transformer
Nov 19, 2021
Paper Review: Swin Transformer V2 Scaling Up Capacity and Resolution
My review of the paper Swin Transformer V2 Scaling Up Capacity and Resolution
paperreview deeplearning cv transformer
Sep 13, 2021
Paper Review: SwinIR Image Restoration Using Swin Transformer
My review of the paper SwinIR Image Restoration Using Swin Transformer
paperreview deeplearning cv transformer
Sep 01, 2021
Paper Review: Efficient Visual Pretraining with Contrastive Detection
My review of the paper Efficient Visual Pretraining with Contrastive Detection
paperreview deeplearning cv pretraining
Aug 15, 2021
Paper Review: Domain-Aware Universal Style Transfer
My review of the paper Domain-Aware Universal Style Transfer
paperreview deeplearning cv styletransfer
Jul 23, 2021
Paper Review: YOLOX Exceeding YOLO Series in 2021
My review of the paper YOLOX Exceeding YOLO Series in 2021
paperreview deeplearning cv objectdetection
Jul 12, 2021
Paper Review: Long-Short Transformer Efficient Transformers for Language and Vision
My review of the paper Long-Short Transformer Efficient Transformers for Language and Vision
paperreview deeplearning cv nlp
Jun 10, 2021
Paper Review: CoAtNet Marrying Convolution and Attention for All Data Sizes
My review of the paper CoAtNet Marrying Convolution and Attention for All Data Sizes
paperreview deeplearning cv pretraining
Apr 07, 2021
Paper Review: Generating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains
My review of the paper Generating Furry Cars Disentangling Object Shape and Appearance across Multiple Domains.
paperreview cv gan deeplearning
Apr 02, 2021
Paper Review: EfficientNetV2: Smaller Models and Faster Training
My review of the paper EfficientNetV2 Smaller Models and Faster Training.
paperreview cv sota nas
Mar 16, 2021
Paper Review: Revisiting ResNets: Improved Training and Scaling Strategies
My review of the paper Revisiting ResNets, Improved Training and Scaling Strategies.
paperreview cv sota
Feb 22, 2021
Paper Review: Real-World Super-Resolution of Face-Images from Surveillance Cameras
My review of the paper Real-World Super-Resolution of Face-Images from Surveillance Cameras.
paperreview deeplearning superresolution cv
Jul 28, 2020
Paper Review: Funnel Activation for Visual Recognition
My review of the paper Funnel Activation for Visual Recognition.
paperreview deeplearning activationfunction cv
Jul 04, 2020
Paper Review: ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network
My review of the paper ReXNet Diminishing Representational Bottleneck on Convolutional Neural Network.
paperreview deeplearning pretraining transferlearning
Jun 14, 2020
Paper Review: VirTex: Learning Visual Representations from Textual Annotations
My review of the paper VirTex Learning Visual Representations from Textual Annotations.
paperreview imagecaptioning cv visual
May 17, 2020
Paper Review: Transformer Reasoning Network for Image-Text Matching and Retrieval
My review of the paper Transformer Reasoning Network for Image-Text Matching and Retrieval.
paperreview transformer cv imagetextmatching

← All tags