Tag: cv – Andrey Lukyanenko

Dec 22, 2025

Paper Review: NitroGen: A Foundation Model for Generalist Gaming Agents

My review of the paper NitroGen A Foundation Model for Generalist Gaming Agents

paperreview deeplearning flowmatching cv

Oct 06, 2025

Paper Review: LongLive: Real-time Interactive Long Video Generation

My review of the paper LongLive Real-time Interactive Long Video Generation

paperreview deeplearning imagegeneration videogeneration

Sep 01, 2025

Paper Review: Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

My review of the paper Pref-GRPO Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

paperreview deeplearning imagegeneration cv

Aug 25, 2025

Paper Review: DINOv3

Meta's self-supervised vision model trained on 17 billion images, introducing Gram anchoring to prevent feature degra...

paperreview deeplearning cv pytorch

Jun 23, 2025

Paper Review: V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

A self-supervised video model trained on 1M+ hours of video that understands motion, anticipates actions, and — with ...

paperreview deeplearning cv selfsupervised

Apr 07, 2025

Paper Review: TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes

My review of the paper TextCrafter Accurately Rendering Multiple Texts in Complex Visual Scenes

paperreview deeplearning cv imagegeneration

Mar 24, 2025

Paper Review: Video-T1: Test-Time Scaling for Video Generation

My review of the paper Video-T1 Test-Time Scaling for Video Generation

paperreview deeplearning cv

Feb 24, 2025

Paper Review: SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Google's upgraded vision-language encoders that add self-supervised learning and online data curation to SigLIP, deli...

paperreview deeplearning transformer cv

Feb 17, 2025

Paper Review: Goku: Flow Based Video Generative Foundation Models

My review of the paper Goku Flow Based Video Generative Foundation Models

paperreview deeplearning transformer imagegeneration

Jan 13, 2025

Paper Review: STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

My review of the paper STAR Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolu...

paperreview deeplearning cv video

Oct 07, 2024

Paper Review: Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

My review of the paper Depth Pro Sharp Monocular Metric Depth in Less Than a Second

paperreview deeplearning cv depthestimation

Aug 05, 2024

Paper Review: Diffusion Feedback Helps CLIP See Better

My review of the paper Diffusion Feedback Helps CLIP See Better

paperreview deeplearning clip diffusion

Jun 03, 2024

Paper Review: LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models

My review of the paper LiteVAE Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models

paperreview deeplearning cv autoencoder

May 27, 2024

Paper Review: YOLOv10: Real-Time End-to-End Object Detection

My review of the paper YOLOv10 Real-Time End-to-End Object Detection

paperreview deeplearning objectdetection yolo

Apr 15, 2024

Paper Review: Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

My review of the paper Ferret-v2 An Improved Baseline for Referring and Grounding with Large Language Models

paperreview deeplearning llm cv

Apr 01, 2024

Paper Review: Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

My review of the paper Vision-RWKV Efficient and Scalable Visual Perception with RWKV-Like Architectures

paperreview deeplearning cv rnn

Feb 26, 2024

Paper Review: YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

My review of the paper YOLOv9 Learning What You Want to Learn Using Programmable Gradient Information

paperreview deeplearning cv objectdetection

Jan 29, 2024

Paper Review: Lumiere: A Space-Time Diffusion Model for Video Generation

My review of the paper Lumiere A Space-Time Diffusion Model for Video Generation

paperreview deeplearning cv stablediffusion

Jan 22, 2024

Paper Review: Scalable Pre-training of Large Autoregressive Image Models

My review of the paper Scalable Pre-training of Large Autoregressive Image Models

paperreview deeplearning cv

Jan 15, 2024

Paper Review: Ferret: Refer and Ground Anything Anywhere at Any Granularity

My review of the paper Ferret Refer and Ground Anything Anywhere at Any Granularity

paperreview deeplearning llm cv

Dec 25, 2023

Paper Review: StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation

My review of the paper StreamDiffusionStreamDiffusion A Pipeline-Level Solution for Real-Time Interactive Generation

paperreview deeplearning stablediffusion cv

Dec 18, 2023

Paper Review: Pixel Aligned Language Models

My review of the paper Pixel Aligned Language Models

paperreview deeplearning llm cv

Dec 12, 2023

Paper Review: EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

My review of the paper EfficientSAM Leveraged Masked Image Pretraining for Efficient Segment Anything

paperreview deeplearning imagesegmentation cv

Dec 04, 2023

Paper Review: Adversarial Diffusion Distillation

My review of the paper Adversarial Diffusion Distillation

paperreview deeplearning cv stablediffusion

Nov 30, 2023

Paper Review: Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion

My review of the paper Diffuse, Attend, and Segment Unsupervised Zero-Shot Segmentation using Stable Diffusion

paperreview deeplearning cv stablediffusion

Nov 27, 2023

Paper Review: Diffusion Model Alignment Using Direct Preference Optimization

Adapting DPO from language models to image generation — training Stable Diffusion XL on 851K human preferences to sig...

paperreview deeplearning cv stablediffusion

Nov 09, 2023

Paper Review: CogVLM: Visual Expert for Pretrained Language Models

My review of the paper CogVLM Visual Expert for Pretrained Language Models

paperreview deeplearning cv pretraining

Nov 02, 2023

Paper Review: SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

My review of the paper SAM-CLIP Merging Vision Foundation Models towards Semantic and Spatial Understanding

paperreview deeplearning cv imagesegmentation

Oct 26, 2023

Paper Review: Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

My review of the paper Monarch Mixer A Simple Sub-Quadratic GEMM-Based Architecture

paperreview deeplearning nlp cv

Oct 19, 2023

Paper Review: PaLI-3 Vision Language Models: Smaller, Faster, Stronger

My review of the paper PaLI-3 Vision Language Models Smaller, Faster, Stronger

paperreview deeplearning llm vlm

Oct 02, 2023

Paper Review: LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

My review of the paper LAVIE High-Quality Video Generation with Cascaded Latent Diffusion Models

paperreview deeplearning cv diffusion

Sep 28, 2023

Paper Review: DreamLLM: Synergistic Multimodal Comprehension and Creation

My review of the paper DreamLLM Synergistic Multimodal Comprehension and Creation

paperreview deeplearning llm cv

Sep 25, 2023

Paper Review: FreeU: Free Lunch in Diffusion U-Net

My review of the paper FreeU Free Lunch in Diffusion U-Net

paperreview deeplearning stablediffusion unet

Sep 18, 2023

Paper Review: SLiMe: Segment Like Me

My review of the paper SLiMe Segment Like Me

paperreview deeplearning cv imagesegmentation

Sep 07, 2023

Paper Review: Contrastive Feature Masking Open-Vocabulary Vision Transformer

My review of the paper Contrastive Feature Masking Open-Vocabulary Vision Transformer

paperreview deeplearning cv objectdetection

Aug 31, 2023

Paper Review: CoTracker: It is Better to Track Together

My review of the paper CoTracker It is Better to Track Together

paperreview deeplearning cv objecttracking

Aug 21, 2023

Paper Review: LISA: Reasoning Segmentation via Large Language Model

My review of the paper LISA Reasoning Segmentation via Large Language Model

paperreview deeplearning cv imagesegmentation

Aug 17, 2023

Paper Review: FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization

My review of the paper FastViT A Fast Hybrid Vision Transformer using Structural Reparameterization

paperreview deeplearning cv sota

Jul 27, 2023

Paper Review: Meta-Transformer: A Unified Framework for Multimodal Learning

My review of the paper Meta-Transformer A Unified Framework for Multimodal Learning

paperreview deeplearning nlp transformer

Jul 17, 2023

Paper Review: Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

My review of the paper Scaling Autoregressive Multi-Modal Models Pretraining and Instruction Tuning

paperreview deeplearning cv nlp

Jul 13, 2023

Paper Review: UniverSeg: Universal Medical Image Segmentation

My review of the paper UniverSeg Universal Medical Image Segmentation

paperreview deeplearning cv imagesegmentation

Jul 10, 2023

Paper Review: Recognize Anything: A Strong Image Tagging Model

My review of the paper Recognize Anything A Strong Image Tagging Model

paperreview deeplearning cv imagecaptioning

Jul 06, 2023

Paper Review: Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

My review of the paper Hiera A Hierarchical Vision Transformer without the Bells-and-Whistles

paperreview deeplearning cv transformer

Jun 29, 2023

Paper Review: Fast Segment Anything

My review of the paper Fast Segment Anything

paperreview deeplearning cv imagesegmentation

Jun 26, 2023

Paper Review: Tracking Everything Everywhere All at Once

My review of the paper Tracking Everything Everywhere All at Once

paperreview deeplearning cv motiontracking

Jun 08, 2023

Paper Review: StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners

My review of the paper StableRep Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners

paperreview deeplearning stablediffusion nlp

Jun 05, 2023

Paper Review: The effectiveness of MAE pre-pretraining for billion-scale pretraining

My review of the paper The effectiveness of MAE pre-pretraining for billion-scale pretraining

paperreview deeplearning cv pretraining

May 22, 2023

Paper Review: Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

My review of the paper Drag Your GAN Interactive Point-based Manipulation on the Generative Image Manifold

paperreview deeplearning cv gan

May 10, 2023

Paper Review: ImageBind: One Embedding Space To Bind Them All

My review of the paper ImageBind One Embedding Space To Bind Them All

paperreview deeplearning nlp cv

Apr 20, 2023

Paper Review: DINOv2: Learning Robust Visual Features without Supervision

How Meta built all-purpose visual features by scaling self-supervised pretraining to a curated 142M-image dataset, pr...

paperreview deeplearning cv pytorch

Apr 17, 2023

Paper Review: InceptionNeXt: When Inception Meets ConvNeXt

My review of the paper InceptionNeXt When Inception Meets ConvNeXt

paperreview deeplearning cv pytorch

Apr 08, 2023

Paper Review: Segment Anything

My review of the paper Segment Anything

paperreview deeplearning cv pytorch

Mar 27, 2023

Paper Review: ReBotNet: Fast Real-time Video Enhancement

My review of the paper ReBotNet Fast Real-time Video Enhancement

paperreview deeplearning cv video

Mar 20, 2023

Paper Review: Hyena Hierarchy: Towards Larger Convolutional Language Models

My review of the paper Hyena Hierarchy Towards Larger Convolutional Language Models

paperreview deeplearning nlp cv

Feb 20, 2023

Paper Review: Scaling Vision Transformers to 22 Billion Parameters

My review of the paper Scaling Vision Transformers to 22 Billion Parameters

paperreview deeplearning cv transformer

Feb 13, 2023

Paper Review: Dual PatchNorm

My review of the paper Dual PatchNorm

paperreview deeplearning transformer cv

Feb 06, 2023

Paper Review: Cut and Learn for Unsupervised Object Detection and Instance Segmentation

My review of the paper Cut and Learn for Unsupervised Object Detection and Instance Segmentation

paperreview deeplearning cv objectdetection

Jan 29, 2023

Paper Review: StyleGAN-T Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis

My review of the paper StyleGAN-T Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis

paperreview deeplearning cv gan

Dec 22, 2022

A third life of a personal pet-project for handwritten digit recognition

A pet-project for handwritten digit recognition using YOLOv3 and Streamlit

blogpost cv objectdetection

Jul 24, 2022

Paper Review: Next-ViT Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios

My review of the paper Next-ViT Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial S...

paperreview deeplearning cv transformer

Nov 25, 2021

Paper Review: NÜWA Visual Synthesis Pre-training for Neural visUal World creAtion

My review of the paper NÜWA Visual Synthesis Pre-training for Neural visUal World creAtion

paperreview deeplearning cv transformer

Nov 19, 2021

Paper Review: Swin Transformer V2 Scaling Up Capacity and Resolution

My review of the paper Swin Transformer V2 Scaling Up Capacity and Resolution

paperreview deeplearning cv transformer

Sep 13, 2021

Paper Review: SwinIR Image Restoration Using Swin Transformer

My review of the paper SwinIR Image Restoration Using Swin Transformer

paperreview deeplearning cv transformer

Sep 01, 2021

Paper Review: Efficient Visual Pretraining with Contrastive Detection

My review of the paper Efficient Visual Pretraining with Contrastive Detection

paperreview deeplearning cv pretraining

Aug 15, 2021

Paper Review: Domain-Aware Universal Style Transfer

My review of the paper Domain-Aware Universal Style Transfer

paperreview deeplearning cv styletransfer

Jul 23, 2021

Paper Review: YOLOX Exceeding YOLO Series in 2021

My review of the paper YOLOX Exceeding YOLO Series in 2021

paperreview deeplearning cv objectdetection

Jul 12, 2021

Paper Review: Long-Short Transformer Efficient Transformers for Language and Vision

My review of the paper Long-Short Transformer Efficient Transformers for Language and Vision

paperreview deeplearning cv nlp

Jun 10, 2021

Paper Review: CoAtNet Marrying Convolution and Attention for All Data Sizes

My review of the paper CoAtNet Marrying Convolution and Attention for All Data Sizes

paperreview deeplearning cv pretraining

Apr 07, 2021

Paper Review: Generating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains

My review of the paper Generating Furry Cars Disentangling Object Shape and Appearance across Multiple Domains.

paperreview cv gan deeplearning

Apr 02, 2021

Paper Review: EfficientNetV2: Smaller Models and Faster Training

My review of the paper EfficientNetV2 Smaller Models and Faster Training.

paperreview cv sota nas

Mar 16, 2021

Paper Review: Revisiting ResNets: Improved Training and Scaling Strategies

My review of the paper Revisiting ResNets, Improved Training and Scaling Strategies.

paperreview cv sota

Feb 22, 2021

Paper Review: Real-World Super-Resolution of Face-Images from Surveillance Cameras

My review of the paper Real-World Super-Resolution of Face-Images from Surveillance Cameras.

paperreview deeplearning superresolution cv

Jul 28, 2020

Paper Review: Funnel Activation for Visual Recognition

My review of the paper Funnel Activation for Visual Recognition.

paperreview deeplearning activationfunction cv

Jul 04, 2020

Paper Review: ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network

My review of the paper ReXNet Diminishing Representational Bottleneck on Convolutional Neural Network.

paperreview deeplearning pretraining transferlearning

Jun 14, 2020

Paper Review: VirTex: Learning Visual Representations from Textual Annotations

My review of the paper VirTex Learning Visual Representations from Textual Annotations.

paperreview imagecaptioning cv visual

May 17, 2020

Paper Review: Transformer Reasoning Network for Image-Text Matching and Retrieval

My review of the paper Transformer Reasoning Network for Image-Text Matching and Retrieval.

paperreview transformer cv imagetextmatching