Tag: cv
- Paper Review: Depth Pro: Sharp Monocular Metric Depth in Less Than a Second (07 Oct 2024)
- Paper Review: Diffusion Feedback Helps CLIP See Better (05 Aug 2024)
- Paper Review: LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models (03 Jun 2024)
- Paper Review: YOLOv10: Real-Time End-to-End Object Detection (27 May 2024)
- Paper Review: Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models (15 Apr 2024)
- Paper Review: Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures (01 Apr 2024)
- Paper Review: YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information (26 Feb 2024)
- Paper Review: Lumiere: A Space-Time Diffusion Model for Video Generation (29 Jan 2024)
- Paper Review: Scalable Pre-training of Large Autoregressive Image Models (22 Jan 2024)
- Paper Review: Ferret: Refer and Ground Anything Anywhere at Any Granularity (15 Jan 2024)
- Paper Review: StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation (25 Dec 2023)
- Paper Review: Pixel Aligned Language Models (18 Dec 2023)
- Paper Review: EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything (12 Dec 2023)
- Paper Review: Adversarial Diffusion Distillation (04 Dec 2023)
- Paper Review: Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion (30 Nov 2023)
- Paper Review: Diffusion Model Alignment Using Direct Preference Optimization (27 Nov 2023)
- Paper Review: CogVLM: Visual Expert for Pretrained Language Models (09 Nov 2023)
- Paper Review: SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding (02 Nov 2023)
- Paper Review: Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture (26 Oct 2023)
- Paper Review: PaLI-3 Vision Language Models: Smaller, Faster, Stronger (19 Oct 2023)
- Paper Review: LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models (02 Oct 2023)
- Paper Review: DreamLLM: Synergistic Multimodal Comprehension and Creation (28 Sep 2023)
- Paper Review: FreeU: Free Lunch in Diffusion U-Net (25 Sep 2023)
- Paper Review: SLiMe: Segment Like Me (18 Sep 2023)
- Paper Review: Contrastive Feature Masking Open-Vocabulary Vision Transformer (07 Sep 2023)
- Paper Review: CoTracker: It is Better to Track Together (31 Aug 2023)
- Paper Review: LISA: Reasoning Segmentation via Large Language Model (21 Aug 2023)
- Paper Review: FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization (17 Aug 2023)
- Paper Review: Meta-Transformer: A Unified Framework for Multimodal Learning (27 Jul 2023)
- Paper Review: Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning (17 Jul 2023)
- Paper Review: UniverSeg: Universal Medical Image Segmentation (13 Jul 2023)
- Paper Review: Recognize Anything: A Strong Image Tagging Model (10 Jul 2023)
- Paper Review: Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles (06 Jul 2023)
- Paper Review: Fast Segment Anything (29 Jun 2023)
- Paper Review: Tracking Everything Everywhere All at Once (26 Jun 2023)
- Paper Review: StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners (08 Jun 2023)
- Paper Review: The effectiveness of MAE pre-pretraining for billion-scale pretraining (05 Jun 2023)
- Paper Review: Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold (22 May 2023)
- Paper Review: ImageBind: One Embedding Space To Bind Them All (10 May 2023)
- Paper Review: DINOv2: Learning Robust Visual Features without Supervision (20 Apr 2023)
- Paper Review: InceptionNeXt: When Inception Meets ConvNeXt (17 Apr 2023)
- Paper Review: Segment Anything (08 Apr 2023)
- Paper Review: ReBotNet: Fast Real-time Video Enhancement (27 Mar 2023)
- Paper Review: Hyena Hierarchy: Towards Larger Convolutional Language Models (20 Mar 2023)
- Paper Review: Scaling Vision Transformers to 22 Billion Parameters (20 Feb 2023)
- Paper Review: Dual PatchNorm (13 Feb 2023)
- Paper Review: Cut and Learn for Unsupervised Object Detection and Instance Segmentation (06 Feb 2023)
- Paper Review: StyleGAN-T Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis (29 Jan 2023)
- A third life of a personal pet-project for handwritten digit recognition (22 Dec 2022)
- Paper Review: Next-ViT Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios (24 Jul 2022)
- Paper Review: NÜWA Visual Synthesis Pre-training for Neural visUal World creAtion (25 Nov 2021)
- Paper Review: Swin Transformer V2 Scaling Up Capacity and Resolution (19 Nov 2021)
- Paper Review: SwinIR Image Restoration Using Swin Transformer (13 Sep 2021)
- Paper Review: Efficient Visual Pretraining with Contrastive Detection (01 Sep 2021)
- Paper Review: Domain-Aware Universal Style Transfer (15 Aug 2021)
- Paper Review: YOLOX Exceeding YOLO Series in 2021 (23 Jul 2021)
- Paper Review: Long-Short Transformer Efficient Transformers for Language and Vision (12 Jul 2021)
- Paper Review: CoAtNet Marrying Convolution and Attention for All Data Sizes (10 Jun 2021)
- Paper Review: Generating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains (07 Apr 2021)
- Paper Review: EfficientNetV2: Smaller Models and Faster Training (02 Apr 2021)
- Paper Review: Revisiting ResNets: Improved Training and Scaling Strategies (16 Mar 2021)
- Paper Review: Funnel Activation for Visual Recognition (28 Jul 2020)
- Paper Review: ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network (04 Jul 2020)
- Paper Review: VirTex: Learning Visual Representations from Textual Annotations (14 Jun 2020)
- Paper Review: Transformer Reasoning Network for Image-Text Matching and Retrieval (17 May 2020)