Tag: deeplearning
- Paper Review: Depth Pro: Sharp Monocular Metric Depth in Less Than a Second (07 Oct 2024)
- Paper Review: Training Language Models to Self-Correct via Reinforcement Learning (23 Sep 2024)
- Paper Review: Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency (16 Sep 2024)
- Paper Review: Agentic Retrieval-Augmented Generation for Time Series Analysis (04 Sep 2024)
- Paper Review: Winning Amazon KDD Cup24 (19 Aug 2024)
- Paper Review: Wolf: Captioning Everything with a World Summarization Framework (12 Aug 2024)
- Paper Review: Diffusion Feedback Helps CLIP See Better (05 Aug 2024)
- Paper Review: Masked Attention is All You Need for Graphs (29 Jul 2024)
- Paper Review: RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs (22 Jul 2024)
- Paper Review: Unveiling Encoder-Free Vision-Language Models (15 Jul 2024)
- Paper Review: Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning (01 Jul 2024)
- Paper Review: Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling (17 Jun 2024)
- Paper Review: σ-GPTs: A New Approach to Autoregressive Models (10 Jun 2024)
- Paper Review: LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models (03 Jun 2024)
- Paper Review: YOLOv10: Real-Time End-to-End Object Detection (27 May 2024)
- Paper Review: Chameleon: Mixed-Modal Early-Fusion Foundation Models (20 May 2024)
- Paper Review: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models (13 May 2024)
- Paper Review: FlowMind: Automatic Workflow Generation with LLMs (06 May 2024)
- Paper Review: Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models (15 Apr 2024)
- Paper Review: Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction (08 Apr 2024)
- Paper Review: Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures (01 Apr 2024)
- Paper Review: Chronos: Learning the Language of Time Series (25 Mar 2024)
- Paper Review: Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks (19 Mar 2024)
- Paper Review: NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models (11 Mar 2024)
- Paper Review: Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models (04 Mar 2024)
- Paper Review: YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information (26 Feb 2024)
- Paper Review: LiRank: Industrial Large Scale Ranking Models at LinkedIn (19 Feb 2024)
- Paper Review: Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting (12 Feb 2024)
- Paper Review: Lumiere: A Space-Time Diffusion Model for Video Generation (29 Jan 2024)
- Paper Review: Scalable Pre-training of Large Autoregressive Image Models (22 Jan 2024)
- Paper Review: Ferret: Refer and Ground Anything Anywhere at Any Granularity (15 Jan 2024)
- Paper Review: DocLLM: A layout-aware generative language model for multimodal document understanding (08 Jan 2024)
- Paper Review: StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation (25 Dec 2023)
- Paper Review: Pixel Aligned Language Models (18 Dec 2023)
- Paper Review: EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything (12 Dec 2023)
- Paper Review: Translatotron 3: Speech to Speech Translation with Monolingual Data (07 Dec 2023)
- Paper Review: Adversarial Diffusion Distillation (04 Dec 2023)
- Paper Review: Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion (30 Nov 2023)
- Paper Review: Diffusion Model Alignment Using Direct Preference Optimization (27 Nov 2023)
- Paper Review: Orca 2: Teaching Small Language Models How to Reason (23 Nov 2023)
- Paper Review: Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models (20 Nov 2023)
- Paper Review: Deep Learning for Day Forecasts from Sparse Observations (16 Nov 2023)
- Paper Review: Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM (13 Nov 2023)
- Paper Review: CogVLM: Visual Expert for Pretrained Language Models (09 Nov 2023)
- Paper Review: Collaborative Large Language Model for Recommender Systems (06 Nov 2023)
- Paper Review: SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding (02 Nov 2023)
- Paper Review: Zephyr: Direct Distillation of LM Alignment (30 Oct 2023)
- Paper Review: Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture (26 Oct 2023)
- Paper Review: Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection (23 Oct 2023)
- Paper Review: PaLI-3 Vision Language Models: Smaller, Faster, Stronger (19 Oct 2023)
- Paper Review: InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining (16 Oct 2023)
- Paper Review: Mistral 7B (12 Oct 2023)
- Paper Review: Think before you speak: Training Language Models With Pause Tokens (09 Oct 2023)
- Paper Review: QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models (05 Oct 2023)
- Paper Review: LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models (02 Oct 2023)
- Paper Review: DreamLLM: Synergistic Multimodal Comprehension and Creation (28 Sep 2023)
- Paper Review: FreeU: Free Lunch in Diffusion U-Net (25 Sep 2023)
- Paper Review: Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers (21 Sep 2023)
- Paper Review: SLiMe: Segment Like Me (18 Sep 2023)
- Paper Review: TSMixer: An All-MLP Architecture for Time Series Forecasting (14 Sep 2023)
- Paper Review: Explaining grokking through circuit efficiency (11 Sep 2023)
- Paper Review: Contrastive Feature Masking Open-Vocabulary Vision Transformer (07 Sep 2023)
- Paper Review: RecMind: Large Language Model Powered Agent For Recommendation (04 Sep 2023)
- Paper Review: CoTracker: It is Better to Track Together (31 Aug 2023)
- Paper Review: Giraffe: Adventures in Expanding Context Lengths in LLMs (28 Aug 2023)
- Paper Review: OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents (24 Aug 2023)
- Paper Review: LISA: Reasoning Segmentation via Large Language Model (21 Aug 2023)
- Paper Review: FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization (17 Aug 2023)
- Paper Review: Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback (10 Aug 2023)
- Paper Review: UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition (10 Aug 2023)
- Paper Review: Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding (07 Aug 2023)
- Paper Review: Tracking Anything in High Quality (03 Aug 2023)
- Paper Review: TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning (31 Jul 2023)
- Paper Review: Meta-Transformer: A Unified Framework for Multimodal Learning (27 Jul 2023)
- Paper Review: Retentive Network: A Successor to Transformer for Large Language Models (24 Jul 2023)
- Paper Review: Llama 2: Open Foundation and Fine-Tuned Chat Models (20 Jul 2023)
- Paper Review: Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning (17 Jul 2023)
- Paper Review: UniverSeg: Universal Medical Image Segmentation (13 Jul 2023)
- Paper Review: Recognize Anything: A Strong Image Tagging Model (10 Jul 2023)
- Paper Review: Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles (06 Jul 2023)
- Paper Review: Multilingual End to End Entity Linking (03 Jul 2023)
- Paper Review: Fast Segment Anything (29 Jun 2023)
- Paper Review: Tracking Everything Everywhere All at Once (26 Jun 2023)
- Paper Review: Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale (23 Jun 2023)
- Paper Review: Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision (19 Jun 2023)
- Paper Review: Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (15 Jun 2023)
- Paper Review: BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks (12 Jun 2023)
- Paper Review: StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners (08 Jun 2023)
- Paper Review: The effectiveness of MAE pre-pretraining for billion-scale pretraining (05 Jun 2023)
- Paper Review: QLoRA: Efficient Finetuning of Quantized LLMs (01 Jun 2023)
- Paper Review: Chain of Hindsight Aligns Language Models with Feedback (30 May 2023)
- Paper Review: MMS: Scaling Speech Technology to 1000+ languages (25 May 2023)
- Paper Review: Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold (22 May 2023)
- Paper Review: DarkBERT: A Language Model for the Dark Side of the Internet (18 May 2023)
- Paper Review: NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers (15 May 2023)
- Paper Review: ImageBind: One Embedding Space To Bind Them All (10 May 2023)
- Paper Review: Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes (08 May 2023)
- Paper Review: Phoenix: Democratizing ChatGPT across Languages (04 May 2023)
- Paper Review: Scaling Transformer to 1M tokens and beyond with RMT (01 May 2023)
- Paper Review: Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations (27 Apr 2023)
- Paper Review: Generative Agents: Interactive Simulacra of Human Behavior (24 Apr 2023)
- Paper Review: DINOv2: Learning Robust Visual Features without Supervision (20 Apr 2023)
- Paper Review: InceptionNeXt: When Inception Meets ConvNeXt (17 Apr 2023)
- Paper Review: Segment Anything (08 Apr 2023)
- Paper Review: BloombergGPT: A Large Language Model for Finance (02 Apr 2023)
- Paper Review: ReBotNet: Fast Real-time Video Enhancement (27 Mar 2023)
- Paper Review: Hyena Hierarchy: Towards Larger Convolutional Language Models (20 Mar 2023)
- Paper Review: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models (13 Mar 2023)
- Paper Review: PaLM-E: An Embodied Multimodal Language Model (09 Mar 2023)
- Paper Review: In-Context Instruction Learning (06 Mar 2023)
- Paper Review: LLaMA: Open and Efficient Foundation Language Models (26 Feb 2023)
- Paper Review: Scaling Vision Transformers to 22 Billion Parameters (20 Feb 2023)
- Paper Review: Dual PatchNorm (13 Feb 2023)
- Paper Review: Cut and Learn for Unsupervised Object Detection and Instance Segmentation (06 Feb 2023)
- Paper Review: StyleGAN-T Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis (29 Jan 2023)
- Paper Review: Next-ViT Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios (24 Jul 2022)
- Paper Review: NL-Augmenter A Framework for Task-Sensitive Natural Language Augmentation (10 Dec 2021)
- Paper Review: NÜWA Visual Synthesis Pre-training for Neural visUal World creAtion (25 Nov 2021)
- Paper Review: Swin Transformer V2 Scaling Up Capacity and Resolution (19 Nov 2021)
- Paper Review: A Recipe For Arbitrary Text Style Transfer with Large Language Models (10 Oct 2021)
- Paper Review: SwinIR Image Restoration Using Swin Transformer (13 Sep 2021)
- Paper Review: Efficient Visual Pretraining with Contrastive Detection (01 Sep 2021)
- Paper Review: Domain-Aware Universal Style Transfer (15 Aug 2021)
- Paper Review: YOLOX Exceeding YOLO Series in 2021 (23 Jul 2021)
- Paper Review: Long-Short Transformer Efficient Transformers for Language and Vision (12 Jul 2021)
- Paper Review: Semi-Autoregressive Transformer for Image Captioning (18 Jun 2021)
- Paper Review: CoAtNet Marrying Convolution and Attention for All Data Sizes (10 Jun 2021)
- Paper Review: ByT5 Towards a token-free future with pre-trained byte-to-byte models (02 Jun 2021)
- Paper Review: Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence (21 May 2021)
- Paper Review: Are Pre-trained Convolutions Better than Pre-trained Transformers? (10 May 2021)
- Paper Review: MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding (04 May 2021)
- Paper Review: Generating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains (07 Apr 2021)
- Paper Review: EfficientNetV2: Smaller Models and Faster Training (02 Apr 2021)
- Paper Review: Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning (29 Mar 2021)
- Paper Review: LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval (21 Mar 2021)
- Paper Review: Real-World Super-Resolution of Face-Images from Surveillance Cameras (22 Feb 2021)
- Paper Review: ObjectAug: Object-level Data Augmentation for Semantic Image Segmentation (07 Feb 2021)
- Paper Review: JigsawGAN: Self-supervised Learning for Solving Jigsaw Puzzles with Generative Adversarial Networks (31 Jan 2021)
- Paper Review: Language-agnostic BERT Sentence Embedding (19 Aug 2020)
- Paper Review: Funnel Activation for Visual Recognition (28 Jul 2020)
- Paper Review: ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network (04 Jul 2020)
- Paper Review: VirTex: Learning Visual Representations from Textual Annotations (14 Jun 2020)
- Paper Review: Linformer: Self-Attention with Linear Complexity (10 Jun 2020)
- Paper Review: End-to-End Object Detection with Transformers (28 May 2020)
- Paper Review: SpERT Span-based Joint Entity and Relation Extraction with Transformer Pre-training (23 May 2020)
All tags
paperreview (149)
deeplearning (145)
cv (65)
nlp (51)
llm (39)
transformer (24)
sota (14)
pretraining (14)
blogpost (14)
imagesegmentation (12)
pytorch (9)
objectdetection (9)
career (7)
attention (7)
video (6)
datascience (6)
stablediffusion (5)
diffusion (5)
timeseries (4)
selfsupervised (4)
ner (4)
mllm (4)
gan (4)
yolo (3)
vlm (3)
styletransfer (3)
life (3)
kaggle (3)
imagegeneration (3)
imagecaptioning (3)
augmentation (3)
audio (3)
visual (2)
tts (2)
transferlearning (2)
superresolution (2)
sd (2)
recommender (2)
ranking (2)
qa (2)
languages (2)
languagemodel (2)
gpt (2)
gnn (2)
generation (2)
fewshotlearning (2)
dpo (2)
competition (2)
cnn (2)
classification (2)
agent (2)
weaksupervision (1)
unet (1)
textgeneration (1)
tensorflow (1)
tabular (1)
swa (1)
summarization (1)
speechtranslation (1)
speechtospeech (1)
speechrecognition (1)
speechgeneration (1)
sentenceembeddings (1)
semisupervised (1)
robustness (1)
robotics (1)
rnn (1)
relationextrction (1)
relationextraction (1)
reinforcementlearning (1)
recurrent (1)
recommendation (1)
realtime (1)
ragn (1)
rag (1)
quantization (1)
promptengineering (1)
objecttracking (1)
objectdetecion (1)
nlg (1)
nas (1)
multimodal (1)
motivation (1)
motiontracking (1)
mlp (1)
mentoring (1)
memoryoptimization (1)
mamba (1)
languagetranslation (1)
jigsaw (1)
instructlearning (1)
inferencespeed (1)
imagetextmatching (1)
imagerestoration (1)
imageinpainting (1)
graphneuralnets (1)
graph (1)
forecasting (1)
fail (1)
entitylinking (1)
endtoend (1)
efficiency (1)
distillation (1)
diffusionmodels (1)
depthestimation (1)
curriculumlreaning (1)
contrastivelearning (1)
coco (1)
clip (1)
chatbot (1)
captioning (1)
books (1)
bert (1)
autoencoder (1)
asr (1)
annotation (1)
anchorfree (1)
alignment (1)
advice (1)
adversarial (1)
activationfunction (1)