Tag: pretraining
17 posts
Beyond Positional Bias: How DroPE Unlocks Zero-Shot Long Context in LLMs
A review of DroPE, a simple but counterintuitive method that extends LLM context length by dropping positional embedd...
Paper Review: DINOv3
Meta's self-supervised vision model trained on 17 billion images, introducing Gram anchoring to prevent feature degra...
Paper Review: SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Google's upgraded vision-language encoders that add self-supervised learning and online data curation to SigLIP, deli...
Paper Review: CogVLM: Visual Expert for Pretrained Language Models
My review of the paper CogVLM Visual Expert for Pretrained Language Models
Paper Review: Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
Yann LeCun's I-JEPA learns semantic image representations by predicting masked patch features — no data augmentation ...
Paper Review: The effectiveness of MAE pre-pretraining for billion-scale pretraining
My review of the paper The effectiveness of MAE pre-pretraining for billion-scale pretraining
Paper Review: DarkBERT: A Language Model for the Dark Side of the Internet
My review of the paper DarkBERT A Language Model for the Dark Side of the Internet
Paper Review: DINOv2: Learning Robust Visual Features without Supervision
How Meta built all-purpose visual features by scaling self-supervised pretraining to a curated 142M-image dataset, pr...
Paper Review: NÜWA Visual Synthesis Pre-training for Neural visUal World creAtion
My review of the paper NÜWA Visual Synthesis Pre-training for Neural visUal World creAtion
Paper Review: Efficient Visual Pretraining with Contrastive Detection
My review of the paper Efficient Visual Pretraining with Contrastive Detection
Paper Review: CoAtNet Marrying Convolution and Attention for All Data Sizes
My review of the paper CoAtNet Marrying Convolution and Attention for All Data Sizes
Paper Review: ByT5 Towards a token-free future with pre-trained byte-to-byte models
My review of the paper ByT5 Towards a token-free future with pre-trained byte-to-byte models
Paper Review: Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence
My review of the paper Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence
Paper Review: Are Pre-trained Convolutions Better than Pre-trained Transformers?
My review of the paper Are Pre-trained Convolutions Better than Pre-trained Transformers?
Paper Review: LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval
My review of the paper LightningDOT Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval.
Paper Review: ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network
My review of the paper ReXNet Diminishing Representational Bottleneck on Convolutional Neural Network.
Paper Review: VirTex: Learning Visual Representations from Textual Annotations
My review of the paper VirTex Learning Visual Representations from Textual Annotations.