Tag: finetuning

Mar 16, 2026

Collaborative Reinforcement Learning: Why HACRL Trains Models in Teams Instead of Isolation

HACRL proposes a new paradigm for reinforcement learning - instead of training models in isolation, multiple agents c...

paperreview deeplearning rl llm

Dec 23, 2024

Paper Review: Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

BERT rebuilt with modern tricks — 2 trillion training tokens, 8192 context length, Flash Attention, and rotary embedd...

paperreview deeplearning nlp transformer

Nov 27, 2023

Paper Review: Diffusion Model Alignment Using Direct Preference Optimization

Adapting DPO from language models to image generation — training Stable Diffusion XL on 851K human preferences to sig...

paperreview deeplearning cv stablediffusion

Oct 30, 2023

Paper Review: Zephyr: Direct Distillation of LM Alignment

My review of the paper Zephyr Direct Distillation of LM Alignment

paperreview deeplearning nlp llm

Oct 16, 2023

Paper Review: InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining

My review of the paper InstructRetro Instruction Tuning post Retrieval-Augmented Pretraining

paperreview deeplearning llm nlp

Oct 05, 2023

Paper Review: QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

My review of the paper QA-LoRA Quantization-Aware Low-Rank Adaptation of Large Language Models

paperreview deeplearning llm nlp

Aug 10, 2023

Paper Review: Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

A systematic survey of what's broken in RLHF — from reward hacking to evaluation gaps — and what techniques can fix, ...

paperreview deeplearning nlp llm

Jul 20, 2023

Paper Review: Llama 2: Open Foundation and Fine-Tuned Chat Models

Meta's open-source LLM family (7B–70B parameters) with chat fine-tuning that matched or beat closed-source models on ...

paperreview deeplearning nlp finetuning

Jul 17, 2023

Paper Review: Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

My review of the paper Scaling Autoregressive Multi-Modal Models Pretraining and Instruction Tuning

paperreview deeplearning cv nlp

Jun 19, 2023

Paper Review: Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

My review of the paper Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

paperreview deeplearning nlp llm

Jun 01, 2023

Paper Review: QLoRA: Efficient Finetuning of Quantized LLMs

My review of the paper QLoRA Efficient Finetuning of Quantized LLMs

paperreview deeplearning finetuning optimization

May 30, 2023

Paper Review: Chain of Hindsight Aligns Language Models with Feedback

My review of the paper Chain of Hindsight Aligns Language Models with Feedback

paperreview deeplearning nlp llm