Tag: finetuning

12 posts

Mar 16, 2026
Collaborative Reinforcement Learning: Why HACRL Trains Models in Teams Instead of Isolation
HACRL proposes a new paradigm for reinforcement learning - instead of training models in isolation, multiple agents c...
paperreview deeplearning rl llm
Dec 23, 2024
Paper Review: Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
BERT rebuilt with modern tricks — 2 trillion training tokens, 8192 context length, Flash Attention, and rotary embedd...
paperreview deeplearning nlp transformer
Nov 27, 2023
Paper Review: Diffusion Model Alignment Using Direct Preference Optimization
Adapting DPO from language models to image generation — training Stable Diffusion XL on 851K human preferences to sig...
paperreview deeplearning cv stablediffusion
Oct 30, 2023
Paper Review: Zephyr: Direct Distillation of LM Alignment
My review of the paper Zephyr Direct Distillation of LM Alignment
paperreview deeplearning nlp llm
Oct 16, 2023
Paper Review: InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
My review of the paper InstructRetro Instruction Tuning post Retrieval-Augmented Pretraining
paperreview deeplearning llm nlp
Oct 05, 2023
Paper Review: QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
My review of the paper QA-LoRA Quantization-Aware Low-Rank Adaptation of Large Language Models
paperreview deeplearning llm nlp
Aug 10, 2023
Paper Review: Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
A systematic survey of what's broken in RLHF — from reward hacking to evaluation gaps — and what techniques can fix, ...
paperreview deeplearning nlp llm
Jul 20, 2023
Paper Review: Llama 2: Open Foundation and Fine-Tuned Chat Models
Meta's open-source LLM family (7B–70B parameters) with chat fine-tuning that matched or beat closed-source models on ...
paperreview deeplearning nlp finetuning
Jul 17, 2023
Paper Review: Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
My review of the paper Scaling Autoregressive Multi-Modal Models Pretraining and Instruction Tuning
paperreview deeplearning cv nlp
Jun 19, 2023
Paper Review: Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
My review of the paper Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
paperreview deeplearning nlp llm
Jun 01, 2023
Paper Review: QLoRA: Efficient Finetuning of Quantized LLMs
My review of the paper QLoRA Efficient Finetuning of Quantized LLMs
paperreview deeplearning finetuning optimization
May 30, 2023
Paper Review: Chain of Hindsight Aligns Language Models with Feedback
My review of the paper Chain of Hindsight Aligns Language Models with Feedback
paperreview deeplearning nlp llm

← All tags