Tag: finetuning
12 posts
Collaborative Reinforcement Learning: Why HACRL Trains Models in Teams Instead of Isolation
HACRL proposes a new paradigm for reinforcement learning - instead of training models in isolation, multiple agents c...
Paper Review: Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
BERT rebuilt with modern tricks — 2 trillion training tokens, 8192 context length, Flash Attention, and rotary embedd...
Paper Review: Diffusion Model Alignment Using Direct Preference Optimization
Adapting DPO from language models to image generation — training Stable Diffusion XL on 851K human preferences to sig...
Paper Review: Zephyr: Direct Distillation of LM Alignment
My review of the paper Zephyr Direct Distillation of LM Alignment
Paper Review: InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
My review of the paper InstructRetro Instruction Tuning post Retrieval-Augmented Pretraining
Paper Review: QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
My review of the paper QA-LoRA Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper Review: Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
A systematic survey of what's broken in RLHF — from reward hacking to evaluation gaps — and what techniques can fix, ...
Paper Review: Llama 2: Open Foundation and Fine-Tuned Chat Models
Meta's open-source LLM family (7B–70B parameters) with chat fine-tuning that matched or beat closed-source models on ...
Paper Review: Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
My review of the paper Scaling Autoregressive Multi-Modal Models Pretraining and Instruction Tuning
Paper Review: Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
My review of the paper Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Paper Review: QLoRA: Efficient Finetuning of Quantized LLMs
My review of the paper QLoRA Efficient Finetuning of Quantized LLMs
Paper Review: Chain of Hindsight Aligns Language Models with Feedback
My review of the paper Chain of Hindsight Aligns Language Models with Feedback