Tag: nlp
75 posts
Beyond Positional Bias: How DroPE Unlocks Zero-Shot Long Context in LLMs
A review of DroPE, a simple but counterintuitive method that extends LLM context length by dropping positional embedd...
Paper Review: mHC: Manifold-Constrained Hyper-Connections
My review of the paper mHC Manifold-Constrained Hyper-Connections
Paper Review: The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
A biologically inspired LLM built as a graph of spiking neurons with Hebbian learning — it matches GPT-2 scaling whil...
Paper Review: LongLive: Real-time Interactive Long Video Generation
My review of the paper LongLive Real-time Interactive Long Video Generation
Paper Review: Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
My review of the paper Sharing is Caring Efficient LM Post-Training with Collective RL Experience Sharing
Paper Review: Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Only ~20% of tokens actually matter when training LLMs to reason with RL. Updating the low-entropy majority actively ...
Paper Review: AlphaEvolve: A coding agent for scientific and algorithmic discovery
DeepMind's autonomous coding agent that evolves algorithms through LLM-driven iteration — it discovered the first imp...
Paper Review: AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents
My review of the paper AgentA/B Automated and Scalable Web A/BTesting with Interactive LLM Agents
Paper Review: M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
My review of the paper M1 Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper Review: RWKV-7 Goose with Expressive Dynamic State Evolution
My review of the paper RWKV-7 Goose with Expressive Dynamic State Evolution
Paper Review: Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
My review of the paper Audio Flamingo 2 An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Ab...
Paper Review: Large Language Diffusion Models
LLaDA replaces autoregressive token generation with diffusion-based masked prediction, rivaling LLaMA3 8B while natur...
Paper Review: NeoBERT: A Next-Generation BERT
A compact 250M-parameter bidirectional encoder that incorporates RoPE, SwiGLU, and modern pretraining to outperform m...
Paper Review: Titans: Learning to Memorize at Test Time
A new architecture that pairs attention with a learnable long-term memory module, scaling to 2M+ tokens and outperfor...
Paper Review: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
How pure reinforcement learning (without supervised fine-tuning) can teach LLMs to reason, producing open-source mode...
Paper Review: Training Large Language Models to Reason in a Continuous Latent Space
Coconut lets LLMs reason in latent space instead of generating text tokens, enabling breadth-first exploration of rea...
Paper Review: Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
BERT rebuilt with modern tricks — 2 trillion training tokens, 8192 context length, Flash Attention, and rotary embedd...
Paper Review: Byte Latent Transformer: Patches Scale Better Than Tokens
My review of the paper Byte Latent Transformer Patches Scale Better Than Tokens
Paper Review: Reverse Thinking Makes LLMs Stronger Reasoners
My review of the paper Reverse Thinking Makes LLMs Stronger Reasoners
Paper Review: Project Sid: Many-agent simulations toward AI civilization
What happens when you put 1k AI agents in Minecraft and let them self-organize? They form governments, transmit cultu...
Paper Review: Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
My review of the paper Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper Review: Unbounded: A Generative Infinite Game of Character Life Simulation
My review of the paper Unbounded A Generative Infinite Game of Character Life Simulation
Paper Review: Contextual Document Embeddings
My review of the paper Contextual Document Embeddings
Paper Review: Differential Transformer
My review of the paper Differential Transformer
Paper Review: Training Language Models to Self-Correct via Reinforcement Learning
My review of the paper Training Language Models to Self-Correct via Reinforcement Learning
Paper Review: Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
My review of the paper Samba Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Paper Review: σ-GPTs: A New Approach to Autoregressive Models
My review of the paper σ-GPTs A New Approach to Autoregressive Models
Paper Review: Orca 2: Teaching Small Language Models How to Reason
My review of the paper Orca 2 Teaching Small Language Models How to Reason
Paper Review: Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models
My review of the paper Chain-of-Note Enhancing Robustness in Retrieval-Augmented Language Models
Paper Review: Zephyr: Direct Distillation of LM Alignment
My review of the paper Zephyr Direct Distillation of LM Alignment
Paper Review: Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
My review of the paper Monarch Mixer A Simple Sub-Quadratic GEMM-Based Architecture
Paper Review: Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
My review of the paper Self-RAG Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper Review: PaLI-3 Vision Language Models: Smaller, Faster, Stronger
My review of the paper PaLI-3 Vision Language Models Smaller, Faster, Stronger
Paper Review: InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
My review of the paper InstructRetro Instruction Tuning post Retrieval-Augmented Pretraining
Paper Review: Mistral 7B
My review of the paper Mistral 7B
Paper Review: Think before you speak: Training Language Models With Pause Tokens
My review of the paper Think before you speak Training Language Models With Pause Tokens
Paper Review: QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
My review of the paper QA-LoRA Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper Review: DreamLLM: Synergistic Multimodal Comprehension and Creation
My review of the paper DreamLLM Synergistic Multimodal Comprehension and Creation
Paper Review: Giraffe: Adventures in Expanding Context Lengths in LLMs
My review of the paper Giraffe Adventures in Expanding Context Lengths in LLMs
Paper Review: OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
My review of the paper OBELISC An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Paper Review: Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
A systematic survey of what's broken in RLHF — from reward hacking to evaluation gaps — and what techniques can fix, ...
Paper Review: UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition
My review of the paper UniversalNER Targeted Distillation from Large Language Models for Open Named Entity Recognition
Paper Review: Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
My review of the paper Skeleton-of-Thought Large Language Models Can Do Parallel Decoding
Paper Review: Meta-Transformer: A Unified Framework for Multimodal Learning
My review of the paper Meta-Transformer A Unified Framework for Multimodal Learning
Paper Review: Retentive Network: A Successor to Transformer for Large Language Models
My review of the paper Retentive Network A Successor to Transformer for Large Language Models
Paper Review: Llama 2: Open Foundation and Fine-Tuned Chat Models
Meta's open-source LLM family (7B–70B parameters) with chat fine-tuning that matched or beat closed-source models on ...
Paper Review: Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
My review of the paper Scaling Autoregressive Multi-Modal Models Pretraining and Instruction Tuning
Paper Review: Multilingual End to End Entity Linking
My review of the paper Multilingual End to End Entity Linking
Paper Review: Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
My review of the paper Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Paper Review: BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks
My review of the paper BiomedGPT A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, L...
Paper Review: StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
My review of the paper StableRep Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
Paper Review: Chain of Hindsight Aligns Language Models with Feedback
My review of the paper Chain of Hindsight Aligns Language Models with Feedback
Paper Review: DarkBERT: A Language Model for the Dark Side of the Internet
My review of the paper DarkBERT A Language Model for the Dark Side of the Internet
Paper Review: ImageBind: One Embedding Space To Bind Them All
My review of the paper ImageBind One Embedding Space To Bind Them All
Paper Review: Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
My review of the paper Distilling Step-by-Step Outperforming Larger Language Models with Less Training Data and Small...
Paper Review: Phoenix: Democratizing ChatGPT across Languages
My review of the paper Phoenix Democratizing ChatGPT across Languages
Paper Review: Generative Agents: Interactive Simulacra of Human Behavior
My review of the paper Generative Agents Interactive Simulacra of Human Behavior
Paper Review: BloombergGPT: A Large Language Model for Finance
Bloomberg trained a 50B-parameter LLM on 363B tokens of proprietary financial data. It crushes existing models on fin...
Paper Review: Hyena Hierarchy: Towards Larger Convolutional Language Models
My review of the paper Hyena Hierarchy Towards Larger Convolutional Language Models
Paper Review: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
My review of the paper Visual ChatGPT Talking, Drawing and Editing with Visual Foundation Models
Paper Review: PaLM-E: An Embodied Multimodal Language Model
My review of the paper PaLM-E An Embodied Multimodal Language Model
Paper Review: In-Context Instruction Learning
My review of the paper In-Context Instruction Learning
Paper Review: LLaMA: Open and Efficient Foundation Language Models
My review of the paper LLaMA Open and Efficient Foundation Language Models
Medical-chat bot: the history of our attempt to do it
A story how the project of developing a medical-chat bot was closed after a lot of efforts spent on it
Paper Review: NL-Augmenter A Framework for Task-Sensitive Natural Language Augmentation
My review of the paper NL-Augmenter A Framework for Task-Sensitive Natural Language Augmentation and my contribution ...
Paper Review: A Recipe For Arbitrary Text Style Transfer with Large Language Models
My review of the paper A Recipe For Arbitrary Text Style Transfer with Large Language Models
Paper Review: Long-Short Transformer Efficient Transformers for Language and Vision
My review of the paper Long-Short Transformer Efficient Transformers for Language and Vision
Paper Review: ByT5 Towards a token-free future with pre-trained byte-to-byte models
My review of the paper ByT5 Towards a token-free future with pre-trained byte-to-byte models
Paper Review: Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence
My review of the paper Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence
Paper Review: Are Pre-trained Convolutions Better than Pre-trained Transformers?
My review of the paper Are Pre-trained Convolutions Better than Pre-trained Transformers?
Paper Review: Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning
My review of the paper Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning.
Paper Review: Language-agnostic BERT Sentence Embedding
My review of the paper Language-agnostic BERT Sentence Embedding.
Paper Review: SpERT Span-based Joint Entity and Relation Extraction with Transformer Pre-training
My review of the paper SpERT Span-based Joint Entity and Relation Extraction with Transformer Pre-training.
Paper Review: Named Entity Recognition without Labelled Data A Weak Supervision Approach
My review of the paper Named Entity Recognition without Labelled Data A Weak Supervision Approach.
Approaches to sentimental analysis on a small imbalanced dataset without Deep Learning
Let’s make logreg great again!