Tag: nlp

75 posts

Feb 23, 2026
Beyond Positional Bias: How DroPE Unlocks Zero-Shot Long Context in LLMs
A review of DroPE, a simple but counterintuitive method that extends LLM context length by dropping positional embedd...
paperreview deeplearning llm attention
Jan 26, 2026
Paper Review: mHC: Manifold-Constrained Hyper-Connections
My review of the paper mHC Manifold-Constrained Hyper-Connections
paperreview deeplearning architecture llm
Oct 27, 2025
Paper Review: The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
A biologically inspired LLM built as a graph of spiking neurons with Hebbian learning — it matches GPT-2 scaling whil...
paperreview deeplearning nlp llm
Oct 06, 2025
Paper Review: LongLive: Real-time Interactive Long Video Generation
My review of the paper LongLive Real-time Interactive Long Video Generation
paperreview deeplearning imagegeneration videogeneration
Sep 15, 2025
Paper Review: Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
My review of the paper Sharing is Caring Efficient LM Post-Training with Collective RL Experience Sharing
paperreview deeplearning nlp llm
Jun 09, 2025
Paper Review: Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Only ~20% of tokens actually matter when training LLMs to reason with RL. Updating the low-entropy majority actively ...
paperreview deeplearning llm rl
May 15, 2025
Paper Review: AlphaEvolve: A coding agent for scientific and algorithmic discovery
DeepMind's autonomous coding agent that evolves algorithms through LLM-driven iteration — it discovered the first imp...
paperreview deeplearning agent nlp
Apr 28, 2025
Paper Review: AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents
My review of the paper AgentA/B Automated and Scalable Web A/BTesting with Interactive LLM Agents
paperreview deeplearning agent nlp
Apr 21, 2025
Paper Review: M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
My review of the paper M1 Towards Scalable Test-Time Compute with Mamba Reasoning Models
paperreview deeplearning rnn distillation
Mar 24, 2025
Paper Review: RWKV-7 Goose with Expressive Dynamic State Evolution
My review of the paper RWKV-7 Goose with Expressive Dynamic State Evolution
paperreview deeplearning nlp rnn
Mar 17, 2025
Paper Review: Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
My review of the paper Audio Flamingo 2 An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Ab...
paperreview deeplearning transformer nlp
Mar 10, 2025
Paper Review: Large Language Diffusion Models
LLaDA replaces autoregressive token generation with diffusion-based masked prediction, rivaling LLaMA3 8B while natur...
paperreview deeplearning nlp transformer
Mar 03, 2025
Paper Review: NeoBERT: A Next-Generation BERT
A compact 250M-parameter bidirectional encoder that incorporates RoPE, SwiGLU, and modern pretraining to outperform m...
paperreview deeplearning nlp transformer
Feb 03, 2025
Paper Review: Titans: Learning to Memorize at Test Time
A new architecture that pairs attention with a learnable long-term memory module, scaling to 2M+ tokens and outperfor...
paperreview deeplearning llm nlp
Jan 27, 2025
Paper Review: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
How pure reinforcement learning (without supervised fine-tuning) can teach LLMs to reason, producing open-source mode...
paperreview deeplearning llm rl
Jan 06, 2025
Paper Review: Training Large Language Models to Reason in a Continuous Latent Space
Coconut lets LLMs reason in latent space instead of generating text tokens, enabling breadth-first exploration of rea...
paperreview deeplearning nlp llm
Dec 23, 2024
Paper Review: Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
BERT rebuilt with modern tricks — 2 trillion training tokens, 8192 context length, Flash Attention, and rotary embedd...
paperreview deeplearning nlp transformer
Dec 16, 2024
Paper Review: Byte Latent Transformer: Patches Scale Better Than Tokens
My review of the paper Byte Latent Transformer Patches Scale Better Than Tokens
paperreview deeplearning nlp llm
Dec 09, 2024
Paper Review: Reverse Thinking Makes LLMs Stronger Reasoners
My review of the paper Reverse Thinking Makes LLMs Stronger Reasoners
paperreview deeplearning nlp llm
Nov 25, 2024
Paper Review: Project Sid: Many-agent simulations toward AI civilization
What happens when you put 1k AI agents in Minecraft and let them self-organize? They form governments, transmit cultu...
paperreview deeplearning nlp llm
Nov 11, 2024
Paper Review: Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
My review of the paper Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
paperreview deeplearning nlp llm
Oct 29, 2024
Paper Review: Unbounded: A Generative Infinite Game of Character Life Simulation
My review of the paper Unbounded A Generative Infinite Game of Character Life Simulation
paperreview deeplearning nlp llm
Oct 21, 2024
Paper Review: Contextual Document Embeddings
My review of the paper Contextual Document Embeddings
paperreview deeplearning transformer embedding
Oct 14, 2024
Paper Review: Differential Transformer
My review of the paper Differential Transformer
paperreview deeplearning transformer attention
Sep 23, 2024
Paper Review: Training Language Models to Self-Correct via Reinforcement Learning
My review of the paper Training Language Models to Self-Correct via Reinforcement Learning
paperreview deeplearning rl llm
Jun 17, 2024
Paper Review: Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
My review of the paper Samba Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
paperreview deeplearning swa nlp
Jun 10, 2024
Paper Review: σ-GPTs: A New Approach to Autoregressive Models
My review of the paper σ-GPTs A New Approach to Autoregressive Models
paperreview deeplearning nlp gpt
Nov 23, 2023
Paper Review: Orca 2: Teaching Small Language Models How to Reason
My review of the paper Orca 2 Teaching Small Language Models How to Reason
paperreview deeplearning nlp llm
Nov 20, 2023
Paper Review: Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models
My review of the paper Chain-of-Note Enhancing Robustness in Retrieval-Augmented Language Models
paperreview deeplearning nlp llm
Oct 30, 2023
Paper Review: Zephyr: Direct Distillation of LM Alignment
My review of the paper Zephyr Direct Distillation of LM Alignment
paperreview deeplearning nlp llm
Oct 26, 2023
Paper Review: Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
My review of the paper Monarch Mixer A Simple Sub-Quadratic GEMM-Based Architecture
paperreview deeplearning nlp cv
Oct 23, 2023
Paper Review: Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
My review of the paper Self-RAG Learning to Retrieve, Generate, and Critique through Self-Reflection
paperreview deeplearning llm nlp
Oct 19, 2023
Paper Review: PaLI-3 Vision Language Models: Smaller, Faster, Stronger
My review of the paper PaLI-3 Vision Language Models Smaller, Faster, Stronger
paperreview deeplearning llm vlm
Oct 16, 2023
Paper Review: InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
My review of the paper InstructRetro Instruction Tuning post Retrieval-Augmented Pretraining
paperreview deeplearning llm nlp
Oct 12, 2023
Paper Review: Mistral 7B
My review of the paper Mistral 7B
paperreview deeplearning llm nlp
Oct 09, 2023
Paper Review: Think before you speak: Training Language Models With Pause Tokens
My review of the paper Think before you speak Training Language Models With Pause Tokens
paperreview deeplearning llm nlp
Oct 05, 2023
Paper Review: QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
My review of the paper QA-LoRA Quantization-Aware Low-Rank Adaptation of Large Language Models
paperreview deeplearning llm nlp
Sep 28, 2023
Paper Review: DreamLLM: Synergistic Multimodal Comprehension and Creation
My review of the paper DreamLLM Synergistic Multimodal Comprehension and Creation
paperreview deeplearning llm cv
Aug 28, 2023
Paper Review: Giraffe: Adventures in Expanding Context Lengths in LLMs
My review of the paper Giraffe Adventures in Expanding Context Lengths in LLMs
paperreview deeplearning nlp llm
Aug 24, 2023
Paper Review: OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
My review of the paper OBELISC An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
paperreview deeplearning nlp llm
Aug 10, 2023
Paper Review: Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
A systematic survey of what's broken in RLHF — from reward hacking to evaluation gaps — and what techniques can fix, ...
paperreview deeplearning nlp llm
Aug 10, 2023
Paper Review: UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition
My review of the paper UniversalNER Targeted Distillation from Large Language Models for Open Named Entity Recognition
paperreview deeplearning nlp llm
Aug 07, 2023
Paper Review: Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
My review of the paper Skeleton-of-Thought Large Language Models Can Do Parallel Decoding
paperreview deeplearning nlp llm
Jul 27, 2023
Paper Review: Meta-Transformer: A Unified Framework for Multimodal Learning
My review of the paper Meta-Transformer A Unified Framework for Multimodal Learning
paperreview deeplearning nlp transformer
Jul 24, 2023
Paper Review: Retentive Network: A Successor to Transformer for Large Language Models
My review of the paper Retentive Network A Successor to Transformer for Large Language Models
paperreview deeplearning nlp transformer
Jul 20, 2023
Paper Review: Llama 2: Open Foundation and Fine-Tuned Chat Models
Meta's open-source LLM family (7B–70B parameters) with chat fine-tuning that matched or beat closed-source models on ...
paperreview deeplearning nlp finetuning
Jul 17, 2023
Paper Review: Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
My review of the paper Scaling Autoregressive Multi-Modal Models Pretraining and Instruction Tuning
paperreview deeplearning cv nlp
Jul 03, 2023
Paper Review: Multilingual End to End Entity Linking
My review of the paper Multilingual End to End Entity Linking
paperreview deeplearning nlp llm
Jun 19, 2023
Paper Review: Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
My review of the paper Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
paperreview deeplearning nlp llm
Jun 12, 2023
Paper Review: BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks
My review of the paper BiomedGPT A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, L...
paperreview deeplearning nlp gpt
Jun 08, 2023
Paper Review: StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
My review of the paper StableRep Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
paperreview deeplearning stablediffusion nlp
May 30, 2023
Paper Review: Chain of Hindsight Aligns Language Models with Feedback
My review of the paper Chain of Hindsight Aligns Language Models with Feedback
paperreview deeplearning nlp llm
May 18, 2023
Paper Review: DarkBERT: A Language Model for the Dark Side of the Internet
My review of the paper DarkBERT A Language Model for the Dark Side of the Internet
paperreview deeplearning nlp pretraining
May 10, 2023
Paper Review: ImageBind: One Embedding Space To Bind Them All
My review of the paper ImageBind One Embedding Space To Bind Them All
paperreview deeplearning nlp cv
May 08, 2023
Paper Review: Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
My review of the paper Distilling Step-by-Step Outperforming Larger Language Models with Less Training Data and Small...
paperreview deeplearning nlp distillation
May 04, 2023
Paper Review: Phoenix: Democratizing ChatGPT across Languages
My review of the paper Phoenix Democratizing ChatGPT across Languages
paperreview deeplearning nlp
Apr 24, 2023
Paper Review: Generative Agents: Interactive Simulacra of Human Behavior
My review of the paper Generative Agents Interactive Simulacra of Human Behavior
paperreview deeplearning nlp
Apr 02, 2023
Paper Review: BloombergGPT: A Large Language Model for Finance
Bloomberg trained a 50B-parameter LLM on 363B tokens of proprietary financial data. It crushes existing models on fin...
paperreview deeplearning nlp
Mar 20, 2023
Paper Review: Hyena Hierarchy: Towards Larger Convolutional Language Models
My review of the paper Hyena Hierarchy Towards Larger Convolutional Language Models
paperreview deeplearning nlp cv
Mar 13, 2023
Paper Review: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
My review of the paper Visual ChatGPT Talking, Drawing and Editing with Visual Foundation Models
paperreview deeplearning nlp transformer
Mar 09, 2023
Paper Review: PaLM-E: An Embodied Multimodal Language Model
My review of the paper PaLM-E An Embodied Multimodal Language Model
paperreview deeplearning nlp transformer
Mar 06, 2023
Paper Review: In-Context Instruction Learning
My review of the paper In-Context Instruction Learning
paperreview deeplearning nlp transformer
Feb 26, 2023
Paper Review: LLaMA: Open and Efficient Foundation Language Models
My review of the paper LLaMA Open and Efficient Foundation Language Models
paperreview deeplearning nlp transformer
Sep 09, 2022
Medical-chat bot: the history of our attempt to do it
A story how the project of developing a medical-chat bot was closed after a lot of efforts spent on it
blogpost nlp ner relationextraction
Dec 10, 2021
Paper Review: NL-Augmenter A Framework for Task-Sensitive Natural Language Augmentation
My review of the paper NL-Augmenter A Framework for Task-Sensitive Natural Language Augmentation and my contribution ...
paperreview deeplearning nlp augmentation
Oct 10, 2021
Paper Review: A Recipe For Arbitrary Text Style Transfer with Large Language Models
My review of the paper A Recipe For Arbitrary Text Style Transfer with Large Language Models
paperreview deeplearning nlp styletransfer
Jul 12, 2021
Paper Review: Long-Short Transformer Efficient Transformers for Language and Vision
My review of the paper Long-Short Transformer Efficient Transformers for Language and Vision
paperreview deeplearning cv nlp
Jun 02, 2021
Paper Review: ByT5 Towards a token-free future with pre-trained byte-to-byte models
My review of the paper ByT5 Towards a token-free future with pre-trained byte-to-byte models
paperreview deeplearning nlp pretraining
May 21, 2021
Paper Review: Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence
My review of the paper Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence
paperreview deeplearning nlp nlg
May 10, 2021
Paper Review: Are Pre-trained Convolutions Better than Pre-trained Transformers?
My review of the paper Are Pre-trained Convolutions Better than Pre-trained Transformers?
paperreview deeplearning nlp cnn
Mar 29, 2021
Paper Review: Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning
My review of the paper Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning.
paperreview nlp fewshotlearning augmentation
Aug 19, 2020
Paper Review: Language-agnostic BERT Sentence Embedding
My review of the paper Language-agnostic BERT Sentence Embedding.
paperreview deeplearning transformer nlp
May 23, 2020
Paper Review: SpERT Span-based Joint Entity and Relation Extraction with Transformer Pre-training
My review of the paper SpERT Span-based Joint Entity and Relation Extraction with Transformer Pre-training.
paperreview nlp deeplearning transformer
May 10, 2020
Paper Review: Named Entity Recognition without Labelled Data A Weak Supervision Approach
My review of the paper Named Entity Recognition without Labelled Data A Weak Supervision Approach.
paperreview nlp ner weaksupervision
Aug 09, 2019
Approaches to sentimental analysis on a small imbalanced dataset without Deep Learning
Let’s make logreg great again!
blogpost datascience nlp classification

← All tags