Tag: nlp – Andrey Lukyanenko

Apr 24, 2026

DeepSeek-V4 Review: Why Million-Token Context Needs Efficient Attention, Not Just Larger Windows

DeepSeek V4 pairs a hybrid sparse-attention stack with on-policy distillation across domain specialists to bring 1M-t...

paperreview deeplearning llm moe

Feb 23, 2026

Beyond Positional Bias: How DroPE Unlocks Zero-Shot Long Context in LLMs

A review of DroPE, a simple but counterintuitive method that extends LLM context length by dropping positional embedd...

paperreview deeplearning llm attention

Jan 26, 2026

Paper Review: mHC: Manifold-Constrained Hyper-Connections

My review of the paper mHC Manifold-Constrained Hyper-Connections

paperreview deeplearning architecture llm

Oct 27, 2025

Paper Review: The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

A biologically inspired LLM built as a graph of spiking neurons with Hebbian learning — it matches GPT-2 scaling whil...

paperreview deeplearning nlp llm

Oct 06, 2025

Paper Review: LongLive: Real-time Interactive Long Video Generation

My review of the paper LongLive Real-time Interactive Long Video Generation

paperreview deeplearning imagegeneration videogeneration

Sep 15, 2025

Paper Review: Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

My review of the paper Sharing is Caring Efficient LM Post-Training with Collective RL Experience Sharing

paperreview deeplearning nlp llm

Jun 09, 2025

Paper Review: Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Only ~20% of tokens actually matter when training LLMs to reason with RL. Updating the low-entropy majority actively ...

paperreview deeplearning llm rl

May 15, 2025

Paper Review: AlphaEvolve: A coding agent for scientific and algorithmic discovery

DeepMind's autonomous coding agent that evolves algorithms through LLM-driven iteration — it discovered the first imp...

paperreview deeplearning agent nlp

Apr 28, 2025

Paper Review: AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents

My review of the paper AgentA/B Automated and Scalable Web A/BTesting with Interactive LLM Agents

paperreview deeplearning agent nlp

Apr 21, 2025

Paper Review: M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

My review of the paper M1 Towards Scalable Test-Time Compute with Mamba Reasoning Models

paperreview deeplearning rnn distillation

Mar 24, 2025

Paper Review: RWKV-7 Goose with Expressive Dynamic State Evolution

My review of the paper RWKV-7 Goose with Expressive Dynamic State Evolution

paperreview deeplearning nlp rnn

Mar 17, 2025

Paper Review: Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities

My review of the paper Audio Flamingo 2 An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Ab...

paperreview deeplearning transformer nlp

Mar 10, 2025

Paper Review: Large Language Diffusion Models

LLaDA replaces autoregressive token generation with diffusion-based masked prediction, rivaling LLaMA3 8B while natur...

paperreview deeplearning nlp transformer

Mar 03, 2025

Paper Review: NeoBERT: A Next-Generation BERT

A compact 250M-parameter bidirectional encoder that incorporates RoPE, SwiGLU, and modern pretraining to outperform m...

paperreview deeplearning nlp transformer

Feb 03, 2025

Paper Review: Titans: Learning to Memorize at Test Time

A new architecture that pairs attention with a learnable long-term memory module, scaling to 2M+ tokens and outperfor...

paperreview deeplearning llm nlp

Jan 27, 2025

Paper Review: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

How pure reinforcement learning (without supervised fine-tuning) can teach LLMs to reason, producing open-source mode...

paperreview deeplearning llm rl

Jan 06, 2025

Paper Review: Training Large Language Models to Reason in a Continuous Latent Space

Coconut lets LLMs reason in latent space instead of generating text tokens, enabling breadth-first exploration of rea...

paperreview deeplearning nlp llm

Dec 23, 2024

Paper Review: Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

BERT rebuilt with modern tricks — 2 trillion training tokens, 8192 context length, Flash Attention, and rotary embedd...

paperreview deeplearning nlp transformer

Dec 16, 2024

Paper Review: Byte Latent Transformer: Patches Scale Better Than Tokens

My review of the paper Byte Latent Transformer Patches Scale Better Than Tokens

paperreview deeplearning nlp llm

Dec 09, 2024

Paper Review: Reverse Thinking Makes LLMs Stronger Reasoners

My review of the paper Reverse Thinking Makes LLMs Stronger Reasoners

paperreview deeplearning nlp llm

Nov 25, 2024

Paper Review: Project Sid: Many-agent simulations toward AI civilization

What happens when you put 1k AI agents in Minecraft and let them self-organize? They form governments, transmit cultu...

paperreview deeplearning nlp llm

Nov 11, 2024

Paper Review: Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

My review of the paper Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

paperreview deeplearning nlp llm

Oct 29, 2024

Paper Review: Unbounded: A Generative Infinite Game of Character Life Simulation

My review of the paper Unbounded A Generative Infinite Game of Character Life Simulation

paperreview deeplearning nlp llm

Oct 21, 2024

Paper Review: Contextual Document Embeddings

My review of the paper Contextual Document Embeddings

paperreview deeplearning transformer embedding

Oct 14, 2024

Paper Review: Differential Transformer

My review of the paper Differential Transformer

paperreview deeplearning transformer attention

Sep 23, 2024

Paper Review: Training Language Models to Self-Correct via Reinforcement Learning

My review of the paper Training Language Models to Self-Correct via Reinforcement Learning

paperreview deeplearning rl llm

Jun 17, 2024

Paper Review: Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

My review of the paper Samba Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

paperreview deeplearning swa nlp

Jun 10, 2024

Paper Review: σ-GPTs: A New Approach to Autoregressive Models

My review of the paper σ-GPTs A New Approach to Autoregressive Models

paperreview deeplearning nlp gpt

Nov 23, 2023

Paper Review: Orca 2: Teaching Small Language Models How to Reason

My review of the paper Orca 2 Teaching Small Language Models How to Reason

paperreview deeplearning nlp llm

Nov 20, 2023

Paper Review: Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models

My review of the paper Chain-of-Note Enhancing Robustness in Retrieval-Augmented Language Models

paperreview deeplearning nlp llm

Oct 30, 2023

Paper Review: Zephyr: Direct Distillation of LM Alignment

My review of the paper Zephyr Direct Distillation of LM Alignment

paperreview deeplearning nlp llm

Oct 26, 2023

Paper Review: Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

My review of the paper Monarch Mixer A Simple Sub-Quadratic GEMM-Based Architecture

paperreview deeplearning nlp cv

Oct 23, 2023

Paper Review: Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

My review of the paper Self-RAG Learning to Retrieve, Generate, and Critique through Self-Reflection

paperreview deeplearning llm nlp

Oct 19, 2023

Paper Review: PaLI-3 Vision Language Models: Smaller, Faster, Stronger

My review of the paper PaLI-3 Vision Language Models Smaller, Faster, Stronger

paperreview deeplearning llm vlm

Oct 16, 2023

Paper Review: InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining

My review of the paper InstructRetro Instruction Tuning post Retrieval-Augmented Pretraining

paperreview deeplearning llm nlp

Oct 12, 2023

Paper Review: Mistral 7B

My review of the paper Mistral 7B

paperreview deeplearning llm nlp

Oct 09, 2023

Paper Review: Think before you speak: Training Language Models With Pause Tokens

My review of the paper Think before you speak Training Language Models With Pause Tokens

paperreview deeplearning llm nlp

Oct 05, 2023

Paper Review: QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

My review of the paper QA-LoRA Quantization-Aware Low-Rank Adaptation of Large Language Models

paperreview deeplearning llm nlp

Sep 28, 2023

Paper Review: DreamLLM: Synergistic Multimodal Comprehension and Creation

My review of the paper DreamLLM Synergistic Multimodal Comprehension and Creation

paperreview deeplearning llm cv

Aug 28, 2023

Paper Review: Giraffe: Adventures in Expanding Context Lengths in LLMs

My review of the paper Giraffe Adventures in Expanding Context Lengths in LLMs

paperreview deeplearning nlp llm

Aug 24, 2023

Paper Review: OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents

My review of the paper OBELISC An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents

paperreview deeplearning nlp llm

Aug 10, 2023

Paper Review: Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

A systematic survey of what's broken in RLHF — from reward hacking to evaluation gaps — and what techniques can fix, ...

paperreview deeplearning nlp llm

Aug 10, 2023

Paper Review: UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition

My review of the paper UniversalNER Targeted Distillation from Large Language Models for Open Named Entity Recognition

paperreview deeplearning nlp llm

Aug 07, 2023

Paper Review: Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding

My review of the paper Skeleton-of-Thought Large Language Models Can Do Parallel Decoding

paperreview deeplearning nlp llm

Jul 27, 2023

Paper Review: Meta-Transformer: A Unified Framework for Multimodal Learning

My review of the paper Meta-Transformer A Unified Framework for Multimodal Learning

paperreview deeplearning nlp transformer

Jul 24, 2023

Paper Review: Retentive Network: A Successor to Transformer for Large Language Models

My review of the paper Retentive Network A Successor to Transformer for Large Language Models

paperreview deeplearning nlp transformer

Jul 20, 2023

Paper Review: Llama 2: Open Foundation and Fine-Tuned Chat Models

Meta's open-source LLM family (7B–70B parameters) with chat fine-tuning that matched or beat closed-source models on ...

paperreview deeplearning nlp finetuning

Jul 17, 2023

Paper Review: Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

My review of the paper Scaling Autoregressive Multi-Modal Models Pretraining and Instruction Tuning

paperreview deeplearning cv nlp

Jul 03, 2023

Paper Review: Multilingual End to End Entity Linking

My review of the paper Multilingual End to End Entity Linking

paperreview deeplearning nlp llm

Jun 19, 2023

Paper Review: Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

My review of the paper Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

paperreview deeplearning nlp llm

Jun 12, 2023

Paper Review: BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks

My review of the paper BiomedGPT A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, L...

paperreview deeplearning nlp gpt

Jun 08, 2023

Paper Review: StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners

My review of the paper StableRep Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners

paperreview deeplearning stablediffusion nlp

May 30, 2023

Paper Review: Chain of Hindsight Aligns Language Models with Feedback

My review of the paper Chain of Hindsight Aligns Language Models with Feedback

paperreview deeplearning nlp llm

May 18, 2023

Paper Review: DarkBERT: A Language Model for the Dark Side of the Internet

My review of the paper DarkBERT A Language Model for the Dark Side of the Internet

paperreview deeplearning nlp pretraining

May 10, 2023

Paper Review: ImageBind: One Embedding Space To Bind Them All

My review of the paper ImageBind One Embedding Space To Bind Them All

paperreview deeplearning nlp cv

May 08, 2023

Paper Review: Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

My review of the paper Distilling Step-by-Step Outperforming Larger Language Models with Less Training Data and Small...

paperreview deeplearning nlp distillation

May 04, 2023

Paper Review: Phoenix: Democratizing ChatGPT across Languages

My review of the paper Phoenix Democratizing ChatGPT across Languages

paperreview deeplearning nlp

Apr 24, 2023

Paper Review: Generative Agents: Interactive Simulacra of Human Behavior

My review of the paper Generative Agents Interactive Simulacra of Human Behavior

paperreview deeplearning nlp

Apr 02, 2023

Paper Review: BloombergGPT: A Large Language Model for Finance

Bloomberg trained a 50B-parameter LLM on 363B tokens of proprietary financial data. It crushes existing models on fin...

paperreview deeplearning nlp

Mar 20, 2023

Paper Review: Hyena Hierarchy: Towards Larger Convolutional Language Models

My review of the paper Hyena Hierarchy Towards Larger Convolutional Language Models

paperreview deeplearning nlp cv

Mar 13, 2023

Paper Review: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

My review of the paper Visual ChatGPT Talking, Drawing and Editing with Visual Foundation Models

paperreview deeplearning nlp transformer

Mar 09, 2023

Paper Review: PaLM-E: An Embodied Multimodal Language Model

My review of the paper PaLM-E An Embodied Multimodal Language Model

paperreview deeplearning nlp transformer

Mar 06, 2023

Paper Review: In-Context Instruction Learning

My review of the paper In-Context Instruction Learning

paperreview deeplearning nlp transformer

Feb 26, 2023

Paper Review: LLaMA: Open and Efficient Foundation Language Models

My review of the paper LLaMA Open and Efficient Foundation Language Models

paperreview deeplearning nlp transformer

Sep 09, 2022

Medical-chat bot: the history of our attempt to do it

A story how the project of developing a medical-chat bot was closed after a lot of efforts spent on it

blogpost nlp ner relationextraction

Dec 10, 2021

Paper Review: NL-Augmenter A Framework for Task-Sensitive Natural Language Augmentation

My review of the paper NL-Augmenter A Framework for Task-Sensitive Natural Language Augmentation and my contribution ...

paperreview deeplearning nlp augmentation

Oct 10, 2021

Paper Review: A Recipe For Arbitrary Text Style Transfer with Large Language Models

My review of the paper A Recipe For Arbitrary Text Style Transfer with Large Language Models

paperreview deeplearning nlp styletransfer

Jul 12, 2021

Paper Review: Long-Short Transformer Efficient Transformers for Language and Vision

My review of the paper Long-Short Transformer Efficient Transformers for Language and Vision

paperreview deeplearning cv nlp

Jun 02, 2021

Paper Review: ByT5 Towards a token-free future with pre-trained byte-to-byte models

My review of the paper ByT5 Towards a token-free future with pre-trained byte-to-byte models

paperreview deeplearning nlp pretraining

May 21, 2021

Paper Review: Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence

My review of the paper Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence

paperreview deeplearning nlp nlg

May 10, 2021

Paper Review: Are Pre-trained Convolutions Better than Pre-trained Transformers?

My review of the paper Are Pre-trained Convolutions Better than Pre-trained Transformers?

paperreview deeplearning nlp cnn

Mar 29, 2021

Paper Review: Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning

My review of the paper Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning.

paperreview nlp fewshotlearning augmentation

Aug 19, 2020

Paper Review: Language-agnostic BERT Sentence Embedding

My review of the paper Language-agnostic BERT Sentence Embedding.

paperreview deeplearning transformer nlp

May 23, 2020

Paper Review: SpERT Span-based Joint Entity and Relation Extraction with Transformer Pre-training

My review of the paper SpERT Span-based Joint Entity and Relation Extraction with Transformer Pre-training.

paperreview nlp deeplearning transformer

May 10, 2020

Paper Review: Named Entity Recognition without Labelled Data A Weak Supervision Approach

My review of the paper Named Entity Recognition without Labelled Data A Weak Supervision Approach.

paperreview nlp ner weaksupervision

Aug 09, 2019

Approaches to sentimental analysis on a small imbalanced dataset without Deep Learning

Let’s make logreg great again!

blogpost datascience nlp classification