Paper Review: Large Language Diffusion Models
10 March 2025
LLaDA replaces autoregressive token generation with diffusion-based masked prediction, rivaling LLaMA3 8B while naturally solving the reversal curse that plagues standard LLMs.
Data science, career and other topics
10 March 2025
LLaDA replaces autoregressive token generation with diffusion-based masked prediction, rivaling LLaMA3 8B while naturally solving the reversal curse that plagues standard LLMs.
05 March 2025
Two Years of Studying and Practicing Foreign Languages. Spanish, German, and Japanese
03 March 2025
A compact 250M-parameter bidirectional encoder that incorporates RoPE, SwiGLU, and modern pretraining to outperform much larger models on the MTEB benchmark.
24 February 2025
Google's upgraded vision-language encoders that add self-supervised learning and online data curation to SigLIP, delivering major gains in localization, dense prediction, and multilingual retrieval.
17 February 2025
My review of the paper Goku Flow Based Video Generative Foundation Models
03 February 2025
A new architecture that pairs attention with a learnable long-term memory module, scaling to 2M+ tokens and outperforming Transformers on language modeling, reasoning, genomics, and time series.
Type at least 2 characters to search...