Paper Review: NeoBERT: A Next-Generation BERT
03 March 2025
A compact 250M-parameter bidirectional encoder that incorporates RoPE, SwiGLU, and modern pretraining to outperform much larger models on the MTEB benchmark.
Data science, career and other topics
03 March 2025
A compact 250M-parameter bidirectional encoder that incorporates RoPE, SwiGLU, and modern pretraining to outperform much larger models on the MTEB benchmark.
24 February 2025
Google's upgraded vision-language encoders that add self-supervised learning and online data curation to SigLIP, delivering major gains in localization, dense prediction, and multilingual retrieval.
17 February 2025
My review of the paper Goku Flow Based Video Generative Foundation Models
03 February 2025
A new architecture that pairs attention with a learnable long-term memory module, scaling to 2M+ tokens and outperforming Transformers on language modeling, reasoning, genomics, and time series.
27 January 2025
How pure reinforcement learning (without supervised fine-tuning) can teach LLMs to reason, producing open-source models that rival OpenAI-o1 on math and coding benchmarks.
13 January 2025
My review of the paper STAR Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
Type at least 2 characters to search...