Tag: bert
3 posts
Paper Review: NeoBERT: A Next-Generation BERT
A compact 250M-parameter bidirectional encoder that incorporates RoPE, SwiGLU, and modern pretraining to outperform m...
Paper Review: Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
BERT rebuilt with modern tricks — 2 trillion training tokens, 8192 context length, Flash Attention, and rotary embedd...
Paper Review: SpERT Span-based Joint Entity and Relation Extraction with Transformer Pre-training
My review of the paper SpERT Span-based Joint Entity and Relation Extraction with Transformer Pre-training.