-
Attention Is All You Need
Paper • 1706.03762 • Published • 125 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 29 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 10 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 23
Taufiq Dwi Purnomo
taufiqdp
AI & ML interests
SLM, VLM
Recent Activity
upvoted a collection 1 day ago
Gemma 4 QAT Q4_0 liked a model 2 days ago
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16 liked a model 4 days ago
google/gemma-4-12B-it