Collections
Discover the best community collections!
Collections including paper arxiv:2205.13147
-
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
Paper • 2404.15420 • Published • 11 -
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 126 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 262 -
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
Paper • 2404.14047 • Published • 45
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 87 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 86 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 29
-
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Paper • 2306.00989 • Published • 1 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 66 -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 17 -
Matryoshka Representation Learning
Paper • 2205.13147 • Published • 27
-
All you need is a good init
Paper • 1511.06422 • Published • 1 -
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper • 2404.14507 • Published • 23 -
Efficient Transformer Encoders for Mask2Former-style models
Paper • 2404.15244 • Published • 1 -
Deep Residual Learning for Image Recognition
Paper • 1512.03385 • Published • 16
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 87 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 86 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 29
-
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Paper • 2306.00989 • Published • 1 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 66 -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 17 -
Matryoshka Representation Learning
Paper • 2205.13147 • Published • 27
-
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
Paper • 2404.15420 • Published • 11 -
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 126 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 262 -
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
Paper • 2404.14047 • Published • 45
-
All you need is a good init
Paper • 1511.06422 • Published • 1 -
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper • 2404.14507 • Published • 23 -
Efficient Transformer Encoders for Mask2Former-style models
Paper • 2404.15244 • Published • 1 -
Deep Residual Learning for Image Recognition
Paper • 1512.03385 • Published • 16