-
microsoft/Phi-4-mini-flash-reasoning
Text Generation • 4B • Updated • 1.05k • 279 -
microsoft/Phi-4-mini-reasoning
Text Generation • 4B • Updated • 57.7k • • 232 -
microsoft/Phi-4-reasoning
Text Generation • 15B • Updated • 9.15k • 227 -
microsoft/Phi-4-reasoning-plus
Text Generation • 15B • Updated • 26.4k • 343
Collections
Discover the best community collections!
Collections including paper arxiv:2504.21318
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 86 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 156 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 212 -
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Paper • 2508.14444 • Published • 51 -
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Paper • 2507.06261 • Published • 67 -
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 278
-
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
Paper • 2501.09751 • Published • 46 -
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 41 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 452 -
s1: Simple test-time scaling
Paper • 2501.19393 • Published • 126
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
EuroBERT: Scaling Multilingual Encoders for European Languages
Paper • 2503.05500 • Published • 81 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 452 -
Qwen2.5-1M Technical Report
Paper • 2501.15383 • Published • 73 -
Baichuan-Omni-1.5 Technical Report
Paper • 2501.15368 • Published • 61
-
microsoft/Phi-4-mini-flash-reasoning
Text Generation • 4B • Updated • 1.05k • 279 -
microsoft/Phi-4-mini-reasoning
Text Generation • 4B • Updated • 57.7k • • 232 -
microsoft/Phi-4-reasoning
Text Generation • 15B • Updated • 9.15k • 227 -
microsoft/Phi-4-reasoning-plus
Text Generation • 15B • Updated • 26.4k • 343
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 86 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 156 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 212 -
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Paper • 2508.14444 • Published • 51 -
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Paper • 2507.06261 • Published • 67 -
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 278
-
EuroBERT: Scaling Multilingual Encoders for European Languages
Paper • 2503.05500 • Published • 81 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 452 -
Qwen2.5-1M Technical Report
Paper • 2501.15383 • Published • 73 -
Baichuan-Omni-1.5 Technical Report
Paper • 2501.15368 • Published • 61
-
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
Paper • 2501.09751 • Published • 46 -
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 41 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 452 -
s1: Simple test-time scaling
Paper • 2501.19393 • Published • 126