Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Deqing Fu
PRO
deqing
12
21
10
Follow
kramp's profile picture
Mi6paulino's profile picture
dark-pen's profile picture
14 followers
·
19 following
https://deqingfu.github.io
DeqingFu
DeqingFu
AI & ML interests
None yet
Recent Activity
authored
a paper
about 20 hours ago
Value-Aware Stochastic KV Cache Eviction for Reasoning Models
liked
a model
1 day ago
google/tabfm-1.0.0-pytorch
updated
a model
27 days ago
deqing/convergent-llama-300M-muon-6digit-addition_6digit_custom6
View all activity
Organizations
deqing
's models
158
Sort: Recently updated
deqing/convergent-llama-300M-adamw-swap_numbers
Text Generation
•
0.3B
•
Updated
Mar 29
•
8
deqing/convergent-llama-300M-adamw-isolate
Text Generation
•
0.3B
•
Updated
Mar 29
•
7
deqing/convergent-llama-300M-adamw-unigram
Text Generation
•
0.3B
•
Updated
Mar 29
•
9
deqing/convergent-mamba2-300M-muon-original
Text Generation
•
0.3B
•
Updated
Mar 29
•
11
deqing/llama-window-4-old
Text Generation
•
0.3B
•
Updated
Mar 29
•
8
deqing/llama-window-2-old
Text Generation
•
0.3B
•
Updated
Mar 29
•
9
deqing/convergent-llama-300M-muon-unk_number
Text Generation
•
0.3B
•
Updated
Mar 29
•
6
deqing/convergent-llama-300M-muon-swap_numbers
Text Generation
•
0.3B
•
Updated
Mar 29
•
8
deqing/llama-isolate-old
Text Generation
•
0.3B
•
Updated
Mar 29
•
8
deqing/convergent-llama-300M-muon-fivegram
Text Generation
•
0.3B
•
Updated
Mar 29
•
8
deqing/convergent-llama-300M-muon-permute
Text Generation
•
0.3B
•
Updated
Mar 29
•
6
deqing/convergent-llama-300M-muon-bigram
Text Generation
•
0.3B
•
Updated
Mar 29
•
8
deqing/convergent-llama-300M-muon-unigram
Text Generation
•
0.3B
•
Updated
Mar 29
•
9
deqing/mamba2-300M-v5-mamba2
Text Generation
•
0.3B
•
Updated
Mar 29
•
20
deqing/lstm-12layer-v5
0.2B
•
Updated
Mar 29
•
6
deqing/llama-300M-v5-original
Text Generation
•
0.3B
•
Updated
Mar 27
•
8
deqing/llama-300M-v5-unk_number
Text Generation
•
0.3B
•
Updated
Mar 26
•
9
deqing/llama-300M-v5-addition_3digit_adamw
0.3B
•
Updated
Mar 25
•
3
deqing/llama-300M-v5-addition_3digit
0.3B
•
Updated
Mar 25
•
3
deqing/llama-300M-v5-addition
Text Generation
•
0.3B
•
Updated
Mar 25
•
10
deqing/llama-300M-v5-addition_adamw
Text Generation
•
0.3B
•
Updated
Mar 24
•
9
deqing/llama-300M-v5-addition_adamw-old
0.3B
•
Updated
Mar 22
•
2
deqing/llama-300M-v5-addition_3digit-old
0.3B
•
Updated
Mar 22
•
2
deqing/llama-300M-v5-adamw-addition_3digit_adamw-old
0.3B
•
Updated
Mar 22
•
2
deqing/llama-300M-v5-original-random_init_sft
Updated
Mar 21
deqing/llama-300M-v5-isolate_sft
Updated
Mar 21
deqing/llama-300M-v5-swap_numbers_sft
Updated
Mar 21
deqing/llama-300M-v5-addition-old
0.3B
•
Updated
Mar 21
•
2
deqing/llama-300M-v5-original_sft
Updated
Mar 20
deqing/llama-300M-v5-bigram
Text Generation
•
0.3B
•
Updated
Mar 20
•
8
Previous
1
2
3
4
5
6
Next