Deqing Fu PRO

deqing

·

https://deqingfu.github.io

AI & ML interests

None yet

Recent Activity

authored a paper about 20 hours ago

Value-Aware Stochastic KV Cache Eviction for Reasoning Models

liked a model 1 day ago

google/tabfm-1.0.0-pytorch

updated a model 27 days ago

deqing/convergent-llama-300M-muon-6digit-addition_6digit_custom6

View all activity

Organizations

deqing 's models 158

deqing/convergent-llama-300M-adamw-swap_numbers

Text Generation • 0.3B • Updated Mar 29 • 8

deqing/convergent-llama-300M-adamw-isolate

Text Generation • 0.3B • Updated Mar 29 • 7

deqing/convergent-llama-300M-adamw-unigram

Text Generation • 0.3B • Updated Mar 29 • 9

deqing/convergent-mamba2-300M-muon-original

Text Generation • 0.3B • Updated Mar 29 • 11

deqing/llama-window-4-old

Text Generation • 0.3B • Updated Mar 29 • 8

deqing/llama-window-2-old

Text Generation • 0.3B • Updated Mar 29 • 9

deqing/convergent-llama-300M-muon-unk_number

Text Generation • 0.3B • Updated Mar 29 • 6

deqing/convergent-llama-300M-muon-swap_numbers

Text Generation • 0.3B • Updated Mar 29 • 8

deqing/llama-isolate-old

Text Generation • 0.3B • Updated Mar 29 • 8

deqing/convergent-llama-300M-muon-fivegram

Text Generation • 0.3B • Updated Mar 29 • 8

deqing/convergent-llama-300M-muon-permute

Text Generation • 0.3B • Updated Mar 29 • 6

deqing/convergent-llama-300M-muon-bigram

Text Generation • 0.3B • Updated Mar 29 • 8

deqing/convergent-llama-300M-muon-unigram

Text Generation • 0.3B • Updated Mar 29 • 9

deqing/mamba2-300M-v5-mamba2

Text Generation • 0.3B • Updated Mar 29 • 20

deqing/lstm-12layer-v5

0.2B • Updated Mar 29 • 6

deqing/llama-300M-v5-original

Text Generation • 0.3B • Updated Mar 27 • 8

deqing/llama-300M-v5-unk_number

Text Generation • 0.3B • Updated Mar 26 • 9

deqing/llama-300M-v5-addition_3digit_adamw

0.3B • Updated Mar 25 • 3

deqing/llama-300M-v5-addition_3digit

0.3B • Updated Mar 25 • 3

deqing/llama-300M-v5-addition

Text Generation • 0.3B • Updated Mar 25 • 10

deqing/llama-300M-v5-addition_adamw

Text Generation • 0.3B • Updated Mar 24 • 9

deqing/llama-300M-v5-addition_adamw-old

0.3B • Updated Mar 22 • 2

deqing/llama-300M-v5-addition_3digit-old

0.3B • Updated Mar 22 • 2

deqing/llama-300M-v5-adamw-addition_3digit_adamw-old

0.3B • Updated Mar 22 • 2

deqing/llama-300M-v5-original-random_init_sft

deqing/llama-300M-v5-isolate_sft

deqing/llama-300M-v5-swap_numbers_sft

deqing/llama-300M-v5-addition-old

0.3B • Updated Mar 21 • 2

deqing/llama-300M-v5-original_sft

deqing/llama-300M-v5-bigram

Text Generation • 0.3B • Updated Mar 20 • 8