Text Generation
MLX
Safetensors
qwen3_5_moe
mixture-of-experts
reap
pruned
expert-pruning
qwen3.6
quantized
oq4
conversational
arxiv:2510.13999
4-bit precision
Instructions to use stamsam/Qwen3.6-28B-REAP-oQ4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use stamsam/Qwen3.6-28B-REAP-oQ4 with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("stamsam/Qwen3.6-28B-REAP-oQ4") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use stamsam/Qwen3.6-28B-REAP-oQ4 with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "stamsam/Qwen3.6-28B-REAP-oQ4"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "stamsam/Qwen3.6-28B-REAP-oQ4" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use stamsam/Qwen3.6-28B-REAP-oQ4 with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "stamsam/Qwen3.6-28B-REAP-oQ4"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default stamsam/Qwen3.6-28B-REAP-oQ4
Run Hermes
hermes
- MLX LM
How to use stamsam/Qwen3.6-28B-REAP-oQ4 with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "stamsam/Qwen3.6-28B-REAP-oQ4"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "stamsam/Qwen3.6-28B-REAP-oQ4" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stamsam/Qwen3.6-28B-REAP-oQ4", "messages": [ {"role": "user", "content": "Hello"} ] }'
Qwen3.6-28B-REAP-oQ4
An oQ4 / MLX quant of
0xSero/Qwen3.6-28B-REAP,
made to be easier to run locally on Apple Silicon.
This repo is just the quantized version. The model work, source release, and credit belong to 0xSero.
Go check out the original repo here:
Big shout out to 0xSero and
Sybil Solutions for putting the original
model out there. If this version is useful, please like and support the source
model too.
Notes
- Quantized for MLX-compatible local use.
- No extra fine-tuning was done here.
- No benchmark claims yet. Test it on your own prompts.
License
Apache 2.0, following the source checkpoint.
- Downloads last month
- 344
Model size
5B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
4-bit