SKT-ST-X-0-3B-V1

SKT AI LABS

Small Language Model — Mixture of Experts (3B Parameters) Built by SKT AI LABS, India



1. Model Overview

SKT 3B-MoE is a compact Small Language Model (SLM) built using ST-X-0 Taken Mixtral For Better MoE Stability , it delivers efficient and intelligent responses while maintaining a small footprint.

Property Value
Architecture Mixture of Experts (MoE)
Total Parameters ~3B
Active Parameters ~1.1B (2 expert/token)
Hidden Size 2048
*Number of Experts 4
Context Length 8K tokens
Training Tokens 40B

2. Key Capabilities

Capability Description
Bilingual English & Hindi
Basic Coding Python, logic, algorithms
Reasoning Logical thinking, problem solving
Creative Writing Stories, poems, roleplay
Knowledge QA General knowledge, facts
Personality Friendly, helpful, cute

3.Quick Start


Installation

pip install transformers accelerate torch

Basic Usage


from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "sKT-Ai-Labs/SKT-ST-X-0-3B",
    device_map="auto",
    torch_dtype=torch.float16
)
tokenizer = AutoTokenizer.from_pretrained("sKT-Ai-Labs/SKT-ST-X-0-3B")

# Chat
prompt = "What is Quantum Physics ?"
formatted = f"<|user|>\n{prompt}\n<|assistant|>\n"
inputs = tokenizer(formatted, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.split("<|assistant|>")[-1].strip())

4. Sample Outputs

Q: Write a short story about a cat
A: Once upon a time, there was a little brown cat named Jake.
    He was very small but very brave. One day, Jake saw a beautiful bird...
Q: Explain quantum computing
A: Quantum computing is a type of computing that uses quantum mechanics
    to process information. Unlike classical computers that use bits (0 or 1),
    quantum computers use qubits that can be both 0 and 1 simultaneously...

5. Advanced Configuration

LoRA Fine-tuning (for custom tasks)

from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()  # Only ~0.06% trainable!

4-bit Quantization (for low VRAM)

from transformers import BitsAndBytesConfig
import torch

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16
)

model = AutoModelForCausalLM.from_pretrained(
    "sKT-Ai-Labs/SKT-ST-X-0-3B",
    quantization_config=quant_config,
    device_map="auto"
)

6. License

Both the code repository and the model weights are released under the Apache-2.0 License
See License

See THIRD PARTY NOTICES


7. Contact Us

If you have any questions, please reach out at support@sktailabs.in.


8. Citation

@misc{SKT-ST-X-0-3B,
  author = {SKT AI LABS, India},
  title = {SKT-ST-X-0-3B: A Compact Mixture of Experts Model},
  year = {2026},
  publisher = {Hugging Face},
  url = {[https://huggingface.co/sKT-Ai-Labs/SKT-ST-X-0-3B](https://huggingface.co/sKT-Ai-Labs/SKT-ST-X-0-3B)}
}

Downloads last month
639
Safetensors
Model size
3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 2 Ask for provider support

Model tree for sKT-Ai-Labs/SKT-ST-X-0-3B

Finetunes
1 model
Quantizations
1 model

Dataset used to train sKT-Ai-Labs/SKT-ST-X-0-3B