Instructions to use LiquidAI/LFM2.5-1.2B-JP-202606-ONNX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers.js
How to use LiquidAI/LFM2.5-1.2B-JP-202606-ONNX with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('text-generation', 'LiquidAI/LFM2.5-1.2B-JP-202606-ONNX');
🇯🇵 LFM2.5-1.2B-JP-202606-ONNX
ONNX export of LFM2.5-1.2B-JP-202606 for cross-platform deployment via ONNX Runtime, Transformers.js, and the WebGPU stack. Same weights, same chat template — just compiled into ONNX graphs at multiple precisions, including a WebGPU-friendly INT4 + FP16 mix.
LiquidAI/LFM2.5-1.2B-JP-202606 は当社の汎用日本語チャットモデルです。本リポジトリはその ONNX エクスポートで、ONNX Runtime / Transformers.js / WebGPU での実行に対応しています。重みおよびチャットテンプレートは同一です。
📦 Files
| File | Format | Embedding | Weights | Cache / activations | Approx. size |
|---|---|---|---|---|---|
onnx/model.onnx |
FP32 | FP32 | FP32 | FP32 | 4.7 GB |
onnx/model_fp16.onnx |
FP16 | FP16 | FP16 | FP16 | 2.4 GB |
onnx/model_q4.onnx |
INT4 | INT4 (GatherBlockQuantized) |
INT4 (MatMulNBits) |
FP32 | 834 MB |
onnx/model_q4f16.onnx |
INT4 + FP16 | INT4 + FP16 scales | INT4 + FP16 scales | FP16 | 744 MB |
onnx/model_q4f32.onnx |
INT4 (MatMul-only) | FP32 (kept) | INT4 (MatMulNBits) |
FP32 | 1.2 GB |
onnx/model_q8.onnx |
INT8 (MatMul-only) | FP32 (kept) | INT8 (MatMulNBits) |
FP32 | 1.8 GB |
model_q4f16.onnxis the recommended variant for WebGPU: INT4 weights with FP16 scales, FP16 KV cache and conv state I/O, FP32 logits via an inserted Cast — the format Transformers.js targets for browser inference.
Each .onnx file ships its weights in one or more .onnx_data chunks (≤ 2 GB each, per the ONNX external-data convention).
🏃 Inference
Transformers.js (browser / Node.js, WebGPU)
import { pipeline } from "@huggingface/transformers";
const generator = await pipeline(
"text-generation",
"LiquidAI/LFM2.5-1.2B-JP-202606-ONNX",
{ dtype: "q4f16", device: "webgpu" }
);
const messages = [
{ role: "system", content: "You are a helpful assistant trained by Liquid AI." },
{ role: "user", content: "日本の首都は?" },
];
const output = await generator(messages, {
max_new_tokens: 256,
do_sample: true,
temperature: 0.1,
top_k: 50,
repetition_penalty: 1.05,
});
console.log(output[0].generated_text);
ONNX Runtime (Python)
import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer
REPO = "LiquidAI/LFM2.5-1.2B-JP-202606-ONNX"
tokenizer = AutoTokenizer.from_pretrained(REPO)
session = ort.InferenceSession("onnx/model_q4.onnx", providers=["CPUExecutionProvider"])
# Map ORT type names to numpy dtypes so fp16 / q4f16 variants work too.
ORT_DTYPE = {"tensor(float)": np.float32, "tensor(float16)": np.float16, "tensor(int64)": np.int64}
prompt = tokenizer.apply_chat_template(
[{"role": "user", "content": "日本の首都は?"}],
tokenize=False,
add_generation_prompt=True,
)
input_ids = np.array([tokenizer.encode(prompt, add_special_tokens=False)], dtype=np.int64)
seq_len = input_ids.shape[1]
feed = {
"input_ids": input_ids,
"attention_mask": np.ones((1, seq_len), dtype=np.int64),
"position_ids": np.arange(seq_len, dtype=np.int64).reshape(1, -1),
}
for inp in session.get_inputs():
if inp.name not in feed:
shape = [d if isinstance(d, int) else 1 for d in inp.shape]
feed[inp.name] = np.zeros(shape, dtype=ORT_DTYPE[inp.type])
logits = session.run(None, feed)[0]
next_id = int(np.argmax(logits[0, -1]))
print(tokenizer.decode([next_id]))
For full multi-turn generation with stateful KV cache feedback, see the LiquidONNX inference example (works against this repo unchanged).
🗒️ Model Details
LFM2.5-1.2B-JP-202606 is a general-purpose Japanese-capable chat model:
- Number of parameters: 1.17B
- Number of layers: 16 (10 double-gated LIV convolution blocks + 6 GQA blocks)
- Context length: 32,768 tokens
- Vocabulary size: 65,536
- Knowledge cutoff: Mid-2024
- Languages: English, Japanese
- Recommended generation parameters:
temperature: 0.1top_k: 50repetition_penalty: 1.05
Refer to the base model card for benchmark scores, training details, and use-case recommendations.
| Model | Description |
|---|---|
| LFM2.5-1.2B-JP-202606 | Original checkpoint in native format. Best for fine-tuning or inference with Transformers and vLLM. |
| LFM2.5-1.2B-JP-202606-GGUF | Quantized format for llama.cpp and compatible tools. |
| LFM2.5-1.2B-JP-202606-ONNX | ONNX Runtime format for cross-platform deployment (ORT, Transformers.js, WebGPU). |
| LFM2.5-1.2B-JP-202606-MLX-8bit | MLX format for Apple Silicon. |
We recommend using it for agentic workflows, tool use, structured outputs, bilingual English–Japanese assistants, and on-device personal-assistant applications. It is not recommended for knowledge-intensive tasks. It performs best when given clear, explicit instructions that define the task, expected behavior, and output format.
エージェント型ワークフロー、ツール使用、構造化出力、日英バイリンガルアシスタント、オンデバイスのパーソナルアシスタントでの利用を推奨します。一方で、詳細な知識を要するのタスクには推奨されません。タスク内容、期待される動作、出力形式を明確かつ具体的に指示することで、最も高い性能を発揮します。
Chat Template
LFM2.5 uses a ChatML-like format. See the Chat Template documentation for details.
<|startoftext|><|im_start|>system
You are a helpful assistant trained by Liquid AI.<|im_end|>
<|im_start|>user
日本の首都は?<|im_end|>
<|im_start|>assistant
Use tokenizer.apply_chat_template() to format messages automatically — the included tokenizer.json and chat_template.jinja work unchanged across Transformers, Transformers.js, and ORT.
Tool Use
The same Pythonic function-call protocol as the base model (<|tool_call_start|>[fn(...)]<|tool_call_end|>). See the Tool Use documentation for the full guide.
🛠️ How this export was produced
These ONNX artifacts are produced by the Liquid4All/onnx-export toolchain:
uv run lfm2-export LiquidAI/LFM2.5-1.2B-JP-202606 --precision
# (plus a one-shot Q4 → Q4F16 conversion using lfm2_moe.export.convert_q4_to_fp16)
Each variant is verified against the PyTorch reference on a coherence-test prompt suite before publication.
📬 Contact
- Got questions or want to connect? Join our Discord community
- If you are interested in custom solutions with edge deployment, please contact our sales team.
Citation
@article{liquidai2025lfm2,
title={LFM2 Technical Report},
author={Liquid AI},
journal={arXiv preprint arXiv:2511.23404},
year={2025}
}
- Downloads last month
- 16
Model tree for LiquidAI/LFM2.5-1.2B-JP-202606-ONNX
Base model
LiquidAI/LFM2.5-1.2B-Base