SigLIP SoViT-400M/14 Vision Encoder 384px (GGUF)
GGUF conversion of google/siglip-so400m-patch14-384 for use with CrispEmbed.
- Architecture: SigLIP SoViT-400M/14 vision encoder (shape-optimized)
- Parameters: 428M
- Output: 1152-dimensional L2-normalized embeddings
- Input: 384x384 RGB image with SigLIP normalization (mean=0.5, std=0.5)
- Patch size: 14
- Size: ~1.6 GB
- Source: google/siglip-so400m-patch14-384
Usage
# Embed a single image
crispembed -m siglip-so400m-patch14-384 --image photo.jpg
# Batch processing
crispembed -m siglip-so400m-patch14-384 --image-dir ./photos/ --output embeddings.bin
About SigLIP SoViT-400M
SoViT-400M is a shape-optimized SigLIP variant that redistributes compute across width, depth, and MLP ratio for better efficiency. Combined with 384px input and patch size 14, it provides high-quality vision embeddings.
Notes
- All output embeddings are L2-normalized.
- This is a GGUF conversion; weights are numerically equivalent to the original HuggingFace model.
- Downloads last month
- 60
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support