Deploying on AWS documentation

Quickstart - Deploy Hugging Face Models with SageMaker Jumpstart

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Quickstart - Deploy Hugging Face Models with SageMaker Jumpstart

Why use SageMaker JumpStart for Hugging Face models?

Amazon SageMaker JumpStart lets you deploy the most-popular open Hugging Face models with one click—inside your own AWS account. JumpStart offers a curated selection of model checkpoints for various tasks, including text generation, embeddings, vision, audio, and more. Most models are deployed using the official Hugging Face Deep Learning Containers with a sensible default instance type, so you can move from idea to production in minutes.

In this quickstart guide, we will deploy Qwen/Qwen2.5-14B-Instruct.

1. Prerequisites

Requirement
AWS account with SageMaker enabled An AWS account that will contain all your AWS resources.
An IAM role to access SageMaker AI Learn more about how IAM works with SageMaker AI in this guide.
SageMaker Studio domain and user profile We recommend using SageMaker Studio for straightforward deployment and inference. Follow this guide.
Service quotas Most LLMs need GPU instances (e.g. ml.g5). Verify you have quota for ml.g5.24xlarge or request it.

These docs and examples use the SageMaker Python SDK v3, which introduces a new framework-agnostic API built around ModelBuilder (inference) and ModelTrainer (training), replacing the v2 HuggingFaceModel and HuggingFace classes. Install it with pip install "sagemaker>=3.0.0".

2. Endpoint deployment

Let’s explain how you would deploy a Hugging Face model to SageMaker browsing through the Jumpstart catalog:

  1. Open SageMaker → JumpStart.
  2. Filter “Hugging Face” or search for your model (e.g. Qwen2.5-14B).
  3. Click Deploy → (optional) adjust instance size / count → Deploy.
  4. Wait until Endpoints shows In service.
  5. Copy the Endpoint name (or ARN) for later use.
JumpStart deployment demo

Alternatively, you can also browse through the Hugging Face Model Hub:

  1. Open the model page → Click Deploy → SageMaker → Jumpstart tab if model is available.
  2. Copy the code snippet and use it from a SageMaker Notebook instance.
JumpStart deployment demo
# SageMaker JumpStart models can be deployed with ModelBuilder by passing the
# JumpStart model ID as `model`. ModelBuilder resolves the JumpStart artifacts and
# container, and runs the deployment in network isolation.
# Set `instance_type` to one the model supports (see the model card): ModelBuilder's
# auto-detection otherwise picks a CPU instance, which LLMs don't support.
import json
from sagemaker.serve import ModelBuilder

# use the `role_arn` parameter to use a different role
model_builder = ModelBuilder(
    model="huggingface-llm-qwen2-5-14b-instruct",
    instance_type="ml.g5.24xlarge",
)
model_builder.build()

predictor = model_builder.deploy(accept_eula=True)

payload = {
    "inputs": "what is machine learning?",
    "parameters": {"max_new_tokens": 256},
}
response = predictor.invoke(body=json.dumps(payload), content_type="application/json")
print(json.loads(response.body.read()))

The endpoint creation can take several minutes, depending on the size of the model.

3. Test interactively

If you deployed through the console, you need to grab the endpoint ARN and reuse in your code.

import json
from sagemaker.core.resources import Endpoint

endpoint_name = "MY ENDPOINT NAME"
predictor = Endpoint.get(endpoint_name=endpoint_name)
payload = {
    "messages": [
        {
            "role": "system",
            "content": "You are a passionate data scientist."
        },
        {
            "role": "user",
            "content": "what is machine learning?"
        }
    ],
    "max_tokens": 2048,
    "temperature": 0.7,
    "top_p": 0.9,
    "stream": False
}

response = predictor.invoke(body=json.dumps(payload), content_type="application/json")
print(json.loads(response.body.read()))

The endpoint supports the OpenAI API specification.

4. Clean‑up

To avoid incurring unnecessary costs, when you’re done, delete the SageMaker endpoints in the Deployments → Endpoints console or using the following code snippets:

predictor.delete()
Update on GitHub