Service 53

Amazon SageMaker

AI/MLPlatformML

Amazon SageMaker is AWS's family of services for building, training, and deploying your own machine learning models — notebooks for experimentation, training jobs for fitting models, a model registry for tracking artifacts, and endpoints for serving models behind an HTTPS API. The 2017 product was reorganized in late 2024 into an umbrella with a new Unified Studio that brings analytics and ML together.

The framing: SageMaker is for building and serving your own models. For consuming pre-built large language models, use Bedrock; for ready-made vision/speech/text tasks, use the dedicated AI APIs.

Notebooks, Training, and Endpoints

The core workflow has three pieces. Notebooks (SageMaker Studio is the default) provide a managed IDE for exploration. Training jobs take a script and data in S3, spin up a cluster, train, ship the model artifact to S3, and tear the cluster down — billed per second. Inference endpoints serve a trained model on managed instances with auto-scaling, A/B testing, multi-model endpoints, and serverless inference.

The typical path: experiment in Studio, train with a job once the code is right, deploy to a serverless or real-time endpoint, and call it from your application.

Built-In Algorithms, JumpStart, and MLOps

Built-in algorithms (XGBoost, Linear Learner, K-Means, and others) train without writing model code. JumpStart is a library of pre-trained models you can fine-tune and deploy with a few clicks. JumpStart hosts the model on your endpoint (you manage the compute); Bedrock is fully managed with no instance to pick.

MLOps features cover production: Pipelines orchestrates ML workflows, Model Registry versions artifacts, Model Monitor watches endpoints for drift, Feature Store serves features online and offline, and Clarify measures bias.

Inference Cost

SageMaker has many billing lines, but real-time endpoints kept up 24/7 are the cost trap most teams need to watch — they do not scale to zero. Serverless inference (pay per request) is the right choice for low-traffic or bursty endpoints, and asynchronous inference auto-scales to zero for models that need full compute only occasionally.

SageMaker vs Bedrock vs the AI APIs

SageMaker — building and serving your own custom models, with full control over training and deployment.

Bedrock — consuming pre-built foundation models through one API, with optional fine-tuning — no instance to manage.

Rekognition / Comprehend / etc. — ready-made task APIs (vision, NLP, OCR, speech) that need no model at all.

Common Mistakes

Leaving a real-time endpoint running 24/7 for a low-traffic model instead of using serverless inference, which scales to zero.
Training models on a laptop instead of cheap, parallel SageMaker training jobs, capping iteration speed.
Tracking model artifacts with ad-hoc S3 paths instead of the Model Registry from day one.
Retrofitting reproducibility later instead of using SageMaker Pipelines for production workflows.
Reaching for SageMaker to consume a foundation model with no fine-tuning, where Bedrock is simpler.
Deploying without Model Monitor, so data and quality drift go unnoticed until predictions degrade.

Best Practices

Start in SageMaker Studio or Unified Studio; skip classic notebook instances.
Train in the cloud with training jobs, not locally.
Use the Model Registry and SageMaker Pipelines from the start for reproducibility.
Use serverless inference for low-traffic models and asynchronous inference for rare GPU-heavy calls.
Watch deployed models with Model Monitor for drift.
For pre-trained models, evaluate JumpStart and Bedrock together — Bedrock if it covers your model, JumpStart for fine-tuning control.

Comparable services GCP Vertex AIAzure Azure Machine Learning

Knowledge Check

What is SageMaker primarily for, versus Bedrock?

Building and serving your own ML models; Bedrock consumes pre-built foundation models
Consuming pre-built foundation models; Bedrock is the one for training your own from scratch
OCR of scanned documents; Bedrock handles computer vision
They are the same service under two brand names

What is the biggest cost trap in SageMaker?

Real-time endpoints kept up 24/7, which do not scale to zero
Training jobs, which run forever once you start them
The per-version Model Registry artifact storage fee
Studio notebook instances left running idle overnight and on weekends

For a model that receives low, bursty traffic, which inference option fits best?

Serverless inference — pay per request and scale to zero
A real-time endpoint on a large instance left provisioned and always on
Batch transform run continuously against the dataset
A classic notebook instance left up to serve requests

What do SageMaker Pipelines and Model Registry provide?

Reproducible ML workflows and versioned, tracked model artifacts for production
Pre-trained foundation models hosted and ready to call straight from a managed API
Ready-made real-time computer-vision detection APIs for objects, faces, and scenes
Managed neural translation between dozens of source and target languages

You got correct