Amazon SageMaker
Amazon SageMaker is AWS's family of services for building, training, and deploying your own machine learning models — notebooks for experimentation, training jobs for fitting models, a model registry for tracking artifacts, and endpoints for serving models behind an HTTPS API. The 2017 product was reorganized in late 2024 into an umbrella with a new Unified Studio that brings analytics and ML together.
The framing: SageMaker is for building and serving your own models. For consuming pre-built large language models, use Bedrock; for ready-made vision/speech/text tasks, use the dedicated AI APIs.
Notebooks, Training, and Endpoints
The core workflow has three pieces. Notebooks (SageMaker Studio is the default) provide a managed IDE for exploration. Training jobs take a script and data in S3, spin up a cluster, train, ship the model artifact to S3, and tear the cluster down — billed per second. Inference endpoints serve a trained model on managed instances with auto-scaling, A/B testing, multi-model endpoints, and serverless inference.
The typical path: experiment in Studio, train with a job once the code is right, deploy to a serverless or real-time endpoint, and call it from your application.
Built-In Algorithms, JumpStart, and MLOps
Built-in algorithms (XGBoost, Linear Learner, K-Means, and others) train without writing model code. JumpStart is a library of pre-trained models you can fine-tune and deploy with a few clicks. JumpStart hosts the model on your endpoint (you manage the compute); Bedrock is fully managed with no instance to pick.
MLOps features cover production: Pipelines orchestrates ML workflows, Model Registry versions artifacts, Model Monitor watches endpoints for drift, Feature Store serves features online and offline, and Clarify measures bias.
Inference Cost
SageMaker has many billing lines, but real-time endpoints kept up 24/7 are the cost trap most teams need to watch — they do not scale to zero. Serverless inference (pay per request) is the right choice for low-traffic or bursty endpoints, and asynchronous inference auto-scales to zero for models that need full compute only occasionally.
SageMaker — building and serving your own custom models, with full control over training and deployment.
Bedrock — consuming pre-built foundation models through one API, with optional fine-tuning — no instance to manage.
Rekognition / Comprehend / etc. — ready-made task APIs (vision, NLP, OCR, speech) that need no model at all.
- Leaving a real-time endpoint running 24/7 for a low-traffic model instead of using serverless inference, which scales to zero.
- Training models on a laptop instead of cheap, parallel SageMaker training jobs, capping iteration speed.
- Tracking model artifacts with ad-hoc S3 paths instead of the Model Registry from day one.
- Retrofitting reproducibility later instead of using SageMaker Pipelines for production workflows.
- Reaching for SageMaker to consume a foundation model with no fine-tuning, where Bedrock is simpler.
- Deploying without Model Monitor, so data and quality drift go unnoticed until predictions degrade.
- Start in SageMaker Studio or Unified Studio; skip classic notebook instances.
- Train in the cloud with training jobs, not locally.
- Use the Model Registry and SageMaker Pipelines from the start for reproducibility.
- Use serverless inference for low-traffic models and asynchronous inference for rare GPU-heavy calls.
- Watch deployed models with Model Monitor for drift.
- For pre-trained models, evaluate JumpStart and Bedrock together — Bedrock if it covers your model, JumpStart for fine-tuning control.
Knowledge Check
What is SageMaker primarily for, versus Bedrock?
- Building and serving your own ML models; Bedrock consumes pre-built foundation models
- Consuming pre-built foundation models; Bedrock is the one for training your own from scratch
- OCR of scanned documents; Bedrock handles computer vision
- They are the same service under two brand names
What is the biggest cost trap in SageMaker?
- Real-time endpoints kept up 24/7, which do not scale to zero
- Training jobs, which run forever once you start them
- The per-version Model Registry artifact storage fee
- Studio notebook instances left running idle overnight and on weekends
For a model that receives low, bursty traffic, which inference option fits best?
- Serverless inference — pay per request and scale to zero
- A real-time endpoint on a large instance left provisioned and always on
- Batch transform run continuously against the dataset
- A classic notebook instance left up to serve requests
What do SageMaker Pipelines and Model Registry provide?
- Reproducible ML workflows and versioned, tracked model artifacts for production
- Pre-trained foundation models hosted and ready to call straight from a managed API
- Ready-made real-time computer-vision detection APIs for objects, faces, and scenes
- Managed neural translation between dozens of source and target languages
You got correct