Skip to content
Navigation Menu

IBM Cloud

  • CatalogCatalog
  • Cost EstimatorCost Estimator
    • HelpHelp
      • Docs
  • Log in
  • Sign up
  • Catalog
  • Cost Estimator
  • Help
    • Docs

  • Navigation settings

Error

Change theme

This feature is in early stage, some parts of the platform might not fully support different themes yet.

Themes
  1. Catalog

watsonx.ai Runtime

(Formerly known as Watson Machine Learning) Quickly build, run and manage generative AI and machine learning applications with built-in performance and scalability.

  • Date of last update: 05/06/2025
  • Docs
  • API docs
  • Service
  • IBM
  • 05/06/2025
  • AI / Machine Learning
  • HIPAA Enabled
  • IAM-enabled
  • Service Endpoint Supported
  • London (eu-gb)
  • Dallas (us-south)
  • Sydney (au-syd)
  • Toronto (ca-tor)
  • Frankfurt (eu-de)
  • Tokyo (jp-tok)
  • API docs
  • Docs
  • Terms

Pricing plans

Prices shown are for country or location: United States
PlanFeatures and capabilitiesPricing
Lite
  • Service instance
  • Instance includes:
  • • 20 capacity unit-hours (CUH) per month
  • • 50,000 tokens/data points per month
  • -----
  • Foundation models:
  • • Inferencing for text generation consumes tokens (as Resource Units)
  • • Token usage is the sum of input and output tokens
  • • Time series forecasting consumes data points (as Resource Units)
  • • Data point usage is the sum of input and output data points
  • -----
  • Machine learning training tools:
  • • Compute usage counted as CUH
  • • CUH rate based on training tool, hardware specification, and runtime environment
  • -----
  • Machine learning deployments:
  • • Compute usage counted as CUH
  • • CUH rate based on deployment type, hardware specification, and other factors
  • • Maximum parallel Decision Optimization batch jobs per deployment: 2
  • • Default deployment jobs retained per deployment space: 100
  • Free

CUH billing is based on CUH rate multiplied by the hours of consumption. Resource Units billing is based on the rate of the foundation model class multiplied by the number of tokens or data points used. For details on CUH rates, go to https://ibm.biz/wx-plans-ml
For details on foundation model classes, go to https://ibm.biz/wx-plans-gen-ai

1 Resource Unit = 1000 tokens/data points


Lite plan services are deleted after 30 days of inactivity.
  • Service instance
  • Instance includes:
  • • 20 capacity unit-hours (CUH) per month
  • • 50,000 tokens/data points per month
  • -----
  • Foundation models:
  • • Inferencing for text generation consumes tokens (as Resource Units)
  • • Token usage is the sum of input and output tokens
  • • Time series forecasting consumes data points (as Resource Units)
  • • Data point usage is the sum of input and output data points
  • -----
  • Machine learning training tools:
  • • Compute usage counted as CUH
  • • CUH rate based on training tool, hardware specification, and runtime environment
  • -----
  • Machine learning deployments:
  • • Compute usage counted as CUH
  • • CUH rate based on deployment type, hardware specification, and other factors
  • • Maximum parallel Decision Optimization batch jobs per deployment: 2
  • • Default deployment jobs retained per deployment space: 100

CUH billing is based on CUH rate multiplied by the hours of consumption. Resource Units billing is based on the rate of the foundation model class multiplied by the number of tokens or data points used. For details on CUH rates, go to https://ibm.biz/wx-plans-ml
For details on foundation model classes, go to https://ibm.biz/wx-plans-gen-ai

1 Resource Unit = 1000 tokens/data points

Essentials
  • Service instance
  • Foundation models:
  • • Inferencing for text generation consumes tokens (as Resource Units)
  • • Token usage is the sum of input and output tokens
  • • Time series forecasting consumes data points (as Resource Units)
  • • Data point usage is the sum of input and output data points
  • • Compute usage for Tuning Studio is 43 CUH per hour
  • -----
  • Text extraction:
  • • Each document page or image file or .tiff frame is considered 1 Page
  • • Each Page is charged at a flat rate (text extraction category 1 rate)
  • • Text extraction category 2 rates are not applicable at this time
  • -----
  • Machine learning training tools:
  • • Compute usage counted as CUH
  • • CUH rate based on training tool, hardware specification, and runtime environment
  • -----
  • Machine learning deployments:
  • • Compute usage counted as CUH
  • • CUH rate based on deployment type, hardware specification, and other factors
  • • Maximum parallel Decision Optimization batch jobs per deployment: 5
  • • Default deployment jobs retained per deployment space: 1000
  • $0.52 USD/Capacity Unit-Hour
  • $0.01 USD/Mistral Large Output Resource Unit
  • $0.038 USD/Page - Text Extraction Category 1
  • $0.019 USD/Page - Text Extraction Category 2
  • $0.0001 USD/Resource Unit (IBM models)
  • $0.0001 USD/Resource Unit (Third party models)
  • $0.003 USD/Mistral Large Input Resource Unit
Standard
  • Service instance
  • Instance includes:
  • • 2500 capacity unit-hours (CUH) monthly for compute usage
  • -----
  • Foundation models:
  • • Inferencing for text generation consumes tokens (as Resource Units)
  • • Token usage is the sum of input and output tokens
  • • Time series forecasting consumes data points (as Resource Units)
  • • Data point usage is the sum of input and output data points
  • • Compute usage for Tuning Studio is 43 CUH per hour
  • -----
  • Text extraction:
  • • Each document page or image file or .tiff frame is considered 1 Page
  • • Each Page is charged at a flat rate (text extraction category 1 rate)
  • • Text extraction category 2 rates are not applicable at this time
  • -----
  • Custom Foundation Models (Dallas only):
  • • Hourly rates vary based on the hardware configuration, and include model hosting and inferencing costs
  • -----
  • Machine learning training tools:
  • • Compute usage counted as CUH
  • • CUH rate based on training tool, hardware specification, and runtime environment
  • -----
  • Machine learning deployments:
  • • Compute usage counted as CUH
  • • CUH rate based on deployment type, hardware specification, and other factors
  • • Maximum parallel Decision Optimization batch jobs per deployment: 100
  • • Default deployment jobs retained per deployment space: 3000
  • HIPAA readiness option available only in Dallas on IBM Cloud
  • $0.42 USD/Capacity Unit-Hour
  • $1,050.00 USD/Instance per month
  • $0.01 USD/Mistral Large Output Resource Unit
  • $5.22 USD/Hour - Small Model Hosting
  • $10.40 USD/Hour - Medium Model Hosting
  • $20.85 USD/Hour - Large Model Hosting
  • $41.70 USD/Hour - Extra Large Model Hosting
  • $0.03 USD/Page - Text Extraction Category 1
  • $0.015 USD/Page - Text Extraction Category 2
  • $0.0001 USD/Resource Unit (IBM models)
  • $0.0001 USD/Resource Unit (Third party models)
  • $0.003 USD/Mistral Large Input Resource Unit
  • $4.43 USD/Hour - Extra Small Model Hosting
  • $83.50 USD/Hour - Very Large Model Hosting
  • $34.30 USD/Hour - Mistral Large Model Hosting Access
  • $0.00625 USD/Resource Unit - InstructLab Data
  • $0.007 USD/Resource Unit - InstructLab Tuning
  • $8.60 USD/Hour - Mistral 1 GPU Model Hosting Access
  • $17.20 USD/Hour - Mistral 2 GPU Model Hosting Access
  • Service
  • IBM
  • 05/06/2025
  • AI / Machine Learning
  • HIPAA Enabled
  • IAM-enabled
  • Service Endpoint Supported
  • London (eu-gb)
  • Dallas (us-south)
  • Sydney (au-syd)
  • Toronto (ca-tor)
  • Frankfurt (eu-de)
  • Tokyo (jp-tok)
  • API docs
  • Docs
  • Terms

Summary

IBM watsonx.ai Runtime provides the resources that power the AI lifecycle for IBM watsonx.ai and Cloud Pak for Data as a Service. IBM watsonx.ai Runtime powers IBM watsonx.ai Studio. With resources for all AI solution tasks, you can build, test, and deploy machine learning models and on watsonx.ai, build, tune, and deploy generative AI solutions.

Features and capabilities

Manage generative AI solutions

On watsonx.ai, inference foundation models with prompts that you customize for your solution. Build retrieval-augmented generation patterns that include extracting text, chunking and embedding documents for vector indexes, and reranking search results. Host the foundation models that you deploy and the endpoints for prompts and AI services.

Manage machine learning models

Take advantage of machine learning models management (continuous learning system) and deployment (online, batch, streaming). Select any of widely supported machine learning frameworks: TensorFlow, Keras, Caffe, PyTorch, Spark MLlib, scikit learn, xgboost and SPSS.

Embed in applications

Use the command line interface and Python client to manage your artifacts. Extend your application with artificial intelligence through the watsonx.ai REST API.

Integration with watsonx.ai Studio

Create and train machine learning models with the best tools and the latest expertise in a social environment built by and for data scientists.

IBM Watson Machine Learning Dashboard

Focus sentinel
Close

(1/3) | IBM Watson Machine Learning Dashboard

Focus sentinel

Getting support


If you're experiencing issues with this product, go to the IBM Cloud Support Center and navigate to creating a case. Use the All products option to search for this product to continue creating the case or to find more information about getting support. Third party and community supported products might direct you to a support process outside of IBM Cloud.

Summary

watsonx.ai Runtime

Free
  • Plan: Lite
Already have an account? Log in