Skip to content
Navigation Menu

IBM Cloud

  • CatalogCatalog
  • Cost EstimatorCost Estimator
    • HelpHelp
      • Docs
  • Log in
  • Sign up
  • Catalog
  • Cost Estimator
  • Help
    • Docs

  • Navigation settings

Error

Change theme

This feature is in early stage, some parts of the platform might not fully support different themes yet.

Themes
  1. Catalog

Speech to Text

Low-latency, streaming transcription

  • Date of last update: 12/12/2024
  • Docs
  • API docs
  • Service
  • IBM
  • 12/12/2024
  • AI / Machine Learning
  • EU Supported
  • HIPAA Enabled
  • IAM-enabled
  • Sydney (au-syd)
  • Frankfurt (eu-de)
  • London (eu-gb)
  • Tokyo (jp-tok)
  • Washington DC (us-east)
  • Dallas (us-south)
  • API docs
  • Docs
  • Terms

Pricing plans

Prices shown are for country or location: United States
PlanFeatures and capabilitiesPricing
Lite
  • 500 Minutes per Month
  • Free

The Lite plan gets you started with 500 minutes per month at no cost. When you upgrade to a paid plan, you will get access to Customization capabilities.


Lite plan services are deleted after 30 days of inactivity.
  • 500 Minutes per Month

The Lite plan gets you started with 500 minutes per month at no cost. When you upgrade to a paid plan, you will get access to Customization capabilities.

Plus - NEW!
  • Minutes Per Month
  • Simple Volume Tiers
Click to view tiers and pricing detail
Premium - NEW!
  • Everything in Plus Plan, plus...
  • The Premium Plan provides the same features and benefits of using the Plus Plan, but with significantly greater capacity for concurrent transcriptions, enhanced security features to ensure that your data is isolated and encrypted end-to-end while in transit and at rest, and HIPAA readiness. Up to 500 concurrent transcriptions with the option to add more, and 150,000 free minutes to start.
  • HIPAA Enabled in Washington DC (US-East) & Dallas (US-South)
  • EU Supported available in Frankfurt (EU-DE)
    • Service
    • IBM
    • 12/12/2024
    • AI / Machine Learning
    • EU Supported
    • HIPAA Enabled
    • IAM-enabled
    • Sydney (au-syd)
    • Frankfurt (eu-de)
    • London (eu-gb)
    • Tokyo (jp-tok)
    • Washington DC (us-east)
    • Dallas (us-south)
    • API docs
    • Docs
    • Terms

    Summary

    The Speech to Text service converts the human voice into the written word. The service uses deep-learning AI to apply knowledge of grammar, language structure, and the composition of audio and voice signals to accurately transcribe human speech. It can be used in applications such as voice-automated chatbots, analytic tools for customer-service call centers, and multi-media transcription, among many others.

    Features and capabilities

    Available languages

    Brazilian Portuguese, Chinese (Mandarin dialect), Dutch, English (US and UK dialects), French, German, Italian, Japanese, Korean, Spanish (Argentinian, Castilian, Chilean, Colombian, Mexican, and Peruvian dialects), and Modern Standard Arabic (broadband model only). Base models are available for audio sampled at 16 kHz broadband and 8k Hz narrowband in a wide range of audio formats.

    Interfaces and SDKs

    Request transcription with synchronous or asynchronous HTTP REST APIs, or use WebSockets for efficient, low-latency, high-throughput requests over a full-duplex connection. Send all audio at once or stream continuous audio for live speech recognition. Use SDKs for simplified rapid development in Node, Java, Python, Swift, and many other languages.

    Language and Acoustic Customization

    Use language model customization to define domain-specific words that expand the service's base vocabulary; acoustic model customization to enhance recognition for the acoustic characteristics of your audio; and grammars to limit recognition to specific strings and phrases only. Create multiple models and grammars for different purposes, and combine all three capabilities to adapt recognition for your application's requirements.

    Keyword spotting and speaker labels

    Identify specific keyword strings from the audio with a user-defined level of confidence. Identify different speakers from a multi-participant conversation.

    Transcript metadata

    Receive a JSON response that includes confidence scores, start and end times, and multiple possible alternatives. Split a transcript into multiple results based on semantic features such as sentences.

    Transcript refinement

    Apply smart formatting to convert dates, times, numbers, currency values, phone numbers, and more to conventional written forms in final transcripts. Redact sensitive personal information such as credit card numbers from transcripts. Censor profanity from US English transcripts and metadata.

    Processing and audio metrics

    Request processing metrics for detailed information about the service's analysis or your audio, or audio metrics for details about the precise signal characteristics of your audio.

    Getting support


    If you're experiencing issues with this product, go to the IBM Cloud Support Center and navigate to creating a case. Use the All products option to search for this product to continue creating the case or to find more information about getting support. Third party and community supported products might direct you to a support process outside of IBM Cloud.

    Summary

    Speech to Text

    Free
    • Plan: Lite
    Already have an account? Log in