Skip to content
Navigation Menu

IBM Cloud

  • CatalogCatalog
  • Cost EstimatorCost Estimator
  • DocsDocs
  • Catalog
  • Cost Estimator
  • Docs

  • Navigation settings

Error

  • Log in
  • Sign up
  1. Catalog

Text to Speech

Synthesizes natural-sounding speech from text.

  • Date of last update: 09/01/2021
  • Docs
  • API docs
Type
  • Service
Provider
  • IBM
Updated on
  • 09/01/2021
Category
  • AI / Machine Learning
Compliance
  • EU Supported
  • HIPAA Enabled
  • IAM-enabled
Related links
  • API docs
  • Docs
  • Terms

Pricing plans

PlanFeaturesPricing

Summary

Text to Speech

    Already have an account? Log in
    Type
    • Service
    Provider
    • IBM
    Updated on
    • 09/01/2021
    Category
    • AI / Machine Learning
    Compliance
    • EU Supported
    • HIPAA Enabled
    • IAM-enabled
    Related links
    • API docs
    • Docs
    • Terms

    Summary

    The Text to Speech service converts written text to natural-sounding speech. The service streams the synthesized audio back with minimal delay. The audio uses appropriate cadence and intonation for its language and dialect to provide voices that are smooth and natural. The service can be used in applications such as voice-automated chatbots, as well as a variety of voice-driven and screenless applications, such as tools for the disabled or visually impaired, video narration and voice over, and educational and home-automation solutions.

    Features

    Available languages

    Arabic, Brazilian Portuguese, Chinese (Mandarin dialect), Dutch, English (US and UK dialects), French, German, Italian, Japanese, Korean, and Spanish (Castilian, Latin American, and North American dialects).

    Available voices

    Choose from a variety of male and female voices for different languages. Most languages provide both Neural and Standard voices, although some provide only one type. Neural voices generate audio by relying on Deep Neural Networks to predict the acoustic features of the requested speech. Standard voices assemble audio by concatenating segments of recorded speech.

    Interfaces and SDKs

    Request synthesis with HTTP REST or WebSocket APIs. For languages other than Japanese, WebSockets also allow you to obtain timing information for words of the resulting audio. Use SDKs for simplified rapid development in Node, Java, Python, Swift, and many other languages.

    SSML

    Annotate input text with the Speech Synthesis Markup Language (SSML), a standard XML-based notation for speech-synthesis applications. Use SSML to control aspects of speech synthesis such as pronunciation, volume, pitch, speed, and other attributes.

    Voice Customization

    Use voice customization to refine the service's language-dependent rules for pronunciation. Define custom dictionaries for domain-specific terms, words with foreign origins, personal or geographic names, and abbreviations or acronyms in your application's lexicon. Define pronunciations based on other words, or create pronunciations based on phoneme symbols in the International Phonetic Alphabet (IPA) or IBM Symbolic Phonetic Representation (SPR).

    Custom Voices

    Work with IBM to train a new voice for your specific use case and target market. IBM can train a new voice with as little as one hour of training data. This feature is currently available only to Premium customers.