Large speech languages and models
The IBM Watson® Speech to Text service supports a growing collection of Large Speech Models (LSMs) that improve the speech recognition capabilities of the service's previous-generation models. The model name is the locale, which consists of the
language code and the region or country code that is separated by a dash. For example, en-US
is for English that is spoken in the United states. LSMs are large models. They have a large number of trainable parameters and are trained
on large amounts of audio. Because of their large size, the large amounts of training material they are built on, and the state-of-the-art architecture and training recipe that is used to build them, these models deliver more transcription accuracy
compared to the previous models available.
You can use these models for both Telephony use cases and Broadband use cases.
Supported large speech model languages
The following table lists the large speech models that are available for each language.
Language | Model name | Status |
---|---|---|
English (Australian) | en-AU |
IBM Cloud 20 May 2024 IBM Cloud Pak for Data 12 June 2024 |
English (Indian) | en-IN |
IBM Cloud 20 May 2024 IBM Cloud Pak for Data 12 June 2024 |
English (United Kingdom) | en-GB |
IBM Cloud 20 May 2024 IBM Cloud Pak for Data 12 June 2024 |
English (United States) | en-US |
IBM Cloud 20 May 2024 IBM Cloud Pak for Data 12 June 2024 |
French (Canadian) | fr-CA |
IBM Cloud 20 May 2024 IBM Cloud Pak for Data 12 June 2024 |
French (France) | fr-FR |
IBM Cloud 20 May 2024 IBM Cloud Pak for Data 12 June 2024 |
German | de-DE |
IBM Cloud 19 Novemebr 2024 |
Japanese | ja-JP |
IBM Cloud 20 May 2024 IBM Cloud Pak for Data 12 June 2024 |
Portuguese (Brazilian) | pt-BR |
IBM Cloud 18 June 2024 IBM Cloud Pak for Data 23 August 2024 |
Portuguese (Portugal) | pt-PT |
IBM Cloud 23 August 2024 IBM Cloud Pak for Data 23 August 2024 |
Spanish (Castilian) | es-ES |
IBM Cloud 18 June 2024 IBM Cloud Pak for Data 23 August 2024 |
Spanish (Argentinian) | es-AR |
IBM Cloud 18 June 2024 IBM Cloud Pak for Data 23 August 2024 |
Spanish (Chilean) | es-CL |
IBM Cloud 18 June 2024 IBM Cloud Pak for Data 23 August 2024 |
Spanish (Colombian) | es-CO |
IBM Cloud 18 June 2024 IBM Cloud Pak for Data 23 August 2024 |
Spanish (Mexican) | es-MX |
IBM Cloud 18 June 2024 IBM Cloud Pak for Data 23 August 2024 |
Spanish (Peruvian) | es-PE |
IBM Cloud 18 June 2024 IBM Cloud Pak for Data 23 August 2024 |
Supported features for large speech models
The large speech models are supported for use with a large subset of the service's speech recognition features. In cases where a supported feature is restricted to certain languages, the same language restrictions usually apply to large speech models, previous-generation, and next-generation models.
- For more information about the parameters that you can use with large speech models, including their language support and whether the parameters are GA or beta, see the Parameter summary.
- For more information about large speech models' support for customization, see Customization support for large speech models.
Large speech models support all speech recognition parameters and headers except:
acoustic_customization_id
(Large speech models do not support acoustic model customization.)keywords
andkeywords_threshold
word_alternatives_threshold
grammar_name
(Large speech models do not support grammar customization.)low_latency
(Large speech models natively support low latency out of the box.)character_insertion_bias
Large speech models also differ from previous-generation models with respect to the following additional feature:
- Large speech models do not produce hesitation markers. They instead include the actual hesitations in transcription results. For more information, see Speech hesitations and hesitation markers.