Large speech languages and models

The IBM Watson® Speech to Text service supports a growing collection of Large Speech Models (LSMs) that improve the speech recognition capabilities of the service's previous-generation models. The model name is the locale, which consists of the language code and the region or country code that is separated by a dash. For example, en-US is for English that is spoken in the United states. LSMs are large models. They have a large number of trainable parameters and are trained on large amounts of audio. Because of their large size, the large amounts of training material they are built on, and the state-of-the-art architecture and training recipe that is used to build them, these models deliver more transcription accuracy compared to the previous models available.

You can use these models for both Telephony use cases and Broadband use cases.

Supported large speech model languages

The following table lists the large speech models that are available for each language.

Large speech models
Language	Model name	Status
English (Australian)	`en-AU`	IBM Cloud 20 May 2024 IBM Cloud Pak for Data 12 June 2024
English (Indian)	`en-IN`	IBM Cloud 20 May 2024 IBM Cloud Pak for Data 12 June 2024
English (United Kingdom)	`en-GB`	IBM Cloud 20 May 2024 IBM Cloud Pak for Data 12 June 2024
English (United States)	`en-US`	IBM Cloud 20 May 2024 IBM Cloud Pak for Data 12 June 2024
French (Canadian)	`fr-CA`	IBM Cloud 20 May 2024 IBM Cloud Pak for Data 12 June 2024
French (France)	`fr-FR`	IBM Cloud 20 May 2024 IBM Cloud Pak for Data 12 June 2024
German	`de-DE`	IBM Cloud 19 November 2024
Japanese	`ja-JP`	IBM Cloud 20 May 2024 IBM Cloud Pak for Data 12 June 2024
Portuguese (Brazilian)	`pt-BR`	IBM Cloud 18 June 2024 IBM Cloud Pak for Data 23 August 2024
Portuguese (Portugal)	`pt-PT`	IBM Cloud 23 August 2024 IBM Cloud Pak for Data 23 August 2024
Spanish (Castilian)	`es-ES`	IBM Cloud 18 June 2024 IBM Cloud Pak for Data 23 August 2024
Spanish (Argentinian)	`es-AR`	IBM Cloud 18 June 2024 IBM Cloud Pak for Data 23 August 2024
Spanish (Chilean)	`es-CL`	IBM Cloud 18 June 2024 IBM Cloud Pak for Data 23 August 2024
Spanish (Colombian)	`es-CO`	IBM Cloud 18 June 2024 IBM Cloud Pak for Data 23 August 2024
Spanish (Mexican)	`es-MX`	IBM Cloud 18 June 2024 IBM Cloud Pak for Data 23 August 2024
Spanish (Peruvian)	`es-PE`	IBM Cloud 18 June 2024 IBM Cloud Pak for Data 23 August 2024

Supported features for large speech models

The large speech models are supported for use with a large subset of the service's speech recognition features. In cases where a supported feature is restricted to certain languages, the same language restrictions usually apply to large speech models, previous-generation, and next-generation models.

For more information about the parameters that you can use with large speech models, including their language support and whether the parameters are GA or beta, see the Parameter summary.
For more information about large speech models' support for customization, see Customization support for large speech models.

Large speech models support all speech recognition parameters and headers except:

acoustic_customization_id (Large speech models do not support acoustic model customization.)
keywords and keywords_threshold
word_alternatives_threshold
grammar_name (Large speech models do not support grammar customization.)
low_latency (Large speech models natively support low latency out of the box.)
character_insertion_bias

Large speech models also differ from previous-generation models with respect to the following additional feature:

Large speech models do not produce hesitation markers. They instead include the actual hesitations in transcription results. For more information, see Speech hesitations and hesitation markers.