Using a custom acoustic model for speech recognition
Acoustic model customization is available only for previous-generation models. It is not available for next-generation and large speech models.
Once you create and train your custom acoustic model, you can use it in speech recognition requests by using the acoustic_customization_id
query parameter. By default, no custom acoustic model is used with a request. You can create
multiple custom acoustic models for the same or different domains. But you can specify only one custom acoustic model at a time for a speech recognition request. You must issue the request with credentials for the instance of the service that
owns the custom model.
A custom model can be used only with the base model for which it is created. If your custom model is based on a model other than the default, you must also specify that base model with the model
query parameter. For more information,
see Using the default model.
You can also specify a custom language model to be used with the request, which can increase transcription accuracy. For more information, see Using custom language and custom acoustic models for speech recognition.
Examples of using a custom acoustic model
The following examples show the use of a custom acoustic model with each speech recognition interface:
-
For the WebSocket interface, use the
/v1/recognize
method. The specified custom model is used for all requests that are sent over the connection.var access_token = {access_token}; var wsURI = '{ws_url}/v1/recognize' + '?access_token=' + access_token + '&model=en-US_NarrowbandModel' + '&acoustic_customization_id={customization_id}'; var websocket = new WebSocket(wsURI);
-
For the synchronous HTTP interface, use the
POST /v1/recognize
method. The specified custom model is used for that request.IBM Cloud
curl -X POST -u "apikey:{apikey}" \ --header "Content-Type: audio/flac" \ --data-binary @audio-file1.flac \ "{url}/v1/recognize?acoustic_customization_id={customization_id}"
IBM Cloud Pak for Data
curl -X POST \ --header "Authorization: Bearer {token}" \ --header "Content-Type: audio/flac" \ --data-binary @audio-file1.flac \ "{url}/v1/recognize?acoustic_customization_id={customization_id}"
-
For the asynchronous HTTP interface, use the
POST /v1/recognitions
method. The specified custom model is used for that request.IBM Cloud
curl -X POST -u "apikey:{apikey}" \ --header "Content-Type: audio/flac" \ --data-binary @audio-file.flac \ "{url}/v1/recognitions?acoustic_customization_id={customization_id}"
IBM Cloud Pak for Data
curl -X POST \ --header "Authorization: Bearer {token}" \ --header "Content-Type: audio/flac" \ --data-binary @audio-file.flac \ "{url}/v1/recognitions?acoustic_customization_id={customization_id}"
You can omit the language model from the request if the custom model is based on the default model, en-US_BroadbandModel
. Otherwise, you must use the model
parameter to specify the base model, as shown for the WebSocket
example. A custom model can be used only with the base model for which it is created.
Troubleshooting the use of custom acoustic models
If you apply a custom acoustic model to speech recognition but find that the quality of speech recognition does not improve, check for the following possible problems:
- Make sure that you are correctly passing the customization ID to the recognition request as shown in the previous examples.
- Make sure that the status of the custom model is
available
, meaning that it is fully trained and ready to use. For more information, see Listing custom acoustic models.