Interim results and low latency

With the WebSocket interface, the IBM Watson® Speech to Text service supports interim results, which are intermediate transcription hypotheses that arrive before final results. For the WebSocket and HTTP interfaces, most next-generation models also offer low latency to return results even more quickly than they already do, though transcription accuracy might be reduced. Interim results are available for all large speech models. Accuracy may be slightly lower, and low latency is not yet supported for these models.

Interim results

The interim results feature is available only with the WebSocket interface.

Interim results are intermediate transcription hypotheses that are likely to change before the service returns its final results. The service returns interim results as soon as it generates them. Interim results are useful for interactive applications and real-time transcription, and for long audio streams, which can take a while to transcribe.

Interim results evolve as the service's processing of an utterance progresses. They arrive more often and more quickly than final results. You can use them to enable your application to respond more quickly or to gauge the progress of transcription. When the processing of an utterance is complete, the service sends final results that represent its best transcription of the audio for that utterance.

Interim results enable feature parity between old and new models. It also reduces the latency by the users as the intermediate results start coming sooner than they used to.

Interim results are identified in a transcript with the field "final": false. The service can update interim results with more accurate transcriptions as it processes further audio. The service delivers one or more interim results for each final result.
Final results are identified with the field "final": true. The service makes no further updates to final results.

How you request interim results depend on the type of model that you are using:

For interim results, set end_of_phrase_silence_time parameter to a value other than None.
For a previous-generation model, set the interim_results parameter to true in the JSON start message. Interim results are available for all previous-generation models.
For a next-generation model, set the interim_results parameter to true in the JSON start message. You can also set low_latency to true to enable both interim results and low latency together for the models.
For a large speech model, set the interim_results to true in the JSON start message. low_latency is not currently supported by large speech models.

To disable interim results for any model, omit the interim_results parameter or set it to false. Disable interim results if you are doing offline or batch transcription.

Interim results example

The following abbreviated WebSocket example requests interim results. The service sends multiple response objects. It sets the final attribute to true only for the final results. The request implicitly uses the default previous-generation en-US_BroadbandModel.

var access_token = {access_token};
var wsURI = '{ws_url}/v1/recognize'
  + '?access_token=' + access_token;
var websocket = new WebSocket(wsURI);

websocket.onopen = function(evt) { onOpen(evt) };
function onOpen(evt) {
  var message = {
    action: 'start',
    content-type: 'audio/l16;rate=22050',
    interim_results: true
    end_of_phrase_silence_time:1.3
  };
  websocket.send(JSON.stringify(message));
  websocket.send(blob);
}

websocket.onmessage = function(evt) { onMessage(evt) };
function onMessage(evt) {
  console.log(evt.data);
}

The response includes a single utterance with no pauses.

{
  "result_index": 0,
  "results": [
    {
      "alternatives": [
        {
          "transcript": "several to "
        }
      ],
      "final": false
    }
  ]
}{
  "result_index": 0,
  "results": [
    {
      "alternatives": [
        {
          "transcript": "several tornadoes "
        }
      ],
      "final": false
    }
  ]
}{
  "result_index": 0,
  "results": [
    {
      "alternatives": [
        {
          "transcript": "several tornadoes swept through "
        }
      ],
      "final": false
    }
  ]
}{
  . . .
}{
  "result_index": 0,
  "results": [
    {
      "alternatives": [
        {
          "confidence": 0.96,
          "transcript": "several tornadoes swept through Colorado on Sunday "
        }
      ],
      "final": true
    }
  ]
}

Low latency

The low_latency parameter is available for most next-generation models. The parameter is not available with large speech models and previous-generation models.

The next-generation multimedia and telephony models have generally faster response times than the previous-generation models. But in some situations, you might want to receive results more quickly. With next-generation models that support low latency, you can set the low_latency parameter to true to receive results more quickly. For more information about the next-generation models that support low latency, see Supported next-generation language models.

With low latency, the service achieves faster results at the possible expense of transcription accuracy. When low latency is enabled, the service segments the audio into smaller chunks to optimize for speed over accuracy. This tradeoff might be acceptable if your application needs less response time than the highest possible accuracy. For example, low latency is ideal for use cases such as closed captioning, conversational applications, and live customer service in the voice channel of the IBM® watsonx™ Assistant service.

Omitting the low_latency parameter can produce more accurate results and is the recommended approach for most use cases. The low_latency parameter is false by default.

The nature of the response to a request that includes low latency depends on the interface that is used:

With the HTTP interfaces, the service waits until it receives the entire audio input and then sends a response. It then sends a single stream of bytes in response. The response can include multiple transcription elements with multiple final results, and it can be sent incrementally. But it is a single stream of data.
With the WebSocket interface, the service sends final results as they become available. It can send multiple independent responses in the form of different streams of bytes. The connection is bidirectional and full duplex, requests, and responses can continue to flow back and forth over a single connection while it remains active.

Low-latency restrictions

The low-latency feature has the following usage restrictions:

Low latency is available only for some next-generation models. For next-generation models that do not support low-latency, if you include the low_latency parameter with a request, the service fails with status code 400:
```
{
  "code": 400,
  "code_description": "Bad Request",
  "error": "low_latency is not a supported feature for model {model_id}"
}
```
Low latency is not available for previous-generation models. For previous-generation models, if you include the low_latency parameter with a request, the service generates a warning:
```
"warnings": [
  "Unknown arguments: low_latency."
]
```

Low-latency example

The following synchronous HTTP example requests low latency with the en-US_Telephony model. The example sets the low_latency query parameter to true.

IBM Cloud

curl -X POST -u "apikey:{apikey}" \
--header "Content-Type: audio/wav" \
--data-binary @{path}audio-file.wav \
"{url}/v1/recognize?model=en-US_Telephony&low_latency=true"

IBM Cloud Pak for Data IBM Software Hub

curl -X POST \
--header "Authorization: Bearer {token}" \
--header "Content-Type: audio/wav" \
--data-binary @{path}audio-file.wav \
"{url}/v1/recognize?model=en-US_Telephony&low_latency=true"

Interim results and low-latency examples

The following abbreviated WebSocket code shows how to request interim results, low-latency results, or both with the WebSocket interface. It specifies the following parameters:

The model query parameter of the /v1/recognize request passes the next-generation en-US_Telephony model, which supports low latency.
The inactivity_timeout parameter of the JSON start message sets the inactivity timeout to -1 (infinity), which prevents the request from timing out.
Both the interim_results and low_latency parameters are specified with the JSON start message of the request.

To show examples of results with all possible combinations of the parameters, the arguments for interim_results and low_latency are set to true or false in the examples in the following sections.

var access_token = {access_token};
var wsURI = '{ws_url}/v1/recognize'
  + '?access_token=' + access_token
  + '&model=en-US_Telephony';
var websocket = new WebSocket(wsURI);

websocket.onopen = function(evt) { onOpen(evt) };
function onOpen(evt) {
  var message = {
    action: 'start',
    content-type: 'audio/wav',
    inactivity_timeout: -1,
    interim_results: {true | false},
    low_latency: {true | false}
  };
  websocket.send(JSON.stringify(message));
  websocket.send(blob);
}

websocket.onmessage = function(evt) { onMessage(evt) };
function onMessage(evt) {
  console.log(evt.data);
}

The WAV file that is passed to the service includes a single sentence with two embedded multi-second pauses: "Thunderstorms could produce ...pause... large hail ...pause... and heavy rain." The pauses are long enough to represent separate utterances and thus generate multiple final results. Because each response includes multiple final results, one per utterance, you must assemble the final results into a single string to see the full transcript.

Example 1: Interim results and low latency are both false

This example sets both interim_results and low_latency to false. The service returns only final results as a single JSON object.

  var message = {
    action: 'start',
    content-type: 'audio/wav',
    inactivity_timeout: -1,
    interim_results: false,
    low_latency: false,
    end_of_phrase_silence_time: 1.4
  };

{
  "result_index": 0,
  "results": [
    {
      "final": true,
      "alternatives": [
        {
          "transcript": "thunderstorms could produce ",
          "confidence": 0.94
        }
      ]
    },
    {
      "final": true,
      "alternatives": [
        {
          "transcript": "large hail ",
          "confidence": 0.91
        }
      ]
    },
    {
      "final": true,
      "alternatives": [
        {
          "transcript": "and heavy rain ",
          "confidence": 0.86
        }
      ]
    }
  ]
}

Example 2: Interim results is false and low latency is true

This example sets interim_results to false and low_latency to true. The service returns only final results as a single JSON object.

  var message = {
    action: 'start',
    content-type: 'audio/wav',
    inactivity_timeout: -1,
    interim_results: false,
    low_latency: true,
    end_of_phrase_silence_time: 1.4
  };

{
  "result_index": 0,
  "results": [
    {
      "final": true,
      "alternatives": [
        {
          "transcript": "thunderstorms could produce ",
          "confidence": 0.94
        }
      ]
    },
    {
      "final": true,
      "alternatives": [
        {
          "transcript": "large hail ",
          "confidence": 0.91
        }
      ]
    },
    {
      "final": true,
      "alternatives": [
        {
          "transcript": "and heavy rain ",
          "confidence": 0.86
        }
      ]
    }
  ]
}

Example 3: Interim results is true and low latency is false

This example sets interim_results to true and low_latency to false. This returns the interim results using non low-latency mode.

  var message = {
    action: 'start',
    content-type: 'audio/wav',
    inactivity_timeout: -1,
    interim_results: true,
    low_latency: false,
    end_of_phrase_silence_time: 1.4
  };

{
  "result_index": 0,
  "results": [
    {
      "final": false,
      "alternatives": [
        {
          "transcript": "thunderstorms "
        }
      ]
    }
  ]
}{
  "result_index": 0,
  "results": [
    {
      "final": false,
      "alternatives": [
        {
          "transcript": "thunderstorms could produce "
        }
      ]
    }
  ]
}{
  "result_index": 0,
  "results": [
    {
      "final": true,
      "alternatives": [
        {
          "transcript": "thunderstorms could produce ",
          "confidence": 0.94
        }
      ]
    }
  ]
}{
  "result_index": 1,
  "results": [
    {
      "final": false,
      "alternatives": [
        {
          "transcript": "large "
        }
      ]
    }
  ]
}{
  "result_index": 1,
  "results": [
    {
      "final": true,
      "alternatives": [
        {
          "transcript": "large hail ",
          "confidence": 0.91
        }
      ]
    }
  ]
}{
  "result_index": 2,
  "results": [
    {
      "final": true,
      "alternatives": [
        {
          "transcript": "and heavy rain ",
          "confidence": 0.86
        }
      ]
    }
  ]
}

Example 4: Interim results and low latency are both true

This example sets both interim_results and low_latency to true. The service returns both interim and final results, and it returns each result as a separate JSON object.

  var message = {
    action: 'start',
    content-type: 'audio/wav',
    inactivity_timeout: -1,
    interim_results: true,
    low_latency: true,
    end_of_phrase_silence_time: 1.4
  };

{
  "result_index": 0,
  "results": [
    {
      "final": false,
      "alternatives": [
        {
          "transcript": "th "
        }
      ]
    }
  ]
}{
  "result_index": 0,
  "results": [
    {
      "final": false,
      "alternatives": [
        {
          "transcript": "thunderstorms "
        }
      ]
    }
  ]
}{
  "result_index": 0,
  "results": [
    {
      "final": false,
      "alternatives": [
        {
          "transcript": "thunderstorms could produc "
        }
      ]
    }
  ]
}{
  "result_index": 0,
  "results": [
    {
      "final": true,
      "alternatives": [
        {
          "transcript": "thunderstorms could produce ",
          "confidence": 0.94
        }
      ]
    }
  ]
}{
  "result_index": 1,
  "results": [
    {
      "final": false,
      "alternatives": [
        {
          "transcript": "large "
        }
      ]
    }
  ]
}{
  "result_index": 1,
  "results": [
    {
      "final": true,
      "alternatives": [
        {
          "transcript": "large hail ",
          "confidence": 0.91
        }
      ]
    }
  ]
}{
  "result_index": 2,
  "results": [
    {
      "final": false,
      "alternatives": [
        {
          "transcript": "and hea "
        }
      ]
    }
  ]
}{
  "result_index": 2,
  "results": [
    {
      "final": true,
      "alternatives": [
        {
          "transcript": "and heavy rain ",
          "confidence": 0.86
        }
      ]
    }
  ]
}