Getting started with Speech to Text
The IBM Watson® Speech to Text service transcribes audio to text to enable speech transcription capabilities for applications. This curl
-based tutorial can help you get started quickly with the service. The examples show you how to
call the service's POST /v1/recognize
method to request a transcript.
The tutorial uses the curl
command-line utility to demonstrate REST API calls. For more information about curl
, see Using curl with Watson examples.
IBM Cloud Watch the following video for a visual summary of getting started with the Speech to Text service.
Before you begin
IBM Cloud
IBM Cloud
-
Copy the credentials to authenticate to your service instance:
-
View the Manage page for the service instance:
- If you are on the Getting started page for your service instance, click the Manage entry in the list of topics.
- If you are on the Resource list page, expand the AI / Machine Learning grouping in the Name column, and click the name of your service instance.
-
On the Manage page, click Show Credentials in the Credentials box.
-
Copy the
API Key
andURL
values for the service instance.
-
This tutorial uses an API key to authenticate. In production, use an IAM token. For more information see Authenticating to IBM Cloud.
IBM Cloud Pak for Data
IBM Cloud Pak for Data
The Speech to Text for IBM Cloud Pak for Data must be installed and configured before beginning this tutorial. For more information, see Watson Speech services on Cloud Pak for Data.
- Create an instance of the service by using the web client, the API, or the command-line interface. For more information about creating a service instance, see Creating a Watson Speech services instance.
- Follow the instructions in Creating a Watson Speech services instance to obtain a Bearer token for the instance. This tutorial uses a Bearer token to authenticate to the service.
Transcribe audio with no options
Call the POST /v1/recognize
method to request a basic transcript of a FLAC audio file with no additional request parameters.
-
Download the sample audio file audio-file.flac.
-
Issue the following command to call the service's
/v1/recognize
method for basic transcription with no parameters. The example uses theContent-Type
header to indicate the type of the audio,audio/flac
. The example uses the default language model,en-US_BroadbandModel
, for transcription.IBM Cloud
- Modify
{path_to_file}
to specify the location of theaudio-file.flac
file.
curl -X POST -u "apikey:{apikey}" \ --header "Content-Type: audio/flac" \ --data-binary @{path_to_file}audio-file.flac \ "{url}/v1/recognize"
IBM Cloud Pak for Data
- Replace
{token}
and{url}
with the access token and URL for your service instance. - Modify
{path_to_file}
to specify the location of theaudio-file.flac
file.
curl -X POST \ --header "Authorization: Bearer {token}" \ --header "Content-Type: audio/flac" \ --data-binary @{path_to_file}audio-file.flac \ "{url}/v1/recognize"
- Modify
The service returns the following transcription results:
{
"result_index": 0,
"results": [
{
"alternatives": [
{
"confidence": 0.96
"transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
}
],
"final": true
}
]
}
Transcribe audio with options
Call the POST /v1/recognize
method to transcribe the same FLAC audio file, but specify two transcription parameters.
-
If necessary, download the sample audio file audio-file.flac.
-
Issue the following command to call the service's
/v1/recognize
method with two extra parameters. Set thetimestamps
parameter totrue
to indicate the beginning and end of each word in the audio stream. Set themax_alternatives
parameter to3
to receive the three most likely alternatives for the transcription. The example uses theContent-Type
header to indicate the type of the audio,audio/flac
, and the request uses the default model,en-US_BroadbandModel
.IBM Cloud
- Modify
{path_to_file}
to specify the location of theaudio-file.flac
file.
curl -X POST -u "apikey:{apikey}" \ --header "Content-Type: audio/flac" \ --data-binary @{path_to_file}audio-file.flac \ "{url}/v1/recognize?timestamps=true&max_alternatives=3"
IBM Cloud Pak for Data
- Replace
{token}
and{url}
with the access token and URL for your service instance. - Modify
{path_to_file}
to specify the location of theaudio-file.flac
file.
curl -X POST \ --header "Authorization: Bearer {token}" \ --header "Content-Type: audio/flac" \ --data-binary @{path_to_file}audio-file.flac \ "{url}/v1/recognize?timestamps=true&max_alternatives=3"
- Modify
The service returns the following results, which include timestamps and three alternative transcriptions:
{
"result_index": 0,
"results": [
{
"alternatives": [
{
"timestamps": [
["several":, 1.0, 1.51],
["tornadoes":, 1.51, 2.15],
["touch":, 2.15, 2.5],
. . .
]
},
{
"confidence": 0.96
"transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
},
{
"transcript": "several tornadoes touched down as a line of severe thunderstorms swept through Colorado on Sunday "
},
{
"transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado and Sunday "
}
],
"final": true
}
]
}
Next steps
- To try an example application that transcribes text from streaming audio input or from a file that you upload, see the Speech to Text demo.
- For more information about the service's interfaces and features, see Service features.
- For more information about all methods of the service's interfaces, see the API & SDK reference.