Speech transcript enrichment
Use the speech transcript enrichment feature to improve the readability and usability of raw Automatic Speech Recognition (ASR) transcripts. This post-processing service automatically adds punctuation and applies intelligent capitalization to enhance the structure and clarity of spoken content.
The service inserts punctuation marks such as periods, commas, question marks, and exclamation points. It also capitalizes sentence beginnings, proper nouns, acronyms, and brand names based on context. Confidence scoring help ensure accurate and reliable enrichment.
By using speech transcript enrichment, you can produce clean, professional transcripts that are ready for review, publication, or integration into downstream applications.
Integrate Speech transcript enrichment through API
To enable enrichment in API calls:
- Add
enrichments=punctuationparameter to the recognition request. For more details, see Update the recognition request. - Process the enhanced response from
enriched_resultsobject. For more details, see Process enriched response. - Access the confidence metrics for quality assessment.
Update the recognition request
To enable enrichment, add the enrichments=punctuation parameter to your recognition request.
Example request:
curl -X POST -u "apikey:<API-KEY>" \
-H "Content-Type: audio/wav" \
--data-binary @data/<your_wav_audio_file>.wav \
"https://api.jp-tok.speech-to-text.watson.cloud.ibm.com/instances/<INSTANCE-ID>/v1/recognize?model=fr-FR_Telephony&enrichments=punctuation"
Process the enriched response
When enrichment is enabled, the response includes an enriched_results object. This object contains the post-processed transcript with punctuation and capitalization that is applied, along with metadata for quality and traceability.
Example response:
{
"speaker_labels": [
{
"from": 52.52,
"to": 52.7,
"speaker": 0,
"confidence": 1.0,
"final": false
}
],
"enriched_results": {
"transcript": {
"text": "Oh là, et bon, il faudrait me payer ma complation, là. Et vous avez ajouté à le fournir, me payer ma complatigne.",
"timestamp": {
"from": 0.22,
"to": 59.68
}
},
"status": "success"
}
}
For more information about speaker labels and timestamps, see Speaker Labels and Word timestamps.
Supported languages
Speech transcript enrichment currently supports the following languages:
- English (US, UK, Australia, India)
- French (France, Canada)
- German
- Italian
- Portuguese (Brazil, Portugal)
- Spanish (Spain, Latin America, Argentina, Chile, Colombia, Mexico, Peru)
- Japanese