/ Tutorials

Hiding Low Confidence Sections of a Transcript

Even if you customize the API with a Corpus, you still might get transcripts that have poor accuracy. Fortunately, things are not binary.

It's usually not that the entire transcript is bad, but that portions of it are. Using the API, you can easily navigate the low confidence and high confidence portions of the transcript.

Apple does this in their visual voicemail app (pictured below). You can see in the circled blue areas Apple marks low confidence portions of the transcript with a ___ instead of showing the transcript.

visual_voicemail

Step 1: Transcribe audio

Let's take a 7 minute TED Talk from Al Gore and get a transcript for it.

# make the API request to get a transcript
curl --request POST \
    --url 'https://api.assemblyai.com/v1/transcript' \
    --header 'authorization: your-secret-api-token' \
    --data '
    {
      "audio_src_url": "https://s3-us-west-2.amazonaws.com/blog.assemblyai.com/audio/AlGore_2009.wav",
      "corpus_id": 448
    }'
    
# API response JSON
{
	"transcript": {
		"status": "queued",
		"confidence": null,
		"text": null,
		"segments": null,
		"audio_src_url": "https://s3-us-west-2.amazonaws.com/blog.assemblyai.com/audio/AlGore_2009.wav",
		"corpus_id": 448,
		"id": 676938
	}
}

And then lets get the transcript using the id returned in the above API call's JSON response:

# make the API request to get a transcript
curl --request GET \
  --url https://api.assemblyai.com/v1/transcript/676938 \
  --header 'authorization: your-secret-api-token'
  
# API response JSON (truncated for readability)
{
    "transcript": {
        "status": "completed",
        "confidence": 0.850864197530864,
        "created": "2017-11-27T18:46:53.358142Z",
        "text": "the collide so that demonstrate that the arctic ice cap which for most of the last three million years has been the side of the lower forty eight states shrunk by forty percent...",
        "segments": [
            {
                "start": 0.0,
                "confidence": 0.64,
                "end": 14220.0,
                "transcript": "the"
            },
            {
                "start": 14220.0,
                "confidence": 0.92,
                "end": 17400.0,
                "transcript": "collide so that demonstrate that the arctic ice cap"
            },
            {
                "start": 17400.0,
                "confidence": 0.88,
                "end": 20430.0,
                "transcript": "which for most of the last three million years has been the side of"
            },
            {
                "start": 20430.0,
                "confidence": 0.87,
                "end": 23460.0,
                "transcript": "the lower forty eight states shrunk by forty percent"
            },
            ...
        ],
        "audio_src_url": "https://s3-us-west-2.amazonaws.com/blog.assemblyai.com/audio/AlGore_2009.wav",
        "corpus_id": 448,
        "id": 676938
    }
}
Step 2: Parsing low confidence segments

The segments key in the above JSON response shows the transcript for every few seconds of speech (timestamped in milliseconds). It's an array of objects like this one:

{
    "start": 20430.0,
    "confidence": 0.87,
    "end": 23460.0,
    "transcript": "the lower forty eight states shrunk by forty percent"
}

This "segment" shows the transcript for what was spoken between 20430.0 milliseconds and 23460.0 milliseconds. It also shows the confidence score for this specific segment.

For example, the first segment in the segments array is:

{
    "start": 0.0,
    "confidence": 0.64,
    "end": 14220.0,
    "transcript": "the"
}

This means that for the audio between 0.0 milliseconds and 14220.0 milliseconds the API was not confident. If you listen to the audio, this is when there is music playing.

Using this information, we can assume that this part of the transcript is not accurate, and hide it from our users. Or use some special CSS to mark this part of the transcript as low confidence in our application, so our users know not to trust it.

Step 3: Generating a high confidence transcript

Here is some psuedo code for how we might only use the high confidence segments to generate a transcript for our users:

transcript = ""

# iterate over the `segments` array in the JSON response
for segment in segments:

    # if the segment has a high confidence score, use it
    if segment.confidence > 0.85:
        transcript += segment.transcript
    
    # otherwise, skip this section
    else:
        transcript += " --- "

And that's it! Usually anything with a confidence score below 0.75 is low accuracy. Anything above 0.90 is very good. And anything between 0.75 and 0.90 is ok.