Use Google Speech-to-Text API

Speech-to-Text API responses

A synchronous Speech-to-Text API response may take some time to return results, proportional to the length of the supplied audio. Once processed, the API will return a response as shown below :\colon

{
    "results": [ 
        {
            "alternatives": [
             {
                "confidence": 0.982567895,
                "transcript": "how old is the Brooklyn Bridge"
              }
            ]
        }
    ]
}

These fields are explained below :\colon

  • results contains the list of results (of type SpeechRecognitionResult) where each result corresponds to a segment of audio (segments of audio are separated by pauses). Each result will consist of one or more of one or more of the following fields :\colon

If no speech from the supplied audio could be recognized, then the returned results list will contain no items. Unrecognized speech is commonly the result of very poor-quality audio, or from language code, encoding, or sample rate values that do not match the supplied audio. Each synchronous Speech-to-Text API response returns a list of results, rather than a single result containing all recognized audio. The list of recognized audio (within the transcript elements) will appear in contiguous order.

Selecting alternatives

Each result within a successful synchronous recognition response can contain one or more alternatives (if the maxAlternatives value for the request is greater than 1).

The following Python code iterates over a result list and concatenates the transcriptions together. Note that we take the first alternative (the zeroth) in all cases.

response = service_request.execute()
recognized_text = 'Transcribed Text: \n'
for i in range(len(response['result'])):
    recognized_text += response['result'][i]['alternatives'][0]['transcript']

results matching ""

    No results matching ""