Use Google Speech-to-Text API
Speech-to-Text API responses
A synchronous Speech-to-Text API response may take some time to return results, proportional to the length of the supplied audio. Once processed, the API will return a response as shown below
{
"results": [
{
"alternatives": [
{
"confidence": 0.982567895,
"transcript": "how old is the Brooklyn Bridge"
}
]
}
]
}
These fields are explained below
results
contains the list of results (of typeSpeechRecognitionResult
) where each result corresponds to a segment of audio (segments of audio are separated by pauses). Each result will consist of one or more of one or more of the following fieldstranscript
contains the transcribed text. See Handling Transcriptions below.confidence
contains a value between 0 and 1 indicating how confident Speech-to-Text is of the given trascription. See Interpreting Confidence Values below.
If no speech from the supplied audio could be recognized, then the returned results list will contain no items. Unrecognized speech is commonly the result of very poor-quality audio, or from language code, encoding, or sample rate values that do not match the supplied audio. Each synchronous Speech-to-Text API response returns a list of results, rather than a single result containing all recognized audio. The list of recognized audio (within the transcript elements) will appear in contiguous order.
Selecting alternatives
Each result within a successful synchronous recognition response can contain one or more alternatives
(if the maxAlternatives
value for the request is greater than 1).
The following Python code iterates over a result list and concatenates the transcriptions together. Note that we take the first alternative (the zeroth) in all cases.
response = service_request.execute()
recognized_text = 'Transcribed Text: \n'
for i in range(len(response['result'])):
recognized_text += response['result'][i]['alternatives'][0]['transcript']