Transcribing Long Audio File
https://cloud.google.com/speech-to-text/docs/async-recognize
Long audio files refers to audio signal longer than 1 minute.
Asynchronous speech recognition starts a long running audio processing operation.
Asynchronous speech recognition can be used to recognize audio that is longer than a minute.
The audio file can be stored in Google Cloud Storage.
For shorter audio, including audio stored locally (inline), Synchronous speech recognition is faster and simpler.
The results of operation can be retrieved via the google.longrunning.Operations interfact.
Audio content can be sent directly to Cloud Speech-to-Text or it can process audio content that already resides in Google Cloud Storage.
The following python code requires that you have created and activated a service account. ("Quickstart" provide information about setting up gcloud
, and create, activate a service account.)
def transcribe_gcs(gcs_uri):
"""Asynchronously transcribes the audio file specified by the gcs_uri."""
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri=gcs_uri)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz=16000,
language_code='en-US')
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result(timeout=90)
# Each result is for a consecutive portion of the audio. Iterate through
# them to get the transcripts for the entire audio file.
for result in response.results:
# The first alternative is the most likely one for this portion.
print(u'Transcript: {}'.format(result.alternatives[0].transcript))
print('Confidence: {}'.format(result.alternatives[0].confidence))
The audio limits for asynchronous speech recognition request can be referred to [1].
The maximal quotas can be edited by using Google Cloud Platform Dashboard.
Content Limits
Audio data can be provided either directly within the content
field of the request or referenced within a Google Cloud Storage URl in the uri
field of the request.
The API contains the following limits on the size of this content
Content Limit | Audio length |
---|---|
synchronous request | 1 minute |
asynchronous request | 180 minutes |
streaming request | 1 minute |
Audio longer than 1 minute must use the uri
field to reference an audio file in Google Cloud Storage.
The current API usage limits for Cloud Speech-to-Text are as follows
type of limit | usage limit |
---|---|
request per 100 seconds | 500 |
processing per 100 seconds | 10,800 seconds of audio |
processing per day | 480 of audio |
Each StreamingRecognize session is considered a single request even though it includes multiple frame of StreamingREcognizerReqeust audio within the stream.
Requests and/or attempts at audio processing in excess of these limits will produce an error.
These limits apply to each Cloud Speech-to-Text developer project, and are shared across all applications and IP addresses using a given a developer project.
[0]
https://cloud.google.com/speech-to-text/docs/async-recognize
[1]