Parse Recognition Response and Save Result to excel file

092     _, filename = parse_filepath(speech_file)
093     n = re.match(r'C(.+)\.wav', filename)
094     audio_index = n.group(1)
095     results = []
096     for i, result in enumerate(response.results):
097         alternative = result.alternatives[0]
098         print(u'Transcript {}_{}: {}'.format(audio_index, i+1, alternative.transcript))
099         results.append(alternative.transcript)
100 
101         word_index = []
102         start_time_column = []
103         end_time_column   = []
104         confidences_list  = []
105         for word_info in alternative.words:
106             word = word_info.word
107             start_time = word_info.start_time
108             end_time = word_info.end_time
109             word_index.append(word)
110             start_time_column.append(start_time.seconds + start_time.nanos * 1e-9)
111             end_time_column.append(end_time.seconds + end_time.nanos * 1e-9)
112             if hasattr(word_info, 'confidence'):
113                 confidences_list.append(word_info.confidence)
114             else:
115                 confidences_list.append(float('nan'))
116                 
117         df = pd.DataFrame({'start_time':start_time_column, 'end_time':end_time_column, 'confidence':confidences_list}, index=word_index)
118         #print(df)
119         df.index.name = 'word'
120         df.to_excel(timestamp_writer,'{}_{}'.format(filename, i+1))

行92~94是從原始檔案路徑取出音頻編號的資訊(e.g. $\colon$ /Users/petertsai/Google 雲端硬碟/gitbook_figure_python_code/choice(entire)/C0000001.wav 取出 0000001)。

行95 初始一個空列表(list)的變數results，用來儲存轉譯的句子。

行96~120是一個for loop，loop的對象是google speech-to-text api的response.results。

response.results是轉譯的結果，因為google speech-to-text api會自動把超過1秒間隔的部分分開，當成新的句子，所以response.results可能會超過一個。

由於kaggle比賽的語音資料，每個選項間隔因人而異，沒有都超過1秒，所以沒辦法直接用google speech-to-text api做sentence segmentation。

行97取出每個result裡的alternatives[0]，因為每個句子google可能會轉譯出不同的結果，alternatives[0]代表google覺得最有可能的結果，我們只使用這個結果，捨棄其他的選項。

行98把alternative.transcript的結果顯示在螢幕上，並標明是哪個音頻的第幾個段落。

行99把alternative.transcipt加到results列表裡面。

行101-115，程式把alternative.words的資訊word_info，裡面包含 $\colon$

word 單字結果
start_time 單字開始的時間
end_time 單字結束的時間
cofidence 單字結果的信心指數(0~1)

這些資訊會分別存到word_index,start_time_colum, end_time_column, confidence_list列表裡，目的符合pandas.dataframe的格式。

因為在beta版裡面才有word confidence value，所以行112-115當confidence property不存在的話，就把nan填入confidence_list。

圖1 transcribe_timestampbeta_01.xlsx檔案格式示意圖，裡面有25頁面，每個頁面的名稱為<音頻檔名>+<段落編號>，每頁裡面第一行是轉譯的單字(word)，第二行是每個單字開始的時間(start_time)，第三行是每個單字結束的時間(end_time)，第四行是每個單字的信心指數，越接近1表示辨識越準確。

行117建立一個pandas.dataframe把word_index,start_time_colum, end_time_column, confidence_list當成df dataframe的一欄，如圖1所示。

行119把word_index上標標註'word'，方便讀者閱讀。

行120把df透過to_excel函數存成xlsx檔。

Parse Recognition Response and Save Result to excel file

Parse Recognition Response and Save Result to excel file

results matching ""

No results matching ""