中文語音Sentence Boundary Detection

Fig.0 schematic of inference of AI NLP great challenge.

Fig0呈現AI NLP Great Challenge的inference流程圖。
acoustic signal(聲音訊號)將由speech recognition轉成文字檔。
acoustic signal包含paragraph (PasP_{\textrm{as}}), query (QasQ_{\textrm{as}}), choices (CasC_{\textrm{as}}),他們都將轉換成文字。
PasPrawP_{\textrm{as}} \to P_{\textrm{raw}}
QasQrawQ_{\textrm{as}} \to Q_{\textrm{raw}}
CasCrawC_{\textrm{as}} \to C_{\textrm{raw}}

Choice中文語音Sentence Boundary Detection


Fig. 1:\colon Flowchart of choice 中文語音Sentence Boundary Detection

圖1呈現將選項聲音訊號 (CasC_{\textrm{as}}) 轉成四個文字檔 (Craw,1,Craw,2,Craw,3,Craw,4C_{\textrm{raw},1},C_{\textrm{raw},2},C_{\textrm{raw},3},C_{\textrm{raw},4})的示意圖。
更細的來說,除了要將做speech recognition,同時還需做的sentence boundary detection。


Fig. 2 Flowchart of three different domain of Sentence Boundary Detection

圖2呈現三種Sentence Boundary Detection的做法。Sentence Boundary Detection分成三個domain考慮:text domain, signal domain, acoustic domain,4個音頻依序透過Speech Recognition轉譯成文字辨識結果Craw,1C_{raw,1}, Craw,2C_{raw,2}, Craw,3C_{raw,3}, Craw,4C_{raw,4}

圖2(a)是Text-domain的做法,Choice Audio(CasC_{as}) 先透過Speech Recognition產生辨識結果CrawC_{raw},在文字空間上做句子段落偵測(Sentence Boundary Detection),輸出4個選項Craw,1C_{raw,1}, Craw,2C_{raw,2}, Craw,3C_{raw,3}, Craw,4C_{raw,4}的辨識結果。

圖2(b)是Signal-domain的做法,Choice Audio (CasC_{as})先透過Sentence Boundary Detection分成4個選項音頻Cas,1C_{as,1}, Cas,2C_{as,2}, Cas,3C_{as,3}, Cas,4C_{as,4},4個音頻依序透過Speech Recognition轉譯成文字辨識結果Craw,1C_{raw,1}, Craw,2C_{raw,2}, Craw,3C_{raw,3}, Craw,4C_{raw,4}

圖2(c)是Acoustic-Domain的做法,Choice Audio (CasC_{as})先透過音節辨識(Syllables Recognition)轉成音節辨識結果(CsbC_{sb})。CasC_{as}CsbC_{sb}輸入Sentence Boundary Detection,輸出4個選項音頻Cas,1C_{as,1}, Cas,2C_{as,2}, Cas,3C_{as,3}, Cas,4C_{as,4}

Fig. 3 :\colon Schematic of Choice 中文語音Sentence Segmentation.

圖3呈現一個例子:\colon
輸入choice acoustic signal CasC_{as}:C0000001.wav
四個Choices為:Craw,1C_{raw,1}: 一 1.5公里、 Craw,2C_{raw,2}: 二 1.4公里、 Craw,3C_{raw,3}: 三 1.3公里 、Craw,4C_{raw,4}: 四 1.6公里

results matching ""

    No results matching ""