README.md

MM Chiou 05/22/2018


Welcome to the "Taiwanese Speech in the Wild (TSW)" Project

Yuan-Fu Liao, Taipei University of Technology, [email protected]

語料庫現況

這是整個TSW語料庫現況簡介的public project。

公告

The first wave of TSW corpora consists 5 subsets (beta version, except MATBN) and has been officially released on April 11, 2018!

corpus abbreviation source hours remark
mandarin Chinese Broadcast News corpus MATBN PTS 198 story and speaker boundaries
NER Phonetic Annotation corpus Vol.1 NER-PhA-Vol1 NER 6.5 phone, syllable, speaker and code switching
NER Manual Transcription corpus Vol.1 NER-Trs-Vol1 NER 107.4 manual, word sequences
NER Automatic Transcription corpus Vol.1 NER-Auto-Vol1 NER 309.6 auto, word sequences with recognition error rate prediction (QE) and confidence measure (CM)
PTS Manual Subtitling corpus Vol. 1 PTS-MSub-Vol1 PTS 264 manual subtitiling with time-code
Total 879 exclude NER-PhA-Vol1

(Note: PTS refers to Taiwan Public Television Service; NER refers to National Education Radio)

results matching ""

    No results matching ""