README.md
MM Chiou 05/22/2018
Welcome to the "Taiwanese Speech in the Wild (TSW)" Project
Yuan-Fu Liao, Taipei University of Technology, [email protected]
語料庫現況
這是整個TSW語料庫現況簡介的public project。
公告
The first wave of TSW corpora consists 5 subsets (beta version, except MATBN) and has been officially released on April 11, 2018!
corpus | abbreviation | source | hours | remark |
---|---|---|---|---|
mandarin Chinese Broadcast News corpus | MATBN | PTS | 198 | story and speaker boundaries |
NER Phonetic Annotation corpus Vol.1 | NER-PhA-Vol1 | NER | 6.5 | phone, syllable, speaker and code switching |
NER Manual Transcription corpus Vol.1 | NER-Trs-Vol1 | NER | 107.4 | manual, word sequences |
NER Automatic Transcription corpus Vol.1 | NER-Auto-Vol1 | NER | 309.6 | auto, word sequences with recognition error rate prediction (QE) and confidence measure (CM) |
PTS Manual Subtitling corpus Vol. 1 | PTS-MSub-Vol1 | PTS | 264 | manual subtitiling with time-code |
Total | 879 | exclude NER-PhA-Vol1 |
(Note: PTS refers to Taiwan Public Television Service; NER refers to National Education Radio)