Phonetic Decision Trees
The goal of building the phonetic decision tree code is to make it efficient for arbitrary context sizes and general to support a wide range or approaches.
For example, enumerating contexts is avoided.
The conventional approach is that, in each HMM-state of each monophone, to have a decision tree that asks questions about, say, the left and right phones.
In the Kaldi framework, the decision-tree roots can be shared among the phones and among the states of the phones, and questions can be asked about any phone in the context window, and about the HMM state.
Phonetic questions can be supplied based on lingusitic knowledge, but in the Kaldi recipes the questions are generated automatically based on a tree-clustering of the phones.
Questions about things like phonetic stress (if marked in the dictionary) and word start/end information are supported via an extended phone set; in this case, the decision-tree roots are shared among the different versions of the same phone.
Language Modeling
Since Kaldi uses an FST-based framework it is possible, in principle, to use any language model that can be represented as an FST.
Tools for converting LMs in the standard ARPA format to FSTs are provided.
In Kaldi recipes, IRSTLM toolkit are used for purposes like LM pruning.
For building LMs from raw text, users may use the IRSTLM toolkit, for which Kaldi provide installiation help or a a fully-featured toolkit such as SRILM.
http://www.speech.sri.com/projects/srilm
[0]
The Kaldi Speech Recognition Toolkit
Daniel Povey, Arnab Ghoshal,
Gilles Boulianne, Luka ́sˇ Burget, Ondˇrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motl ́ıcˇek, Yanmin Qian, Petr Schwarz, Jan Silovsky , Georg Stemmer10, Karel Vesely ́