Overview of the ToolKit
Fig.1 Components of Kaldi.
Fig.1 shows a schmeatic overview of the Kaldi toolkit.
The toolkit depends on two external libraries that are freely available: OpenFst and numerical algebra libraries.
OpenFst is for the finite-state framework.
Numerical algebra libraries include the standard Basic Linear Algebra subroutines (BLAS) and Linear Algebra PACKage (LAPACK) routines are used.
The library modules can be grouped into those that depend on linear algebra libraries and those that depend on OpenFst. The decodable class (DecodableInterface) bridges these two halves.
Modules that are lower down in the schematic depend on one or more modules that are higher up.
Access to the library functionalities is provided through command-line tools written in C++, which are then called from a scripting language for building and running a speech recognizer.
Each tool has specific functionality with a small set of command line arguments: for example, there are separate executables for accumulating statistics, summing accumulators, and updating a GMM-based acoustic model using maximum likelihood estimation.
Moreover, all the tools can read from and write to pipes which makes it easy to chain together different tools.
To avoid code rot, the toolkit is structured in such a way that implementing a new feature will generally involve adding new code and command-linetools rather than modifying existing ones.
Conclusions
The design of Kaldi is described.
Kaldi toolkit is a free and open-source speech recognition toolkit.
The toolkit currently supports modeling of context-dependent phones of arbitrary context lengths, and all commonly used techniques that can be estimated using maximum likelihood.
It also supports the recently proposed SGMMs.
Development of Kaldi is continuing, and large language model in the FST framework is considered.
Lattice generation and discriminative training is under development.
[0]
The Kaldi Speech Recognition Toolkit
Daniel Povey, Arnab Ghoshal,
Gilles Boulianne, Luka ́sˇ Burget, Ondˇrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motl ́ıcˇek, Yanmin Qian, Petr Schwarz, Jan Silovsky , Georg Stemmer10, Karel Vesely ́