Mozilla Deep Speech
yingting 2018/05/01
Mozilla DeepSpeech is an open source Speech-To-Text engine, based on Baidu's Deep Speech research paper. Tensorflow is used to implementation DeepSpeech
Pre-built binaries execution file can be used for performing inference with a trained model, the binaries file can be installed withpip3
. Using virtual environment is recommended for execution.
A pre-trained English model is available for use, and can be downloaded usingthe instructions below.
DeepSpeech binary can do speech-to-text on short, approximately 5 second audio file (currently only WAVE files with 16-bit, 16kHz, mono are supported in the Python client):
pip3 install deepspeech
deepspeech models/output_graph.pb models/alphabet.txt my_audio_file.wav
Quicker inference (The realtime factor factor on a GeForce GTX 1070 is about 0.44) can be performed using a supported NVIDIA GPU on Linux (GPU version are supported). Install the GPU specific package:
pip3 install deepspeech-gpu
deepspeech models/output_graph.pb models/alphabet.txt my_audio_file.wav
For more information on the use ofdeepspeech
, see the output ofdeepspeech -h
. Checkrequired runtime dependencies.
Prerequisites
Getting the code
Install Git Large File Storage, either manually or through a package like git-lfs if abailable on the system. Then clone the DeepSpeech repository normally:
git
clone
https://github.com/mozilla/DeepSpeech
Getting the pre-trained model
A pre-trained English model for performing speech-to-text can be downloaded from theDeepSpeech releases page. Or run the following command to download and unzip the files in the current directory:
wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-models.tar.gz | gar xvfz -
Using the model
Using Python package
Pre-built binaries can be used for performing inference with a trained model installed withpip3
. Usedeepspeech
binary to do speech-to-text on an audio file.
For the Python bindings, install Python 3.5 or later virtual environment. Find more information inthis documentation.
Installing DeepSpeech Python bindings
Usepip3
to manage packages locally. Install deepspeech with command below:
pip3 install deepspeech
If deepspeech is already installed, update it:
pip3 install --upgrade deepspeech
If the environment have a supported NVIDIA GPU on Linux, type the commandpip3 install deepspeech-gpu
instead ofpipe install deepspeech
.
pip3 install deepspeech-gpu
and update the GPU version deepspeech as the command below:
pip3 install --upgrade deepspeech-gpu
Now can call the sample binary usingdeepspeech
on the code cell.
Note:download the pre-trained modelbefore executing the command below:
deepspeech models/output_graph.pbmm models/alphabet.txt models/lm.binary models/trie my_audio_file.wav
The last two arguments "models/trie", "my_audio_file.wav" are optional, and represent a language model.
Seeclient.pyfor an example of how to use the package programatically.
Installing bindings from source
If pre-built binaries aren't available for the system, install them from scratch. Followthese instructions.
Third party bindings
In addition to the binding above, third party developers have started to provide bindings to other languages:
- Asticode provides Golang bindings in go-astideepspeech repo.
- RustAudio provide a Rust binding, the installation and use of which is described in deepspeech-rs repo.
- stes provides preliminary PKGBUILDs to install the client and python bindings on Arch Linux in the arch-deepspeech.
- gst-deepspeech provides a GStreamer plugin which can be used from any language with GStreamer bindings.
Training
Installing prerequisites for training
Install the required dependencies using pip:
cd DeepSpeech
pip3 install -r requirements.txt
Also need to downloadnative_client.tar.xz
or build the native client files to get the custom Tensorflow OP needed for decoding the outputs of the neural network. Download the files for the architecture usingutil/taskcluster.py
.
python3 util/taskcluster.py --target .
This command above will download the native client files for the x86_64 architecture without CUDA support, and extract them into the current folder. Binaries with CUDA enabled ("--arch gpu") and for ARM7 ("--arch arm"). (以下指令不確定對,for CUDA support)
python util/taskcluster.py --arch gpu --target .
Common Voice training data
The Common Voice corpus consists of voice samples that were donated throughCommon Voice. After downloading the Common Voice corpus (70GB), run the import script "bin/import_cv.py" on the directory where the corpus is located. The importer proceed to unpackaging and importing. To start the import process, type:
bin/import_cv.py PATH_TO_TARGET_DIRECTORY
The process above creates a huge number of small files, using an SSD drive is recommended.
Note: If the import script gets interrupted, it will try to continue from where it stopped the last time. Unfortunately, there are some cases where it will need to start over. Once the import is done, the directory will contain a bunch of CSV files.
The following files are official user-validated sets for training, validating and testing:
- cv-valid-train.csv
- cv-valid-dev.csv
- cv-valid-test.csv
The following files are the non-validated unofficial sets for training, validating and testing:
- cv-other-train.csv
- cv-other-dev.csv
- cv-other-test.csv
"cv-invalid.csv" contains all samples that user flagged as invalid.
A sub-directory calledcv_corpus_{version}
contains the mp3 and wav files that were extracted from an archive namedcv corpus {version}.tar.gz
. All entries in the CSV files refer to their samples by absolute paths. So moving this sub-directory would require another import or tweaking the CSV files accordingly.
To use Common Voice data during training, validation and testing, pass (comma separated combinations of) their filenames into--train_files
,--dev_files
,--test_files
parameters ofDeepSpeech.py
For example, Common Voice was imported into../data/CV
,DeepSpeech.py
could be called like this:
./DeepSpeech.py --train_files ../data/CV/cv-valid-train.csv, ../data/CV/cv-other-train.csv --dev_files ../data/CV/cv-valid-dev.csv --test_files ../data/CV/cv-valid-test.csv
Training a model
The central (Python) script isDeepSpeech.py
in the project's root directory. For listing of command line options, type:+
./DeepSpeech.py --help
Reference