Project Deep Speech
Project DeepSpeech is an open source speech-to-text engine based on Baidu's Deep Speech research paper [1].
Tensorflow is used to run this DeepSpeech project.
Pre-built binaries can be used for performing inference with a trained model.
Pre-bilt binaries can be installed with pip3
.
A pre-trained English model is available for use.
Once everything is installed, the deepspeech binary can be used to do speech-to-text on 5 second audio files (currently only WAVE files with 16-bit, 16 kHz, mono are supported in the Python client).
pip3 install deepspeech
deepspeech models/output_graph.pb models/alphabet.txt my_audio_file.wav
Alternatively, quicker inference (the realtime factor on a GeForce GTX 1070 is about 0.44) can be performed using supported NVIDIA GPU on Linux. This is done by instead installing the GPU specific package
pip3 install deepspeech-gpu
deepspeech models/output_graph.pb models/alphabet.txt my_audio_file.wav
Prerequisites
Python 3.6,
Git Large File Storage [2]
Install Git Large File Storage, either manually or through a package like git-lfs.
Getting the Code
Clone the DeepSpeech repository
git clone https://github.com/mozilla/DeepSpeech
Getting the pre-trained model
If you want to use the pre-trained English model for performing speech-to-text, you can download it from DeepSpeech releases page [3].
Alternatively, you can run the following command to download and unzip the files in your current directory.
wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-models.tar.gz | tar xvfz -
Using the model
There are three ways to use DeepSpeech inference,
1 The python package
2 The command-line client
3 The Node.JS package
Using the Python package
Pre-built binaries can be used to perform inference with a trained model.
The pre-built binaries can be installed with pip3
.
The deepspeech binary can be used do speech-to-text on an audio file.
For the python bindings, Python 3.5 (or later virtual environment) is highly recommended to be installed [4].
The system is assumed to properly setup to create new virtual environment.
Create a DeepSpeech virtual environment
In creating a virtual environment, you will create a directory containing a python3
binary and everything needed to run deepspeech.
For the purpose of the documentation, we will rely on
$HOME/tmp/deepspeech-venv
. You can create it using this command:
$ virtualenv -p python3 $HOME/tmp/deepspeech-venv/
Once this command pletes successfully, the environment will be ready to be activated.
Activating the environment
Each time you need to work with DeepSpeech, you have to activate, load this virtual environment. This is done with the command
$ source $HOME/tmp/deepspeech-venv/bin/activate
Installing DeepSpeech Python bindings
Once your environment has been setup and loaded, you can usepip3
to manage packages locally. On a fresh setup of the virtualenv, you will have to install the DeepSpeech wheel. You can check if it is already installed by taking a look at the output ofpip3 list
. To perform the installation, just issue:
$ pip3 install deepspeech
If it is already installed, you can also update it:
$ pip3 install --upgrade deepspeech
Alternatively, if you have a supported NVIDIA GPU on Linux (See the release notes to find which GPU's are supported.), you can install the GPU specific package as follows:
$ pip3 install deepspeech-gpu
or update it as follows:
$ pip3 install --upgrade deepspeech-gpu
In both cases, it should take care of installing all the required dependencies. Once it is done, you should be able to call the sample binary usingdeepspeech
on your command-line.
Note: the following command assumes you downloaded the pre-trained model.
deepspeech models/output_graph.pbmm models/alphabet.txt models/lm.binary models/trie my_audio_file.wav
The last two arguments are optional, and represent a language model.
See client.py for an example of how to use the package programatically.
Using the command-line client
The util/taskcluster.py
can be used to download the pre-built binaries
python3 util/taskcluster.py --target .
or if you are one macOS
python3 util/taskcluster.py --arch osx --target .
This will download native_client.tar.xz
which includes the deepspeech binary and associated libararies, and extract it into the current folder. taskcluster.p
y will download binaries for Linux/x86 64 by default.
Uou can override that behavior with the --arch parameter.
See the help info with python util/taskcluster.py -h
for more details.
The following command assumes you downloaded the pre-trained model
./deepspeech models/output_graph.pbmm models/alphabet.txt models/lm.binary models/trie audio_input.wav
See the help output with ./deepspeech -h
and the native client README for more details.
[0]
https://github.com/mozilla/DeepSpeech
[1]
https://arxiv.org/abs/1412.5567
[2]
[3]
https://github.com/mozilla/DeepSpeech/releases
[4]