Project Deep Speech

Project DeepSpeech is an open source speech-to-text engine based on Baidu's Deep Speech research paper [1].

Tensorflow is used to run this DeepSpeech project.

Pre-built binaries can be used for performing inference with a trained model.

Pre-bilt binaries can be installed with pip3.

A pre-trained English model is available for use.

Once everything is installed, the deepspeech binary can be used to do speech-to-text on 5 second audio files (currently only WAVE files with 16-bit, 16 kHz, mono are supported in the Python client).

pip3 install deepspeech
deepspeech models/output_graph.pb models/alphabet.txt my_audio_file.wav

Alternatively, quicker inference (the realtime factor on a GeForce GTX 1070 is about 0.44) can be performed using supported NVIDIA GPU on Linux. This is done by instead installing the GPU specific package

pip3 install deepspeech-gpu
deepspeech models/output_graph.pb models/alphabet.txt my_audio_file.wav

Prerequisites

Python 3.6,

Git Large File Storage [2]

Install Git Large File Storage, either manually or through a package like git-lfs.

Getting the Code

Clone the DeepSpeech repository

git clone https://github.com/mozilla/DeepSpeech

Getting the pre-trained model

If you want to use the pre-trained English model for performing speech-to-text, you can download it from DeepSpeech releases page [3].

Alternatively, you can run the following command to download and unzip the files in your current directory.

wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-models.tar.gz | tar xvfz -

Using the model

There are three ways to use DeepSpeech inference,

1 The python package

2 The command-line client

3 The Node.JS package

Using the Python package

Pre-built binaries can be used to perform inference with a trained model.

The pre-built binaries can be installed with pip3.

The deepspeech binary can be used do speech-to-text on an audio file.

For the python bindings, Python 3.5 (or later virtual environment) is highly recommended to be installed [4].

The system is assumed to properly setup to create new virtual environment.

Create a DeepSpeech virtual environment

In creating a virtual environment, you will create a directory containing a python3binary and everything needed to run deepspeech.

For the purpose of the documentation, we will rely on

$HOME/tmp/deepspeech-venv. You can create it using this command:

$ virtualenv -p python3 $HOME/tmp/deepspeech-venv/

Once this command pletes successfully, the environment will be ready to be activated.

Activating the environment

Each time you need to work with DeepSpeech, you have to activate, load this virtual environment. This is done with the command

$ source $HOME/tmp/deepspeech-venv/bin/activate

Installing DeepSpeech Python bindings

Once your environment has been setup and loaded, you can usepip3to manage packages locally. On a fresh setup of the virtualenv, you will have to install the DeepSpeech wheel. You can check if it is already installed by taking a look at the output ofpip3 list. To perform the installation, just issue:

$ pip3 install deepspeech

If it is already installed, you can also update it:

$ pip3 install --upgrade deepspeech

Alternatively, if you have a supported NVIDIA GPU on Linux (See the release notes to find which GPU's are supported.), you can install the GPU specific package as follows:

$ pip3 install deepspeech-gpu

or update it as follows:

$ pip3 install --upgrade deepspeech-gpu

In both cases, it should take care of installing all the required dependencies. Once it is done, you should be able to call the sample binary usingdeepspeechon your command-line.

Note: the following command assumes you downloaded the pre-trained model.

deepspeech models/output_graph.pbmm models/alphabet.txt models/lm.binary models/trie my_audio_file.wav

The last two arguments are optional, and represent a language model.

See client.py for an example of how to use the package programatically.

Using the command-line client

The util/taskcluster.py can be used to download the pre-built binaries

python3 util/taskcluster.py --target .

or if you are one macOS

python3 util/taskcluster.py --arch osx --target .

This will download native_client.tar.xz which includes the deepspeech binary and associated libararies, and extract it into the current folder. taskcluster.py will download binaries for Linux/x86 64 by default.

Uou can override that behavior with the --arch parameter.

See the help info with python util/taskcluster.py -h for more details.

The following command assumes you downloaded the pre-trained model

./deepspeech models/output_graph.pbmm models/alphabet.txt models/lm.binary models/trie audio_input.wav

See the help output with ./deepspeech -h and the native client README for more details.

[0]

https://github.com/mozilla/DeepSpeech

[1]

https://arxiv.org/abs/1412.5567

[2]

https://git-lfs.github.com/

[3]

https://github.com/mozilla/DeepSpeech/releases

[4]

http://docs.python-guide.org/en/latest/dev/virtualenvs/

Project DeepSpeech