Mozilla Deep Speech

yingting 2018/05/01

Mozilla DeepSpeech is an open source Speech-To-Text engine, based on Baidu's Deep Speech research paper. Tensorflow is used to implementation DeepSpeech

Pre-built binaries execution file can be used for performing inference with a trained model, the binaries file can be installed withpip3. Using virtual environment is recommended for execution.

A pre-trained English model is available for use, and can be downloaded usingthe instructions below.

DeepSpeech binary can do speech-to-text on short, approximately 5 second audio file (currently only WAVE files with 16-bit, 16kHz, mono are supported in the Python client):

pip3 install deepspeech
deepspeech models/output_graph.pb models/alphabet.txt my_audio_file.wav

Quicker inference (The realtime factor factor on a GeForce GTX 1070 is about 0.44) can be performed using a supported NVIDIA GPU on Linux (GPU version are supported). Install the GPU specific package:

pip3 install deepspeech-gpu
deepspeech models/output_graph.pb models/alphabet.txt my_audio_file.wav

For more information on the use ofdeepspeech, see the output ofdeepspeech -h. Checkrequired runtime dependencies.


Prerequisites

Python 3.6

Git Large File Storage


Getting the code

Install Git Large File Storage, either manually or through a package like git-lfs if abailable on the system. Then clone the DeepSpeech repository normally:

git 
clone
 https://github.com/mozilla/DeepSpeech

Getting the pre-trained model

A pre-trained English model for performing speech-to-text can be downloaded from theDeepSpeech releases page. Or run the following command to download and unzip the files in the current directory:

wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-models.tar.gz | gar xvfz -

Using the model

Using Python package

Pre-built binaries can be used for performing inference with a trained model installed withpip3. Usedeepspeechbinary to do speech-to-text on an audio file.

For the Python bindings, install Python 3.5 or later virtual environment. Find more information inthis documentation.

Installing DeepSpeech Python bindings

Usepip3to manage packages locally. Install deepspeech with command below:

pip3 install deepspeech

If deepspeech is already installed, update it:

pip3 install --upgrade deepspeech

If the environment have a supported NVIDIA GPU on Linux, type the commandpip3 install deepspeech-gpuinstead ofpipe install deepspeech.

pip3 install deepspeech-gpu

and update the GPU version deepspeech as the command below:

pip3 install --upgrade deepspeech-gpu

Now can call the sample binary usingdeepspeechon the code cell.

Note:download the pre-trained modelbefore executing the command below:

deepspeech models/output_graph.pbmm models/alphabet.txt models/lm.binary models/trie my_audio_file.wav

The last two arguments "models/trie", "my_audio_file.wav" are optional, and represent a language model.

Seeclient.pyfor an example of how to use the package programatically.

Installing bindings from source

If pre-built binaries aren't available for the system, install them from scratch. Followthese instructions.

Third party bindings

In addition to the binding above, third party developers have started to provide bindings to other languages:

  • Asticode provides Golang bindings in go-astideepspeech repo.
  • RustAudio provide a Rust binding, the installation and use of which is described in deepspeech-rs repo.
  • stes provides preliminary PKGBUILDs to install the client and python bindings on Arch Linux in the arch-deepspeech.
  • gst-deepspeech provides a GStreamer plugin which can be used from any language with GStreamer bindings.

Training

Installing prerequisites for training

Install the required dependencies using pip:

cd DeepSpeech
pip3 install -r requirements.txt

Also need to downloadnative_client.tar.xzor build the native client files to get the custom Tensorflow OP needed for decoding the outputs of the neural network. Download the files for the architecture usingutil/taskcluster.py.

python3 util/taskcluster.py --target .

This command above will download the native client files for the x86_64 architecture without CUDA support, and extract them into the current folder. Binaries with CUDA enabled ("--arch gpu") and for ARM7 ("--arch arm"). (以下指令不確定對,for CUDA support)

 python util/taskcluster.py --arch gpu --target .

Common Voice training data

The Common Voice corpus consists of voice samples that were donated throughCommon Voice. After downloading the Common Voice corpus (70GB), run the import script "bin/import_cv.py" on the directory where the corpus is located. The importer proceed to unpackaging and importing. To start the import process, type:

bin/import_cv.py PATH_TO_TARGET_DIRECTORY

The process above creates a huge number of small files, using an SSD drive is recommended.

Note: If the import script gets interrupted, it will try to continue from where it stopped the last time. Unfortunately, there are some cases where it will need to start over. Once the import is done, the directory will contain a bunch of CSV files.

The following files are official user-validated sets for training, validating and testing:

  • cv-valid-train.csv
  • cv-valid-dev.csv
  • cv-valid-test.csv

The following files are the non-validated unofficial sets for training, validating and testing:

  • cv-other-train.csv
  • cv-other-dev.csv
  • cv-other-test.csv

"cv-invalid.csv" contains all samples that user flagged as invalid.

A sub-directory calledcv_corpus_{version}contains the mp3 and wav files that were extracted from an archive namedcv corpus {version}.tar.gz. All entries in the CSV files refer to their samples by absolute paths. So moving this sub-directory would require another import or tweaking the CSV files accordingly.

To use Common Voice data during training, validation and testing, pass (comma separated combinations of) their filenames into--train_files,--dev_files,--test_filesparameters ofDeepSpeech.py

For example, Common Voice was imported into../data/CV,DeepSpeech.pycould be called like this:

./DeepSpeech.py --train_files ../data/CV/cv-valid-train.csv, ../data/CV/cv-other-train.csv --dev_files ../data/CV/cv-valid-dev.csv --test_files ../data/CV/cv-valid-test.csv

Training a model

The central (Python) script isDeepSpeech.pyin the project's root directory. For listing of command line options, type:+

./DeepSpeech.py --help

Reference

[0]https://github.com/mozilla/DeepSpeech

results matching ""

    No results matching ""