How can I train using my own data?
The easiest way to train on a custom dataset is to write your own importer that knows the structure of your audio and text files. All you have to do is generate CSV files for your splits with three columnswav_filename
,wav_filesize
andtranscript
that specify the path to the WAV file, its size, and the corresponding transcript text for each of your train, validation and test splits.
To start writing your own importer, runbin/run-ldc93s1.sh
, then look at the CSV file in data/ldc93s1 that's generated bybin/import_ldc93s1.sh
, and also the other more complexbin/import_*
scripts for inspiration. There's no requirement to use Python for the importer, as long as the generated CSV conforms to the format specified above.
DeepSpeech's requirements for the data is that the transcripts match the[a-z ]+
regex, and that the audio is stored WAV (PCM) files.
How can I train on Amazon AWS/Google CloudML/my favorite platform?
Currently we train on our own hardware with NVIDIA Titan X's, so we don't have answers for those questions. Contributions are welcome!
[0]
https://github.com/mozilla/DeepSpeech/wiki\#frequently-asked-questions