Tensorflow Speech Recognition Challenge

MM 0409/2018

can you build an algorithm that understands simple speech commands?

This competition belongs to Featured Prediction Competition.

This competition is found by Google Brain, and the total prize is $25,000

There are 1,315 teams joins this competition.

Overview

Description

We might be on the verge of too many screens. A promising antidote to our screen addition are voice interfaces. But it is hard to build a speech detector using free, open data and code for independent makers and entrepreneurs. Many voice recognition datasets require preprocessing before a neural network model can be built on them.

Tensorflow recently released the Speech Commands Datasets. It includes 65,000 one-second long utterances of 30 short words, by thousands of different people.

In this competition, you're challenged to use the Speech Commands Dataset to build an algorithm that understands spoken commands. By improving the recognition accuracy of open-sourced voice interface tools, the product effectiveness and their accessibility can be improved.

Evaluation

Submissions are evaluated on Multiclass Accuracy, which is the avverage number of observations with the correct label.

There are 12 possible labels for the Test set: yes, no, up, down, left, right, on, off, stop, go, silence, unknown.

The unknown label should be used for a command that is not one of the first 10 labels or that is not silence.

For audio clip in the test set, you must predict the correct label.

The submission file should comtain a header and have the following format

fname, label (fname refers to file name)

clip_000044442.wav,silence

clip_000adecb.wave,left

clip_0000d4322.wav,unknown

Prize

The leaderboard prizes:

1st place - $8,000

2nd place -$6,000

3rd place -$3,000

special Tensorflow prize -$8,000

The goal of the special prize is to encourage contestants to create a model that can be useful in practice to recognize commands on Raspberry Pi 3. In order to do this, there are several criteria:

1 The model must be runnable as frozen TensorFlow GraphDef files with no additional dependencies beyond TensorFlow 1.4.

2 The models must be smal in size (below 5 M bytes).

3 The model must have a standard set of inputs and outputs:

4 The model must run in less than 200ms on a stock Raspberry Pi 3 running Raspbian GNU/Linux 8 (Jessie), with no overclocking.

5 The model must come with code to train the model, which must be license-compatible with Tensorflow (Apache), and be submittable through Google's CLA to the Tensorflow project.

...

Timeline

January 9, 2018
- Entry deadline. You must accept the competition rules before this date in order to compete.
January 9, 2018
- Team Merger deadline. This is the last day participants may join or merge teams.
January 16, 2018
- Final submission deadline.

Tutorial & More Info

Google Research Blog Post announcing the Speech Commands Dataset. Note that much of what is provided as part of the training set is already public. However, the test set is not.

TensorFlow Audio Recognition Tutorial
Link to purchase Raspberry Pi 3 on Amazon. This will be at your own expense.
Also review the Prizes tab for details and tools for how the special prize will be evaluated.

[0]

https://www.kaggle.com/c/tensorflow-speech-recognition-challenge

antidote: 解毒藥

utterances：說話、講話

Kaggle_platform