Neural Machine Translation (seq2seq)

180402 CCL

Required Environment

This tutorial requires TensorFlow Nightly [1] with consideration of branches such as tf-1.4 [2] for stability.

Introduction

Sequence-to-sequence (seq2seq) models are used for machine translation, speech recognition, and text summarization.

This tutorial will show how to build a competitive seq2seq model from scratch on the testbed Neural Machine Translation (NMT).

First, basic knowledge about seq2seq models for NMT, explaining how to build and train a simple NMT model will be mentioned.

The second part will go into details of building a competitive NMT model with attention mechanism.

Last, how to build the best possible NMT models (both in speed and translation quality) such as TensorFlow best practices (batching, bucketing), bidirectional RNNs, beam search, and scaling up to multiple GPUs using GNMT attention will be discussed.

Basic

Background on Neural Machine Translation

Traditional phrase-based translation systems performed their task by breaking up source sentences into multiple chunks and then translated them phrase-by-phrase. However, Neural Machine Translation (NMT) mimics how humans translate- read the whole sentence, understand its meaning, and then translate.

Figure 1. An example of a general approach for NMT (often refers as the encoder-decoder architecture). An encoder converts a source sentence into a "meaning" vector which is passed through a decoder to produce a translation.

As shown in Figure 1, an NMT system first reads the source sentence using an encoder to build a "thought" vector, a sequence of numbers that represents the sentence meaning; a decoder, then, processes the sentence vector to emit a translation.

In this manner, NMT can capture long-range dependencies in languages, e.g., gender agreements, syntax structures, etc.

NMT models vary in terms of their exact architectures. Recurrent neural network (RNN) is used by most NMT models for sequential data.

An RNN is usually used for both the encoder and decoder. The RNN models differ in terms of: (a) directionality – unidirectional or bidirectional; (b) depth – single- or multi-layer; and (c) type – often either a vanilla RNN, a Long Short-term Memory (LSTM), or a gated recurrent unit (GRU).

In this tutorial, a deep multi-layer RNN which is unidirectional and LSTM are considered as a recurrent unit. This model is shown in Figure 2. In the model, a source sentence "I am a student" was translated into a target sentence "Je suis étudiant".

Figure 2. An example of a deep recurrent architecture. Here, "<s>" marks the start of the decoding process while "</s>" tells the decoder to stop. (說明不清楚,待理解後要再補)

At a high level, the NMT model consists of two recurrent neural networks: the encoder RNN simply consumes the input source words without making any prediction; the decoder, on the other hand, processes the target sentence while predicting the next words.

Installing the Tutorial

Reference

[0]https://www.tensorflow.org/tutorials/seq2seq

[1]https://github.com/tensorflow/tensorflow/#installation

[2]https://github.com/tensorflow/nmt/tree/tf-1.4

results matching ""

    No results matching ""