bi-directional GRU

MM 05/30/2018


Peopole find LSTM work well, but unnecessarily complicated, and the gated recurrent units (GRUs) are introduced [0-a].

tf.nn.dynamic_rnn: uses a tf.While loop to dynamically construct the graph when it is executed. Graph creation is faster, and batches of variable size can be fed.

tf.nn.bidirectional_dynamic_rnn: dynamic_rnn with bidirectional [0-a].

Stack multiple cells

Dealing with sequence with the same length.

Cell=tf.nn.rnncell.GRU_Cell(hidden_size)
rnn_cell=tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers)
output,outstate=tf.nn.dynamic_rnn(cell,seq,length,initial_state)

Dealing with variable sequence length.

All sequences can be padded with zero vectors and all labels with zero label. Most models can not deal with sequences of length larger than 120 tokens, and the sequences are truncated to a fixed max_length.

However, the padded labels change the total loss, and affects teh gradients.

There are two approaches for padded/truncated sequence length.

Approach 1

step1 maintain a mask (true for real, false for padded tokens)

step2 run model on both the real/padded tokens (model will predict labels for the padded tokens as well).

step3 only take into account the loss caused by the real elements.

full_loss = tf.nn.softmax_cross_entropy_with_logits(preds, labels)
loss = tf.reduce_mean(tf.boolean_mask(full_loss, mask))

Approach 2

step1 let model know the real sequence length so it only predict the labels for the real tokens.

cell = tf.nn.rnn_cell.GRUCell(hidden_size)
rnn_cells = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers)
tf.reduce_sum(tf.reduce_max(tf.sign(seq), 2), 1)
output, out_state = tf.nn.dynamic_rnn(cell, seq, length, initial_state

Methods to Deal with Several Common Problems When Training RNNs

Vanishing Gradients

Use different activation units:

1 tf.nn.relu, 2 tf.nn.relu6, 3 tf.nn.crelu, 4 tf.nn.elu

in addition to

1 tf.nn.softplus, 2 tf.nn.softsign, 3 tf.nn.bias_add, 4 tf.sigmoid, 5 tf.tanh.

tf.nn.bidirectional_dynamic_rnn

[0-a]

https://web.stanford.edu/class/cs20si/2017/lectures/slides_11.pdf

[0-b]

https://www.tensorflow.org/api_docs/python/tf/nn/bidirectional_dynamic_rnn

results matching ""

    No results matching ""