Supervised and Unsupervised Transfer Learning for Question Answering

2018/06/09 MM

Abstract

Transfer learning has been shown successfully for task like object and speech recognition, but its applicability to question answering (QA) has yet been well-studied. Extensive experiments have been conduct to investigate the transferability of knowledge learned from a source QA dataset to a larger dataset using two models (End-to-end memory networks and QACNN).

The performance of both models on a TOFEL listening comprehension test [1] and MCTest (Richardson et al., 2013) is significantly improved via a simple transfer learning technique from MovieQA (Tapaswi et al., 2016).

In particular, one of the models achieves the state-of-the-art on all target datasets; for the TOFEL listening comprehension test, it outperforms the previous best model by 7%. Finally, transfer learning is shown to be helpful even in unsupervised scenarios when correct answers for target QA dataset example are not available.

Introduction

Question Answering

One of the most important characteristics of an intelligent system is to understand stories like humans do. A story is a sequence of sentences, and can be in the form of plain text (Trischler et al., 2017; Rajpukar et al., 2016; Weston et al., 2016; Yang et al., 2015) or spoken content (Tseng et al., 2016), where the latter usually requires the spoken content to be first transcribed into text by automatic speech recognition (ASR), and the model will subsequently process the ASR output. To evaluate the extent of the model's understanding of the story, it is asked to answer questions about the story. Such a task is referred to as question answering (QA), and has been a long-standing yet challenging problem in natural language processing (NLP).

QA scenarios differ from each other in various ways, including the length of the story, the format of the answer and the size of the training set. In this work, we focus on context-aware multi-choice QA, where the answer to each question can be obtained by referring to its accompanying story, and each question comes with a set of answer choices with only one correct answer. The answer choice are in the form of open, natural language sentences. To correctly answer the question, the model is required to understand and reason about the relationship between the sentences in the story.

Transfer Learning

Transfer Learning (Pan and Yang, 2010) is a vital machine learning technique that aims to use knowledge learned from one task and apply it to a different, but related, task in order to either reduce the necessary fine-tuning data size or improve performance. Transfer learning, also known as domain adaption, has achieved success in numerous domain such as computer vision (Sharif Razavian et al., 2014), ASR (Doulaty et al., 2015; Huang et al., 2013), and NLP (Zhang et al., 2017; You et al., 2016).

In computer vision, deep neural networks trained on a large-scale image classification dataset such as ImageNet (Rusakosky et al., 2015) have proven to be excellent feature extractors for a board range of visual tasks such as image captioning (Lu et al., 2017; Karpathy and Fei-Fei, 2015; Fang et al., 2015) and visual question answering (Xu and Saenko, 2016; Fukui et al., 2016; Yang et al., 2016; Antol et al., 2015).

In NLP, transfer learning has also been successfully applied to tasks like sequence tagging (Yang et al., 2017), syntactic parsing (McClosky et al. 2010) and named entity recognition (Chiticariu et al., 2010).

Transfer Learning for QA

Although transfer learning has been successfully applied to various applications, its applicability to QA has yet to be well-studied. In this paper, the TOFEL listening comprehension test (Tseng et al., 2016) and MCTest (Richardson et al., 2013) with transfer learning from MovieQA (Tapaswi et al., 2016) are considered using two existing QA models End-to-end memory network and QACNN. The performance on the two target datasets are significantly improved.

In particular, one of the models achieves the state-of-the-art on all target datasets; for the TOFEL listening comprehension test, it outperforms the previous best model by 7%.

Transfer learning without any labeled data from the target domain is referred to as unsupervised transfer learning.

Motivated by the success of unsupervised transfer learning for speaker adaption (Chen et al., 2011; Wallace et al., 2009( and spoken document summarization (Lee et al., 2013), whether unsupervised transfer learning is feasible for QA is investigated .

Transfer learning for QA has been explored recently. Kadlec et al. (2016) is the first work that attempted to apply transfer learning for machine comprehension. The authors showed only limited transfer between two QA tasks, but the transferred system was still significantly better than a random baseline. Wiese et al. (2017) tackled a more specific task of biomedical QA with transfer learning from a large-scale dataset.

A simple transfer learning technique is used to achieve better performance Min et al. (2017).

None of these works study unsupervised transfer learning, which is especially crucial when the target dataset is small.

Golub et al. (2017) proposed a two-stage synthesis network that can generate synthetic questions and answers to augment insufficient training data without annotations.

Conclusion and Future Work

In this paper, a simple transfer learning technique has been demonstrated to be useful for the task of multi-choice question answering. A QACNN and a MEMN2N are used as QA models, and MovieQA serves as the source task.TOFEL listening comprehension test and MCTest serve as the target tasks.

By pre-training on MovieQA, the performance of both models on the target datasets improves significantly. The model also require much less training data from the target to achieve similar performance to those without pre-training.

Experiments have been conducted to study the influence of transfer learning on different types of questions, and show that the effectiveness of transfer learning is not limited to specific types of questions.

Iterative self-labeling technique has been applied to transfer learning, and the transfer learning has been verified to be useful even when the correct answers for target QA dataset example are not available.

Since the original data format of the TOFEL listening comprehension test is audio instead of text, it is worth trying to initialize the embedding layer of the QACNN with semantic or acoustic word embedding learned from speech (Chung and Glass 2018,2017; Chung et al., 2016) instead of those learned from text (Mikolov et al., 2013; Pengington et al., 2014).

Reference

[0]

Chung, Yu-An, Hung-Yi Lee, and James Glass. "Supervised and Unsupervised Transfer Learning for Question Answering." arXiv preprint arXiv:1711.05345 (2017).

[1]

Bo-Hsiang Tseng, Shen-Syan Shen, Hung-Yi Lee, and Lin-Shan Lee. 2016. Towards machine comprehension of spoken content: Initial FOFEL listening comprehension test by machine. In INTERSPEECH.

Supervised_and_unsupervised_transfer_Learning_for_QA_MM_修改