Supervised and Supervised Transfer Learning for Question Answering
2018/05/06 Tsung-Yung Tsai
Abstract
Transfer learning has been shown successfully for task like object and speech recognition, but its applicability to question answering (QA) has yet been well-studied. In this paper, we conduct extensive experiments to investigate the transferability of knowledge learned from a source QA dataset to a larger dataset using two models. The performance of both models on a TOFEL listening comprehension test [1] and MCTest (Richardson et al., 2013) is significantly improved via a simple transfer learning technique from MovieQA (Tapaswi et al., 2016). In particular, one of the models achieves the state-of-the-art on all target datasets; for the TOFEL listening comprehension test, it outperforms the previous best model by 7%. Finally, we show that transfer learning is helpful even in unsupervised scenarios when correct answers for target QA dataset example are not available.
Introduction
Question Answering
One of the most important characteristics of an intelligent system is to understand stories like humans do. A story is a sequence of sentences, and can be in the form of plain text (Trischler et al., 2017; Rajpukar et al., 2016; Weston et al., 2016; Yang et al., 2015) or spoken content (Tseng et al., 2016), where the latter usually requires the spoken content to be first transcribed into text by automatic speech recognition (ASR), and the model will subsequently process the ASR output. To evaluate the extent of the model's understanding of the story, it is asked to answer questions about the story. Such a task is referred to as question answering (QA), and has been a long-standing yet challenging problem in natural language processing (NLP).
Several QA scenarios and datasets have been introduced over the past few years. These scenarios differ from each other in various ways., including the length of the story, the format of the answer and the size of the training set. In this work, we focus on context-aware multi-choice QA, where the answer to each question can be obtained by referring to its accompanying story, and each question comes with a set of answer choices with only one correct answer. The answer choice are in the form of open, natural language sentences. To correctly answer the question, the model is required to understand and reason about the relationship between the sentences in the story.
Transfer Learning
Transfer Learning (Pan and Yang, 2010) is a vital machine learning technique that aims to use knowledge learned from one task and apply it to a different, but related, task in order to either reduce the necessary fine-tuning data size or improve performance. Transfer learning, also known as domain adaption [In the paper, we do not distinguish conceptually between transfer learning and domain adaption. A 'domain' in the sense we use throughout this paper is defined by datasets.], has achieved success in numerous domain such as computer vision (Sharif Razavian et al., 2014), ASR (Doulaty et al., 2015; Huang et al., 2013), and NLP (Zhang et al., 2017; You et al., 2016). In computer vision, deep neural networks trained on a large-scale image classification dataset such as ImageNet (Rusakosky et al., 2015) have proven to be excellent feature extractors for a board range of visual tasks such as image captioning (Lu et al., 2017; Karpathy and Fei-Fei, 2015; Fang et al., 2015) and visual question answering (Xu and Saenko, 2016; Fukui et al., 2016; Yang et al., 2016; Antol et al., 2015). In NLP, transfer learning has also been successfully applied to tasks like sequence tagging (Yang et al., 2017), syntactic parsing (McClosky et al. 2010) and named entity recognition (Chiticariu et al., 2010).
Conclusion and Future Work
In this paper we demonstrate that a simple transfer learning technique can be very useful for the task of multi-choice question answering. We use a QACNN and a MemN2N as QA models, with MovieQA as the source task and a TOFEL listening comprehension test and MCTest as the target tasks. By pre-training on MovieQA, the performance of both models on the target datasets improves significantly. The model also require much less training data from the target to achieve similar performance to those without pre-training. We also conduct experiments to study the influence of transfer learning on different types of questions, and show that the effectiveness of transfer learning is not limited to specific types of questions. Finally, we show that by simple iterative self-labeling technique, transfer learning is still useful, even when the correct answers for target QA dataset example are not available, through quantitative results and visual analysis.
One area of future research will be generalizing the transfer results presented in this paper to other QA models and datasets. In addition, since the original data format of the TOFEL listening comprehension test is audio instead of text, it is worth trying to initialize the embedding layer of the QACNN with semantic or acoustic word embedding learned from speech (Chung and Glass 2018,2017; Chung et al., 2016) instead of those learned from text (Mikolov et al., 2013; Pengington et al., 2014).
Reference
[0]
Chung, Yu-An, Hung-Yi Lee, and James Glass. "Supervised and Unsupervised Transfer Learning for Question Answering." arXiv preprint arXiv:1711.05345 (2017).
[1]
Bo-Hsiang Tseng, Shen-Syan Shen, Hung-Yi Lee, and Lin-Shan Lee. 2016. Towards machine comprehension of spoken content: Initial FOFEL listening comprehension test by machine. In INTERSPEECH.