Query-based Attention CNN for Text Similarity Map
MM 0602/2018
Abstract
Query-based attention CNN (QACNN), n end-to-end neural network for question answering, for Text Similarity Map was introduced.
The QACNN is composed of compare mechanism, two-staged CNN architecture with attention mechanism, and a prediction layer.
The compare mechanism compares between the given passage, query, and multiple answer choices to build similarity maps.
The two-staged CNN architecture extracts features through word-level and sentence-level.
The attention mechanism helps CNN focus more on the important part of the passage based on the query information.
The prediction layer find out the most possible answer choice.
The model is conducted on the MovieQA [1] dataset using Plot Synopses, and achieve 79.99 % accuracy.
Introduction
Calculating the cosine similarity between two vectors is generally done by two step.
First, encode text into word vectors, sentence vectors or paragraph vectors.
Second, calculating the cosine similarity between target vectors.
This method performs well when applied to word-level matching. However, as for matching between sentences or paragraphs, a single vector is not sufficient to encode all the important information.
In order to solve this problem, a compare-aggregate framwork was proposed to performs word-level matching using multiple techniques followed by aggregation with convolutional neural network [2].
The compare-aggregate framework has been shown to effectively match two sequences through a wide range.
Although "compare-aggregate" matching mechanism performs well on multiple question answering tasks, it has two deficiencies.
First, it tends to aggregate passively through the sequence rather than take the importance of each element into account.
That is, "compare aggregate" model considers all the sequential contents equally.
Second, "compare aggregate" can only take few neighboring elements into account at the same time because of the limitation of CNN kernel size.
The QACNN is proposed to deal with the deficiencies above.
First, the query-based attention mechanism is added into original "compare aggregate" model.
Moreover, the aggregation mechanism is re-designed to a two-staged CNN architecture which comprises word-level aggregation and sentence-level aggregation. In this way, QANN can efficiently extract features cross sentences.
The QACNN model consists of three components.
1) The similarity mapping layer which converts the input passage, query and choice into feature representation and perform a similarity operation to each other.
2) The attention-based CNN matching network composed of a two-staged CNN focusing on word-level and sentence-level matching respectively.
3) The prediction layer which makes the final decision.
The main contributions are three-fold.
First, a two-staged CNN architecture was introduced to integrate information from word-level to sentence-level, and then from sentence-level to passage-level.
Second, attention mechanism is introduced into this net. The CNN structure and attention mechanism are used to recognize the pattern of similarity map, and the specific syntactic structure of queries are identified.
By transforming passage-query feature into attention maps and applying it to passage-choice matching result, the weight are given to every word in the passage.
The model reaches 79.99% accuracy on the MovieQA dataset which yields top 1 result dataset.
Conclusion
An efficient matching mechanism on multiple choice question answering task is present.
Two staged CNN are introduced match passage and choice on word level and sentence level.
In addition, query-based CNN attention is used to enhance matching effect.
The model is verified on MovieQA dataset, which yielded the state-of-art result on the dataset.
In the future, QACNN will be trained based on trained embedding with TF-IDF weighting.
Further more, the QACNN model will be tested on open-answer task like SQuaD by seeing the whole corpus as an answer pool and solve it like multiple choice question.
[0]
Query-based Attention CNN for Text Similarity Map
Tzu-Chien Liu, Yu-Hsueh Wu, Hung-Yi Lee, 2017.
[1]
M.Tapaswi, Y.Zhu, R. Stiefelhagen,
"Movieqa: Understanding stories in movies through question-answering," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[2]
S. Wang and J. Jiang, " A compare-aggregate model for matching text sequences," 1611.01747, 2016.
[3]
J. Pennington, R. Socher, and C. Manning, "Glove: Global vectors for word representation," in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp.1532-1543.