# CNN 1

CNN1 is the "first stage CNN" part of QACNN model.

Fig.1 Overall data flow of QACNN model

CNN1 projects word level feature into sentence-level.

The output of CNN1 are r¯¯PQ\bar{\bar{r}}^{PQ} and r¯¯PC\bar{\bar{r}}^{PC}

r¯¯PQ\bar{\bar{r}}^{PQ} is paragraph sentence features based on query.

r¯¯PC\bar{\bar{r}}^{PC} is paragraph sentence features based on choice.

r¯¯PQ=[r¯1PQ,,r¯nPQ,,r¯NPQ]\bar{\bar{r}}^{PQ} = [\bar{r}^{PQ}_1, \cdots, \bar{r}^{PQ}_n, \cdots, \bar{r}^{PQ}_N] and r¯¯PC=[r¯1PC,,r¯nPC,,r¯NPC]\bar{\bar{r}}^{PC} = [\bar{r}^{PC}_1, \cdots, \bar{r}^{PC}_n, \cdots, \bar{r}^{PC}_N]

nn is the index for sentence, and N is the total number of sentence in paragraph.


CNN1 分成兩部分,第一部分是 generate attention map a¯n\bar{a}_n, 第二部分是 generate paragraph sentence features based on query r¯¯PQ\bar{\bar{r}}^{PQ}, and generate paragraph sentence features based on choices r¯¯PC\bar{\bar{r}}^{PC}.

#CNN1# 第一部分 Generate Attention Map

CNN1 第一部分計算 Attention Map a¯n\bar{a}_n from PQPQ

Fig.2 # CNN1 # attention part, from PQPQ to PnQP_nQ to word-level attention map a¯n\bar{a}_n.

Fig.2 shows the derivation of word-level attention map a¯n\bar{a}_n from PQPQ

PnQRJ×IP_n Q \in R^{J \times I} is nn-th sentence slice in PQPQ , and CNN is applied on PnQP_nQ with convolution kernel W¯¯¯1ARJ×l×d\bar{\bar{\bar{W}}}_1^A \in R^{J \times l \times d} . dd and ll represent width of kernel and number of kernel, respectively.

In convolution kernel W¯¯¯1A\bar{\bar{\bar{W}}}_1^A, the superscript AA denotes attention map, and the subscript 11 denotes the first stage CNN.

The generated feature q¯¯nARl×(Id+1)\bar{\bar{q}}_n^A \in R^{l \times(I-d+1)} is expressed as

q¯¯nA=sigmoid(W¯¯1APnQ+b¯1A)\bar{\bar{q}}_n^A =\textrm{sigmoid}( \bar{\bar{W}}_1^A \otimes P_n Q + \bar{b}_1^A)

where b¯1ARl\bar{b}_1^A \in R^l is the bias.

The query syntactic structure including the paragraph's location information can be learned with W¯¯1A\bar{\bar{W}}_1^A.

Sigmoid function is chosen as activation function in this stage.

Max pooling is performed on q¯¯nA\bar{\bar{q}}_n^A to find the largest elements between different kernels in the same location to generate word-level attention map a¯nRId+1\bar{a}_{n} \in R^{I-d+1} for each sentence.

a¯n=max pool(q¯¯nA)\bar{a}_n = \textrm{max pool}(\bar{\bar{q}}_n^A), note q¯¯nARl×(Id+1)\bar{\bar{q}}_n^A \in R^{l \times (I-d+1)} and a¯nRId+1\bar{a}_n \in R^{I-d+1}.

Fig.3 # CNN1 # attention part. Each paragraph sentences are used to generate its corresponding word-level attention map a¯n\bar{a}_n, n=1,2,,Nn=1,2,\cdots,N.

Fig.3 shows the architecture of the attention map in first stage CNN, and all sentences in paragraph are used.

#CNN1# 第二部分 Generate paragraph sentence features based on query and choices

CNN1 第二部分計算 paragraph's sentence features based on the query r¯¯PQ\bar{\bar{r}}^{PQ} and paragraph's sentence features based on the choice r¯¯PC\bar{\bar{r}}^{PC}

Fig.4 Derivation of paragraph feature based on query, r¯nPQ\bar{r}_n^{PQ}.

Fig.4 shows the derivation of paragraph feature based on query, r¯nPQ\bar{r}_n^{PQ}.

Fig.5 All paragraph's sentence feature based on query, r¯¯PQ=[r¯1PQ,,r¯nPQ,,r¯NPQ]\bar{\bar{r}}^{PQ}=[\bar{r}_1^{PQ},\cdots,\bar{r}_n^{PQ},\cdots,\bar{r}_N^{PQ}].

Fig.5 shows all paragraph's sentence feature based on query, r¯¯PQ=[r¯1PQ,,r¯nPQ,,r¯NPQ]\bar{\bar{r}}^{PQ}=[\bar{r}_1^{PQ},\cdots,\bar{r}_n^{PQ},\cdots,\bar{r}_N^{PQ}].

Fig.6 Derivation of paragraph's sentence feature based on choice, r¯nPC\bar{r}_n^{PC}.

Fig.6 shows the derivation of paragraph's sentence feature based on choice, r¯nPC\bar{r}_n^{PC}.

Fig.7 All paragraph's sentence feature based on choice, r¯¯PC=[r¯1PC,,r¯nPC,,r¯NPC]\bar{\bar{r}}^{PC}=[\bar{r}_1^{PC},\cdots,\bar{r}_n^{PC},\cdots,\bar{r}_N^{PC}].

Fig.7 shows all paragraph's sentence feature based on choice, r¯¯PC=[r¯1PC,,r¯nPC,,r¯NPC]\bar{\bar{r}}^{PC}=[\bar{r}_1^{PC},\cdots,\bar{r}_n^{PC},\cdots,\bar{r}_N^{PC}].

r¯¯PQ\bar{\bar{r}}^{PQ} and r¯¯PC\bar{\bar{r}}^{PC} are two output representation part of first-stage CNN architecture.

Kernels W¯¯¯1RRl×K×d\bar{\bar{\bar{W}}}_1^R \in R^{l \times K \times d} and bias b¯1RRl\bar{b}_1^R \in R^l are applied to PnQP_n Q to acquire query-based sentence features.

q¯¯nR=ReLU(W¯¯¯1RPnQ+b¯1R)Rl×(Id+1)\bar{\bar{q}}_n^R = \textrm{ReLU}(\bar{\bar{\bar{W}}}_1^R * P_n Q + \bar{b}_1^R) \in R^{l \times ( I-d+1)}

Identical kernels W¯¯¯1RRl×K×d\bar{\bar{\bar{W}}}_1^R \in R^{l \times K \times d} and bias b¯1RRl\bar{b}_1^R \in R^l are applied to PnCP_n C to aggregate pattern of location relationship and acquire choice-based sentence features. The superscript RR denotes output representation

c¯¯nR=ReLU(W¯¯¯1RPnC+b¯1R)=[(c¯n,1R)t(c¯n,R)t]Rl×(Id+1)\bar{\bar{c}}_n^R = \textrm{ReLU}(\bar{\bar{\bar{W}}}_1^R \otimes P_n C + \bar{b}_1^R)= \left[ \begin{matrix} (\bar{c}_{n,1}^R)^t \\ \vdots \\(\bar{c}_{n,\ell}^R)^t \end{matrix} \right] \in R^{l \times ( I-d+1)}

c¯¯nR\bar{\bar{c}}_n^R is multiplied by the word-level attention map a¯nRId+1\bar{a}_n \in R^{I-d+1} through the first dimension.

c¯¯nR=[(c¯n,1R)ta¯n(c¯n,R)ta¯n]\bar{\bar{c}}_n^R=\left[ \begin{matrix} (\bar{c}_{n,1}^R)^t \odot \bar{a}_n \\ \vdots \\(\bar{c}_{n,\ell}^R)^t \odot \bar{a}_n \end{matrix} \right],

The max pool operations are applied on q¯¯nRRl×(Id+1)\bar{\bar{q}}_n^R \in R^{l \times ( I-d+1)} and c¯¯nRRl×(Id+1)\bar{\bar{c}}_n^R \in R^{l \times ( I-d+1)} horizontally with kernel shape (Id+1)(I-d+1) to get the query-based sentence features r¯nPQRl\bar{r}_n^{PQ} \in R^l and choice-based sentence features r¯nPCRl\bar{r}_n^{PC} \in R^l.

r¯nPQ=max pool(q¯¯nR)\bar{r}_n^{PQ} = \textrm{max pool}(\bar{\bar{q}}_n^R)

r¯nPC=max pool(c¯¯nR)\bar{r}_n^{PC} = \textrm{max pool}(\bar{\bar{c}}_n^R)

#CNN1 overall flowchart

Fig.1 __init__() 函數中 # CNN1 # 流程

Fig.1 為 __init__() 函數中 # CNN1 # 流程

WQ1 is W¯¯¯1A\bar{\bar{\bar{W}}}_1^A, the convolution kernel, for attention map a¯n\bar{a}_n in CNN1

bQ1 is b¯1A\bar{b}_1^A, the bias, for attention map a¯n\bar{a}_n in CNN1

W1 is W¯¯¯1R\bar{\bar{\bar{W}}}_1^R, the convolution kernel, for the paragraph's sentence feature based on query r¯¯PQ\bar{\bar{r}}^{PQ} and the paragraph's sentence feature based on choice r¯¯PC\bar{\bar{r}}^{PC}

b1 is b¯1R\bar{b}_1^R, the bias for paragraph's sentence features based on query r¯¯PQ\bar{\bar{r}}^{PQ} and the paragraph's sentence feature based on choice r¯¯PC\bar{\bar{r}}^{PC}

# CNN1 # 1~15 行

1   ### CNN 1 ###
2   pooled_outputs_PQ_1 = []
3   pooled_outputs_PA_1 = []
4   pooled_outputs_PB_1 = []
5   pooled_outputs_PC_1 = []
6   pooled_outputs_PD_1 = []
7   pooled_outputs_PE_1 = []
8   for i, filter_size in enumerate(self.filter_size):
9       with tf.name_scope("conv1-maxpool-%s" % (filter_size)):
10          filter_shape = [filter_size,max_len[2], 1, self.filter_num]
11          W1 = tf.get_variable(name="W1-%s"%(filter_size), shape=filter_shape,initializer=tf.contrib.layers.xavier_initializer())
12          b1 = tf.Variable(tf.constant(0.1, shape=[self.filter_num]), name="b1")   
13
14          WQ1 = tf.get_variable(name="WQ1-%s"%(filter_size), shape=filter_shape,initializer=tf.contrib.layers.xavier_initializer())
15          bQ1 = tf.Variable(tf.constant(0.1, shape=[self.filter_num]), name="bQ1")

2~7行,定義變數 pooled_outputs_PQ_1, pooled_outputs_PA_1,...pooled_outputs_PE_1

2   pooled_outputs_PQ_1 = []
3   pooled_outputs_PA_1 = []
4   pooled_outputs_PB_1 = []
5   pooled_outputs_PC_1 = []
6   pooled_outputs_PD_1 = []
7   pooled_outputs_PE_1 = []

pooled_outputs_PQ_1, pooled_outputs_PA_1, ..., pooled_outputs_PE_1 為空的 list。

底線1, "_1", 指的是CNN1的意思

pooled_outputs_PQ_1 is r¯nPQ\bar{r}_n^{PQ}, the paragraph's sentence features based on the query.

pooled_outputs_PA_1, ..., pooled_outputs_PE_1 are merged to r¯nPC\bar{r}_n^{PC}, the paragraph's sentence features based on the choices.

r¯nPC=[pooled_outputs_PA_1,pooled_outputs_PB_1,,pooled_outputs_PE_1]\bar{r}_n^{PC} = [ \textrm{pooled}\_\textrm{outputs}\_ \textrm{PA}\_1, \textrm{pooled}\_\textrm{outputs}\_ \textrm{PB}\_1,\cdots, \textrm{pooled}\_\textrm{outputs}\_ \textrm{PE}\_1]

8~15行,定義 W1, b1, WQ1, bQ1

8   for i, filter_size in enumerate(self.filter_size):
9       with tf.name_scope("conv1-maxpool-%s" % (filter_size)):
10          filter_shape = [filter_size,max_len[2], 1, self.filter_num]
11          W1 = tf.get_variable(name="W1-%s"%(filter_size), shape=filter_shape, initializer=tf.contrib.layers.xavier_initializer())
12          b1 = tf.Variable(tf.constant(0.1, shape=[self.filter_num]), name="b1")   
13
14          WQ1 = tf.get_variable(name="WQ1-%s"%(filter_size), shape=filter_shape, initializer=tf.contrib.layers.xavier_initializer())
15          bQ1 = tf.Variable(tf.constant(0.1, shape=[self.filter_num]), name="bQ1")

第 8 行,self.filter_size (dd ) is the width of kernel, 在程式中被設定是 [1, 3, 5],因此 for loop 會執行三次。

第一次 d=1d = 1,第二次 d=3d = 3,第三次 d=5d = 5

filter_shape 指的是 filter 的形狀,

filter_shape=[filter_size,max_len[2],1,self.filter_num]=[d,K,1,]\textrm{filter} \_ \textrm{shape} = [\textrm{filter} \_ \textrm{size}, \textrm{max}\_\textrm{len}[2],1, \textrm{self.filter}\_\textrm{num}]=[d, K,1,\ell]

KK是 total number of words in choice sentence (note 在程式中被設定是 50)

self.filter_num (\ell ), is number of kernel in CNN1 (note 在程式中被設定是 128)

Fig.2 W1 and WQ1 的 shape 由 filter_shape 指定。

Fig.2 呈現程式的第11、14行定義出W1、WQ1。

11          W1 = tf.get_variable(name="W1-%s"%(filter_size), shape=filter_shape, initializer=tf.contrib.layers.xavier_initializer())

11 行程式指 W1 的 shape 是 filter_shape,它的名字將是W1-d, 其中d是width of kernel。W1由tf.contrib.layers.xavier_initializer()做初始化。

WQ1 is W¯¯¯1A\bar{\bar{\bar{W}}}_1^A , the convolution kernel on PnQP_nQ.

bQ1 is b¯1A\bar{b}_1^A, the bias on PnQP_nQ.

W1 is W¯¯¯1R\bar{\bar{\bar{W}}}_1^R, the convolution kernel for generating paragraph's sentence features based on query and choice.

b1 is b¯1R\bar{b}_1^R, the bias for generating paragraph's sentence features based on query and choice.


#CNN1# 程式16-158行

16            hiddenPQ_1 = []
17            hiddenPA_1 = []
18            hiddenPB_1 = []
19            hiddenPC_1 = []
20            hiddenPD_1 = []
21            hiddenPE_1 = []
22        for sentence_ind in range(len(PQAttention)):
23            convPQ_attention = tf.nn.conv2d(
24                PQAttention[sentence_ind],
25                WQ1,
26                strides=[1, 1, 1, 1],
27                padding="VALID",
28                name="conv")## [batch,wordNumberP- filter_size + 1, 1,self.filter_num]
29            convPQ_1 = tf.nn.conv2d(
30                PQAttention[sentence_ind],
31                W1,
32                strides=[1, 1, 1, 1],
33                padding="VALID",
34                name="conv")## [batch,wordNumberP- filter_size + 1, 1,self.filter_num]
35            convPA_1 = tf.nn.conv2d(
36                PAAttention[sentence_ind],
37                W1,
38                strides=[1, 1, 1, 1],
39                padding="VALID",
40                name="conv")## [batch,wordNumberP- filter_size + 1, 1,self.filter_num]
41            convPB_1 = tf.nn.conv2d(
42                PBAttention[sentence_ind],
43                W1,
44                strides=[1, 1, 1, 1],
45                padding="VALID",
46                name="conv")## [batch,wordNumberP- filter_size + 1, 1,self.filter_num]
47            convPC_1 = tf.nn.conv2d(
48                PCAttention[sentence_ind],
49                W1,
50                strides=[1, 1, 1, 1],
51                padding="VALID",
52                name="conv")## [batch,wordNumberP- filter_size + 1, 1,self.filter_num]
53            convPD_1 = tf.nn.conv2d(
54                PDAttention[sentence_ind],
55                W1,
56                strides=[1, 1, 1, 1],
57                padding="VALID",
58                name="conv")## [batch,wordNumberP- filter_size + 1, 1,self.filter_num]
59            convPE_1 = tf.nn.conv2d(
60                PEAttention[sentence_ind],
61                W1,
62                strides=[1, 1, 1, 1],
63                padding="VALID",
64                name="conv")## [batch,wordNumberP- filter_size + 1, 1,self.filter_num]
65    
66            wPQ_1 = tf.transpose(tf.sigmoid(tf.nn.bias_add(convPQ_attention,bQ1)),[0,3,2,1])
67            wPQ_1 = tf.nn.dropout(wPQ_1,self.dropoutRate)
68            wPQ_1 = tf.nn.max_pool(
69                wPQ_1,
70                ksize=[1,self.filter_num, 1,1],
71                strides=[1, 1, 1, 1],
72                padding='VALID',
73                name="pool_pq")  ##  [batch_size,1,1,wordNumberP- filter_size
74            wPQ_1 = tf.transpose(tf.tile(wPQ_1,[1,self.filter_num,1,1]),[0,3,2,1])
75            onesentence_hiddenPQ_1 = tf.nn.dropout(tf.nn.relu(tf.nn.bias_add(convPQ_1, b1), name="relu"),self.dropoutRate)
76            hiddenPQ_1.append(onesentence_hiddenPQ_1)
77            onesentence_hiddenPA_1 = tf.nn.dropout(tf.nn.relu(tf.nn.bias_add(convPA_1, b1), name="relu"),self.dropoutRate)*wPQ_1
78            hiddenPA_1.append(onesentence_hiddenPA_1)
79            onesentence_hiddenPB_1 = tf.nn.dropout(tf.nn.relu(tf.nn.bias_add(convPB_1, b1), name="relu"),self.dropoutRate)*wPQ_1
80            hiddenPB_1.append(onesentence_hiddenPB_1)
81            onesentence_hiddenPC_1 = tf.nn.dropout(tf.nn.relu(tf.nn.bias_add(convPC_1, b1), name="relu"),self.dropoutRate)*wPQ_1
82            hiddenPC_1.append(onesentence_hiddenPC_1)
83            onesentence_hiddenPD_1 = tf.nn.dropout(tf.nn.relu(tf.nn.bias_add(convPD_1, b1), name="relu"),self.dropoutRate)*wPQ_1
84            hiddenPD_1.append(onesentence_hiddenPD_1)
85            onesentence_hiddenPE_1 = tf.nn.dropout(tf.nn.relu(tf.nn.bias_add(convPE_1, b1), name="relu"),self.dropoutRate)*wPQ_1
86            hiddenPE_1.append(onesentence_hiddenPE_1)
87      hiddenPQ_1 = tf.concat(hiddenPQ_1, 1) ## [batch,max_plot_len*(wordNumberP- filter_size + 1), 1,self.filter_num]                    
88      hiddenPA_1 = tf.concat(hiddenPA_1, 1)
89      hiddenPB_1 = tf.concat(hiddenPB_1, 1)
90      hiddenPC_1 = tf.concat(hiddenPC_1, 1)
91      hiddenPD_1 = tf.concat(hiddenPD_1, 1)
92      hiddenPE_1 = tf.concat(hiddenPE_1, 1)
93
94      hiddenPQ_1 = tf.reshape(tf.squeeze(hiddenPQ_1), [batch_size, max_plot_len, (max_len[0] - filter_size + 1), self.filter_num]) ## [batch,max_plot_len,(wordNumberP- filter_size + 1),self.filter_num]     
95      hiddenPA_1 = tf.reshape(tf.squeeze(hiddenPA_1), [batch_size, max_plot_len, (max_len[0] - filter_size + 1), self.filter_num])
96      hiddenPB_1 = tf.reshape(tf.squeeze(hiddenPB_1), [batch_size, max_plot_len, (max_len[0] - filter_size + 1), self.filter_num])
97      hiddenPC_1 = tf.reshape(tf.squeeze(hiddenPC_1), [batch_size, max_plot_len, (max_len[0] - filter_size + 1), self.filter_num])
98      hiddenPD_1 = tf.reshape(tf.squeeze(hiddenPD_1), [batch_size, max_plot_len, (max_len[0] - filter_size + 1), self.filter_num])
99      hiddenPE_1 = tf.reshape(tf.squeeze(hiddenPE_1), [batch_size, max_plot_len, (max_len[0] - filter_size + 1), self.filter_num])
100
101     pooledPQ_1 = tf.nn.max_pool(
102         hiddenPQ_1,
103         ksize=[1, 1, (max_len[0] - filter_size + 1), 1],
104         strides=[1, 1, 1, 1],
105         padding='VALID',
106         name="pool")  ##  [batch_size, max_plot_len, 1, self.filter_num]
107     pooled_outputs_PQ_1.append(pooledPQ_1)
108
109     pooledPA_1 = tf.nn.max_pool(
110         hiddenPA_1,
111         ksize=[1, 1, (max_len[0] - filter_size + 1), 1],
112         strides=[1, 1, 1, 1],
113         padding='VALID',
114         name="pool")  ##  [batch_size, max_plot_len, 1, self.filter_num]
115     pooled_outputs_PA_1.append(pooledPA_1)
116
117     pooledPB_1 = tf.nn.max_pool(
118         hiddenPB_1,
119         ksize=[1, 1, (max_len[0] - filter_size + 1), 1],
120         strides=[1, 1, 1, 1],
121         padding='VALID',
122         name="pool")  ##[batch_size, max_plot_len, 1, self.filter_num]
123
124     pooled_outputs_PB_1.append(pooledPB_1)
125
126     pooledPC_1 = tf.nn.max_pool(
127         hiddenPC_1,
128         ksize=[1, 1, (max_len[0] - filter_size + 1), 1],
129         strides=[1, 1, 1, 1],
130         padding='VALID',
131         name="pool")  ##[batch_size, max_plot_len, 1, self.filter_num]
132
133     pooled_outputs_PC_1.append(pooledPC_1)
134
135     pooledPD_1 = tf.nn.max_pool(
136         hiddenPD_1,
137         ksize=[1, 1, (max_len[0] - filter_size + 1), 1],
138         strides=[1, 1, 1, 1],
139         padding='VALID',
140         name="pool")  ##[batch_size, max_plot_len, 1, self.filter_num
141
142     pooled_outputs_PD_1.append(pooledPD_1)
143
144     pooledPE_1 = tf.nn.max_pool(
145         hiddenPE_1,
146         ksize=[1, 1, (max_len[0] - filter_size + 1), 1],
147         strides=[1, 1, 1, 1],
148         padding='VALID',
149         name="pool")  ##[batch_size, max_plot_len, 1, self.filter_num]
150
151     pooled_outputs_PE_1.append(pooledPE_1)
152
153 h_pool_PQ_1 = tf.transpose(tf.concat(pooled_outputs_PQ_1, 3), perm=[0,3,1,2]) ##[batch_size, num_filters_total, max_plot_len, 1]
154 h_pool_PA_1 = tf.transpose(tf.concat(pooled_outputs_PA_1, 3), perm=[0,3,1,2]) ##[batch_size, num_filters_total, max_plot_len, 1]
155 h_pool_PB_1 = tf.transpose(tf.concat(pooled_outputs_PB_1, 3), perm=[0,3,1,2]) ##[batch_size, num_filters_total, max_plot_len, 1]
156 h_pool_PC_1 = tf.transpose(tf.concat(pooled_outputs_PC_1, 3), perm=[0,3,1,2]) ##[batch_size, num_filters_total, max_plot_len, 1]
157 h_pool_PD_1 = tf.transpose(tf.concat(pooled_outputs_PD_1, 3), perm=[0,3,1,2]) ##[batch_size, num_filters_total, max_plot_len, 1]
158 h_pool_PE_1 = tf.transpose(tf.concat(pooled_outputs_PE_1, 3), perm=[0,3,1,2]) ##[batch_size, num_filters_total, max_plot_len, 1]

16~21行,定義變數 hiddenPQ_1, hiddenPA_1, ..., hiddenPE_1

16            hiddenPQ_1 = []
17            hiddenPA_1 = []
18            hiddenPB_1 = []
19            hiddenPC_1 = []
20            hiddenPD_1 = []
21            hiddenPE_1 = []

hiddenPQ_1, hiddenPA_1, ..., hiddenPE_1 變數為空的 list。

第22行 for loop, from 0, 1, ... to (N-1)

22        for sentence_ind in range(len(PQAttention)):

第22行將for loop each sentence in paragraph, from 0 to N-1。

PQAttentionRN×I×J\textrm{PQAttention}\in R^{N \times I \times J},所以 len(PQAttention) = N

range(len(PQAttention)) 會產生一個 list,內容是 [0, 1, ..., N-1]。
sentence_ind 就是 n。


CNN1 第一部分 Generate Attention Map (wPQ_1) of First stage

fig.xx 計算 attention map (wPQ_1) 的流程

fig.xx 將計算 a¯n\bar{a}_n

a¯n=maxpool(q¯¯nA)RId+1\bar{a}_n = maxpool(\bar{\bar{q}}_n^A) \in R^{I-d+1}

q¯¯nA=sigmoid(W¯¯1APnQ+b¯1A)Rl×(Id+1)\bar{\bar{q}}_n^A = sigmoid(\bar{\bar{W}}_1^A \otimes P_nQ + \bar{b}_1^A) \in R^{l \times (I-d+1)}

其中 wPQ_1 是 a¯n\bar{a}_n 再加上 batch size 維度

第23~28行 compute the convolution of PnQ and the kernel WQ1

23            convPQ_attention = tf.nn.conv2d(
24                PQAttention[sentence_ind],
25                WQ1,
26                strides=[1, 1, 1, 1],
27                padding="VALID",
28                name="conv")## [batch,wordNumberP- filter_size + 1, 1,self.filter_num]

第23~28行將執行

convPQ_attention=PnQW¯¯¯1ARbatch_size×(Id+1)×1×l\textrm{convPQ}\_\textrm{attention} = P_nQ \otimes \bar{\bar{\bar{W}}}^A_1 \in R^{\textrm{batch}\_\textrm{size} \times (I-d+1) \times 1 \times l}

Fig.xx convolution of PQAttention[sentence_ind] and WQ1

Fig.xx shows the schematic of the convolution of PnQP_nQ and W¯¯¯1A\bar{\bar{\bar{W}}}_1^A.

"PQAttention[sentence_ind]" _refers to _PnQP_nQ

"WQ1" refers to W¯¯1A\bar{\bar{W}}_1^A.

strides of convolution is [1, 1, 1, 1]

第66行 convPQ_attention 加上 bias,計算 sigmoid,以及轉置

66            wPQ_1 = tf.transpose(tf.sigmoid(tf.nn.bias_add(convPQ_attention,bQ1)),[0,3,2,1])

fig.xx Schematic of add bias and sigmoid into convPQ_attention

fig.xx shows the schematic corresponds to line 66.

line 66 computes q¯¯nA=sigmoid(PnQW¯¯¯1A+b¯1)\bar{\bar{q}}_n^A= \textrm{sigmoid} ( P_nQ \otimes \bar{\bar{\bar{W}}}^A_1 + \bar{b}_1 )

"wPQ1" is q¯¯nA\bar{\bar{q}}_n^A, "bQ1" is b¯1\bar{b}_1.

before transpose, q¯¯nA\bar{\bar{q}}_n^A維度為 : batch_size×(Id+1)×1×l\textrm{batch}\_\textrm{size} \times (I-d+1) \times 1 \times l
轉置後, q¯¯nA\bar{\bar{q}}_n^A維度為 batch_size×l×1×(Id+1)\textrm{batch}\_\textrm{size} \times l \times 1 \times (I-d+1)

第67行 wPQ_1 dropout

67            wPQ_1 = tf.nn.dropout(wPQ_1,self.dropoutRate)

dropout 隨機將 Hidden Layer 一定比例 (self.dropoutRate) 前一層到下一層的連線刪去。程式中 self.dropoutRate 被設定為 0.8

第68~73行 wPQ_1 取 max pool

68            wPQ_1 = tf.nn.max_pool(
69                wPQ_1,
70                ksize=[1,self.filter_num, 1,1],
71                strides=[1, 1, 1, 1],
72                padding='VALID',
73                name="pool_pq")  ##  [batch_size,1 ,1 ,wordNumberP-filter_size+1


fig.xx Schematic of max_pool in q¯¯nA\bar{\bar{q}}_n^A
fig.xx shows the schematic of max_pool in q¯¯nA\bar{\bar{q}}_n^A
wPQ_1 is a¯n\bar{a}_n, the word-level attention map

ksize is the size of the window for each dimension of the input tensor.
a¯n=wPQ_1Rbatch_size×1×1×(Id+1)\bar{a}_n = \textrm{wPQ}\_1 \in R^{\textrm{batch}\_{size} \times 1 \times 1 \times (I-d+1)}

第74行 wPQ_1 展開與轉置

74            wPQ_1 = tf.transpose(tf.tile(wPQ_1,[1,self.filter_num,1,1]),[0,3,2,1])

fig.xx Schematic of tf.tile (a¯n\bar{a}_n, [1, l, 1, 1])

fig.xx a¯n\bar{a}_n 維度是 batch_size×1×1×(Id+1)\textrm{batch}\_\textrm{size} \times 1 \times 1 \times (I-d+1),依 [1,l,1,1][1, l, 1, 1] 展開後,維度變為 batch_size×l×1×(Id+1)\textrm{batch}\_\textrm{size} \times l \times 1 \times (I-d+1)

轉置後維度為 batch_size×(Id+1)×1×l\textrm{batch}\_\textrm{size} \times (I-d+1) \times 1 \times l


CNN1 第二部分 Generate paragraph sentence features based on query (h_pool_PQ_1)

fig.xx 計算 paragraph sentence features based on query (h_pool_PQ_1) 的流程

fig.xx 將計算 r¯¯PQ\bar{\bar{r}}^{PQ}

r¯¯PQ=[r¯1PQ,r¯2PQ,,r¯nPQ,,r¯NPQ]\bar{\bar{r}}^{PQ} = [\bar{r}_1^{PQ}, \bar{r}_2^{PQ}, \cdots, \bar{r}_n^{PQ}, \cdots, \bar{r}_N^{PQ}]

r¯nPQ=maxpool(q¯¯nR)Rl\bar{r}_n^{PQ} = maxpool(\bar{\bar{q}}_n^R) \in R^l

q¯¯nR=relu(W¯¯1RPnQ+b¯1R)Rl×(Id+1)\bar{\bar{q}}_n^R = relu(\bar{\bar{W}}_1^R \otimes P_nQ + \bar{b}_1^R) \in R^{l \times (I-d+1)}

其中 h_pool_PQ_1 是 r¯¯PQ\bar{\bar{r}}^{PQ} 再加上 batch size 維度

第29~34行 compute the convolution of PnQ and the kernel W1 to convPQ_1

29            convPQ_1 = tf.nn.conv2d(
30                PQAttention[sentence_ind],
31                W1,
32                strides=[1, 1, 1, 1],
33                padding="VALID",
34                name="conv")## [batch,wordNumberP- filter_size + 1, 1,self.filter_num]

Fig.xx convolution of PQAttention[sentence_ind] and W1

Fig.xx show the schematic of the convolution of PQAttention[sentence_ind] and W1.

PQAttention[sentence_ind] is PnQP_nQ,

W1 is W¯¯¯1R\bar{\bar{\bar{W}}}^R_1

convPQ_1 is PnQW¯¯¯1RRbatch_size×(Id+1)×1×lP_nQ \otimes \bar{\bar{\bar{W}}}^R_1 \in R^{\textrm{batch}\_\textrm{size} \times (I-d+1) \times 1 \times l}

strides of convolution is [1, 1, 1, 1]

第75行 計算 onesentence_hiddenPQ_1

75            onesentence_hiddenPQ_1 = tf.nn.dropout(tf.nn.relu(tf.nn.bias_add(convPQ_1, b1), name="relu"),self.dropoutRate)

75行將執行

q¯¯nR=dropout(relu(PnQW¯¯¯1R+b¯1R))Rl×(Id+1)\bar{\bar{q}}_n^R = \textrm{dropout}(\textrm{relu}(P_nQ \otimes \bar{\bar{\bar{W}}}_1^R + \bar{b}_1^R)) \in R^{l \times (I-d+1)}

fig.xx Schematic of computing q¯¯nR\bar{\bar{q}}_n^R

fig.xx 計算 q¯¯nR\bar{\bar{q}}_n^R 的示意圖

b1 is b¯1R\bar{b}_1^R,

onesentence_hiddenPQ_1 is q¯¯nR\bar{\bar{q}}_n^R

q¯¯nRRbatch_size×(Id+1)×1×l\bar{\bar{q}}_n^R \in R^{\textrm{batch}\_\textrm{size} \times (I-d+1) \times 1 \times l}

dropout 隨機將 Hidden Layer 一定比例 (self.dropoutRate) 上一層到下一層的連結刪去。程式中 self.dropoutRate 被設定為 0.8

第76行 將 N 個 onesentence_hiddenPQ_1 合併為 hiddenPQ_1

76            hiddenPQ_1.append(onesentence_hiddenPQ_1)

fig.xx Schematic of onesentence_hiddenPQ_1 append to hiddenPQ_1

fig.xx shows onesentence_hiddenPQ_1 append to hiddenPQ_1

hiddenPQ_1=[onesentence_hiddenPQ1_1,,onesentence_hiddenPQn_1,,onesentence_hiddenPQN_1]\textrm{hiddenPQ}\_1 = [\textrm{onesentence}\_\textrm{hiddenPQ}_1\_1, \cdots, \textrm{onesentence}\_\textrm{hiddenPQ}_n\_1, \cdots, \textrm{onesentence}\_\textrm{hiddenPQ}_N\_1]

第87行 串接 (concat) hiddenPQ_1

87      hiddenPQ_1 = tf.concat(hiddenPQ_1, 1) ## [batch,max_plot_len*(wordNumberP- filter_size + 1), 1,self.filter_num]

fig.xx Schematic of tf.concat of hiddenPQ_1

hiddenPQ_1 concat 後維度是四維

hiddenPQ_1Rbatch_size×(N×(Id+1))×1×l\textrm{hiddenPQ}\_1 \in R^{\textrm{batch}\_\textrm{size} \times (N \times (I-d+1)) \times 1 \times l}

第94行 hiddenPQ_1 squeeze and reshape

94       hiddenPQ_1 = tf.reshape(tf.squeeze(hiddenPQ_1), [batch_size, max_plot_len, (max_len[0] - filter_size + 1), self.filter_num]) ## [batch,max_plot_len,(wordNumberP- filter_size + 1),self.filter_num]

tf.squeeze removes dimensions of size 1 from the shape of a tensor.

所以 tf.squeeze(hiddenPQ_1) 後的維度是三維 batch_size×(N×(Id+1))×l\textrm{batch}\_\textrm{size} \times (N \times (I-d+1)) \times l

再 reshape 後是四維 batch_size×N×(Id+1)×l\textrm{batch}\_\textrm{size} \times N \times (I-d+1) \times l

hiddenPQ_1Rbatch_size×N×(Id+1)×l\textrm{hiddenPQ}\_1 \in R^{\textrm{batch}\_\textrm{size} \times N \times (I-d+1) \times l}

第101~106行 hiddenPQ_1 計算 max_pool

101     pooledPQ_1 = tf.nn.max_pool(
102         hiddenPQ_1,
103         ksize=[1, 1, (max_len[0] - filter_size + 1), 1],
104         strides=[1, 1, 1, 1],
105         padding='VALID',
106         name="pool")  ##  [batch_size, max_plot_len, 1, self.filter_num]

fig.xx pooledPQ_1 is computed max_pool by hiddenPQ_1

fig.xx shows pooledPQ_1 is computed max_pool by hiddenPQ_1.

pooledPQ_1Rbatch_size×N×1×l\textrm{pooledPQ}\_1 \in R^{\textrm{batch}\_\textrm{size} \times N \times 1 \times l}

第107行 將 3 個 pooledPQ_1 合併為 pooled_outputs_PQ_1

107     pooled_outputs_PQ_1.append(pooledPQ_1)

fig.xx pooled_outputs_PQ_1 is appended by pooledPQ_1

fig.xx shows pooled_outputs_PQ_1 is appended by pooledPQ_1 three times when d=1, d=3 and d=5

pooled_outputs_PQ_1Rbatch_size×N×1×3l\textrm{pooled}\_\textrm{outputs}\_\textrm{PQ}\_1 \in R^{\textrm{batch}\_\textrm{size} \times N \times 1 \times 3l}

第153行 derivation to paragraph feature (h_pool_PQ_1) based on query

153 h_pool_PQ_1 = tf.transpose(tf.concat(pooled_outputs_PQ_1, 3), perm=[0,3,1,2]) ##[batch_size, num_filters_total, max_plot_len, 1]

fig.xx Schematic of tf.concat(pooled_outputs_PQ_1, 3)

fig.xx 表示 pooled_outputs_PQ_1 依照第三個維度 (3l) 連接

經過 tf.transpose 後得到 h_pool_PQ_1

h_pool_PQ_1=r¯¯PQRbatch_size×3l×N×1\textrm{h}\_\textrm{pool}\_\textrm{PQ}\_1 = \bar{\bar{r}}^{PQ} \in R^{ \textrm{batch}\_\textrm{size} \times 3l \times N \times 1}


CNN1 第二部分 Generate Paragraph sentence feature based on chioce (h_pool_PA_1, ..., h_pool_PE_1)

modify bQ1 and arrow

fig.xx 計算 paragraph sentence features based on choice (以 h_pool_PA_1 為例) 的流程

fig.xx 將計算 r¯¯PC\bar{\bar{r}}^{PC}. r¯¯PC\bar{\bar{r}}^{PC}有五個 : r¯¯APC,r¯¯BPC,r¯¯CPC,r¯¯DPC,r¯¯EPC\bar{\bar{r}}^{PC}_A, \bar{\bar{r}}^{PC}_B, \bar{\bar{r}}^{PC}_C, \bar{\bar{r}}^{PC}_D, \bar{\bar{r}}^{PC}_E

r¯¯PC=[r¯1PC,r¯2PC,,r¯nPC,,r¯NPC]\bar{\bar{r}}^{PC} = [\bar{r}_1^{PC}, \bar{r}_2^{PC}, \cdots, \bar{r}_n^{PC}, \cdots, \bar{r}_N^{PC}]

r¯nPC=maxpool(c¯¯nR)Rl\bar{r}_n^{PC} = maxpool(\bar{\bar{c}}_n^R) \in R^l

c¯¯nR=relu(W¯¯1RPnC+b¯1R)a¯n=[(c¯n,1R)ta¯n(c¯n,R)ta¯n]Rl×(Id+1)\bar{\bar{c}}_n^R = relu(\bar{\bar{W}}_1^R \otimes P_nC + \bar{b}_1^R) \odot \bar{a}_n = \left[ \begin{matrix} (\bar{c}_{n,1}^R)^t \odot \bar{a}_n \\ \vdots \\(\bar{c}_{n,\ell}^R)^t \odot \bar{a}_n \end{matrix} \right] \in R^{l \times (I-d+1)}

c¯¯nR\bar{\bar{c}}_n^R is multiplied by the word-level attention map a¯nRId+1\bar{a}_n \in R^{I-d+1} through the first dimension.

其中 h_pool_PA_1 是 r¯¯PC\bar{\bar{r}}^{PC} 再加上 batch size 維度

第29~34行 compute the convolution of PAAttention and the kernel W1

29            convPA_1 = tf.nn.conv2d(
30                PAAttention[sentence_ind],
31                W1,
32                strides=[1, 1, 1, 1],
33                padding="VALID",
34                name="conv")## [batch,wordNumberP- filter_size + 1, 1,self.filter_num]

PnQ error, J => K

Fig.xx convolution of PAAttention[sentence_ind] and W1

Fig.xx show the schematic of the convolution of PAAttention[sentence_ind] and W1.

convPA_1=PAAttentionW¯¯¯1RRbatch_size×(Id+1)×1×l\textrm{convPA}\_\textrm{1} = \textrm{PAAttention} \otimes \bar{\bar{\bar{W}}}^R_1 \in R^{\textrm{batch}\_\textrm{size} \times (I-d+1) \times 1 \times l}

同理,第41~64行 convolution 計算 convPB_1, convPC_1, convPD_1 and convPE_1

第77行 convPA_1 加上 bias、計算 relu 與 dropout、乘上 wPQ_1

77            onesentence_hiddenPA_1 = tf.nn.dropout(tf.nn.relu(tf.nn.bias_add(convPA_1, b1), name="relu"),self.dropoutRate)*wPQ_1

fig.xx Schematic of onesentence_hiddenPA_1

onesentence_hiddenPA_1 為 convPA_1 加上 bias,計算 relu 與 dropout,以及乘上 a¯n\bar{a}_n (wPQ_1) 得到

onesentence_hiddenPA_1=dropout(relu(convPA+b1))a¯nRbatch_size×(Id+1)×1×l\textrm{onesentence}\_\textrm{hiddenPA}\_1 = \textrm{dropout}(relu(\textrm{convPA} + \textrm{b1})) \ast \bar{a}_n \in R^{\textrm{batch}\_\textrm{size} \times (I-d+1) \times 1 \times l}

第78行 將 N 個 onesentence_hiddenPA_1 合併為 hiddenPA_1

78            hiddenPA_1.append(onesentence_hiddenPA_1)

hiddenPA_1=[onesentence_hiddenPA1_1,,onesentence_hiddenPAn_1,,onesentence_hiddenPAN_1]\textrm{hiddenPA}\_1 = [\textrm{onesentence}\_\textrm{hiddenPA}_1\_1, \cdots, \textrm{onesentence}\_\textrm{hiddenPA}_n\_1, \cdots, \textrm{onesentence}\_\textrm{hiddenPA}_N\_1]

第88行 串接 (concat) hiddenPA_1

88      hiddenPA_1 = tf.concat(hiddenPA_1, 1)

hiddenPA_1Rbatch_size×(N×(Id+1))×1×l\textrm{hiddenPA}\_1 \in R^{\textrm{batch}\_\textrm{size} \times (N \times (I-d+1)) \times 1 \times l}

第95行 hiddenPA_1 squeeze and reshape

95      hiddenPA_1 = tf.reshape(tf.squeeze(hiddenPA_1), [batch_size, max_plot_len, (max_len[0] - filter_size + 1), self.filter_num])

tf.squeeze removes dimensions of size 1 from the shape of a tensor.

tf.squeeze(hiddenPA_1) 後的維度是三維 batch_size×(N×(Id+1))×l\textrm{batch}\_\textrm{size} \times (N \times (I-d+1)) \times l

再 reshape 後是四維 batch_size×N×(Id+1)×l\textrm{batch}\_\textrm{size} \times N \times (I-d+1) \times l

hiddenPA_1Rbatch_size×N×(Id+1)×l\textrm{hiddenPA}\_1 \in R^{\textrm{batch}\_\textrm{size} \times N \times (I-d+1) \times l}

第109~114行 hiddenPA_1 計算 max_pool

109     pooledPA_1 = tf.nn.max_pool(
110         hiddenPA_1,
111         ksize=[1, 1, (max_len[0] - filter_size + 1), 1],
112         strides=[1, 1, 1, 1],
113         padding='VALID',
114         name="pool")  ##  [batch_size, max_plot_len, 1, self.filter_num]

fig.xx pooledPQ_1 is computed max_pool by hiddenPQ_1

fig.xx shows pooledPQ_1 is computed max_pool by hiddenPQ_1.

pooledPA_1Rbatch_size×N×1×l\textrm{pooledPA}\_1 \in R^{\textrm{batch}\_\textrm{size} \times N \times 1 \times l}

第115行 將 3 個 pooledPA_1 合併為 pooled_outputs_PA_1

115     pooled_outputs_PA_1.append(pooledPA_1)

fig.xx pooled_outputs_PQ_1 is appended by pooledPQ_1

fig.xx shows pooled_outputs_PQ_1 is appended by pooledPQ_1 three times when d=1, d=3 and d=5

pooled_outputs_PA_1Rbatch_size×N×1×3l\textrm{pooled}\_\textrm{outputs}\_\textrm{PA}\_1 \in R^{\textrm{batch}\_\textrm{size} \times N \times 1 \times 3l}

第154行 derivation to paragraph feature (h_pool_PA_1) based on choice A

154 h_pool_PA_1 = tf.transpose(tf.concat(pooled_outputs_PA_1, 3), perm=[0,3,1,2]) ##[batch_size, num_filters_total, max_plot_len, 1]

pooled_outputs_PA_1 依照第三個維度 (3l) 連接

經過 tf.transpose 後得到 h_pool_PA_1

h_pool_PA_1Rbatch_size×3l×N×1\textrm{h}\_\textrm{pool}\_\textrm{PA}\_1 \in R^{ \textrm{batch}\_\textrm{size} \times 3l \times N \times 1}

同理得出 h_pool_PB_1, h_pool_PC_1, h_pool_PD_1, h_pool_PE_1, derivation to paragraph feature based on choice B, C, D and E.

results matching ""

    No results matching ""