# CNN 1

CNN1 is the "first stage CNN" part of QACNN model.

Fig.1 Overall data flow of QACNN model

CNN1 projects word level feature into sentence-level.

The output of CNN1 are $\bar{\bar{r}}^{PQ}$ and $\bar{\bar{r}}^{PC}$

$\bar{\bar{r}}^{PQ}$ is paragraph sentence features based on query.

$\bar{\bar{r}}^{PC}$ is paragraph sentence features based on choice.

$\bar{\bar{r}}^{PQ} = [\bar{r}^{PQ}_1, \cdots, \bar{r}^{PQ}_n, \cdots, \bar{r}^{PQ}_N]$ and $\bar{\bar{r}}^{PC} = [\bar{r}^{PC}_1, \cdots, \bar{r}^{PC}_n, \cdots, \bar{r}^{PC}_N]$

$n$ is the index for sentence, and N is the total number of sentence in paragraph.

CNN1 分成兩部分，第一部分是 generate attention map $\bar{a}_n$ , 第二部分是 generate paragraph sentence features based on query $\bar{\bar{r}}^{PQ}$ , and generate paragraph sentence features based on choices $\bar{\bar{r}}^{PC}$ .

#CNN1# 第一部分 Generate Attention Map

CNN1 第一部分計算 Attention Map $\bar{a}_n$ from $PQ$

Fig.2 # CNN1 # attention part, from $PQ$ to $P_nQ$ to word-level attention map $\bar{a}_n$ .

Fig.2 shows the derivation of word-level attention map $\bar{a}_n$ from $PQ$

$P_n Q \in R^{J \times I}$ is $n$ -th sentence slice in $PQ$ , and CNN is applied on $P_nQ$ with convolution kernel $\bar{\bar{\bar{W}}}_1^A \in R^{J \times l \times d}$ . $d$ and $l$ represent width of kernel and number of kernel, respectively.

In convolution kernel $\bar{\bar{\bar{W}}}_1^A$ , the superscript $A$ denotes attention map, and the subscript $1$ denotes the first stage CNN.

The generated feature $\bar{\bar{q}}_n^A \in R^{l \times(I-d+1)}$ is expressed as

$\bar{\bar{q}}_n^A =\textrm{sigmoid}( \bar{\bar{W}}_1^A \otimes P_n Q + \bar{b}_1^A)$

where $\bar{b}_1^A \in R^l$ is the bias.

The query syntactic structure including the paragraph's location information can be learned with $\bar{\bar{W}}_1^A$ .

Sigmoid function is chosen as activation function in this stage.

Max pooling is performed on $\bar{\bar{q}}_n^A$ to find the largest elements between different kernels in the same location to generate word-level attention map $\bar{a}_{n} \in R^{I-d+1}$ for each sentence.

$\bar{a}_n = \textrm{max pool}(\bar{\bar{q}}_n^A)$ , note $\bar{\bar{q}}_n^A \in R^{l \times (I-d+1)}$ and $\bar{a}_n \in R^{I-d+1}$ .

Fig.3 # CNN1 # attention part. Each paragraph sentences are used to generate its corresponding word-level attention map $\bar{a}_n$ , $n=1,2,\cdots,N$ .

Fig.3 shows the architecture of the attention map in first stage CNN, and all sentences in paragraph are used.

#CNN1# 第二部分 Generate paragraph sentence features based on query and choices

CNN1 第二部分計算 paragraph's sentence features based on the query $\bar{\bar{r}}^{PQ}$ and paragraph's sentence features based on the choice $\bar{\bar{r}}^{PC}$

Fig.4 Derivation of paragraph feature based on query, $\bar{r}_n^{PQ}$ .

Fig.4 shows the derivation of paragraph feature based on query, $\bar{r}_n^{PQ}$ .

Fig.5 All paragraph's sentence feature based on query, $\bar{\bar{r}}^{PQ}=[\bar{r}_1^{PQ},\cdots,\bar{r}_n^{PQ},\cdots,\bar{r}_N^{PQ}]$ .

Fig.5 shows all paragraph's sentence feature based on query, $\bar{\bar{r}}^{PQ}=[\bar{r}_1^{PQ},\cdots,\bar{r}_n^{PQ},\cdots,\bar{r}_N^{PQ}]$ .

Fig.6 Derivation of paragraph's sentence feature based on choice, $\bar{r}_n^{PC}$ .

Fig.6 shows the derivation of paragraph's sentence feature based on choice, $\bar{r}_n^{PC}$ .

Fig.7 All paragraph's sentence feature based on choice, $\bar{\bar{r}}^{PC}=[\bar{r}_1^{PC},\cdots,\bar{r}_n^{PC},\cdots,\bar{r}_N^{PC}]$ .

Fig.7 shows all paragraph's sentence feature based on choice, $\bar{\bar{r}}^{PC}=[\bar{r}_1^{PC},\cdots,\bar{r}_n^{PC},\cdots,\bar{r}_N^{PC}]$ .

$\bar{\bar{r}}^{PQ}$ and $\bar{\bar{r}}^{PC}$ are two output representation part of first-stage CNN architecture.

Kernels $\bar{\bar{\bar{W}}}_1^R \in R^{l \times K \times d}$ and bias $\bar{b}_1^R \in R^l$ are applied to $P_n Q$ to acquire query-based sentence features.

$\bar{\bar{q}}_n^R = \textrm{ReLU}(\bar{\bar{\bar{W}}}_1^R * P_n Q + \bar{b}_1^R) \in R^{l \times ( I-d+1)}$

Identical kernels $\bar{\bar{\bar{W}}}_1^R \in R^{l \times K \times d}$ and bias $\bar{b}_1^R \in R^l$ are applied to $P_n C$ to aggregate pattern of location relationship and acquire choice-based sentence features. The superscript $R$ denotes output representation

$\bar{\bar{c}}_n^R = \textrm{ReLU}(\bar{\bar{\bar{W}}}_1^R \otimes P_n C + \bar{b}_1^R)= \left[ \begin{matrix} (\bar{c}_{n,1}^R)^t \\ \vdots \\(\bar{c}_{n,\ell}^R)^t \end{matrix} \right] \in R^{l \times ( I-d+1)}$

$\bar{\bar{c}}_n^R$ is multiplied by the word-level attention map $\bar{a}_n \in R^{I-d+1}$ through the first dimension.

$\bar{\bar{c}}_n^R=\left[ \begin{matrix} (\bar{c}_{n,1}^R)^t \odot \bar{a}_n \\ \vdots \\(\bar{c}_{n,\ell}^R)^t \odot \bar{a}_n \end{matrix} \right]$ ,

The max pool operations are applied on $\bar{\bar{q}}_n^R \in R^{l \times ( I-d+1)}$ and $\bar{\bar{c}}_n^R \in R^{l \times ( I-d+1)}$ horizontally with kernel shape $(I-d+1)$ to get the query-based sentence features $\bar{r}_n^{PQ} \in R^l$ and choice-based sentence features $\bar{r}_n^{PC} \in R^l$ .

$\bar{r}_n^{PQ} = \textrm{max pool}(\bar{\bar{q}}_n^R)$

$\bar{r}_n^{PC} = \textrm{max pool}(\bar{\bar{c}}_n^R)$

#CNN1 overall flowchart

Fig.1 __init__() 函數中 # CNN1 # 流程

Fig.1 為 __init__() 函數中 # CNN1 # 流程

WQ1 is $\bar{\bar{\bar{W}}}_1^A$ , the convolution kernel, for attention map $\bar{a}_n$ in CNN1

bQ1 is $\bar{b}_1^A$ , the bias, for attention map $\bar{a}_n$ in CNN1

W1 is $\bar{\bar{\bar{W}}}_1^R$ , the convolution kernel, for the paragraph's sentence feature based on query $\bar{\bar{r}}^{PQ}$ and the paragraph's sentence feature based on choice $\bar{\bar{r}}^{PC}$

b1 is $\bar{b}_1^R$ , the bias for paragraph's sentence features based on query $\bar{\bar{r}}^{PQ}$ and the paragraph's sentence feature based on choice $\bar{\bar{r}}^{PC}$

# CNN1 # 1~15 行

1   ### CNN 1 ###
2   pooled_outputs_PQ_1 = []
3   pooled_outputs_PA_1 = []
4   pooled_outputs_PB_1 = []
5   pooled_outputs_PC_1 = []
6   pooled_outputs_PD_1 = []
7   pooled_outputs_PE_1 = []
8   for i, filter_size in enumerate(self.filter_size):
9       with tf.name_scope("conv1-maxpool-%s" % (filter_size)):
10          filter_shape = [filter_size,max_len[2], 1, self.filter_num]
11          W1 = tf.get_variable(name="W1-%s"%(filter_size), shape=filter_shape,initializer=tf.contrib.layers.xavier_initializer())
12          b1 = tf.Variable(tf.constant(0.1, shape=[self.filter_num]), name="b1")   
13
14          WQ1 = tf.get_variable(name="WQ1-%s"%(filter_size), shape=filter_shape,initializer=tf.contrib.layers.xavier_initializer())
15          bQ1 = tf.Variable(tf.constant(0.1, shape=[self.filter_num]), name="bQ1")

2~7行，定義變數 pooled_outputs_PQ_1, pooled_outputs_PA_1,...pooled_outputs_PE_1

2   pooled_outputs_PQ_1 = []
3   pooled_outputs_PA_1 = []
4   pooled_outputs_PB_1 = []
5   pooled_outputs_PC_1 = []
6   pooled_outputs_PD_1 = []
7   pooled_outputs_PE_1 = []

pooled_outputs_PQ_1, pooled_outputs_PA_1, ..., pooled_outputs_PE_1 為空的 list。

底線1, "_1", 指的是CNN1的意思

pooled_outputs_PQ_1 is $\bar{r}_n^{PQ}$ , the paragraph's sentence features based on the query.

pooled_outputs_PA_1, ..., pooled_outputs_PE_1 are merged to $\bar{r}_n^{PC}$ , the paragraph's sentence features based on the choices.

$\bar{r}_n^{PC} = [ \textrm{pooled}\_\textrm{outputs}\_ \textrm{PA}\_1, \textrm{pooled}\_\textrm{outputs}\_ \textrm{PB}\_1,\cdots, \textrm{pooled}\_\textrm{outputs}\_ \textrm{PE}\_1]$

8~15行，定義 W1, b1, WQ1, bQ1

8   for i, filter_size in enumerate(self.filter_size):
9       with tf.name_scope("conv1-maxpool-%s" % (filter_size)):
10          filter_shape = [filter_size,max_len[2], 1, self.filter_num]
11          W1 = tf.get_variable(name="W1-%s"%(filter_size), shape=filter_shape, initializer=tf.contrib.layers.xavier_initializer())
12          b1 = tf.Variable(tf.constant(0.1, shape=[self.filter_num]), name="b1")   
13
14          WQ1 = tf.get_variable(name="WQ1-%s"%(filter_size), shape=filter_shape, initializer=tf.contrib.layers.xavier_initializer())
15          bQ1 = tf.Variable(tf.constant(0.1, shape=[self.filter_num]), name="bQ1")

第 8 行，self.filter_size ( $d$ ) is the width of kernel, 在程式中被設定是 [1, 3, 5]，因此 for loop 會執行三次。

第一次 $d = 1$ ，第二次 $d = 3$ ，第三次 $d = 5$ 。

filter_shape 指的是 filter 的形狀，

$\textrm{filter} \_ \textrm{shape} = [\textrm{filter} \_ \textrm{size}, \textrm{max}\_\textrm{len}[2],1, \textrm{self.filter}\_\textrm{num}]=[d, K,1,\ell]$ ，

$K$ 是 total number of words in choice sentence (note 在程式中被設定是 50)

self.filter_num ( $\ell$ ), is number of kernel in CNN1 (note 在程式中被設定是 128)

Fig.2 W1 and WQ1 的 shape 由 filter_shape 指定。

Fig.2 呈現程式的第11、14行定義出W1、WQ1。

11          W1 = tf.get_variable(name="W1-%s"%(filter_size), shape=filter_shape, initializer=tf.contrib.layers.xavier_initializer())

11 行程式指 W1 的 shape 是 filter_shape，它的名字將是W1-d, 其中d是width of kernel。W1由tf.contrib.layers.xavier_initializer()做初始化。

WQ1 is $\bar{\bar{\bar{W}}}_1^A$ , the convolution kernel on $P_nQ$ .

bQ1 is $\bar{b}_1^A$ , the bias on $P_nQ$ .

W1 is $\bar{\bar{\bar{W}}}_1^R$ , the convolution kernel for generating paragraph's sentence features based on query and choice.

b1 is $\bar{b}_1^R$ , the bias for generating paragraph's sentence features based on query and choice.

#CNN1# 程式16-158行

16            hiddenPQ_1 = []
17            hiddenPA_1 = []
18            hiddenPB_1 = []
19            hiddenPC_1 = []
20            hiddenPD_1 = []
21            hiddenPE_1 = []
22        for sentence_ind in range(len(PQAttention)):
23            convPQ_attention = tf.nn.conv2d(
24                PQAttention[sentence_ind],
25                WQ1,
26                strides=[1, 1, 1, 1],
27                padding="VALID",
28                name="conv")## [batch,wordNumberP- filter_size + 1, 1,self.filter_num]
29            convPQ_1 = tf.nn.conv2d(
30                PQAttention[sentence_ind],
31                W1,
32                strides=[1, 1, 1, 1],
33                padding="VALID",
34                name="conv")## [batch,wordNumberP- filter_size + 1, 1,self.filter_num]
35            convPA_1 = tf.nn.conv2d(
36                PAAttention[sentence_ind],
37                W1,
38                strides=[1, 1, 1, 1],
39                padding="VALID",
40                name="conv")## [batch,wordNumberP- filter_size + 1, 1,self.filter_num]
41            convPB_1 = tf.nn.conv2d(
42                PBAttention[sentence_ind],
43                W1,
44                strides=[1, 1, 1, 1],
45                padding="VALID",
46                name="conv")## [batch,wordNumberP- filter_size + 1, 1,self.filter_num]
47            convPC_1 = tf.nn.conv2d(
48                PCAttention[sentence_ind],
49                W1,
50                strides=[1, 1, 1, 1],
51                padding="VALID",
52                name="conv")## [batch,wordNumberP- filter_size + 1, 1,self.filter_num]
53            convPD_1 = tf.nn.conv2d(
54                PDAttention[sentence_ind],
55                W1,
56                strides=[1, 1, 1, 1],
57                padding="VALID",
58                name="conv")## [batch,wordNumberP- filter_size + 1, 1,self.filter_num]
59            convPE_1 = tf.nn.conv2d(
60                PEAttention[sentence_ind],
61                W1,
62                strides=[1, 1, 1, 1],
63                padding="VALID",
64                name="conv")## [batch,wordNumberP- filter_size + 1, 1,self.filter_num]
65    
66            wPQ_1 = tf.transpose(tf.sigmoid(tf.nn.bias_add(convPQ_attention,bQ1)),[0,3,2,1])
67            wPQ_1 = tf.nn.dropout(wPQ_1,self.dropoutRate)
68            wPQ_1 = tf.nn.max_pool(
69                wPQ_1,
70                ksize=[1,self.filter_num, 1,1],
71                strides=[1, 1, 1, 1],
72                padding='VALID',
73                name="pool_pq")  ##  [batch_size,1,1,wordNumberP- filter_size
74            wPQ_1 = tf.transpose(tf.tile(wPQ_1,[1,self.filter_num,1,1]),[0,3,2,1])
75            onesentence_hiddenPQ_1 = tf.nn.dropout(tf.nn.relu(tf.nn.bias_add(convPQ_1, b1), name="relu"),self.dropoutRate)
76            hiddenPQ_1.append(onesentence_hiddenPQ_1)
77            onesentence_hiddenPA_1 = tf.nn.dropout(tf.nn.relu(tf.nn.bias_add(convPA_1, b1), name="relu"),self.dropoutRate)*wPQ_1
78            hiddenPA_1.append(onesentence_hiddenPA_1)
79            onesentence_hiddenPB_1 = tf.nn.dropout(tf.nn.relu(tf.nn.bias_add(convPB_1, b1), name="relu"),self.dropoutRate)*wPQ_1
80            hiddenPB_1.append(onesentence_hiddenPB_1)
81            onesentence_hiddenPC_1 = tf.nn.dropout(tf.nn.relu(tf.nn.bias_add(convPC_1, b1), name="relu"),self.dropoutRate)*wPQ_1
82            hiddenPC_1.append(onesentence_hiddenPC_1)
83            onesentence_hiddenPD_1 = tf.nn.dropout(tf.nn.relu(tf.nn.bias_add(convPD_1, b1), name="relu"),self.dropoutRate)*wPQ_1
84            hiddenPD_1.append(onesentence_hiddenPD_1)
85            onesentence_hiddenPE_1 = tf.nn.dropout(tf.nn.relu(tf.nn.bias_add(convPE_1, b1), name="relu"),self.dropoutRate)*wPQ_1
86            hiddenPE_1.append(onesentence_hiddenPE_1)
87      hiddenPQ_1 = tf.concat(hiddenPQ_1, 1) ## [batch,max_plot_len*(wordNumberP- filter_size + 1), 1,self.filter_num]                    
88      hiddenPA_1 = tf.concat(hiddenPA_1, 1)
89      hiddenPB_1 = tf.concat(hiddenPB_1, 1)
90      hiddenPC_1 = tf.concat(hiddenPC_1, 1)
91      hiddenPD_1 = tf.concat(hiddenPD_1, 1)
92      hiddenPE_1 = tf.concat(hiddenPE_1, 1)
93
94      hiddenPQ_1 = tf.reshape(tf.squeeze(hiddenPQ_1), [batch_size, max_plot_len, (max_len[0] - filter_size + 1), self.filter_num]) ## [batch,max_plot_len,(wordNumberP- filter_size + 1),self.filter_num]     
95      hiddenPA_1 = tf.reshape(tf.squeeze(hiddenPA_1), [batch_size, max_plot_len, (max_len[0] - filter_size + 1), self.filter_num])
96      hiddenPB_1 = tf.reshape(tf.squeeze(hiddenPB_1), [batch_size, max_plot_len, (max_len[0] - filter_size + 1), self.filter_num])
97      hiddenPC_1 = tf.reshape(tf.squeeze(hiddenPC_1), [batch_size, max_plot_len, (max_len[0] - filter_size + 1), self.filter_num])
98      hiddenPD_1 = tf.reshape(tf.squeeze(hiddenPD_1), [batch_size, max_plot_len, (max_len[0] - filter_size + 1), self.filter_num])
99      hiddenPE_1 = tf.reshape(tf.squeeze(hiddenPE_1), [batch_size, max_plot_len, (max_len[0] - filter_size + 1), self.filter_num])
100
101     pooledPQ_1 = tf.nn.max_pool(
102         hiddenPQ_1,
103         ksize=[1, 1, (max_len[0] - filter_size + 1), 1],
104         strides=[1, 1, 1, 1],
105         padding='VALID',
106         name="pool")  ##  [batch_size, max_plot_len, 1, self.filter_num]
107     pooled_outputs_PQ_1.append(pooledPQ_1)
108
109     pooledPA_1 = tf.nn.max_pool(
110         hiddenPA_1,
111         ksize=[1, 1, (max_len[0] - filter_size + 1), 1],
112         strides=[1, 1, 1, 1],
113         padding='VALID',
114         name="pool")  ##  [batch_size, max_plot_len, 1, self.filter_num]
115     pooled_outputs_PA_1.append(pooledPA_1)
116
117     pooledPB_1 = tf.nn.max_pool(
118         hiddenPB_1,
119         ksize=[1, 1, (max_len[0] - filter_size + 1), 1],
120         strides=[1, 1, 1, 1],
121         padding='VALID',
122         name="pool")  ##[batch_size, max_plot_len, 1, self.filter_num]
123
124     pooled_outputs_PB_1.append(pooledPB_1)
125
126     pooledPC_1 = tf.nn.max_pool(
127         hiddenPC_1,
128         ksize=[1, 1, (max_len[0] - filter_size + 1), 1],
129         strides=[1, 1, 1, 1],
130         padding='VALID',
131         name="pool")  ##[batch_size, max_plot_len, 1, self.filter_num]
132
133     pooled_outputs_PC_1.append(pooledPC_1)
134
135     pooledPD_1 = tf.nn.max_pool(
136         hiddenPD_1,
137         ksize=[1, 1, (max_len[0] - filter_size + 1), 1],
138         strides=[1, 1, 1, 1],
139         padding='VALID',
140         name="pool")  ##[batch_size, max_plot_len, 1, self.filter_num
141
142     pooled_outputs_PD_1.append(pooledPD_1)
143
144     pooledPE_1 = tf.nn.max_pool(
145         hiddenPE_1,
146         ksize=[1, 1, (max_len[0] - filter_size + 1), 1],
147         strides=[1, 1, 1, 1],
148         padding='VALID',
149         name="pool")  ##[batch_size, max_plot_len, 1, self.filter_num]
150
151     pooled_outputs_PE_1.append(pooledPE_1)
152
153 h_pool_PQ_1 = tf.transpose(tf.concat(pooled_outputs_PQ_1, 3), perm=[0,3,1,2]) ##[batch_size, num_filters_total, max_plot_len, 1]
154 h_pool_PA_1 = tf.transpose(tf.concat(pooled_outputs_PA_1, 3), perm=[0,3,1,2]) ##[batch_size, num_filters_total, max_plot_len, 1]
155 h_pool_PB_1 = tf.transpose(tf.concat(pooled_outputs_PB_1, 3), perm=[0,3,1,2]) ##[batch_size, num_filters_total, max_plot_len, 1]
156 h_pool_PC_1 = tf.transpose(tf.concat(pooled_outputs_PC_1, 3), perm=[0,3,1,2]) ##[batch_size, num_filters_total, max_plot_len, 1]
157 h_pool_PD_1 = tf.transpose(tf.concat(pooled_outputs_PD_1, 3), perm=[0,3,1,2]) ##[batch_size, num_filters_total, max_plot_len, 1]
158 h_pool_PE_1 = tf.transpose(tf.concat(pooled_outputs_PE_1, 3), perm=[0,3,1,2]) ##[batch_size, num_filters_total, max_plot_len, 1]

16~21行，定義變數 hiddenPQ_1, hiddenPA_1, ..., hiddenPE_1

16            hiddenPQ_1 = []
17            hiddenPA_1 = []
18            hiddenPB_1 = []
19            hiddenPC_1 = []
20            hiddenPD_1 = []
21            hiddenPE_1 = []

hiddenPQ_1, hiddenPA_1, ..., hiddenPE_1 變數為空的 list。

第22行 for loop, from 0, 1, ... to (N-1)

22        for sentence_ind in range(len(PQAttention)):

第22行將for loop each sentence in paragraph, from 0 to N-1。

$\textrm{PQAttention}\in R^{N \times I \times J}$ ，所以 len(PQAttention) = N

range(len(PQAttention)) 會產生一個 list，內容是 [0, 1, ..., N-1]。
sentence_ind 就是 n。

CNN1 第一部分 Generate Attention Map (wPQ_1) of First stage

fig.xx 計算 attention map (wPQ_1) 的流程

fig.xx 將計算 $\bar{a}_n$

$\bar{a}_n = maxpool(\bar{\bar{q}}_n^A) \in R^{I-d+1}$

$\bar{\bar{q}}_n^A = sigmoid(\bar{\bar{W}}_1^A \otimes P_nQ + \bar{b}_1^A) \in R^{l \times (I-d+1)}$

其中 wPQ_1 是 $\bar{a}_n$ 再加上 batch size 維度

第23~28行 compute the convolution of PnQ and the kernel WQ1

23            convPQ_attention = tf.nn.conv2d(
24                PQAttention[sentence_ind],
25                WQ1,
26                strides=[1, 1, 1, 1],
27                padding="VALID",
28                name="conv")## [batch,wordNumberP- filter_size + 1, 1,self.filter_num]

第23~28行將執行

$\textrm{convPQ}\_\textrm{attention} = P_nQ \otimes \bar{\bar{\bar{W}}}^A_1 \in R^{\textrm{batch}\_\textrm{size} \times (I-d+1) \times 1 \times l}$

Fig.xx convolution of PQAttention[sentence_ind] and WQ1

Fig.xx shows the schematic of the convolution of $P_nQ$ and $\bar{\bar{\bar{W}}}_1^A$ .

"PQAttention[sentence_ind]" _refers to _ $P_nQ$

"WQ1" refers to $\bar{\bar{W}}_1^A$ .

strides of convolution is [1, 1, 1, 1]

第66行 convPQ_attention 加上 bias，計算 sigmoid，以及轉置

66            wPQ_1 = tf.transpose(tf.sigmoid(tf.nn.bias_add(convPQ_attention,bQ1)),[0,3,2,1])

fig.xx Schematic of add bias and sigmoid into convPQ_attention

fig.xx shows the schematic corresponds to line 66.

line 66 computes $\bar{\bar{q}}_n^A= \textrm{sigmoid} ( P_nQ \otimes \bar{\bar{\bar{W}}}^A_1 + \bar{b}_1 )$

"wPQ1" is $\bar{\bar{q}}_n^A$ , "bQ1" is $\bar{b}_1$ .

before transpose, $\bar{\bar{q}}_n^A$ 維度為 : $\textrm{batch}\_\textrm{size} \times (I-d+1) \times 1 \times l$
轉置後, $\bar{\bar{q}}_n^A$ 維度為 $\textrm{batch}\_\textrm{size} \times l \times 1 \times (I-d+1)$

第67行 wPQ_1 dropout

67            wPQ_1 = tf.nn.dropout(wPQ_1,self.dropoutRate)

dropout 隨機將 Hidden Layer 一定比例 (self.dropoutRate) 前一層到下一層的連線刪去。程式中 self.dropoutRate 被設定為 0.8

第68~73行 wPQ_1 取 max pool

68            wPQ_1 = tf.nn.max_pool(
69                wPQ_1,
70                ksize=[1,self.filter_num, 1,1],
71                strides=[1, 1, 1, 1],
72                padding='VALID',
73                name="pool_pq")  ##  [batch_size,1 ,1 ,wordNumberP-filter_size+1

fig.xx Schematic of max_pool in $\bar{\bar{q}}_n^A$
fig.xx shows the schematic of max_pool in $\bar{\bar{q}}_n^A$
wPQ_1 is $\bar{a}_n$ , the word-level attention map

ksize is the size of the window for each dimension of the input tensor.
$\bar{a}_n = \textrm{wPQ}\_1 \in R^{\textrm{batch}\_{size} \times 1 \times 1 \times (I-d+1)}$

第74行 wPQ_1 展開與轉置

74            wPQ_1 = tf.transpose(tf.tile(wPQ_1,[1,self.filter_num,1,1]),[0,3,2,1])

fig.xx Schematic of tf.tile ( $\bar{a}_n$ , [1, l, 1, 1])

fig.xx $\bar{a}_n$ 維度是 $\textrm{batch}\_\textrm{size} \times 1 \times 1 \times (I-d+1)$ ，依 $[1, l, 1, 1]$ 展開後，維度變為 $\textrm{batch}\_\textrm{size} \times l \times 1 \times (I-d+1)$

轉置後維度為 $\textrm{batch}\_\textrm{size} \times (I-d+1) \times 1 \times l$

CNN1 第二部分 Generate paragraph sentence features based on query (h_pool_PQ_1)

fig.xx 計算 paragraph sentence features based on query (h_pool_PQ_1) 的流程

fig.xx 將計算 $\bar{\bar{r}}^{PQ}$

$\bar{\bar{r}}^{PQ} = [\bar{r}_1^{PQ}, \bar{r}_2^{PQ}, \cdots, \bar{r}_n^{PQ}, \cdots, \bar{r}_N^{PQ}]$

$\bar{r}_n^{PQ} = maxpool(\bar{\bar{q}}_n^R) \in R^l$

$\bar{\bar{q}}_n^R = relu(\bar{\bar{W}}_1^R \otimes P_nQ + \bar{b}_1^R) \in R^{l \times (I-d+1)}$

其中 h_pool_PQ_1 是 $\bar{\bar{r}}^{PQ}$ 再加上 batch size 維度

第29~34行 compute the convolution of PnQ and the kernel W1 to convPQ_1

29            convPQ_1 = tf.nn.conv2d(
30                PQAttention[sentence_ind],
31                W1,
32                strides=[1, 1, 1, 1],
33                padding="VALID",
34                name="conv")## [batch,wordNumberP- filter_size + 1, 1,self.filter_num]

Fig.xx convolution of PQAttention[sentence_ind] and W1

Fig.xx show the schematic of the convolution of PQAttention[sentence_ind] and W1.

PQAttention[sentence_ind] is $P_nQ$ ,

W1 is $\bar{\bar{\bar{W}}}^R_1$

convPQ_1 is $P_nQ \otimes \bar{\bar{\bar{W}}}^R_1 \in R^{\textrm{batch}\_\textrm{size} \times (I-d+1) \times 1 \times l}$

strides of convolution is [1, 1, 1, 1]

第75行計算 onesentence_hiddenPQ_1

75            onesentence_hiddenPQ_1 = tf.nn.dropout(tf.nn.relu(tf.nn.bias_add(convPQ_1, b1), name="relu"),self.dropoutRate)

75行將執行

$\bar{\bar{q}}_n^R = \textrm{dropout}(\textrm{relu}(P_nQ \otimes \bar{\bar{\bar{W}}}_1^R + \bar{b}_1^R)) \in R^{l \times (I-d+1)}$

fig.xx Schematic of computing $\bar{\bar{q}}_n^R$

fig.xx 計算 $\bar{\bar{q}}_n^R$ 的示意圖

b1 is $\bar{b}_1^R$ ,

onesentence_hiddenPQ_1 is $\bar{\bar{q}}_n^R$

$\bar{\bar{q}}_n^R \in R^{\textrm{batch}\_\textrm{size} \times (I-d+1) \times 1 \times l}$

dropout 隨機將 Hidden Layer 一定比例 (self.dropoutRate) 上一層到下一層的連結刪去。程式中 self.dropoutRate 被設定為 0.8

第76行將 N 個 onesentence_hiddenPQ_1 合併為 hiddenPQ_1

76            hiddenPQ_1.append(onesentence_hiddenPQ_1)

fig.xx Schematic of onesentence_hiddenPQ_1 append to hiddenPQ_1

fig.xx shows onesentence_hiddenPQ_1 append to hiddenPQ_1

$\textrm{hiddenPQ}\_1 = [\textrm{onesentence}\_\textrm{hiddenPQ}_1\_1, \cdots, \textrm{onesentence}\_\textrm{hiddenPQ}_n\_1, \cdots, \textrm{onesentence}\_\textrm{hiddenPQ}_N\_1]$

第87行串接 (concat) hiddenPQ_1

87      hiddenPQ_1 = tf.concat(hiddenPQ_1, 1) ## [batch,max_plot_len*(wordNumberP- filter_size + 1), 1,self.filter_num]

fig.xx Schematic of tf.concat of hiddenPQ_1

hiddenPQ_1 concat 後維度是四維

$\textrm{hiddenPQ}\_1 \in R^{\textrm{batch}\_\textrm{size} \times (N \times (I-d+1)) \times 1 \times l}$

第94行 hiddenPQ_1 squeeze and reshape

94       hiddenPQ_1 = tf.reshape(tf.squeeze(hiddenPQ_1), [batch_size, max_plot_len, (max_len[0] - filter_size + 1), self.filter_num]) ## [batch,max_plot_len,(wordNumberP- filter_size + 1),self.filter_num]

tf.squeeze removes dimensions of size 1 from the shape of a tensor.

所以 tf.squeeze(hiddenPQ_1) 後的維度是三維 $\textrm{batch}\_\textrm{size} \times (N \times (I-d+1)) \times l$

再 reshape 後是四維 $\textrm{batch}\_\textrm{size} \times N \times (I-d+1) \times l$

$\textrm{hiddenPQ}\_1 \in R^{\textrm{batch}\_\textrm{size} \times N \times (I-d+1) \times l}$

第101~106行 hiddenPQ_1 計算 max_pool

101     pooledPQ_1 = tf.nn.max_pool(
102         hiddenPQ_1,
103         ksize=[1, 1, (max_len[0] - filter_size + 1), 1],
104         strides=[1, 1, 1, 1],
105         padding='VALID',
106         name="pool")  ##  [batch_size, max_plot_len, 1, self.filter_num]

fig.xx pooledPQ_1 is computed max_pool by hiddenPQ_1

fig.xx shows pooledPQ_1 is computed max_pool by hiddenPQ_1.

$\textrm{pooledPQ}\_1 \in R^{\textrm{batch}\_\textrm{size} \times N \times 1 \times l}$

第107行將 3 個 pooledPQ_1 合併為 pooled_outputs_PQ_1

107     pooled_outputs_PQ_1.append(pooledPQ_1)

fig.xx pooled_outputs_PQ_1 is appended by pooledPQ_1

fig.xx shows pooled_outputs_PQ_1 is appended by pooledPQ_1 three times when d=1, d=3 and d=5

$\textrm{pooled}\_\textrm{outputs}\_\textrm{PQ}\_1 \in R^{\textrm{batch}\_\textrm{size} \times N \times 1 \times 3l}$

第153行 derivation to paragraph feature (h_pool_PQ_1) based on query

153 h_pool_PQ_1 = tf.transpose(tf.concat(pooled_outputs_PQ_1, 3), perm=[0,3,1,2]) ##[batch_size, num_filters_total, max_plot_len, 1]

fig.xx Schematic of tf.concat(pooled_outputs_PQ_1, 3)

fig.xx 表示 pooled_outputs_PQ_1 依照第三個維度 (3l) 連接

經過 tf.transpose 後得到 h_pool_PQ_1

$\textrm{h}\_\textrm{pool}\_\textrm{PQ}\_1 = \bar{\bar{r}}^{PQ} \in R^{ \textrm{batch}\_\textrm{size} \times 3l \times N \times 1}$

CNN1 第二部分 Generate Paragraph sentence feature based on chioce (h_pool_PA_1, ..., h_pool_PE_1)

modify bQ1 and arrow

fig.xx 計算 paragraph sentence features based on choice (以 h_pool_PA_1 為例) 的流程

fig.xx 將計算 $\bar{\bar{r}}^{PC}$ . $\bar{\bar{r}}^{PC}$ 有五個 : $\bar{\bar{r}}^{PC}_A, \bar{\bar{r}}^{PC}_B, \bar{\bar{r}}^{PC}_C, \bar{\bar{r}}^{PC}_D, \bar{\bar{r}}^{PC}_E$

$\bar{\bar{r}}^{PC} = [\bar{r}_1^{PC}, \bar{r}_2^{PC}, \cdots, \bar{r}_n^{PC}, \cdots, \bar{r}_N^{PC}]$

$\bar{r}_n^{PC} = maxpool(\bar{\bar{c}}_n^R) \in R^l$

$\bar{\bar{c}}_n^R = relu(\bar{\bar{W}}_1^R \otimes P_nC + \bar{b}_1^R) \odot \bar{a}_n = \left[ \begin{matrix} (\bar{c}_{n,1}^R)^t \odot \bar{a}_n \\ \vdots \\(\bar{c}_{n,\ell}^R)^t \odot \bar{a}_n \end{matrix} \right] \in R^{l \times (I-d+1)}$

$\bar{\bar{c}}_n^R$ is multiplied by the word-level attention map $\bar{a}_n \in R^{I-d+1}$ through the first dimension.

其中 h_pool_PA_1 是 $\bar{\bar{r}}^{PC}$ 再加上 batch size 維度

第29~34行 compute the convolution of PAAttention and the kernel W1

29            convPA_1 = tf.nn.conv2d(
30                PAAttention[sentence_ind],
31                W1,
32                strides=[1, 1, 1, 1],
33                padding="VALID",
34                name="conv")## [batch,wordNumberP- filter_size + 1, 1,self.filter_num]

PnQ error, J => K

Fig.xx convolution of PAAttention[sentence_ind] and W1

Fig.xx show the schematic of the convolution of PAAttention[sentence_ind] and W1.

$\textrm{convPA}\_\textrm{1} = \textrm{PAAttention} \otimes \bar{\bar{\bar{W}}}^R_1 \in R^{\textrm{batch}\_\textrm{size} \times (I-d+1) \times 1 \times l}$

同理，第41~64行 convolution 計算 convPB_1, convPC_1, convPD_1 and convPE_1

第77行 convPA_1 加上 bias、計算 relu 與 dropout、乘上 wPQ_1

77            onesentence_hiddenPA_1 = tf.nn.dropout(tf.nn.relu(tf.nn.bias_add(convPA_1, b1), name="relu"),self.dropoutRate)*wPQ_1

fig.xx Schematic of onesentence_hiddenPA_1

onesentence_hiddenPA_1 為 convPA_1 加上 bias，計算 relu 與 dropout，以及乘上 $\bar{a}_n$ (wPQ_1) 得到

$\textrm{onesentence}\_\textrm{hiddenPA}\_1 = \textrm{dropout}(relu(\textrm{convPA} + \textrm{b1})) \ast \bar{a}_n \in R^{\textrm{batch}\_\textrm{size} \times (I-d+1) \times 1 \times l}$

第78行將 N 個 onesentence_hiddenPA_1 合併為 hiddenPA_1

78            hiddenPA_1.append(onesentence_hiddenPA_1)

$\textrm{hiddenPA}\_1 = [\textrm{onesentence}\_\textrm{hiddenPA}_1\_1, \cdots, \textrm{onesentence}\_\textrm{hiddenPA}_n\_1, \cdots, \textrm{onesentence}\_\textrm{hiddenPA}_N\_1]$

第88行串接 (concat) hiddenPA_1

88      hiddenPA_1 = tf.concat(hiddenPA_1, 1)

$\textrm{hiddenPA}\_1 \in R^{\textrm{batch}\_\textrm{size} \times (N \times (I-d+1)) \times 1 \times l}$

第95行 hiddenPA_1 squeeze and reshape

95      hiddenPA_1 = tf.reshape(tf.squeeze(hiddenPA_1), [batch_size, max_plot_len, (max_len[0] - filter_size + 1), self.filter_num])

tf.squeeze removes dimensions of size 1 from the shape of a tensor.

tf.squeeze(hiddenPA_1) 後的維度是三維 $\textrm{batch}\_\textrm{size} \times (N \times (I-d+1)) \times l$

再 reshape 後是四維 $\textrm{batch}\_\textrm{size} \times N \times (I-d+1) \times l$

$\textrm{hiddenPA}\_1 \in R^{\textrm{batch}\_\textrm{size} \times N \times (I-d+1) \times l}$

第109~114行 hiddenPA_1 計算 max_pool

109     pooledPA_1 = tf.nn.max_pool(
110         hiddenPA_1,
111         ksize=[1, 1, (max_len[0] - filter_size + 1), 1],
112         strides=[1, 1, 1, 1],
113         padding='VALID',
114         name="pool")  ##  [batch_size, max_plot_len, 1, self.filter_num]

fig.xx pooledPQ_1 is computed max_pool by hiddenPQ_1

fig.xx shows pooledPQ_1 is computed max_pool by hiddenPQ_1.

$\textrm{pooledPA}\_1 \in R^{\textrm{batch}\_\textrm{size} \times N \times 1 \times l}$

第115行將 3 個 pooledPA_1 合併為 pooled_outputs_PA_1

115     pooled_outputs_PA_1.append(pooledPA_1)

fig.xx pooled_outputs_PQ_1 is appended by pooledPQ_1

fig.xx shows pooled_outputs_PQ_1 is appended by pooledPQ_1 three times when d=1, d=3 and d=5

$\textrm{pooled}\_\textrm{outputs}\_\textrm{PA}\_1 \in R^{\textrm{batch}\_\textrm{size} \times N \times 1 \times 3l}$

第154行 derivation to paragraph feature (h_pool_PA_1) based on choice A

154 h_pool_PA_1 = tf.transpose(tf.concat(pooled_outputs_PA_1, 3), perm=[0,3,1,2]) ##[batch_size, num_filters_total, max_plot_len, 1]

pooled_outputs_PA_1 依照第三個維度 (3l) 連接

經過 tf.transpose 後得到 h_pool_PA_1

$\textrm{h}\_\textrm{pool}\_\textrm{PA}\_1 \in R^{ \textrm{batch}\_\textrm{size} \times 3l \times N \times 1}$

同理得出 h_pool_PB_1, h_pool_PC_1, h_pool_PD_1, h_pool_PE_1, derivation to paragraph feature based on choice B, C, D and E.

# CNN1 #

# CNN 1

#CNN1# 第一部分 Generate Attention Map

#CNN1# 第二部分 Generate paragraph sentence features based on query and choices

#CNN1 overall flowchart

# CNN1 # 1~15 行

2~7行，定義變數 pooled_outputs_PQ_1, pooled_outputs_PA_1,...pooled_outputs_PE_1

8~15行，定義 W1, b1, WQ1, bQ1

#CNN1# 程式16-158行

16~21行，定義變數 hiddenPQ_1, hiddenPA_1, ..., hiddenPE_1

第22行 for loop, from 0, 1, ... to (N-1)

CNN1 第一部分 Generate Attention Map (wPQ_1) of First stage

第23~28行 compute the convolution of PnQ and the kernel WQ1

第66行 convPQ_attention 加上 bias，計算 sigmoid，以及轉置

第67行 wPQ_1 dropout

第68~73行 wPQ_1 取 max pool

第74行 wPQ_1 展開與轉置

CNN1 第二部分 Generate paragraph sentence features based on query (h_pool_PQ_1)

第29~34行 compute the convolution of PnQ and the kernel W1 to convPQ_1

第75行計算 onesentence_hiddenPQ_1

第76行將 N 個 onesentence_hiddenPQ_1 合併為 hiddenPQ_1

第87行串接 (concat) hiddenPQ_1

第94行 hiddenPQ_1 squeeze and reshape

第101~106行 hiddenPQ_1 計算 max_pool

第107行將 3 個 pooledPQ_1 合併為 pooled_outputs_PQ_1

第153行 derivation to paragraph feature (h_pool_PQ_1) based on query

CNN1 第二部分 Generate Paragraph sentence feature based on chioce (h_pool_PA_1, ..., h_pool_PE_1)

第29~34行 compute the convolution of PAAttention and the kernel W1

第77行 convPA_1 加上 bias、計算 relu 與 dropout、乘上 wPQ_1

第78行將 N 個 onesentence_hiddenPA_1 合併為 hiddenPA_1

第88行串接 (concat) hiddenPA_1

第95行 hiddenPA_1 squeeze and reshape

第109~114行 hiddenPA_1 計算 max_pool

第115行將 3 個 pooledPA_1 合併為 pooled_outputs_PA_1

第154行 derivation to paragraph feature (h_pool_PA_1) based on choice A

results matching ""

No results matching ""

# CNN 1

#CNN1# 第一部分 Generate Attention Map

#CNN1# 第二部分 Generate paragraph sentence features based on query and choices

#CNN1 overall flowchart

# CNN1 # 1~15 行

2~7行，定義變數 pooled_outputs_PQ_1, pooled_outputs_PA_1,...pooled_outputs_PE_1

8~15行，定義 W1, b1, WQ1, bQ1

#CNN1# 程式16-158行

16~21行，定義變數 hiddenPQ_1, hiddenPA_1, ..., hiddenPE_1

第22行 for loop, from 0, 1, ... to (N-1)

CNN1 第一部分 Generate Attention Map (wPQ_1) of First stage

第23~28行 compute the convolution of PnQ and the kernel WQ1

第66行 convPQ_attention 加上 bias，計算 sigmoid，以及轉置

第67行 wPQ_1 dropout

第68~73行 wPQ_1 取 max pool

第74行 wPQ_1 展開與轉置

CNN1 第二部分 Generate paragraph sentence features based on query (h_pool_PQ_1)

第29~34行 compute the convolution of PnQ and the kernel W1 to convPQ_1

第75行 計算 onesentence_hiddenPQ_1

第76行 將 N 個 onesentence_hiddenPQ_1 合併為 hiddenPQ_1

第87行 串接 (concat) hiddenPQ_1

第94行 hiddenPQ_1 squeeze and reshape

第101~106行 hiddenPQ_1 計算 max_pool

第107行 將 3 個 pooledPQ_1 合併為 pooled_outputs_PQ_1

第153行 derivation to paragraph feature (h_pool_PQ_1) based on query

CNN1 第二部分 Generate Paragraph sentence feature based on chioce (h_pool_PA_1, ..., h_pool_PE_1)

第29~34行 compute the convolution of PAAttention and the kernel W1

第77行 convPA_1 加上 bias、計算 relu 與 dropout、乘上 wPQ_1

第78行 將 N 個 onesentence_hiddenPA_1 合併為 hiddenPA_1

第88行 串接 (concat) hiddenPA_1

第95行 hiddenPA_1 squeeze and reshape

第109~114行 hiddenPA_1 計算 max_pool

第115行 將 3 個 pooledPA_1 合併為 pooled_outputs_PA_1

第154行 derivation to paragraph feature (h_pool_PA_1) based on choice A

results matching ""

No results matching ""

第75行計算 onesentence_hiddenPQ_1

第76行將 N 個 onesentence_hiddenPQ_1 合併為 hiddenPQ_1

第87行串接 (concat) hiddenPQ_1

第107行將 3 個 pooledPQ_1 合併為 pooled_outputs_PQ_1

第78行將 N 個 onesentence_hiddenPA_1 合併為 hiddenPA_1

第88行串接 (concat) hiddenPA_1

第115行將 3 個 pooledPA_1 合併為 pooled_outputs_PA_1