Update equation for the hidden-to-output weights

The input of jj-th neuron on $m$-th panel in the output layer is obtained as

uj,m=uj=v¯wjh¯=i=1Nwijhi=i=1Nwijwki(1) u_{j, m} = u_j = \bar{v}_{w_j}' \cdot \bar{h} = \sum_{i = 1}^N w_{ij}' h_i = \sum_{i = 1}^N w_{ij}' w_{ki} \pod{\text{1}}

and the derivative of EE to uj,mu_{j,m} is

Euj,m=yj,mδjjo,mej,m(2) \frac{\partial E}{\partial u_{j,m}} = y_{j,m} - \delta_{j j_{o,m}} \doteq e_{j,m} \pod{\text{2}}

By using (1)(1) and (2)(2), the derivative of EE to wijw_{ij}' is obtained as

Ewij=m=1CEuj,muj,mwij=m=1Cej,mhi(3) \frac{\partial E}{\partial w_{ij}'} = \sum_{m = 1}^C \frac{\partial E}{\partial u_{j,m}} \frac{\partial u_{j,m}}{\partial w_{ij}'} = \sum_{m = 1}^C e_{j,m} h_i \pod{\text{3}}

by which, the update equation for the hidden-to-output weights is

wij(new)=wij(old)ηEwij=wij(old)ηm=1Cej,mhi(4) {w_{ij}'}^{(new)} = {w_{ij}'}^{(old)} - \eta \frac{\partial E}{\partial w_{ij}'} = {w_{ij}'}^{(old)} - \eta \sum_{m = 1}^C e_{j,m} h_i \pod{\text{4}}

which can also be represented as

v¯wj(new)=v¯wj(old)ηm=1Cej,mh¯=v¯wj(old)ηm=1Cej,mv¯wk, j=1,2,,V(5) {\bar{v}_{w_j}'}^{(new)} = {\bar{v}_{w_j}'}^{(old)} - \eta \sum_{m = 1}^C e_{j,m} \bar{h} = {\bar{v}_{w_j}'}^{(old)} - \eta \sum_{m = 1}^C e_{j,m} \bar{v}_{w_k} , \ j = 1, 2, \cdots, V \pod{\text{5}}

where the prediction error ej,me_{j,m} is summed across all context words in the output layer.
Note that it is needed to apply the update equation for every hidden-to-output weight for each training instance.

Update equation for input-to-hidden weights

The output of the hidden layer is

hi=vwk,i=wki(6) h_i = v_{w_k, i} = w_{ki} \pod{\text{6}}

By using (1)(1) and (2)(2), the derivative of EE to hih_i is obtained as

Ehi=j=1Vm=1CEuj,muj,mhi=j=1Vm=1Cej,mwij(7) \frac{\partial E}{\partial h_i} = \sum_{j = 1}^V \sum_{m = 1}^C \frac{\partial E}{\partial u_{j,m}} \frac{\partial u_{j,m}}{\partial h_i} = \sum_{j = 1}^V \sum_{m = 1}^C e_{j,m} w_{ij}' \pod{\text{7}}

By using (6)(6) and (7)(7), we can obtain the derivative of EE to wkiw_{ki} as

Ewki=Ehihiwki=j=1Vm=1Cej,mwij(8) \frac{\partial E}{\partial w_{ki}} = \frac{\partial E}{\partial h_i} \frac{\partial h_i}{\partial w_{ki}} = \sum_{j = 1}^V \sum_{m = 1}^C e_{j,m} w_{ij}' \pod{\text{8}}

Thus, the update equation for the input-to-hidden weights is

wki(new)=wki(old)ηEwki=wki(old)ηj=1Vm=1Cej,mwij(9) w_{ki}^{(new)} = w_{ki}^{(old)} - \eta \frac{\partial E}{\partial w_{ki}} = w_{ki}^{(old)} - \eta \sum_{j = 1}^V \sum_{m = 1}^C e_{j,m} w_{ij}' \pod{\text{9}}

or equivalently

v¯wk(new)=v¯wk(old)ηj=1Vm=1Cej,mv¯wj(10) \bar{v}_{w_k}^{(new)} = \bar{v}_{w_k}^{(old)} - \eta \sum_{j = 1}^V \sum_{m = 1}^C e_{j,m} \bar{v}_{w_j}' \pod{\text{10}}

results matching ""

    No results matching ""