Update equation for the hidden-to-output weights
The input of j-th neuron on $m$-th panel in the output layer is obtained as
uj,m=uj=v¯wj′⋅h¯=i=1∑Nwij′hi=i=1∑Nwij′wki(1)
and the derivative of E to uj,m is
∂uj,m∂E=yj,m−δjjo,m≐ej,m(2)
By using (1) and (2), the derivative of E to wij′ is obtained as
∂wij′∂E=m=1∑C∂uj,m∂E∂wij′∂uj,m=m=1∑Cej,mhi(3)
by which, the update equation for the hidden-to-output weights is
wij′(new)=wij′(old)−η∂wij′∂E=wij′(old)−ηm=1∑Cej,mhi(4)
which can also be represented as
v¯wj′(new)=v¯wj′(old)−ηm=1∑Cej,mh¯=v¯wj′(old)−ηm=1∑Cej,mv¯wk, j=1,2,⋯,V(5)
where the prediction error ej,m is summed across all context words in the output layer.
Note that it is needed to apply the update equation for every hidden-to-output weight for each training instance.
The output of the hidden layer is
hi=vwk,i=wki(6)
By using (1) and (2), the derivative of E to hi is obtained as
∂hi∂E=j=1∑Vm=1∑C∂uj,m∂E∂hi∂uj,m=j=1∑Vm=1∑Cej,mwij′(7)
By using (6) and (7), we can obtain the derivative of E to wki as
∂wki∂E=∂hi∂E∂wki∂hi=j=1∑Vm=1∑Cej,mwij′(8)
Thus, the update equation for the input-to-hidden weights is
wki(new)=wki(old)−η∂wki∂E=wki(old)−ηj=1∑Vm=1∑Cej,mwij′(9)
or equivalently
v¯wk(new)=v¯wk(old)−ηj=1∑Vm=1∑Cej,mv¯wj′(10)