Skip-gram Model

Architecture of the skip-gram model.

Fig. shows the architecture of the skip-gram model.A word is input at the input layer to predict its context words set as the target words at the output layer.

The output of the hidden layer is

h¯=v¯wk(1) \bar{h} = \bar{v}_{w_k} \pod{\text{1}}

or represented component-wise as

hi=vwk,i=wki(2) h_i = v_{w_k, i} = w_{ki} \pod{\text{2}}

On the output layer, instead of outputing one multinomial distribution, output CC multinomial distribtions with each multinomial distribtion computed using the same hidden-to-output weight matrix W¯¯\bar{\bar{W}}'. The input of jj-th neuron on $m$-th panel in the output layer is obtained as

uj,m=uj=v¯wjh¯=i=1Nwijhi=i=1Nwijwki(3) u_{j, m} = u_j = \bar{v}_{w_j}' \cdot \bar{h} = \sum_{i = 1}^N w_{ij}' h_i = \sum_{i = 1}^N w_{ij}' w_{ki} \pod{\text{3}}

where uj,mu_{j, m} of all panels are the same since they share the same weights. The probability of jj-th (j=1,2,,Vj = 1, 2, \cdots, V) word on mm-th panel is

yj,m=p(wj,mwk)=euj,mj=1Veuj=ev¯wjh¯j=1Vev¯wjh¯(4) y_{j, m} = p(w_{j, m}| w_k) = \frac{e^{u_{j, m}}}{\displaystyle \sum_{j' = 1}^V e^{u_{j'}}} = \frac{e^{\bar{v}_{w_j}' \cdot \bar{h} }}{\displaystyle \sum_{j' = 1}^V e^{\bar{v}_{w_{j'}}' \cdot \bar{h}}} \pod{\text{4}}

It is desired to maximize the probability of the context words given the input word.

The loss function is defined as

E=lnp(wjo,1,wjo,2,,wjo,Cwk)=lnm=1Cp(wjo,mwk)=lnm=1Cyjo,m,m=lnm=1Cev¯wjo,mh¯j=1Vev¯wjh¯(5) E = -\ln p(w_{j_{o, 1}}, w_{j_{o, 2}}, \cdots, w_{j_{o, C}} | w_k) = -\ln \prod_{m = 1}^C p(w_{j_{o, m}} | w_k) = -\ln \prod_{m = 1}^C y_{j_{o,m}, m} = -\ln \prod_{m = 1}^C \frac{e^{\bar{v}_{w_{j_{o,m}}}' \cdot \bar{h} }}{\displaystyle \sum_{j' = 1}^V e^{\bar{v}_{w_{j'}}' \cdot \bar{h}}} \pod{\text{5}}

where the subscript jo,mj_{o,m} means the word with the subscript is the mm-th target context word in Cx(wk)Cx(w_k).

The derivative of EE to uj,mu_{j,m} is

Euj,m=yj,mδjjo,mej,m(6) \frac{\partial E}{\partial u_{j,m}} = y_{j,m} - \delta_{j j_{o,m}} \doteq e_{j,m} \pod{\text{6}}

results matching ""

    No results matching ""