Neural Word Segmentation Learning for Chinese

MM Chiou 0627/2018


Abstract

Chinese word segmentation is usually formalized as a character-based sequence labeling task, where only contextual information within fixed size local windows and interactions between adjacent tags can be captured.

An neural framework which thoroughly eliminates context windows was proposed to utilize complete sementation history.

The model employs a gated combination neural network over characters to produce distributed representations of word candidates, which are given to a LSTM language scoring model.

Experiments on the benchmark datasets show that the model achieve competitive performances with previous state-of-the art method.

Introduction

Chinese are written without explicit word delimiters. Therefore, word segmentation is a preliminary step for processing those languages.

Most methods formalize the Chinese word segmentation (CWS) as a sequence labeling problem with character position tags, which can be handled with supervised learning methods such as Maximum Entropy and Conditional Random Fields. However, those methods heavily depend on the choice of handcrafted features.

Neural models have been widely used for NLP tasks. For the task of CWS, the general neural network architecture has been adapted for sequence labeling [3], and character embeddings are used as input to a two-layer network.

The interaction between local context and previous tag are modeled in [4].

A gated recursive neural network was proposed to model the feature combinations of context characters [5].

An LSTM architecture was used to capture potential long-distance dependencies [6], and alleviates the limitation of the size of context window. However, another window is introduced for hidden states.

All these models are designed to solve CWS by assigning labels to the characters in the sequence one by one.

At each time step of inference, these models compute the tag scores of character based on (i) context features within a fixed sized local window and (ii) tagging history of previous one.

Nevertheless, the tag-tag transition is insufficient to model the complicated influence from previous segmentation decisions, though it could sometimes be crucial clue to later segmentation decisions.

The fixed context window size is broadly adopted by these methods for feature engineering, and also restricts the flexibility of modeling diverse distances.

Moreover, word-level information, which is being the greater granularity unit as suggested in [7], remains unemployed.

To alleviate the drawbacks inside previous methods and release those inconvenient constrains such as the fixed sized context window, this work makes a latest attempt to re-formalize CWS as a direct segmentation learning task.

This method does not make tagging decisions on individual characters, but directly evaluates the the relative likelihood of different segmented sentences and search for a segmentation with the highest score.

To feature a segmented sentence, a series of distributed vector representations are generated to characterize the corresponding word candidates.

Such a representation setting makes the decoding different from previous methods.

More discriminative features can be captured.

Though the vector building is word centered, the this scoring model covers all three processing levels from character, word until sentence.

First, the distributed representation starts from character embedding, as in the context of word segmentation, the nn-gram data sparsity issue makes it impractical to use word vectors immediately.

Second, as the word candidate representation is derived from its characters, the inside character structure will also be encoded, thus it can be used to determine the word likelihood of its own.

Third, to evaluate how a segmented sentence makes sense through word interacting, an LSTM is used to chain together word candidate incrementally and construct the representation of partially segmented sentence at each decoding step, so that the coherence between next word candidate and previous segmentation history can be depicted.

This model is the first attemp to model the entire contents of the segenter's state, including teh complete history of both segmentation decisions and input characters.

The comparisons of feature windows used in different models are shown in table.

Table 1: Feature windows of different models.i(j)indexes the current character(word) that is under scoring.

Compared to both sequence labeling schemes and word-based models in the past, this model thoroughly eliminates context windows and can capture the complete history of segmentation decisions, which offers more possibilities to effectively and accurately model segmentation context.

[0]

Neural Word Segmentation Learning for Chinese, 2016, cs.CL.

[1]

https://github.com/jcyk/CWS

[2]

https://github.com/jcyk/greedyCWS

[3]

Xiaoqing Zheng, Hanyang Chen, and Tianyu Xu. 2013. Deep learning for Chinese word segmentation and POS tagging. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 647–657.

[4]

Wenzhe Pei, Tao Ge, and Baobao Chang. 2014. Maxmargin tensor neural network for chinese word segmentation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 293–303.

[5]

Xinchi Chen, Xipeng Qiu, Chenxi Zhu, and Xuanjing Huang. 2015a. Gated recursive neural network for chinese word segmentation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pages 1744–1753.

[6]

Chen, Xipeng Qiu, Chenxi Zhu, Pengfei Liu, and Xuanjing Huang. 2015b. Long short-term memory neural networks for chinese word segmentation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1197–1206.

[7]

Chang-Ning Huang and Hai Zhao. 2006. Which is essential for chinese word segmentation: Character versus word. In The 20th Pacific Asia Conference on Language, Information and Computation, pages 1–12.

results matching ""

    No results matching ""