Chinese Word Segmentation

jcyk/CWS

Source code for an ACL2016 paper of Chinese word segmentation

Fig.1 Files in jcyk/CWS Github.

README

CWS

This code implements the word segmentation algorithm in paper Neural Word Segmentation Learning for Chinese, ACL 2016.

A faser implementation using dynet as backend is avaiable.

This new dynet based version can be used using python train.py -d

Usage (theano, also helpful to dynet version)

  • train python train.py -t

First check the hyperparameter settings in train.py.

The training procedure will result a config file at the beginning in which the hyperparameter settings are preserved, and output the trained model parameters to *.npz per epoch.

  • test pyton test.py params.npz input_file output_path config_file

Specify the file that stores the model parameters as param.npz as well as the corresponding configuration file config_file.

The test procedure will read data from input_file and output result to output_path.

  • evaluate

To see the best result (F1-score 95.5) on PJU dataset, first generate the output file through the trained model (

results matching ""

    No results matching ""