Chinese Word Segmentation
jcyk/CWS
Source code for an ACL2016 paper of Chinese word segmentation
Fig.1 Files in jcyk/CWS Github.
README
CWS
This code implements the word segmentation algorithm in paper Neural Word Segmentation Learning for Chinese, ACL 2016.
A faser implementation using dynet as backend is avaiable.
This new dynet based version can be used using python train.py -d
Usage (theano, also helpful to dynet version)
- train
python train.py -t
First check the hyperparameter settings in train.py.
The training procedure will result a config file at the beginning in which the hyperparameter settings are preserved, and output the trained model parameters to *.npz per epoch.
- test
pyton test.py params.npz input_file output_path config_file
Specify the file that stores the model parameters as param.npz as well as the corresponding configuration file config_file.
The test procedure will read data from input_file and output result to output_path.
- evaluate
To see the best result (F1-score 95.5) on PJU dataset, first generate the output file through the trained model (