This project aims to implement and improve upon the classical Chinese poetry generation system proposed in "Chinese Poetry Generation with Planning based Neural Network".
Training and Predicting:
data: directory for
processed data, pre-processed
starterkit data, and generated poetry
model: directory for saved neural network models
log: directory for training logs
notebooks: directory for exploratory/experimental IPython notebooks
training_scripts: directory for sample scripts used for training several basic models
model.py: graph definition
train.py: training logic
predict.py: prediction logic
plan.py: keyword planning logic
main.py: user interaction program
To prepare training data:
This scrip does the following in order:
- Parse corpus
- Build vocab
- Filter quatrains
- Count words
- Rank words
- Generate training data
The TextRank algorithm may take many hours to run.
Instead, you can choose to interrupt the iterations and stop it early,
when the progress shown in the terminal has remained stationary for a long time.
Then, to generate the word embedding:
As an alternative, we have also provided pre-processed data in the
You may simply perform
cp data/starterkit/* data/processedto skip the data processing step
To train the default model:
To view the full list of configurable training parameters:
python train.py -h
Thus you should almost always train a new model after modifying any of the parameters.
Models are by default saved to
model/. To train a new model, you may either remove the existing model from
or specify a new model path during training with
python train.py --model_dir :new_model:dir:
To start the user interation program:
Similarly, to view the full list of configurable predicting parameters:
python main.py -h
The program currently does not check that predication parameters matches corresponding training parameters.
User has to ensure, in particular, the data loading modes correspond with the ones used during traing.
(e.g. If training data is
aligned, then prediction input should also be
Otherwise, results may range from subtle differences in output to total crash.
To generate sample poems for evaluation:
This script by default randomly samples 4000 poems from the training data and saves them as
humanpoems. Then it uses entire poems as inputs to the planner, to create keywords for the predictor. The predicted poems are saved as
To evaluate the generated poems: