This is a PyTorch implementation of Tree-LSTM as described in the paper Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks by Kai Sheng Tai, Richard Socher, and Christopher Manning. On the semantic similarity task using the SICK dataset, this implementation reaches:
--lr 0.010 --wd 0.0001 --optim adagrad --batchsize 25
--lr 0.025 --wd 0.0001 --optim adagrad --batchsize 25 --freeze_embed
0.2532are the numbers reported in the original paper.
requirements.txtNote: Currently works with PyTorch 0.4.0. Switch to the
pytorch-v0.3.1branch if you want to use PyTorch 0.3.1.
Before delving into how to run the code, here is a quick overview of the contents:
fetch_and_preprocess.shto download the SICK dataset, Stanford Parser and Stanford POS Tagger, and Glove word vectors (Common Crawl 840) -- Warning: this is a 2GB download!), and additionally preprocees the data, i.e. generate dependency parses using Stanford Neural Network Dependency Parser.
main.pydoes the actual heavy lifting of training the model and testing it on the SICK dataset. For a list of all command-line arguments, have a look at
checkpoints/directory with the name specified by the command line argument
Next, these are the different ways to run the code here to train a TreeLSTM model.
If you have a working Python3 environment, simply run the following sequence of steps:
- bash fetch_and_preprocess.sh - pip install -r requirements.txt - python main.py
If you want to use a Docker container, simply follow these steps:
- docker build -t treelstm . - docker run -it treelstm bash - bash fetch_and_preprocess.sh - python main.py
If you want to use a Docker container, but want to persist data and checkpoints in your local filesystem, simply follow these steps:
- bash fetch_and_preprocess.sh - docker build -t treelstm . - docker run -it --mount type=bind,source="$(pwd)",target="/root/treelstm.pytorch" treelstm bash - python main.py
NOTE: Setting the environment variable OMP_NUM_THREADS=1 usually gives a speedup on the CPU. Use it like
OMP_NUM_THREADS=1 python main.py. To run on a GPU, set the CUDA_VISIBLE_DEVICES instead. Usually, CUDA does not give much speedup here, since we are operating at a batchsize of
--sparseargument will enable sparse gradient updates for
nn.Embedding, potentially reducing memory usage.
This is my first PyTorch based implementation, and might contain bugs. Please let me know if you find any!