Machine Translation Session
A Convolutional Encoder Model for Neural Machine Translation
Jonas Gehring, Michael Auli, David Grangier and Yann Dauphin
- The encoder is replaced by convolutional networks
- Position embeddings are used
- Two-stack architecture: there are two separate nets predicting the key and value for the attention
- Interestingly, the author does not use two-stack architecture in their later research
Deep Neural Machine Translation with Linear Associative
Mingxuan Wang, Zhengdong Lu, Jie Zhou and Qun Liu
- The residual connection is introduced inside the gate function, modified from GRU
- It’s quite interesting the gain is very large by just replacing GRU with LAU
Alternative Objective Functions for Training MT Evaluation
Miloš Stanojević and Khalil Sima’an
- A training-based evaluation method for machine translation
- The evaluate itself is done by looking at the kendall’s tau agianst the human ranking data
General review of the conference
The accecpted papers tend to have a detailed comparison with other related methods, rather than just interesting. Beside the experiments, the structure of paper is usually clear and supports the core idea. This point is also reflected by classifying the sentiment (figure below). In the poster session, I have found the papers very diverse, there are quite a lot papers solving problems that I didn’t heard about.
Many people have different backgrounds, and some of them are not in the academia.People I have talked include entrepreneurs, recruiters and general managers. The nice thing about the hotel is the free coffee and food. Awesome experience.
Intel MKL allows fast math computation on CPU and NCCL enables fast multiple GPU communication. Both of them are desirable for the running experiments, so let's get them.
Install Intel MKL
I will prefer to install all the packages in
$HOME/apps. Thus, first create the directory.
Get the download link from Intel MKL's page, and download on the server: https://software.intel.com/en-us/mkl .
cd ~/apps # Put your real link here. wget http://registrationcenter-download.intel.com/akdlm/irc_nas/tec/xxx/l_mkl_xxx.tgz tar xzvf l_mkl*.tgz rm l_mkl_*.tgz cd l_mkl_* ./install.sh
Clone torch library and run the installation script.
git clone https://github.com/torch/distro.git ~/apps/torch --recursive cd ~/apps/torch bash install-deps ./install.sh
Error: no default constructor exists for class ….
If you get this error, try to get the latest cuda driver. Make sure the driver version is not a pre-release version.
Clone the NCCL repository and compile the project.
cd ~/apps git clone https://github.com/NVIDIA/nccl cd nccl make CUDA_HOME=/usr/local/cuda test sudo make install # Copy libnccl files to cuda's folder, # so you don't have to modify the environment paths sudo cp /usr/local/lib/libnccl* /usr/local/cuda/lib64/
Restart the terminal to make the torch avaiable
If you are running tmux, try
source ~/.zshrc or
It turns out that Torch does not yet support cudnn 6.0 currently.
Hands-on Learning to Search for Structured Prediction
Deep Learning and Continuous Representations for NLP
Theano is a fantastic computational graph builder and optimizer. However, the graph optimization can drive you mad when it gives these two errors:
- Out of Memory
- Index out of bounds
Now I'm encountering both of these issues. For the out of memory error, the “omnipotent” solution that recommended in the Theano user group is to reduce the batch size. Well, reducing the batch size is a workaround, but considerably slows down the training speed. A clever way for debugging is to turn on the
exception_verbosity=high option, which gives a list of storage map, where you can see which operation occupies the vast majority of the GPU memory. Another fix that works for me is to use Theano APIs whenever possible. For example, using
T.nnet.cross_entropy to compute loss.
Now, let's talk about Index out of bounds. Speak frankly, I don't have a good solution for this one. The error happens in the forward graph, then using test values (
tesnor.tag.test_value) can help to solve the problem easily. The tough situation is that the error happens at backpropagation. I got this problem when implementing a Neural Machine Translation model. This issue is very tough to solve because the backward graph basically is undebugable for normal users, a debug print of graph is unreadable if you don't have a good knowledge about what the graph optimization engine is doing. Finally, the only possible solution is to run all the graph on numpy, and hope the same error can be caught by doing this.