ACL changes the policy of Arxiv submission

According to the new ACL submission policy for submitting papers to preprint services such as Arxiv, there is a period that all ACL submissions are forbidden to be submitted to Arxiv. The period begins one month before the submission deadline and ends at the notification date. The changing does a lot of impact for NLP researchers. Some people just get used to posting their paper on Arxiv right after the submission; this is strictly forbidden now. If you want to submit to Arxiv, then do it one month before the deadline.
As the length of the period is about two months, which means that in this two months, NLP researchers can reduce the frequency of checking Arxiv papers, just like a holiday. However, it will also have a negative impact. For researchers also want to submit to ICLR,  they have to carefully check the schedule of ACL submissions. It is possible that the forbidding period conflicts with the submission deadline of ICLR, then the researchers have to choose to submit to either conference.

ACL 2017 first day review

Machine Translation Session

A Convolutional Encoder Model for Neural Machine Translation

Jonas Gehring, Michael Auli, David Grangier and Yann Dauphin

  • The encoder is replaced by convolutional networks
  • Position embeddings are used
  • Two-stack architecture: there are two separate nets predicting the key and value for the attention
  • Interestingly, the author does not use two-stack architecture in their later research

Deep Neural Machine Translation with Linear Associative

Mingxuan Wang, Zhengdong Lu, Jie Zhou and Qun Liu

  • The residual connection is introduced inside the gate function, modified from GRU
  • It’s quite interesting the gain is very large by just replacing GRU with LAU

Alternative Objective Functions for Training MT Evaluation

Miloš Stanojević and Khalil Sima’an

  • A training-based evaluation method for machine translation
  • The evaluate itself is done by looking at the kendall’s tau agianst the human ranking data

General review of the conference

The accecpted papers tend to have a detailed comparison with other related methods, rather than just interesting. Beside the experiments, the structure of paper is usually clear and supports the core idea. This point is also reflected by classifying the sentiment (figure below). In the poster session, I have found the papers very diverse, there are quite a lot papers solving problems that I didn’t heard about.

Many people have different backgrounds, and some of them are not in the academia.People I have talked include entrepreneurs, recruiters and general managers. The nice thing about the hotel is the free coffee and food. Awesome experience.

Install torch with Intel MKL and NCCL

Intel MKL allows fast math computation on CPU and NCCL enables fast multiple GPU communication. Both of them are desirable for the running experiments, so let's get them.

Install Intel MKL

I will prefer to install all the packages in $HOME/apps. Thus, first create the directory.

mkdir ~/apps

Get the download link from Intel MKL's page, and download on the server: .

cd ~/apps
# Put your real link here.
tar xzvf l_mkl*.tgz
rm l_mkl_*.tgz
cd l_mkl_*

Install Torch

Clone torch library and run the installation script.

git clone ~/apps/torch --recursive
cd ~/apps/torch
bash install-deps

Error: no default constructor exists for class ….

If you get this error, try to get the latest cuda driver. Make sure the driver version is not a pre-release version.

Install NCCL

Clone the NCCL repository and compile the project.

cd ~/apps
git clone
cd nccl
make CUDA_HOME=/usr/local/cuda test
sudo make install
# Copy libnccl files to cuda's folder,
# so you don't have to modify the environment paths
sudo cp /usr/local/lib/libnccl* /usr/local/cuda/lib64/

Restart the terminal to make the torch avaiable

If you are running tmux, try source ~/.zshrc or source ~/.bashrc.


It turns out that Torch does not yet support cudnn 6.0 currently.

NAACL 2016 Tutorials

Hands-on Learning to Search for Structured Prediction

Deep Learning and Continuous Representations for NLP

Getting mad at Theano

Theano is a fantastic computational graph builder and optimizer. However, the graph optimization can drive you mad when it gives these two errors:

  • Out of Memory
  • Index out of bounds

Now I'm encountering both of these issues. For the out of memory error, the “omnipotent” solution that recommended in the Theano user group is to reduce the batch size. Well, reducing the batch size is a workaround, but considerably slows down the training speed. A clever way for debugging is to turn on the exception_verbosity=high option, which gives a list of storage map, where you can see which operation occupies the vast majority of the GPU memory. Another fix that works for me is to use Theano APIs whenever possible. For example, using T.nnet.cross_entropy to compute loss.

Now, let's talk about Index out of bounds. Speak frankly, I don't have a good solution for this one. The error happens in the forward graph, then using test values (tesnor.tag.test_value) can help to solve the problem easily. The tough situation is that the error happens at backpropagation. I got this problem when implementing a Neural Machine Translation model. This issue is very tough to solve because the backward graph basically is undebugable for normal users, a debug print of graph is unreadable if you don't have a good knowledge about what the graph optimization engine is doing. Finally, the only possible solution is to run all the graph on numpy, and hope the same error can be caught by doing this.