Getting mad at Theano

Theano is a fantastic computational graph builder and optimizer. However, the graph optimization can drive you mad when it gives these two errors:

  • Out of Memory
  • Index out of bounds

Now I'm encountering both of these issues. For the out of memory error, the “omnipotent” solution that recommended in the Theano user group is to reduce the batch size. Well, reducing the batch size is a workaround, but considerably slows down the training speed. A clever way for debugging is to turn on the exception_verbosity=high option, which gives a list of storage map, where you can see which operation occupies the vast majority of the GPU memory. Another fix that works for me is to use Theano APIs whenever possible. For example, using T.nnet.cross_entropy to compute loss.

Now, let's talk about Index out of bounds. Speak frankly, I don't have a good solution for this one. The error happens in the forward graph, then using test values (tesnor.tag.test_value) can help to solve the problem easily. The tough situation is that the error happens at backpropagation. I got this problem when implementing a Neural Machine Translation model. This issue is very tough to solve because the backward graph basically is undebugable for normal users, a debug print of graph is unreadable if you don't have a good knowledge about what the graph optimization engine is doing. Finally, the only possible solution is to run all the graph on numpy, and hope the same error can be caught by doing this.