This paper introduces the Deep Recurrent Attention Writer (DRAW) neural network architecture for image generation.
DRAW network combine a novel spatial attention mechanism with a sequential VAE framework that allows for the iterative construction of complex images.
1. As shown, the architecture is a pair of recurrent neural networks: (1) an encoder network that compresses the images during training, and (2) a decoder that reconstitutes image after receiving codes. The whole system is trained end to end by gradient descent.
2. Three major differences:
(1) the encoder is privy to the decoder's previous outputs, allowing it to tailor the codes it sends according to the decoder's behaviour so far.
(2) The decoder's outputs are successively added to the distribution that will ultimately generate the data.
(3) A dynamically updated attention mechanism is used to focus on specific regions.
3. Selective attention model: an array of 2D Gaussian filters is applied to the image, yielding an image 'patch; of smoothly varying location and zoom, similar to  and . The attention model makes it possible for the network to only focus on a certain region of interest. It is similar with , but being differentiable.
1. The use of LSTM/RNN in images is interesting. It actually mimics the drawing process. So how do human practically makes a video or film? First shoot every scene (short clips of videos), then link these clips together to form a film. Is it possible to do this similar with this paper's insight? Seems like the network in network (RNN in RNN) structure from the Feedback Network paper combined with this paper could work.
2. The differentiable attention model: pay attention. Could be useful in a lot of applications.
3. I have seen several papers about VAE. The difference is sometimes just the input. Like in this paper, the input of the VAE is a combination of several information. How to make use of each input or just concatenation is enough?
 Graves, Alex. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.
 Graves, Alex, Wayne, Greg, and Danihelka, Ivo. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
 Mnih, Volodymyr, Heess, Nicolas, Graves, Alex, et al. Re- current models of visual attention. In Advances in Neural Information Processing Systems, pp. 2204–2212, 2014.