Deep Learning - Recurrent Neural Networks

Gated recurrent unit (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. Their performance on polyphonic music modeling and speech signal modeling was found to be similar to that of long short-term memory (LSTM). However, GRUs have been shown to exhibit better performance on smaller datasets. They have fewer parameters than LSTM, as they lack an output gate.

RMSprop optimizer

https://www.quora.com/Why-is-it-said-that-RMSprop-optimizer-is-recommended-in-training-recurrent-neural-networks-What-is-the-explanation-behind-it

http://ruder.io/optimizing-gradient-descent/

Hyperparameters

hidden_layer_size 如何选择?

BPTT

https://distill.pub/2016/augmented-rnns/

References

[1] Understanding LSTM Networks https://colah.github.io/posts/2015-08-Understanding-LSTMs/
[2] Towards Data Science - The fall of RNN / LSTM https://towardsdatascience.com/the-fall-of-rnn-lstm-2d1594c74ce0
[3] The Unreasonable Effectiveness of Recurrent Neural Networks http://karpathy.github.io/2015/05/21/rnn-effectiveness/