Gated recurrent unit

Last updated

In artificial neural networks, the gated recurrent unit (GRU) is a gating mechanism used in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. [1] The GRU is like a long short-term memory (LSTM) with a gating mechanism to input or forget certain features, [2] but lacks a context vector or output gate, resulting in fewer parameters than LSTM. [3] GRU's performance on certain tasks of polyphonic music modeling, speech signal modeling and natural language processing was found to be similar to that of LSTM. [4] [5] GRUs showed that gating is indeed helpful in general, and Bengio's team came to no concrete conclusion on which of the two gating units was better. [6] [7]

Contents

Architecture

There are several variations on the full gated unit, with gating done using the previous hidden state and the bias in various combinations, and a simplified form called minimal gated unit. [8]

The operator denotes the Hadamard product in the following.

Fully gated unit

Gated Recurrent Unit, fully gated version Gradient Recurrent Unit.svg
Gated Recurrent Unit, fully gated version

Initially, for , the output vector is .

Variables ( denotes the number of input features and the number of output features):

Activation functions

Alternative activation functions are possible, provided that .

Type 1 Gated Recurrent Unit, type 1.svg
Type 1
Type 2 Gradient Recurrent Unit, type 2.svg
Type 2
Type 3 Gradient Recurrent Unit, type 3.svg
Type 3

Alternate forms can be created by changing and [9]

Minimal gated unit

The minimal gated unit (MGU) is similar to the fully gated unit, except the update and reset gate vector is merged into a forget gate. This also implies that the equation for the output vector must be changed: [10]

Variables

Light gated recurrent unit

The light gated recurrent unit (LiGRU) [4] removes the reset gate altogether, replaces tanh with the ReLU activation, and applies batch normalization (BN):

LiGRU has been studied from a Bayesian perspective. [11] This analysis yielded a variant called light Bayesian recurrent unit (LiBRU), which showed slight improvements over the LiGRU on speech recognition tasks.

References

  1. Cho, Kyunghyun; van Merrienboer, Bart; Bahdanau, Dzmitry; Bengio, Yoshua (2014). "On the Properties of Neural Machine Translation: Encoder–Decoder Approaches". Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation: 103–111. doi:10.3115/v1/W14-4012.
  2. Felix Gers; Jürgen Schmidhuber; Fred Cummins (1999). "Learning to forget: Continual prediction with LSTM". 9th International Conference on Artificial Neural Networks: ICANN '99. Vol. 1999. pp. 850–855. doi:10.1049/cp:19991218. ISBN   0-85296-721-7.
  3. "Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano – WildML". Wildml.com. 2015-10-27. Archived from the original on 2021-11-10. Retrieved May 18, 2016.
  4. 1 2 Ravanelli, Mirco; Brakel, Philemon; Omologo, Maurizio; Bengio, Yoshua (2018). "Light Gated Recurrent Units for Speech Recognition". IEEE Transactions on Emerging Topics in Computational Intelligence. 2 (2): 92–102. arXiv: 1803.10225 . doi:10.1109/TETCI.2017.2762739. S2CID   4402991.
  5. Su, Yuahang; Kuo, Jay (2019). "On extended long short-term memory and dependent bidirectional recurrent neural network". Neurocomputing. 356: 151–161. arXiv: 1803.01686 . doi:10.1016/j.neucom.2019.04.044. S2CID   3675055.
  6. Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv: 1412.3555 [cs.NE].
  7. Gruber, N.; Jockisch, A. (2020), "Are GRU cells more specific and LSTM cells more sensitive in motive classification of text?", Frontiers in Artificial Intelligence, 3: 40, doi: 10.3389/frai.2020.00040 , PMC   7861254 , PMID   33733157, S2CID   220252321
  8. Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv: 1412.3555 [cs.NE].
  9. Dey, Rahul; Salem, Fathi M. (2017-01-20). "Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks". arXiv: 1701.05923 [cs.NE].
  10. Heck, Joel; Salem, Fathi M. (2017-01-12). "Simplified Minimal Gated Unit Variations for Recurrent Neural Networks". arXiv: 1701.03452 [cs.NE].
  11. Bittar, Alexandre; Garner, Philip N. (May 2021). "A Bayesian Interpretation of the Light Gated Recurrent Unit". ICASSP 2021. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, ON, Canada: IEEE. pp. 2965–2969. 10.1109/ICASSP39728.2021.9414259.