Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. [1] The GRU is like a long short-term memory (LSTM) with a gating mechanism to input or forget certain features, [2] but lacks a context vector or output gate, resulting in fewer parameters than LSTM. [3] GRU's performance on certain tasks of polyphonic music modeling, speech signal modeling and natural language processing was found to be similar to that of LSTM. [4] [5] GRUs showed that gating is indeed helpful in general, and Bengio's team came to no concrete conclusion on which of the two gating units was better. [6] [7]



There are several variations on the full gated unit, with gating done using the previous hidden state and the bias in various combinations, and a simplified form called minimal gated unit. [8]

The operator denotes the Hadamard product in the following.

Fully gated unit

Gated Recurrent Unit, fully gated version Gradient Recurrent Unit.svg
Gated Recurrent Unit, fully gated version

Initially, for , the output vector is .

Variables ( denotes the number of input features and the number of output features):

Activation functions

Alternative activation functions are possible, provided that .

Type 1 Gated Recurrent Unit, type 1.svg
Type 1
Type 2 Gradient Recurrent Unit, type 2.svg
Type 2
Type 3 Gradient Recurrent Unit, type 3.svg
Type 3

Alternate forms can be created by changing and [9]

Minimal gated unit

The minimal gated unit (MGU) is similar to the fully gated unit, except the update and reset gate vector is merged into a forget gate. This also implies that the equation for the output vector must be changed: [10]


Light gated recurrent unit

The light gated recurrent unit (LiGRU) [4] removes the reset gate altogether, replaces tanh with the ReLU activation, and applies batch normalization (BN):

LiGRU has been studied from a Bayesian perspective. [11] This analysis yielded a variant called light Bayesian recurrent unit (LiBRU), which showed slight improvements over the LiGRU on speech recognition tasks.

