Backpropagation through time

Backpropagation through time (BPTT) is a gradient-based technique for training certain types of recurrent neural networks. It can be used to train Elman networks. The algorithm was independently derived by numerous researchers.The training data for a recurrent neural network is an ordered sequence of k {displaystyle k} input-output pairs, ⟨ a 0 , y 0 ⟩ , ⟨ a 1 , y 1 ⟩ , ⟨ a 2 , y 2 ⟩ , . . . , ⟨ a k − 1 , y k − 1 ⟩ {displaystyle langle mathbf {a} _{0},mathbf {y} _{0} angle ,langle mathbf {a} _{1},mathbf {y} _{1} angle ,langle mathbf {a} _{2},mathbf {y} _{2} angle ,...,langle mathbf {a} _{k-1},mathbf {y} _{k-1} angle } . An initial value must be specified for the hidden state x 0 {displaystyle mathbf {x} _{0}} . Typically, a vector of all zeros is used for this purpose.Pseudo-code for a truncated version of BPTT, where the training data contains n {displaystyle n} input-output pairs, but the network is unfolded for k {displaystyle k} time steps:BPTT tends to be significantly faster for training recurrent neural networks than general-purpose optimization techniques such as evolutionary optimization.BPTT has difficulty with local optima. With recurrent neural networks, local optima are a much more significant problem than with feed-forward neural networks. The recurrent feedback in such networks tends to create chaotic responses in the error surface which cause local optima to occur frequently, and in poor locations on the error surface.

Parent Topic

Child Topic

No Parent Topic