For this question we will attempt to understand why gradients vanish and explode in RNNs. To make the calculations simpl

Post by **answerhappygod** » Tue Sep 07, 2021 7:45 am

: For This Question We Will Attempt To Understand Why Gradients Vanish And Explode In Rnns To Make The Calculations Simpl 1 (145.85 KiB) Viewed 101 times

For this question we will attempt to understand why gradients vanish and explode in RNNs. To make the calculations simpler we will ignore the input vector at each timestep as well as the bias, so the update equation is given by he+1= °(Wh) a) Compute the derivative dht dni b) Explain why the value of this derivative will become very small (vanish) or very large (explode) as the number of time-steps t increases. c) Explain why vanishing and exploding gradients mean that the model will “forget”. d) In our calculations we ignored the inputs at each step and the bias. If we use the full update equation ht+1 = °(Wiht + W2x++ b), do our conclusions about vanishing and exploding gradients still hold? e) We have seen that the LSTM and GRU cells can help to avoid exploding and vanishing gradients. Can you think of any other ways to change a simple RNN in order to increase its memory?