For this question we will attempt to understand why gradients vanish and explode in RNNs. To make the calculations simpl

Business, Finance, Economics, Accounting, Operations Management, Computer Science, Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Algebra, Precalculus, Statistics and Probabilty, Advanced Math, Physics, Chemistry, Biology, Nursing, Psychology, Certifications, Tests, Prep, and more.
Post Reply
answerhappygod
Site Admin
Posts: 899559
Joined: Mon Aug 02, 2021 8:13 am

For this question we will attempt to understand why gradients vanish and explode in RNNs. To make the calculations simpl

Post by answerhappygod »

For This Question We Will Attempt To Understand Why Gradients Vanish And Explode In Rnns To Make The Calculations Simpl 1
For This Question We Will Attempt To Understand Why Gradients Vanish And Explode In Rnns To Make The Calculations Simpl 1 (145.85 KiB) Viewed 101 times
For this question we will attempt to understand why gradients vanish and explode in RNNs. To make the calculations simpler we will ignore the input vector at each timestep as well as the bias, so the update equation is given by he+1= °(Wh) a) Compute the derivative dht dni b) Explain why the value of this derivative will become very small (vanish) or very large (explode) as the number of time-steps t increases. c) Explain why vanishing and exploding gradients mean that the model will “forget”. d) In our calculations we ignored the inputs at each step and the bias. If we use the full update equation ht+1 = °(Wiht + W2x++ b), do our conclusions about vanishing and exploding gradients still hold? e) We have seen that the LSTM and GRU cells can help to avoid exploding and vanishing gradients. Can you think of any other ways to change a simple RNN in order to increase its memory?
Join a community of subject matter experts. Register for FREE to view solutions, replies, and use search function. Request answer by replying!
Post Reply