SGD with momentum is a variation of the SGD where we use the gradient at the previous iteration in the current update. T
-
answerhappygod
- Site Admin
- Posts: 899604
- Joined: Mon Aug 02, 2021 8:13 am
SGD with momentum is a variation of the SGD where we use the gradient at the previous iteration in the current update. T
SGD with momentum is a variation of the SGD where we use the gradient at the previous iteration in the current update. The update with momentum is as follows: al al wt) aw (w(t-1)) дw L(w) is the objective function as shown in the figure below, and a,y > 0 are the learning rates. Assume for iteration t = 0 and t = -1, we set w(t) = wo as the initial guess. w(t+1) — (w(t)) - 8 6 4 Figure 1: Graph of objective func- tion L 2 01 -15 -10 5 10 15 Answer the following with justification: a) Assuming that wo = -8 (the flat region). Discuss if the basic SGD (y = 0 ) algorithm terminate at the minimum? Why? (Why not?) b) Assuming that wo = -6 (in a sloped region). Discuss if the basic SGD (y = 0 ) algorithm terminate at the minimum? Why? (Why not?) c) Assuming that wo = -8 (the flat region). Discuss if the SGD with momentum (y > 0 ) algorithm terminate at the minimum? Why? (Why not?) d) Assuming that wo = -6 (in a sloped region). Discuss if the SGD with momentum (y > 0) terminate at the minimum? Why? (Why not?)
Join a community of subject matter experts. Register for FREE to view solutions, replies, and use search function. Request answer by replying!