Answer Happy

Posted: **Mon May 23, 2022 10:14 am**

: Task 5 Update Weights Once You Have Calculated The Gradients You Can Update The Weights Using Gradient Descent In This 1 (172.88 KiB) Viewed 16 times

Task 5: Update weights Once you have calculated the gradients you can update the weights using gradient descent. In this case you will use stochastic gradient descent (SGD) because we are estimating the gradients on mini-batches of data (Previously we would calculate the gradients using the entire dataset (an entire batch - although confusingly you will often see the size of the mini-batches being set by some hyperparameter described as being the batch size, when they really mean mini-batch size). In actual fact this should be called mini-batch gradient descent, as stochastic gradient descent is actually intended for the case where updates would be performed on every individual example. However, mini-batch gradient descent is commonly referred to as stochastic gradient in most contexts. The only change here is we need to average the gradients across the number of examples used for training (mini-batch size). In the backprop stage we already added up all of the gradients across the examples, so during the update you can divide by the mini-batch size. E.g. For a standard SGD VW W:=W η: n_examples Where n is the learning rate. There are modifications to this learning rule such as momentum terms and other, more elaborate, learning rules such as RMSProp, Adagrad, Adam.
In [ ]: def update(grads, eta, params, n_examples): Update parameters Arguments: grads (dict): The gradients calculated during backprop eta (float): The Learning rate params (dict): The current parameter values n_examples (int): The number of examples in the mini-batch Returns: updated_params (dict): A dictionary of the updated parameters updated_params = { 'W1': None, 'bi': None, 'w2': None 'b2': None } # YOUR CODE HERE raise Not ImplementedError() return updated_params In [ ]: # free In [ ]: sizes_grade = [2, 3, 2] # set seed for repeatable test initialisation np.random.seed() params_grade = init(sizes_grade) eta_grade = 0.01 X_grade = np.array([[0.1, -0.2], [-0.1, 0.2]]) y_grade = np.array([[0, 1], [1, 0]]) n_examples_grade = x_grade.shape[@] outputs_grade = forward(params_grade, X_grade) grads_grade = backprop (outputs_grade, params_grade, X_grade, y_grade) updated_params_grade = update (grads_grade, eta_grade, params_grade, n_examples _grade) updated_params_test = { 'wi': np.array([[ 1.76399035, 0.4001897, 0.97879742], [ 2.2410172, 1.86749301, -0.97739675]]), 'b1': np.array([ 0.00061999, -0.0003249 0.00059436]), 'W2': np.array([[ 0.95023747, -0.15153479], [-0.10303594, 0.41038059], [ 0.14385645, 1.45441192]]), 'b2': np.array([ -8.94551854e-85, -1.81529826e-84]) } for param, value in updated_params_test.items(): npt.assert_allclose(updated_params_grade [param], value, rtol=1e-5)