Deep Learning Assignment 3
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Deep Learning
Assignment 3
1. The diagram below shows the forward computations for a node implementing the Batch Normalization algorithm. Assuming that the gradient flowing into the
system from the right is and its value is known, derive expressions for the
following other gradients which are needed to do Backprop on the parameters (, p):
aL aL
a , ap.
Hint: First compute .
(8 Points)
2. Explain the concept of Model Capacity and the relationship between Model Capacity and Data complexity.
Why is it better to increase Model Capacity by adding layers to Deep Feed Forward Neural Network as opposed to adding more nodes per layer?
(5 Points)
3. The Keras Model for classifying IMDB Move Reviews was discussed as part of
Lecture 6. Experiment with the following:
(a) Gradient Descent Algorithms
(b) Regularization Techniques
(and combinations of the two) that were discussed in Week 4, in order to improve the Accuracy of this model. Which choice of these algorithms works best?
(12 points)
4. Consider a Dense Feed Forward Network, of the type shown below, composed of an Input with 2 nodes, followed by a Hidden Layer with 3 nodes and finally an Output Layer with 1 node. Assume the activation function for the hidden layer is given by ReLU, the output activation is given by the sigmoid function and the Loss Function is the Binary Cross Entropy.
a. Compute the number of parameters (weights and biases) required to
describe the network.
(5 points)
b. Assume that the network is initialized with the weight values as given in the figure and the bias values are initialized to zero. If the input into the network is (1,2), compute the activations z1, z2, z3 at the hidden layer
nodes, and the output y of the network.
(8 points)
c. Assume that the output label t corresponding to the input (1,2) is 1. Recall that the gradient 6 = at the output node is (y-t). Backpropagate this gradient to compute the gradients 61, 62, 63 at the three hidden nodes (Hint: You can use the gradient propagation rules to
do this).
(6 points)
d. Based on the activations in Part (b) and the gradients in part (c), compute the gradients of the Loss Function with respect to the
output weights (for the second layer of weights only).
(6 points)
2022-10-31