闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Deep Learning

Assignment 3

1. The diagram below shows the forward computations for a node implementing the Batch Normalization algorithm. Assuming that the gradient flowing into the

system from the right is and its value is known, derive expressions for the

following other gradients which are needed to do Backprop on the parameters (, p):

aL aL

a , ap.

Hint: First compute .

(8 Points)

2. Explain the concept of Model Capacity and the relationship between Model Capacity and Data complexity.

Why is it better to increase Model Capacity by adding layers to Deep Feed Forward Neural Network as opposed to adding more nodes per layer?

(5 Points)

3. The Keras Model for classifying IMDB Move Reviews was discussed as part of

Lecture 6. Experiment with the following:

(a) Gradient Descent Algorithms

(b) Regularization Techniques

(and combinations of the two) that were discussed in Week 4, in order to improve the Accuracy of this model. Which choice of these algorithms works best?

(12 points)

4. Consider a Dense Feed Forward Network, of the type shown below, composed of an Input with 2 nodes, followed by a Hidden Layer with 3 nodes and finally an Output Layer with 1 node. Assume the activation function for the hidden layer is given by ReLU, the output activation is given by the sigmoid function and the Loss Function is the Binary Cross Entropy.

a. Compute the number of parameters (weights and biases) required to

describe the network.

(5 points)

b. Assume that the network is initialized with the weight values as given in the figure and the bias values are initialized to zero. If the input into the network is (1,2), compute the activations z1, z2, z3 at the hidden layer

nodes, and the output y of the network.

(8 points)

c. Assume that the output label t corresponding to the input (1,2) is 1. Recall that the gradient 6 = at the output node is (y-t). Backpropagate this gradient to compute the gradients 61, 62, 63 at the three hidden nodes (Hint: You can use the gradient propagation rules to

do this).

(6 points)

d. Based on the activations in Part (b) and the gradients in part (c), compute the gradients of the Loss Function with respect to the