闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

BCS/CSC 229: Computer Models of Human Perception and Cognition

Homework Assignment #3

Instructions: Answer all questions below. Include all requested calculations and graphs. Also include the Python code that you wrote to answer the questions. When writing text or equations, please write NEATLY!

(0) (Part A) At the top of the document that you turn in, place your name and the date. (Part B) Next, please take the honor pledge. That is, write (by hand using a pen): “I aﬃrm that I have not given or received any unauthorized help on this assignment, and that this work is my own.” Then sign your name.

(1) Implement (using Python) the basic model described in:

Tenenbaum, J. B. & Griﬃths, T. L. (2001). Generalization, similarity, and Bayesian infer- ence. Behavioral and Brain Sciences, 24, 629-640.

First, re-read the article (several times) to make sure that you understand the model. Given a data set X, the model should produce the probability distribution p(y e ClX) where y is a test point and C is a “consequential” region (e.g., see Figures 1, 2, and 3 of the article). For the sake of simplicity, you can assume:

. Individual data points x are integers between 0 and 100.

. Individual test items y are integers between 0 and 100.

. Individual hypotheses are intervals with integer-valued endpoints (e.g., as illustrated in Figure 1 of the article).

. The prior distribution over hypotheses is a uniform distribution (not an Erlang distri-bution).

. You’ll need to deﬁne hypothesis spaces consisting of hypotheses ranging from a cardi- nality of 1 up to hypotheses with a cardinality of N . For part (a) below, set N = 6. For the remaining parts, set N = 40.

(a) Let X = {45}. Plot p(y e ClX) for all y in the interval [38, 52]. (b) Let X = {43, 44, 45}. Plot p(y e ClX) for all y in the interval [0, 100].

(c) Let X = {37, 42, 45}. Plot p(y e ClX) for all y in the interval [0, 100]. (d) Let X = {15, 35, 45}. Plot p(y e ClX) for all y in the interval [0, 100]. (e) Let X = {37, 45}. Plot p(y e ClX) for all y in the interval [0, 100]. (f) Let X = {37, 40, 42, 45}. Plot p(y e ClX) for all y in the interval [0, 100].

(g) Let X = {37, 38, 40, 40, 41, 42, 43, 45} (note that the number 40 occurs twice in this set). Plot p(y e ClX) for all y in the interval [0, 100].

(2) For this question, you will implement (using Python) the model described in: Hemmer, P. & Steyvers, M. (2009). A Bayesian account of reconstructive memory. Topics in Cognitive Science, 1, 189-202.

First, re-read the article (several times) to make sure you understand the model. Then:

. Consider a graph in which the horizontal axis gives the study size of an object and

the vertical axis gives the memory error (i.e., remembered memory size [mean of the posterior distribution µn] - study size [denoted µs]). The goal of this question is to use the Bayesian model of Hemmer and Steyver to generate graphs of this type.

. A graph should contain 4 sets of data points corresponding to 4 typical object sizes.

This should be implemented using 4 values for the mean of the object prior distribution over size (very small [µi = 0.2], small [µi = 0.4], large [µi = 0.6], very large [µi = 0.8]). The variance of the object prior distribution (denoted σi(2)) should be set to 0.005.

. The mean of the category prior distribution (denoted µc ) should be set to 0.5, and its variance (denoted σc(2)) should be set to 0.05.

. To generate the prior distribution (mean µ0 and variance σ0(2)) that makes use of both

object prior information and category prior information, use Equations 3 and 4 of Hemmer and Steyvers (2009). Recall that z is a binary variable whose value is sampled from a Bernoulli distribution with familiarity parameter θ .

. To generate the mean of the posterior distribution over size, denoted µn , use Equation

2. Make sure you understand how the value of weight w is calculated.

● On an individual “trial”:

— You need to generate a value of the absolute study size of an object µs . To do this, sample µs from a normal distribution with mean µs and variance σs(2) = 0.005. Across trials, let the value of µs go from 0.1 to 0.9 in increments of 0.1.

— Given a value of study size µs , you need to generate episodic memories of this value, denoted {yi } . Memory yi should be sampled from a normal distribution with mean µs and variance σm(2) = 0.02 (i.e., generate “noisy” memories). Let y denote the mean of the episodic memories {yi } .

— On each trial, you need to sample a value for z (z is used in Equations 3 and 4).

The value of z (either 0 or 1) depends on the familiarity θi . See Equation 5.

● Run two sets of simulations, one in which the number of episodic memories used on each trial is 4 (i.e., n = 4) and one in which the number of episodic memories is 40 (i.e., n = 40). Make sure you understand why you get diﬀerent results when n is 4 versus 40 (hint: think about the variance of the likelihood function [and thus the value of w in Equation 2] when n is 4 versus 40). When running simulations, you’ll notice that there might be a notable amount of variation across simulations. Consequently, you’ll want to run multiple simulations to get a feel for average performance in each set.

● To generate an individual data point in a graph, simulate each trial many times (and average the values of the memory errors). In your simulations, simulate each trial 1,000 times.

● You’ll need to turn in six graphs: three graphs corresponding to three values for familiarity parameter θ (θ = 0.0, θ = 0.4, and θ = 0.7) × two sets of simulations (number of episodic memories used on each trial n is either 4 or 40).

● Figure 1 illustrates one set of graphs (when the number of episodic memories n was set to 4).

Figure 1: Simulation results when the number of episodic memories n was set to 4. In each graph, the horizontal axis plots the absolute study size µs , and the vertical axis plots the average memory error (µn - µs ). The blue, green, red, and cyan lines show the results when the mean of the object prior distribution µi was set to 0.2, 0.4, 0.6, and 0.8, respectively. The top, middle, and bottom graphs show the results when the familiarity θ was set to 0.0, 0.4, and 0.7, respectively.

(3) The goal of this question is to give you practice with the Metropolis-Hastings (MH) algorithm.

(a) Generate ten samples from a univariate normal distribution with a mean of 10.0 and a variance of 1.0. Next, pretend that you don’t know the mean of this distribution. Instead, use the 10 data items to infer a posterior distribution of the mean using the MH algorithm (for now, assume that you know that the data were sampled from a normal distribution with a variance of 1.0). Assume that the prior distribution of the mean is a uniform distribution. Let the proposal distribution be a normal distribution with a standard deviation of 0.05. Initialize your chain to a value of 0.0, and run the chain for 5000 iterations. Create a graph in which the horizontal axis gives the iteration number, and the vertical axis gives the value of the chain.

Hint: Be careful when computing the quantity R. For example, if you compute the likelihood for the proposed value of the mean based on all data items, then this likelihood will go toward zero (because you’ll be multiplying 10 very small numbers). Instead, you’ll need to compute R incrementally, one data item at a time. Using the notation from the lecture slides, let Ri

p(xi lθc ) p(θc )

Ri =

where xi is the ith data item. Then

R = Ri .

i=1

Computing R in this way is much more robust.

(b) Repeat Part (a) except now use the MH algorithm to estimate both the posterior distri- bution of the mean and the posterior distribution of the standard deviation. Assume that the prior distributions of the mean and standard deviation are uniform distributions. (This is a poor assumption because, of course, the standard deviation must be positive.) As above, let the proposal distributions for the mean and standard deviation be normal distributions with a standard deviation of 0.05 (but don’t propose a value for the standard deviation that is too small [e.g., 0.01]). Initialize your chain so that the mean is initialized to 0.0, and the standard deviation is initialized to 5.0. Run the chain for 250,000 iterations (this might re- quire 10-15 minutes). This chain needs to run for a long time because estimates for the mean and standard deviation are coupled (e.g., a poor estimate for the mean will lead to a poor estimate for the standard deviation). Create two graphs. In both graphs, the horizontal axis gives the iteration number. In one graph, the vertical axis gives the estimate of the mean; in the other graph, the vertical axis gives the estimate of the standard deviation. (c) Generate twenty samples from a mixture of two normal distributions. Ten samples should come from a normal distribution with a mean of 10 and standard deviation of 1. The re- maining samples should come from a normal distribution with a mean of 5 and standard deviation of 1. Next, pretend you don’t know which data items came from which normal distribution. In addition, pretend you don’t know the means of the two normal distributions.

Instead, use the 20 data items to infer posterior distributions for these means. (Assume you know the distributions have standard deviations of 1.) Assume the prior distributions for the means are uniform distributions. Let the proposal distributions be normal distributions with standard deviations of 0.05. Initialize your chain so that the mean of one normal dis- tribution is initialized to 0.0, and the mean of the other normal distribution is initialized to 20.0. Run the chain for 5000 iterations. Create two graphs in which the horizontal axes give the iteration number, and the vertical axes give the values of the two means.

Hint: The likelihood function for data item xi is:

p(xi lµ1 , µ2 , σ 2 ) = p(j) p(xi lµj , σ 2 )

j=1

= j 1 0.5 exp ┌ (xi - µj )2 ┐