闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT3006/STAT7305 Assignment 1—Probability and Optimization Theory

2022

Instructions

● The assignment consists of four (4) problems, each problem is worth 25 marks, and each mark is equally weighted.

● The mathematical elements of the assignment can be completed by hand, in LaTeX (prefer- ably), or in Word (or other typesetting software). The mathematical derivations and ma- nipulations should be accompanied by clear explanations in English regarding necessary information required to interpret the mathematical exposition.

● Computation problems should be answered using programs in the R language.

● Computer generated plots and hand drawn graphs should be included together with the text where problems are answered.

● Submission ﬁles should include the following (which ever applies to you):

– Scans of handwritten mathematical exposition.

– Typeset mathematical exposition, outputted as a pdf ﬁle.

– Typeset answers to computational problems, outputted as a pdf ﬁle.

– Program code/scripts that you wish to submit, outputted as a txt ﬁle.

● Mathematical problems should be answered with reference to results presented in the Main Text (refer, page numbers), Remarks, Exercises, and Corollaries/Lemma/Propositions/Theorems from the Lecture Notes, if required. If a mathematical result is used that is not presented in the Lecture Notes, then its common name (e.g., “Bayes’ Theorem”, “Intermediate Value The- orem”, “Borel–Cantelli Lemma”, etc.) should be cited, or else a reference to a text containing the result should be provided (preferably a textbook).

● All submission ﬁles should be labeled with your name and student number and archived together in a zip ﬁle and submitted at the TurnItIn link on Blackboard. We suggest naming using the convention:

[LastName_FirstName/StudentNumber]_STAT3006A1_ [AnythingElse] . [FileExtension] .

● As per my .uq .edu .au/information - and - services/manage -my -program/student - in tegrityand - conduct/academic - integrity - and - student - conduct, what you submit should be your own work. Even where working from sources, you should endeavour to write in your own words. You should use consistent notation throughout your assignment and deﬁne whatever is required.

Problem 1 [25 Marks]

Let (Ω , 于, P) be a probability space, and let (Xi (u))ie[n] be a sequence of independent and iden- tically distribution (IID) random variables from (Ω , 于) to (R, 夕 (R)), each with the same image measure PX , as the random variable X (u).

Assuming that X has expectation uX = E (X) and ﬁnite variance

aX(2) = E [(X 一 uX )2] < o,

prove that the sample mean,

n (u) = Xi (u) ,

is a 。onsistent estimator of uX in the sense that n converges in probability to uX , as n - o. Such a result is usually referred to as a we〉k l〉w of l〉rge numrers.

[10 Marks]

Recall that the cumulative distribution function (CDF) of X is deﬁned as:

F (z) = P (u : X (u) < z) .

Using the sequence (Xi )ie[n] estimate the CDF F (z) using the empiri。〉l !》只 :

i=1

For each ﬁxed z e R, show that

Iα,n (z) = {y e R : n (z) 一 < y < n (z) + }

is a (1 一 a) × 100% conﬁdence interval for F (z) in the sense that

P (u : F (z) e Iα,n (z)) > 1 一 a.

[10 Marks]

The solmogorov*s strong l〉w of l〉rge numrers improves upon the result from (a) stating that if the expectation of X is ﬁnite (i.e. |uX | < o), then n converges almost surely to uX , as n - o.

sup │n (z) 一 F (z)│

xeR

converges to 0, almost surely, as n - o.

[5 Marks]

This result, along with the laws of large numbers, is generally considered to be the fundamental law(s) of statistics in the sense that they provide a mechanism for probability distributions to be realized in reality; i.e., repeated observation of IID random variables and averaging the outcomes yields insight regarding the properties of the underlying probability measure P.

Problem 2

Let (Ω , 于, P) be a probability space, and let (Xi (u))ie[n] be a sequence of IID random variables from (Ω , 于) to (R, 夕 (R)), each with the same image measure PX , as the random variable X (u). Suppose that PX 《 Leb, and has Radon–Nykodym derivative (probability density function;

PDF) pθ0 : R - R>0, where 90 e R, the so-called data generating parameter corresponding to PX , is unknown to us. That is, we only know that PX has a PDF of the form pθ , for some value of 9 e R, but not which particular value of 9 (i.e., 90 ).

We wish to test the simple hypotheses that either the null 员ypot员esis

H0 : 90 = 91(*)

is true, or the 〉ltern〉tive 员ypot员esis

H1 : 90 = 92(*)

is true, for a pair of predetermined values 91(*), 92(*) e R, where 91(*) 92(*) . To construct a test between the two hypotheses, we construct the so-called l!H?l!Voop 小04!o s404!s4!。:

pθ2(*) (Xi (u))

(a) Under the null hypothesis (i.e., we assume that 90 = 91(*) is true, and subsequently pθ0 = pθ 1(*)), show that

E [5n (u)] = 1.

You may use the fact that if (yi (u))ie[n] is a sequence of independent random variables, then so (fi О yi )ie[n] is also a sequence of independent random variables, where fi : R - R is a function, for each i e [n].

[5 Marks]

We say that a random variable E (u) is an e-value if

E [E] < 1,

and we say that a random variable P (u) is a p-value if for any a e [0, 1]:

P (u : P (u) < a) < a.

(b) Prove that P = max {1, 1/E} is a p-value.

[5 Marks]

We say that 图n , (Xi )ie[n] -1 {0, 1}, is a 小?!?。4!ou 小Nl? (or 4?s4) for the hypotheses H0 and H1 , where we say that we 小?!?。4 H0 if 图n = 1 and we say that we 0。。?d4 H0 if 图n = 0. We further say that 图n has s!X? (or s!6u!y。0u。? l?a?l ) a e (0, 1) if

P (u : 图n = 1) < a,

under the assumption that H0 is true.

(c) Using the conclusions from (a) and (b), for any a e (0, 1), construct a rejection rule with size a for the test of H0 : 90 = 91(*) versus H1 : 90 = 92(*) .

[5 Marks]

In the R programing language, we can generate an sequence (yi )ie[n] of IID random variables, each with the same image probability measures PY , and PDF

pθ (y) = oθ (y) = exp │ 一 (y 一 9)2 ← ,

using the command rnorm (n, 9). We can also compute oθ (y) using the command dnorm (y, 9).

(d) Assuming that X has image measure PX has a PDF of the form pθ0 (z) = oθ0 (z), use the commands above to program an R script that implements the rejection rule from

(c) to test the hypotheses H0 : 90 = 0 versus H1 : 90 = 92(*), and assess how often the test rejects H0 , as 92(*) increases. We say that a test is powerful if H0 is rejected with high probability when H1 is false. Via computational experiments or otherwise, comment on the power of the test as 92(*) increases away from 0.

[10 Marks]

Problem 3

(a) For non-negative numbers u, w e R>0 and positive coeﬃcients p, g e R>0, such that

1/p + 1/g = 1, show that

up wq

p g .

You may use the fact that z -1 log (z) is concave on R>0 .

[5 Marks]

Let X (u) and y (u) be functions from measure space (Ω , 于, M) to (R, 夕 (R)). (b) Use the result from (a) to prove M6lder*s ineau〉lity. That is, show that if

│Ω |X|p dM < o and │Ω |y |q dM < o,

for 1/p + 1/g = 1, then

│Ω |Xy | dM < ││Ω |X|p dM← 1/p ││Ω |y |q dM← 1/q .

[10 Marks]

Use Hölder’s inequality (or otherwise) to prove tinkowski*s ineau〉lity. That is, show

that if

│Ω |X|p dM < o and │Ω |y |p dM < o,

for p > 1, then

││Ω |X + y |p dM← 1/p < ││Ω |X|p dM← 1/p + ││Ω |y |p dM← 1/p .

[5 Marks]

Let (Xi )ie[n] be a sequence of functions from (Ω , 于, M) to (R, 夕 (R)).

(d)

Consider a generalization of Hölder’s inequality of the form:

│Ω │i1 Xi │ dM < i1 ││Ω |X|qi dM← 1/qi ,

(1)

where (gi )ie[n] is a sequence of constants, such that gi e R for each i e [n]. Propose conditions on the values that (gi )ie[n] can take so that (1) is true, and argue (formally or informally) why you believe that your suggested conditions are correct.

[5 Marks]

Problem 4

Let f : T - R be a function on the domain T c Rd , where T is said to be convex. We say that

f is u-strongly convex on T (with respect to the Euclidean norm |．|) if there exists a u > 0 such

f (A9 + (1 一 A) r) < Af (9) + (1 一 A) f (r) + uA (1 一 A) |9 一 r |2 . (2)

This more structured notion of convexity allows for proofs of convergence and rates for many modern optimization algorithms, and problems relating to strongly convex functions are mathe- matically “easier” to solve that others.

(a) Characterization (2) is one of many characterizations of u-strong convexity. Another useful characterization is that

g (9) = f (9) 一 u |9|2 (3)

is convex on T. That is, f is so convex that even when we remove a quadratic function from f (i.e., g) we still have a convex function. Prove that these two characterizations ((2) and (3)) of u-strong convexity are equivalent.

[5 Marks]

Using either (2) or (3), prove that if f is u-strongly convex on T then it is also strictly convex on T, and propose a counterexample of a strictly convex function that is not u-strongly convex for any u > 0.

[5 Marks]

We now assume that f is diﬀerentiable on a an open set 5 T, and thus is diﬀerentiable on T. Then, we can further characterize u-strong convexity on T by the inequality

f (r) > f (9) + (r 一 9)T Vf (9) + u |r 一 9| . (4)

Vf (9* ) = 0,

then for all 9 e T we have

f (9) > f (9* ) + u |9 一 9* |2

and

f (9* ) > f (9) 一 |Vf (9)|2 .

[10 Marks]

For a diﬀerentiable function f , we say that it is 8-smooth on T if

|Vf (9) 一 Vf (r)| < 8 |9 一 r | ,

for every 9, r e T.

(d) A typical algorithm for computing

9* = arg min f (9)

θeT

is the gr〉dient des。ent met员od, where we generate a sequence (9t )te☆ such that 9t+1 = 9t 一 7Vf (9t ) ,

for some positive constant 7 > 0, starting from some initial value 91 e T. When f is convex and 8-smooth on T, it is provable that after T e ☆ steps of the gradient descent algorithm, we have the inequality:

f (9T ) 一 f (9* ) < T 1~~一 1~~ 28 |91 一 9* |2 , (5)

by taking 7 = 1/8 . On the other hand, if f is also u-strongly convex on T, then after

T + 1 steps of gradient descent, we have

f (9T ) 一 f (9* ) < exp │ 一 ~~4 (~~8/(T)u(一) 1) ← |91 一 9* |2 ,

when taking 7 = 2/ (u + 8). Discuss the meanings of inequalities (5) and (6), and argue whether or not gradient descent performs better when f is strongly convex.

[5 Marks]