STAT3006/STAT7305 Assignment 1—Probability and Optimization Theory 2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STAT3006/STAT7305 Assignment 1—Probability and Optimization Theory
2022
Instructions
● The assignment consists of four (4) problems, each problem is worth 25 marks, and each mark is equally weighted.
● The mathematical elements of the assignment can be completed by hand, in LaTeX (prefer- ably), or in Word (or other typesetting software). The mathematical derivations and ma- nipulations should be accompanied by clear explanations in English regarding necessary information required to interpret the mathematical exposition.
● Computation problems should be answered using programs in the R language.
● Computer generated plots and hand drawn graphs should be included together with the text where problems are answered.
● Submission files should include the following (which ever applies to you):
– Scans of handwritten mathematical exposition.
– Typeset mathematical exposition, outputted as a pdf file.
– Typeset answers to computational problems, outputted as a pdf file.
– Program code/scripts that you wish to submit, outputted as a txt file.
● Mathematical problems should be answered with reference to results presented in the Main Text (refer, page numbers), Remarks, Exercises, and Corollaries/Lemma/Propositions/Theorems from the Lecture Notes, if required. If a mathematical result is used that is not presented in the Lecture Notes, then its common name (e.g., “Bayes’ Theorem”, “Intermediate Value The- orem”, “Borel–Cantelli Lemma”, etc.) should be cited, or else a reference to a text containing the result should be provided (preferably a textbook).
● All submission files should be labeled with your name and student number and archived together in a zip file and submitted at the TurnItIn link on Blackboard. We suggest naming using the convention:
[LastName_FirstName/StudentNumber]_STAT3006A1_ [AnythingElse] . [FileExtension] .
● As per my .uq .edu .au/information - and - services/manage -my -program/student - in tegrityand - conduct/academic - integrity - and - student - conduct, what you submit should be your own work. Even where working from sources, you should endeavour to write in your own words. You should use consistent notation throughout your assignment and define whatever is required.
Problem 1 [25 Marks]
Let (Ω , 于, P) be a probability space, and let (Xi (u))ie[n] be a sequence of independent and iden- tically distribution (IID) random variables from (Ω , 于) to (R, 夕 (R)), each with the same image measure PX , as the random variable X (u).
Assuming that X has expectation uX = E (X) and finite variance
aX(2) = E [(X 一 uX )2] < o,
prove that the sample mean,
n (u) = Xi (u) ,
is a 。onsistent estimator of uX in the sense that n converges in probability to uX , as n - o. Such a result is usually referred to as a we〉k l〉w of l〉rge numrers.
[10 Marks]
Recall that the cumulative distribution function (CDF) of X is defined as:
F (z) = P (u : X (u) < z) .
Using the sequence (Xi )ie[n] estimate the CDF F (z) using the empiri。〉l !》只 :
n
i=1
For each fixed z e R, show that
Iα,n (z) = {y e R : n (z) 一 < y < n (z) + }
is a (1 一 a) × 100% confidence interval for F (z) in the sense that
P (u : F (z) e Iα,n (z)) > 1 一 a.
[10 Marks]
The solmogorov*s strong l〉w of l〉rge numrers improves upon the result from (a) stating that if the expectation of X is finite (i.e. |uX | < o), then n converges almost surely to uX , as n - o.
(c) Assuming that P 《 Leb, use the strong law of large numbers to prove the 2livenko- !〉ntelli M员eorem ; i.e., show that
sup │n (z) 一 F (z)│
xeR
converges to 0, almost surely, as n - o.
[5 Marks]
This result, along with the laws of large numbers, is generally considered to be the fundamental law(s) of statistics in the sense that they provide a mechanism for probability distributions to be realized in reality; i.e., repeated observation of IID random variables and averaging the outcomes yields insight regarding the properties of the underlying probability measure P.
Problem 2
Let (Ω , 于, P) be a probability space, and let (Xi (u))ie[n] be a sequence of IID random variables from (Ω , 于) to (R, 夕 (R)), each with the same image measure PX , as the random variable X (u). Suppose that PX 《 Leb, and has Radon–Nykodym derivative (probability density function;
PDF) pθ0 : R - R>0, where 90 e R, the so-called data generating parameter corresponding to PX , is unknown to us. That is, we only know that PX has a PDF of the form pθ , for some value of 9 e R, but not which particular value of 9 (i.e., 90 ).
We wish to test the simple hypotheses that either the null 员ypot员esis
H0 : 90 = 91(*)
is true, or the 〉ltern〉tive 员ypot员esis
H1 : 90 = 92(*)
is true, for a pair of predetermined values 91(*), 92(*) e R, where 91(*) 92(*) . To construct a test between the two hypotheses, we construct the so-called l!H?l!Voop 小04!o s404!s4!。:
pθ2(*) (Xi (u))
(a) Under the null hypothesis (i.e., we assume that 90 = 91(*) is true, and subsequently pθ0 = pθ 1(*)), show that
E [5n (u)] = 1.
You may use the fact that if (yi (u))ie[n] is a sequence of independent random variables, then so (fi О yi )ie[n] is also a sequence of independent random variables, where fi : R - R is a function, for each i e [n].
[5 Marks]
We say that a random variable E (u) is an e-value if
E [E] < 1,
and we say that a random variable P (u) is a p-value if for any a e [0, 1]:
P (u : P (u) < a) < a.
(b) Prove that P = max {1, 1/E} is a p-value.
[5 Marks]
We say that 图n , (Xi )ie[n] -1 {0, 1}, is a 小?!?。4!ou 小Nl? (or 4?s4) for the hypotheses H0 and H1 , where we say that we 小?!?。4 H0 if 图n = 1 and we say that we 0。。?d4 H0 if 图n = 0. We further say that 图n has s!X? (or s!6u!y。0u。? l?a?l ) a e (0, 1) if
P (u : 图n = 1) < a,
under the assumption that H0 is true.
(c) Using the conclusions from (a) and (b), for any a e (0, 1), construct a rejection rule with size a for the test of H0 : 90 = 91(*) versus H1 : 90 = 92(*) .
[5 Marks]
In the R programing language, we can generate an sequence (yi )ie[n] of IID random variables, each with the same image probability measures PY , and PDF
pθ (y) = oθ (y) = exp │ 一 (y 一 9)2 ← ,
using the command rnorm (n, 9). We can also compute oθ (y) using the command dnorm (y, 9).
(d) Assuming that X has image measure PX has a PDF of the form pθ0 (z) = oθ0 (z), use the commands above to program an R script that implements the rejection rule from
(c) to test the hypotheses H0 : 90 = 0 versus H1 : 90 = 92(*), and assess how often the test rejects H0 , as 92(*) increases. We say that a test is powerful if H0 is rejected with high probability when H1 is false. Via computational experiments or otherwise, comment on the power of the test as 92(*) increases away from 0.
[10 Marks]
Problem 3
(a) For non-negative numbers u, w e R>0 and positive coefficients p, g e R>0, such that
1/p + 1/g = 1, show that
up wq
p g .
You may use the fact that z -1 log (z) is concave on R>0 .
[5 Marks]
Let X (u) and y (u) be functions from measure space (Ω , 于, M) to (R, 夕 (R)). (b) Use the result from (a) to prove M6lder*s ineau〉lity. That is, show that if
│Ω |X|p dM < o and │Ω |y |q dM < o,
for 1/p + 1/g = 1, then
│Ω |Xy | dM < ││Ω |X|p dM← 1/p ││Ω |y |q dM← 1/q .
[10 Marks]
Use Hölder’s inequality (or otherwise) to prove tinkowski*s ineau〉lity. That is, show
that if
│Ω |X|p dM < o and │Ω |y |p dM < o,
for p > 1, then
││Ω |X + y |p dM← 1/p < ││Ω |X|p dM← 1/p + ││Ω |y |p dM← 1/p .
[5 Marks]
Let (Xi )ie[n] be a sequence of functions from (Ω , 于, M) to (R, 夕 (R)).
(d)
Consider a generalization of Hölder’s inequality of the form:
│Ω │i1 Xi │ dM < i1 ││Ω |X|qi dM← 1/qi ,
(1)
where (gi )ie[n] is a sequence of constants, such that gi e R for each i e [n]. Propose conditions on the values that (gi )ie[n] can take so that (1) is true, and argue (formally or informally) why you believe that your suggested conditions are correct.
[5 Marks]
Problem 4
Let f : T - R be a function on the domain T c Rd , where T is said to be convex. We say that
f is u-strongly convex on T (with respect to the Euclidean norm |.|) if there exists a u > 0 such
f (A9 + (1 一 A) r) < Af (9) + (1 一 A) f (r) + uA (1 一 A) |9 一 r |2 . (2)
This more structured notion of convexity allows for proofs of convergence and rates for many modern optimization algorithms, and problems relating to strongly convex functions are mathe- matically “easier” to solve that others.
(a) Characterization (2) is one of many characterizations of u-strong convexity. Another useful characterization is that
g (9) = f (9) 一 u |9|2 (3)
is convex on T. That is, f is so convex that even when we remove a quadratic function from f (i.e., g) we still have a convex function. Prove that these two characterizations ((2) and (3)) of u-strong convexity are equivalent.
[5 Marks]
Using either (2) or (3), prove that if f is u-strongly convex on T then it is also strictly convex on T, and propose a counterexample of a strictly convex function that is not u-strongly convex for any u > 0.
[5 Marks]
We now assume that f is differentiable on a an open set 5 T, and thus is differentiable on T. Then, we can further characterize u-strong convexity on T by the inequality
f (r) > f (9) + (r 一 9)T Vf (9) + u |r 一 9| . (4)
(c) Using (4), show that if f is u-strongly convex on T and 9* e T is a critical point of f in the sense that
Vf (9* ) = 0,
then for all 9 e T we have
f (9) > f (9* ) + u |9 一 9* |2
and
f (9* ) > f (9) 一 |Vf (9)|2 .
[10 Marks]
For a differentiable function f , we say that it is 8-smooth on T if
|Vf (9) 一 Vf (r)| < 8 |9 一 r | ,
for every 9, r e T.
(d) A typical algorithm for computing
9* = arg min f (9)
θeT
is the gr〉dient des。ent met员od, where we generate a sequence (9t )te☆ such that 9t+1 = 9t 一 7Vf (9t ) ,
for some positive constant 7 > 0, starting from some initial value 91 e T. When f is convex and 8-smooth on T, it is provable that after T e ☆ steps of the gradient descent algorithm, we have the inequality:
f (9T ) 一 f (9* ) < T 1一 1 28 |91 一 9* |2 , (5)
by taking 7 = 1/8 . On the other hand, if f is also u-strongly convex on T, then after
T + 1 steps of gradient descent, we have
f (9T ) 一 f (9* ) < exp │ 一 4 (8/(T)u(一) 1) ← |91 一 9* |2 ,
when taking 7 = 2/ (u + 8). Discuss the meanings of inequalities (5) and (6), and argue whether or not gradient descent performs better when f is strongly convex.
[5 Marks]
2022-08-12