MATH2070 Optimization and Financial Mathematics Week 7-8
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
MATH2070 Week 7-8
PART I Nonlinear optimization without constraints
PART 1.1 Taylor’s theorem
Taylor’s theorem for one variable
Taylor’s theorem for multivariable functions
PART 1.2 Hessian matrix
Two-dimensional example
Multi-index notation
For α = (α1, α2, ...., αn), αi ≥ 0 :
a) α! = α 1 ! α2 ! ... αn !
b) |α| = α 1 + α2 + ... + αn
c) xα = x1(α)1x xn(α)n
d) ∂αf = ∂x(α)1(1)∂x(α)2(2) ... ∂x(α)n(n)f =
Multinomial theorem
For any x = (x1, x2, ...., xn) ∈ R , and any positive integer k :
(x1 + x2 + .... + xn)k = ∑ xα
|α|=k
Characterization of extrema
Critical point :
a) x0 is a local minimum if Q(ℎ, ℎ) > 0, ∀ℎ ≠ 0
b) x is a local maximum if Q(ℎ, ℎ) < 0, ∀ℎ ≠ 0
c) X0 is a saddle point if Q(ℎ, ℎ) > 0, for some ℎ and Q(ℎ, ℎ) < 0 for some other ℎ
d) No conclusion can be drawn if:
- Q(ℎ, ℎ) ≥ 0 for all ℎ and Q(ℎ, ℎ) = 0 for at least one ℎ ≠ 0
- Q(ℎ, ℎ) ≤ 0 for all ℎ and Q(ℎ, ℎ) = 0 for at least one ℎ ≠ 0
If f is convex, any local minimizer is a global minimizer of f. If additional f is differentiable, then any critical point is global minimizer of f.
- The function f is a convex function if its domain D is a convex set, and if for any two points X and y in D, we have:
f(λX + (1 − λ)y) ≤ λf(X) + (1 − λ)f(y), ∀λ ∈ [0, 1]
Let H = H(X0) :
a) H is positive definite if and only if all of its eigenvalues are positive
b) H is negative definite if and only if all of its eigenvalues are negative
c) H is in definite if and only if it has both positive and negative eigenvalues
d) H is positive semidefinite if and only if all of its eigenvalues are non-negative and at least one eigenvalue is zero
e) H is negative semidefinite if and only if all of its eigenvalues are non-positive and at least one eigenvalue is zero
PART II Nonlinear optimization with constraints
PART 2.1 Equality constraints
Lagrangian function
L(x, λ) = f(x) + λ g (x)
i=1
Constrained optimization problem
Min f(x)
Subject to gi (x) = 0, i = 1, 2, ... , m
or
Solution : find x by setting the gradient of L(x, λ) to zero.
First-order necessary conditions
Given local extrema point x *,there exists a λ * such that :
Example 4 : Min f(x, y) = (x − 1)2 + (y − 2)2 + 1
Subject to : x + y = 1
PART 2.2 Inequality constraints
Constrained optimization problem
Min f(x)
Subject to
Solution: introduce slack variables s , rewrite the problem in the
Min f(x)
Subject to : ℎ 1 (x) + s1(2) = 0
2
ℎ2 (x) + s2 = 0
…
2
ℎ (x) + s = 0
Introduce the Lagrangian function :
m
2
L(x, λ, s) = f(x) + ∑ λ (ℎ (x) + s )
i=1
Find x by setting the gradient of L(x, λ, s) to zero.
PART 2.3 Karush-Kuhn-Tucker conditions
Necessary conditions for the existence of optimal solutions to the optimization problem with inequality constraints.
At a local minimizer x , there exists λ ∈ R, such that
Example 1 : A company is planning to spend up to $10,000 on advertising. It costs $3,000 per minute to advertise on television and $1,000 per minute to advertise on radio. If the firm buys X minutes of television advertising and y minutes of radio advertising, its revenue in thousands of dollars is given by f(X, y) = − 2X2 − y2 + Xy + 8X + 3y. The firm wishes to maximize its revenue.
Example 6: A company can purchase up to 12 units of a raw material at a cost of $0.25/unit. Four units of the raw material can be processed into one unit of product X at a processing cost of $3.00 for each unit of X produced, and two units of the raw material can be processed into one unit of product Y at a processing cost of $1.50 for each unit of Y produced. If x units of product X are produced, product X sells for $(10 − x) per unit. If y units of product Y are produced, product Y sells for $(8 − 2y) per unit. The company wishes to maximize its profit.
PART III Probability review
PART 3.1 Basic definitions
Axiomatic definition of probability
a) P(A) ≥ 0 for each event A
b) P(Ω) = 1
c) For events A1,..., An for some n ∈ N, such that Ai ∩ Aj = ∅ for i ≠ j :
- P(∅) = 0
C
- P(A ) = 1 − P(A)
- A ⊂ B ⇒ P(B\A) ⇒ P(B) − P(A)
- A ⊂ B ⇒ P(A) ≤ P(B)
- P(A) ≤ 1
- P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
Conditional probability
P(A|B) =
Independence
P(A|B) = P(A) and P(B|A) = P(B)
a) Mutually independent: P(⋂ Ai) = ∏ P(Ai)
i∈I i∈I
b) Conditionally independent : P(A ∩ B|C) = P(A|C)P(B|C)
Bayes rule
P(A|B) = =
PART 3.2 Expected value
Definition
µ = E[X] = p1x1 + p2x2 +... + pnxn
Theorem
a) E is a linear operator, that is, if X and Y are two discrete random variables and a, b ∈ R, then E[aX + bY] = aE[X] + bE[Y]
b) If X and Y are independent, then
- E[XY] = E[X]E[Y]
- E[f(X)g(Y)] = E[f(X)]E[g(Y)]
PART 3.3 Variance
Definition
2 2
V[X] = σ = E[(X − µ) ]
Theorem
a) V[X] = E[X ] − E[X]
b) V = [aX] = a2V[X]
c) If X and Y are independent, then V[X + Y] = V[X] + V[Y]
PART 3.4 Covariance
Definition
Let X1, X2 and X be random variables with means µ 1, µ2 and µ respectively
a) Cov(X, X) = V[X]
b) Cov(X1, X2) = E[X1X2] − E[X1]E[X2]
c) Cov(X1, X2) = Cov(X2, X1)
d) Cov(aX1 + bX2, X) = aCov(X1, X) + bCov(X2, X)
If the covariance is zero, the two random variables are uncorrelated.
- Independent implies uncorrelated
- Uncorrelation does not imply independence
PART 3.5 Correlation coefficient
Definition
Let X and X be two random variables. Then
Cov(X1,X2)
ρ = corr(X1, X2) =
Theorem
1 2 two random variables:
a) If X1 and X2 are independent, then Cov(X1, X2) = 0 and ρ = corr(X1, X2) = 0
b) If X2 is a linear function of X1 such that X2 = aX1 + b where a > 0, then ρ = 1
c) If X2 is a linear function of X1 such that X2 = − aX1 + b where a > 0, then ρ = − 1
PART 3.6 Covariance and correlation matrices
Definition
Let X for i = 1, 2, 3,..., n be random variables with means µ and variances σ . The
covariance matrix S with Sij = σij = Cov(Xi, Xj) :
The corresponding correlation matrix is :
2023-02-01