STA 141C - Big Data & High Performance Statistical Computing Spring 2022 Homework 4
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STA 141C - Big Data & High Performance Statistical Computing
Spring 2022
Homework 4
1 Handwriting recognition
In this homework, we work on a model-based method for handwritten digit recognition. Following figure shows example bitmaps of handwritten digits from U.S. postal envelopes.
Each digit is represented by a 32 x 32 bitmap in which each element indicates one pixel with a value of white or black. Each 32 x 32 bitmap is divided into blocks of 4 x 4, and the number of white pixels are counted in each block. Therefore each handwritten digit is summarized by a vector x = (x1 , . . . , x64 ) of length 64 where each element is a count between 0 and 16.
By a model-based method, we mean to impose a distribution on the count vector and carry out classification using probabilities. The goal is to predict handwritten digit. We separate the dataset into training data and test data. The training set contains 3823 handwritten digits and the test set contains 1797 digits.
A common distribution for count vectors is the multinomial distribution. However, it is not a good model for handwritten digits. Let’s work on a more flexible model for count vectors, the Dirichlet-multinomial model.
For a multivariate count vector x = (x1 , . . . , xd ) with batch size lxl= j(d)=1 xj , the probability mass function
for Dirichlet-multinomial distribution is
f (xlα) = ╱ lx(x)l、 己 ,
where (a)亿 = 乞(亿)(a + i).
Given independent data points x1 , . . . , xn , the log-likelihood is given by
L(α) = 乞1 log ╱ lx(x)乞(乞) l、+ 乞 [log(Γ(αj + x乞j ) - log(Γ(αj ))] - 乞1 [log Γ(lαl+lx乞 l) - log Γ(lαl)] .
How do you calculate the MLE?
In this exercise, we use Newton’s method. First, the score function is given by
n n
L(α) = [Ψ(x乞j + αj ) - Ψ(αj )] - [Ψ(lx乞 l+lαl) - Ψ(lαl)] ,
where Ψ(x) = d(log Γ(x)) = Γ— (x)/Γ(x).
Next, the observed information matrix is given by
-d2 L(α) = D - c✶d ✶d(—) ,
where D is a diagonal matrix,
Djj = [Ψ— (αj ) - Ψ— (x乞j + αj )] = 1
and
(lαl+k)2 .
Then given an initial value for α (0) = (α, . . . , α), the Newton’s method keep updating
α (芒) = α (芒 ó 1) + [-d2 L(α(芒 ó 1) )] ó 1 dL(α(芒 ó 1) )
for t = 1, . . . , T. We stop when lL(α(芒)) - L(α(芒 ó 1) )l< c, and then take = α (芒) .
Note that D is a diagonal matrix, we should use the Sherman-Morrison formula to write
[-d2 L(α)] ó 1 = D ó 1 + D ó 1 ✶✶— D ó 1
In the folder uploaded on Piazza, you will find
● Data containing the training data and the testing data
● ‘ddirmult.R’, which evaluates the likelihood function (if log = FALSE) or the log-likelihood function (if log = TRUE) of the Dirichlet-multinomial density
● ‘ddirmult.fit.R‘, which estimates the maximum likelihood estimator (MLE) by Newton’s method
● ‘trainingMLE.R‘, which estimates the MLE based on the training data
Question 1. Open ‘trainingMLE.R’ and obtain MLE estimators for each of the 10 handwriting digits (0, 1, 2, . . . , 9). (You may need to change the path when loading the data)
Question 2. Read in the testing data. Use the estimated MLE for each digit from training data to predict handwriting digits for the testing data.
● Hint 1: To predict the handwriting digit, you should use the ‘ddirmult.R’ function. The following code can be helpful
1 # t est Di g it Pr o b stores posterior probability of each digit being 0 ,1 ,... ,9 2 t estD i gi t Pr ob <- matrix (0 , dim ( testdata ) [1] , 10) 3 for ( dig in 0:9) { 4 t estD i gi t Pr ob [ , dig + 1] <- 5 ddirmult ( testdata [ , -65] , alphahat [ dig + 1 , ] , log = TRUE ) 6 } 7 t estD i gi t Pr ob <- t est Di g it Pr o b + 8 rep ( log ( digitCount / sum ( digitCount ) ) , each = nrow ( testdata ) ) 9 d ig it P r e d i c t ed <- max . col ( te st D ig i tP ro b ) - 1
● Hint 2: To summarize the result, you can construct a confusion table using the code
1 table ( testdata [ , 65] , d i g i t P r e di c t e d )
The output should look like this:
Question 3. Comment on using gradient descent to obtain the MLE (instead of Newton’s method)? (You do not need to implement this.)
Question 4. What is the advantage and disadvantage of using gradient descent instead of Newton’s method?
Question 5. Do you think the current method is satisfactory for predicting handwriting digits? Do you know any other method(s) that can achieve a higher accuracy?
2022-05-27