Top-Rated Free Essay
Preview

ertyr

Good Essays
845 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
ertyr
ST421 Exercise 1 — Solutions
1. Note that P (uj ≤ Xj ≤ vj ) = e−λuj − e−λvj , and fXj (x|uj ≤ Xj ≤ vj ) = λe−λx /{e−λuj − e−λvj }. Hence,
Xj (λ) ≡ Eλ (Xj |Xj ∈ [uj , vj ]) =

1 uj exp(−λuj ) − vj exp(−λvj )
+
. λ exp(−λuj ) − exp(−λvj )

The log-likelihood function based on the full sample is n ∑

l(θ) ≡ l(θ; X1 , · · · , Xn ) = n log λ − λ

Xj ,

j=1

which yields the MLE based on full sample θ(X1 , · · · , Xn ) = n/


1≤j≤n

Xj .

Now the E-step is
Q(λ) = Eλ0 {l(θ) | Xj ∈ [uj , vj ] for m < j ≤ n} = n log λ − λ

m


Xi − λ

i=1

and the M-step is simply λ1 = n

m
/{ ∑

Xi +

i=1

n


n


Xj (λ0 ),

j=m+1

}
Xj (λ0 ) .

j=m+1

The EM-algorithm iterates E-step and M-step with, for example, initial value λ0 = m/(X1 + · · · + Xm ).
2. (a) Note l(p) = X log p + (n − X) log(1 − p), s(p) = X/p − (n − X)/(1 − p), and s(p) = −X/p2 − (n −
˙
∑n
X)/(1 − p)2 , where X = j=1 Xj . Hence the Fisher information is
I(p) = −Ep {s(p)} = n/p + n/(1 − p) = n/{p(1 − p)}.
˙

The C-R lower bound for the variance of unbiased estimator of θ(= p2 ) is ( dp )2 /I(p) = 4p3 (1 − p)/n.
∏n
(b) Note L(p) = j=1 pXj (1 − p)1−Xj . This yields p = X/n. Hence θ = (p)2 = X 2 /n2 .

(c) Note
Ep (X 2 ) =

n




2
Ep (Xi ) +

i=1

2
Ep (Xi Xj ) = nEp (X1 ) + (n2 − n)Ep (X1 X2 ) = np + (n2 − n)p2 .

1≤i̸=j≤n

Hence Ep (θ) = p2 + p(1 − p)/n ̸= p2 , i.e θ is a biased estimator for θ with bias p(1 − p)/n.


(d) We draw bootstrap sample X1 , · · · , Xn from Bernoulli distribution with probability p. Define the


∗ 2 bootstrap estimator θ = (X1 + · · · + Xn ) /n2 . The bootstrap estimator for the bias of θ is Bias∗ ≡
Ep (θ∗ ) − θ. In practice, Ep (θ∗ ) may be estimated via repeated bootstrap samplings.

Note. For this simple example, the bias estimator admits a simple analytic formula Bias∗ = p(1 − p)/n, which is the simple plug-in estimator.

3. Let x⋆ = F (p)−1 . Then x⋆ ≥ x for any G( x−µ ) = F (x) ≤ p. Put y ⋆ = (x⋆ − µ)/σ. Then y ⋆ ≥ (x − µ)/σ σ for any G( x−µ ) ≤ p, i.e. y ⋆ ≥ y for any G(y) ≤ p. Hence y ⋆ = G(p)−1 , i.e. {F (p)−1 − µ}/σ = G(p)−1 . σ 4. Since − log x is convex, it follows from Jensen’s inequality that
(∑ )

∑ fi gi
=−
gi log
≥ − log gi log fi = 0. fi gi i i i 5. Since

∑ i≥1 igi = µ,
D(g, f )

=



i=1

gi log gi −




gi log fi = C −

i=1




gi {log p + (i − 1) log(1 − p)}

i=1

= C − log p − (µ − 1) log(1 − p), which obtains the minimum at p = 1/µ. Thus the geometric distribution which minimises D(g, f ) is fi = µ−1 (1 − µ−1 )i−1 , i ≥ 1.
1

6. From the course work, the normal distribution f which minimises D(g, f ) should have the mean and variance equal to the mean and the variance of g. So the answers are (a) N (θ−1 , θ−2 ), (b) N (r + c, r), and (c) N (r/β, r/β 2 ).
7. The exponential distribution on [0, ∞) with mean µ > 0 has the density function f (x) = µ−1 e−x/µ I(x >
0). Hence

∫ ∞ g(x)xdx = C + log µ + µ0 /µ,
D(g, f ) = C − g(x) log f (x)dx = C + log µ + µ−1
0

which obtains the minimum at µ = µ0 .
8. The log quasi-likelihood under exponential distribution is
1∑
Xi . µ i=1 n l(µ; X1 , · · · , Xn ) = −n log µ −
¯
Maximising it leads to the MQLE µ = X = n−1

∑ i ˙ l(µ) = −1/µ + X1 /µ2 ,
Hence

Xi . Write l(µ) = l(µ; X1 ). Then
¨ = 1/µ2 − 2X1 /µ3 . l(µ) −1 2Eg X1
1
I = −Eg ¨ = 2 + l(µ) = 2, µ µ3 µ 2
2
1
2Eg X1
Eg X1
Eg X1
1

+
=
− 2.
2
3
4
4 µ µ µ µ µ √ ¯
Hence by the limit theorem for MQLEs, n(X − µ) converges in distribution to a normal distribution with mean 0 and variance

˙
J = Eg {l(µ)}2 =

2
I −1 J I −1 = J /I 2 = Eg X1 − µ2 = Varg (X1 ),

which is the standard CLT for sample mean. In fact, as long as an MQLE is the sample mean, including the MQLE under normal distribution, its asymptotic distribution is effectively determined by the CLT.
¯
Note. MQLEs for µ = EX1 are not always equal to X. For example, the MQLE under the uniform distribution
U (a, b) would be µ = 0.5(a + b) = 0.5(mini Xi + maxj Xj ).

9. Let Y = (Y1 , · · · , Yn )τ , ε = (ε1 , · · · , εn )τ , and X be the n × d matrix with xij as its (i, j)-th element.
Then Y = Xβ + ε. The log likelihood is l(β, σ 2 ) = −

n
1
log(2πσ 2 ) − 2 ||Y − Xβ||2 .
2


2
Hence the MLE for β is the LSE β = (Xτ X)−1 Xτ Y, and σd =
C. Ignoring a constant, we may define

1
2
n ||Y−Xβ|| ,

2
2
and l(β, σd ) = − n log(σd )+
2

2
AIC(d) = n log(σd ) + 2d.

This is the AIC with all d explanatory variable xi1 , · · · , xid included.
For model selection, we may apply, for example, a back-deleting algorithm with AIC as follows. Deleting one x from the complete model, we obtain a regression model with d − 1 explanatory variable and there are d such models. We choose one with the minimum AIC value which is denoted as AIC(d − 1). Starting from this new model, we may find the optimum model with d − 2 explanatory variables with the AIC value denoted as AIC(d − 2). In the same manner, we may obtain the optimum model with k explanatory variables, for k = d − 3, · · · , 2, 1. The overall optimum model is the one with overall minimum AIC value.

2

You May Also Find These Documents Helpful