13 nonnormality
TRANSCRIPT
-
8/3/2019 13 Nonnormality
1/32
presented by:
Non-Normality
Dudi Barmana, M.Si.
1313
1
-
8/3/2019 13 Nonnormality
2/32
Agenda
y Konsekuensi yang akan dihadapi
y Identifikasi/pendeteksian (pemeriksaaan pola
sisaan dan uji-uji formal)
y Beberapa alternatif solusi: Transformasi
2
-
8/3/2019 13 Nonnormality
3/32
Today Quote
Orang sering melempar batu di jalan kita. Tergantung kita mau
membuat batu itu jadi tembok atau jembatan
---Chinese book of wisdom---
3
-
8/3/2019 13 Nonnormality
4/32
Konsekuensi yangakan dihadapi
-
8/3/2019 13 Nonnormality
5/32
y If the errors come from a distribution with
thicker or heavier tails than the normal, LS fit
may be sensitive to a small subset of the data.
y Heavy-tailed error distributions often generate
outliers that pull LS fit too much in their
direction.
y Prediction could be invalid.y IndividualT-Test and Model F-Test could be
missleading.
5
-
8/3/2019 13 Nonnormality
6/32
Identifikasi/pendeteksian
-
8/3/2019 13 Nonnormality
7/32
7
Residual Plot
yGraphical analysis is a very effective way to
investigate the adequacy of the fit of a
regressio
n mo
del
and to
check theunderlying assumption.
yNormal Probability Plot:
y
Normal probability plot: a simple way to checkthe normal assumption.
-
8/3/2019 13 Nonnormality
8/32
8
y A straight plot is indicative of normality
y A shape is indicative of right skew errors
(eg gamma)
y A shape is indicative of a symmetric,short-tailed distribution
y A shape is indicative of a symmetric long-tailed distribution
-
8/3/2019 13 Nonnormality
9/32
9
-
8/3/2019 13 Nonnormality
10/32
10
y Ranked residuals: e[1] < < e[n]
y Plot e[i] against Pi = (i-1/2)/ny Sometimes plot e[i] against *
-1[ (i-1/2)/n]
y Plot nearly a straight line for large sample n > 32
if e[i] normaly Small sample (n
-
8/3/2019 13 Nonnormality
11/32
Kolmogorov-Smirnov Test
Let be the cdf for the distribution.x)P(XF(x) i e!
In the uniform(0,1) case: 1x0x,F(x) ee!
Compare this to the empirical distribution function:
x)sampletheinX(#n
1(x)F in e!
11
-
8/3/2019 13 Nonnormality
12/32
Kolmogorov-Smirnov Test
12
-
8/3/2019 13 Nonnormality
13/32
Kolmogorov-Smirnov Test
If X1, X2, , Xn really come from the distribution with cdf F, the distance
F(x)-(x)FmaxDD nx
n!!
should be small.
13
-
8/3/2019 13 Nonnormality
14/32
Kolmogorov-Smirnov Test
Computing the test statistic:
Suppose we simulate 7 uniform(0,1)s and get:
0.6 0.2 0.5 0.9 0.1 0.4 0.2
(obviously simplified)
14
-
8/3/2019 13 Nonnormality
15/32
Put them in order:
0.1xfor0(x)F7 !
Kolmogorov-Smirnov Test
0.6 0.2 0.5 0.9 0.1 0.4 0.2
0.6 0.2 0.5 0.9 0.1 0.4 0.2
Now the empirical cdf is:
0.2x0.1for7
1(x)F7 e!
15
-
8/3/2019 13 Nonnormality
16/32
Kolmogorov-Smirnov Test
0.6 0.2 0.5 0.9 0.1 0.4 0.2
0.2x0.1for7
1(x)F7 e!
0.4x0.2for7
3(x)F7 e!
0.5x0.4for7
4(x)F
7
e!
0.6x0.5for7
5(x)F7 e!
0.9x0.6for7
6(x)F7 e!
0.9xfor1(x)F7 u!
0.1xfor0(x)F7 !
16
-
8/3/2019 13 Nonnormality
17/32
Kolmogorov-Smirnov Test
17
-
8/3/2019 13 Nonnormality
18/32
Kolmogorov-Smirnov Test
0.6 0.2 0.5 0.9 0.1 0.4 0.2
0.257142935
9D7 }!
18
-
8/3/2019 13 Nonnormality
19/32
Kolmogorov-Smirnov Test
Let X(1), X(2), ,X(n) be the ordered sample.
_ a-nnn D,DmaxD !Then Dn can be estimated by
where
! ee
)F(X-n
imaxD (i)ni1n
!ee
n
1-i-)F(XmaxD (i)
ni1n
(assuming non-repeating values)
This is exact
for the
uniform
distribution!
19
-
8/3/2019 13 Nonnormality
20/32
Kolmogorov-Smirnov Test
We reject that this sample came from the proposed distribution if the
empirical cdf is too far from the true cdf of the proposed
distribution
ie: We reject if Dn is too large.
ie: How large is large?
20
-
8/3/2019 13 Nonnormality
21/32
Kolmogorov-Smirnov Test
In the 1930s, Kolmogorov and Smirnov showed that
g
!gp
!e1i
t2i--1in
1/2
n
22e(-1)21-tDnPlim
So, for large sample sizes, you could assume
g
!
}e1i
t2i--1in
1/2 22e(-1)21-tDnP
and find the value of t that makes the right hand side
for an level test.E1-
E
21
-
8/3/2019 13 Nonnormality
22/32
Kolmogorov-Smirnov Test
For small samples, people have worked out and tabulated critical values,
but there is no nice closed form solution.
J. Pomeranz (1973)
J . Durbin (1968)
n
1.6276
n
1.5174
n
1.3581
n
1.2239
n
1.0730cv
0.010.020.050.100.20E
Good approximations for n>40:
22
-
8/3/2019 13 Nonnormality
23/32
Kolmogorov-Smirnov Test
For our small sample of size 7,
From a table, the critical value for a 0.05 level test for n=7 is 0.483.
0.257142935
9D7 }!
23
-
8/3/2019 13 Nonnormality
24/32
Kolmogorov-Smirnov Test
For our large sample of size 100,000,
20.00152392D100000 !
The approximate critical value for a 0.05 level test for n=100,000 is
90.00429468100000
1.3518}
-
8/3/2019 13 Nonnormality
25/32
25
Bera and Jarque testing
y It can be proved that the coefficients of skewness and kurtosis
can be expressed respectively as:
and
y
The Bera Jarque test statistic is given by
y We estimate b1
and b2 using the residuals from the OLS
regression, .u
b
E u1
3
2 3 2!
[ ]/
W b
E u2
4
22
![ ]
W
2~24
3
6
2
2
2
2
1 G
-
!
bbTW
-
8/3/2019 13 Nonnormality
26/32
Beberapa alternatif solusi:
Transformasi
-
8/3/2019 13 Nonnormality
27/32
Transformation on y:TheBox-Cox Method
(Power transformation: y
)
27
-
8/3/2019 13 Nonnormality
28/32
28
y Box and Cox (1964) show how the parameters of the
regression model and P can be estimated simultaneously
using the method of maximum likelihood.y Use
Where is the geometric
mean of the observations and fit the model
y is related to the Jocobian of the transformation
converting the response variable y into
]ln/1[ln1
1 !
!
n
i
iyny
IFP
! Xy1P
y)(P
y
-
8/3/2019 13 Nonnormality
29/32
29
y Computation Procedure:
y Choose P to minimize SSRes()
y Use 10-20 values ofP to compute SSRes(). Then plotSSRes() v.s. P. Finally read the value ofP that minimizes
SSRes() from graph.
y A second iteration can be performed using a finer mesh of
values if desired.y Cannot select Pby directly comparing residual sum of
squares from the regressions of on x because of a
different scale.
yOnce P is selected, the analyst is free to fit the model usingy (P { 0)or ln y (P = 0).
)(Py
-
8/3/2019 13 Nonnormality
30/32
30
y AnApproximate Confidence Interval for P
y The C.I. can be useful in selecting the final value for P.
y For example: if the 0.596 is the minimizing value ofSSRes(), but if 0.5 is in the C.I., then we would prefer
choose P = 0.5. If 1 is in the C.I., then no transformationmay be necessary.
y Maximize
y An approximate 100(1-E)% C.I. for P is
-
8/3/2019 13 Nonnormality
31/32
31
y Let
y can be approximated by
or where R is the number ofresidual degrees of freedom.
y This is based on
y exp(x) = 1 + x + x2/2! +
y
)/exp( 21, nEG nz /1
2
2/E
n/1 2 1,EGnt /1 2 ,2/ RE
222
1 RG tz $!
-
8/3/2019 13 Nonnormality
32/32
pertanyaan