metode kuadrat terkecil

45
METODE KUADRAT TERKECIL (LEAST SQUARE METHOD) Budi Waluyo FAKULTAS PERTANIAN UNIVERSTAS BRAWIJAYA 2009

Upload: sonia-saja

Post on 10-Apr-2015

5.063 views

Category:

Documents


36 download

TRANSCRIPT

Page 1: METODE KUADRAT TERKECIL

METODE KUADRAT TERKECIL(LEAST SQUARE METHOD)

Budi Waluyo

FAKULTAS PERTANIAN

UNIVERSTAS BRAWIJAYA 2009

Page 2: METODE KUADRAT TERKECIL

Metode kuadrat terkecil

• digunakan untuk mendapatkan penaksir koefisien regresi linier

Page 3: METODE KUADRAT TERKECIL

Model regresi sederhana

• Model regresi linier sederhana dinyatakan dengan persamaan : – Y = 0 + 1X + , model umum– Yi = 0 + 1Xi + i , model setiap

pengamatan

• Didapatkan eror, yaitu atau i

– = Y – Ŷ = Y – bo – b1X, atau

– i = Yi – Ŷi = Yi – bo – b1Xi

Page 4: METODE KUADRAT TERKECIL

Graphical - Judgmental Solution

• Titik-titik merah adalah nilai hasil eksperimen, di-notasikan Yi , yang diduga membentuk garis lurus

Garis inilah model yang akan di-taksir, dengan cara menaksir koefisiennya, yaitu b0 dan b1, sehingga terbentuk persamaan b0 + b1 Xi.

Garis tegak lurus sumbu horisontal yang menghubungkan titik eksperimen dengan garis lurus dugaan dinamai error.

Page 5: METODE KUADRAT TERKECIL

Graphical - Judgmental Solution

b1

b0

1

Page 6: METODE KUADRAT TERKECIL

The Least Square Method

nnn

iii

xbbxy

xbbxy

xbbxy

xbbxy

yxy

10

31033

21022

11011

...

...

ˆ

2i

n

1ii )y(yZMin

2i10

n

1ii )xbb(yZMin

Page 7: METODE KUADRAT TERKECIL

Classic Minimization

2i10

n

1ii )xbb(yZMin

We want to minimize this function with respect to b0 and b1

This is a classic optimization problem.

We may remember from high school algebra that to find the minimum value we should get the derivative and set it equal to zero.

Page 8: METODE KUADRAT TERKECIL

The Least Square Method

n10nn

31033

21022

11011

iii

xbbxy

...

...

xbbxy

xbbxy

xbbxy

yxy

Note : Our unknowns are b0 and b1 .

xi and yi are known. They are our data

2i10

n

1ii )xbb(yZ

Find the derivative of Z with respect to b0 and b1 and set them equal to zero

Page 9: METODE KUADRAT TERKECIL

Derivatives

n

1i

2i10i )xbby(Z

n

1ii10i

0

0)xbby)(1(2b

Z

n

1ii10ii

1

0)xbby)(x(2b

Z

Page 10: METODE KUADRAT TERKECIL

b0 and b1

n

xx

n

yxxy

b2

2

1 )(

)(

xbyb 10

Page 11: METODE KUADRAT TERKECIL

Pizza Restaurant ExampleWe collect a set of data from random stores of our Pizza restaurant example

Restaurant Student population Quarterly Sales (1000s) ($1000s)

i xi yi

1 2 582 6 1053 8 884 8 1185 12 1176 16 1377 20 1578 20 1699 22 14910 26 202

Page 12: METODE KUADRAT TERKECIL

ExampleRestaurant i Xi Yi

1 2 582 6 1053 8 884 8 1185 12 1176 16 1377 20 1578 20 1699 22 14910 26 20

Xi Yi

116630704944140421923140338032785252

Xi2

4366464144256400400484676

Total 140 1300 21040 2528

Page 13: METODE KUADRAT TERKECIL

b1

n

xx

n

yxxy

b2

2

1 )(

)(

10)140(

2528

10)1300)(140(

21040

21

b

5568

2840b1

Page 14: METODE KUADRAT TERKECIL

b0

XbbY 10

13010

1300Y

1410

140X

)14(5b130 0

60b0

Page 15: METODE KUADRAT TERKECIL

Estimated Regression Equation

XY 560

Now we can predict.For example, if one of restaurants of this Pizza Chain is close to a campus with 16,000 students. We predict the mean of its quarterly sales is

dollarsthousandY

Y

140

)16(560

Page 16: METODE KUADRAT TERKECIL

• Simple Linear Regression Model

Y = 0 + 1X +

• Simple Linear Regression Equation

E(Y) = 0 + 1X

• Estimated Simple Linear Regression Equation

Ŷ = b0 + b1X

Summary ; The Simple Linear Regression Model

Page 17: METODE KUADRAT TERKECIL

• Least Squares Criterion

min (Yi - Ŷi)2

where

Yi = observed value of the dependent variable

for the i th observation

Ŷi = estimated value of the dependent variable

for the i th observation

Summary ; The Least Square Method

Page 18: METODE KUADRAT TERKECIL

• Slope for the Estimated Regression Equation

• Y -Intercept for the Estimated Regression Equation

Xi = value of independent variable for i th observation

Yi = value of dependent variable for i th observation

X = mean value for independent variable

Y = mean value for dependent variable

n = total number of observations

n

XX

n

YXYX

bi

i

ii

ii

2

2

1 )(

)(

__ __

Summary ; The Least Square Method

XbYb 10

Page 19: METODE KUADRAT TERKECIL

Coefficient of DeterminationQuestion : How well does the estimated regression line fits the data.

Coefficient of determination is a measure for Goodness of Fit.Goodness of Fit of the estimated regression line to the data. Given an observation with values of Yi and Xi. We put Xi in the equation and get . Ŷi = b0 + b1Xi

(Yi – Ŷi) is called residual.

It is the error in using Ŷi to estimate Yi.

SSE = (Yi- Ŷi)2

^̂Yi

Page 20: METODE KUADRAT TERKECIL

SSE : Pictorial Representation

Y = 60+5x

y10 - y10^

Page 21: METODE KUADRAT TERKECIL

SSE Computations

i Xi Yi

1 2 582 6 1053 8 884 8 1885 12 1176 16 1377 20 1578 20 1699 22 14910 26 202

Page 22: METODE KUADRAT TERKECIL

SSE Computations

i Xi Yi Ŷi = 60 + 5Xi 1 2 58 702 6 105 903 8 88 1004 8 188 1005 12 117 1206 16 137 1407 20 157 1608 20 169 1609 22 149 17010 26 202 190

Page 23: METODE KUADRAT TERKECIL

SSE Computations

i Xi Yi Ŷi = 60 + 5xi (Yi - Ŷi ) (Yi- Ŷi

)2

1 2 58 70 -12 1442 6 105 90 15 2253 8 88 100 -12 1444 8 188 100 18 3245 12 117 120 -3 96 16 137 140 -3 97 20 157 160 -3 98 20 169 160 9 819 22 149 170 -21 44110 26 202 190 12 144

Page 24: METODE KUADRAT TERKECIL

SSE Computations

i Xi Yi Ŷi = 60 + 5xi (Yi - Ŷi ) (Yi- Ŷi )2 1 2 58 70 -12 1442 6 105 90 15 2253 8 88 100 -12 1444 8 118 100 18 3245 12 117 120 -3 96 16 137 140 -3 97 20 157 160 -3 98 20 169 160 9 819 22 149 170 -21 44110 26 202 190 12 144

Total SSE = 1530

SSE = 1530 measures the error in using estimated equation to predict sales

Page 25: METODE KUADRAT TERKECIL

SST Computations

Now suppose we want to estimate sales without using the level of advertising. In other words, we want to estimate Y without using X.

= ( yi) / n = 1300/10 = 130yy

This is our estimate for the next value of y.Given an observation with values of yi and xi.

yyIf Y does not depend on X, then b1 = 0. Therefore = b0 + b1x ===> b0 = Here we do not take x into account, we simply use the average of y as our sales forecast.

(yi –y ) is the error in using x to estimate yi.

SST = (yi- y )2

Page 26: METODE KUADRAT TERKECIL

SST : Pictorial Representation

y10 - y

= 130yi

Page 27: METODE KUADRAT TERKECIL

SST Computationsi Xi Yi (Yi - ) (Yi - )2 Y

1 2 58 -72 51842 6 105 -25 6253 8 88 -42 17644 8 188 -12 1445 12 117 -13 1696 16 137 7 497 20 157 27 7298 20 169 39 15219 22 149 19 36110 26 202 72 5184

Total SST = 15730

SST = 15730 measures the error in using mean of y values to predict sales

Y

Page 28: METODE KUADRAT TERKECIL

SSE , SST and SSR

SST : A measure of how well the observations cluster around ySSE : A measure of how well the observations cluster around ŷ

If x did not play any role in vale of y then we shouldSST = SSE

If x plays the full role in vale of y then SSE = 0

SST = SSE + SSR

SSR : Sum of the squares due to regression

SSR is explained portion of SSTSSE is unexplained portion of SST

Page 29: METODE KUADRAT TERKECIL

Coefficient of Determination for Goodness of Fit

SSE = SST - SSR

The largest value for SSE is

SSE = SST

SSE = SST =======> SSR = 0

SSR/SST = 0 =====> the worst fit

SSR/SST = 1 =====> the best fit

Page 30: METODE KUADRAT TERKECIL

Coefficient of Determination for Pizza example

In the Pizza example, SST = 15730SSE = 1530SSR = 15730 - 1530 = 14200

r2 = SSR/SST : Coefficient of Determination

1 r2 0

r2 = 14200/15730 = .9027In other words, 90% of variations in y can be explained by the regression line.

Page 31: METODE KUADRAT TERKECIL

SST Calculations

2)( YYSST

n

YYSST

2

2)(

Page 32: METODE KUADRAT TERKECIL

SST Calculations

n

YYSST

2

2)(

15730)10/)1300((184730 2 SST

Observation Xi Yi Yi^21 2 58 33642 6 105 110253 8 88 77444 8 118 139245 12 117 136896 16 137 187697 20 157 246498 20 169 285619 22 149 22201

10 26 202 408041300 184730

Page 33: METODE KUADRAT TERKECIL

SSR Calculations

Observation X Y XY Y 2̂ X̂ 21 2 58 116 3364 42 6 105 630 11025 363 8 88 704 7744 644 8 118 944 13924 645 12 117 1404 13689 1446 16 137 2192 18769 2567 20 157 3140 24649 4008 20 169 3380 28561 4009 22 149 3278 22201 484

10 26 202 5252 40804 67610 140 1300 21040 184730 2528

n

XX

n

YXXY

SSR2

2

2

1420010/)140(2528

]10/)1300)(140(21040[SSR

2

2

Page 34: METODE KUADRAT TERKECIL

SSR Calculations

9027.r

15730/14200SST/SSRr

2

2

15301420015730SSE

SSRSSTSSE

Page 35: METODE KUADRAT TERKECIL

Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales showing the number of TV ads run and the number of cars sold in each sale are shown below.

Number of TV Ads Number of Cars Sold

1 14

3 24

2 18

1 17

3 27

Example : Reed Auto Sales

Page 36: METODE KUADRAT TERKECIL

We need to calculate X, Y, XY , X2, Y2

Example : Reed Auto SalesnYYSST /)( 22

nXX

nYXXYSSR

/

/)(22

2

X Y XY X2 Y21 14 14 1 1963 24 72 9 5762 18 36 4 3241 17 17 1 2893 27 81 9 729

10 100 220 24 2114

Page 37: METODE KUADRAT TERKECIL

x = 10

y = 100

xy = 220

x2 = 24

y2 = 2114

Example : Reed Auto Sales

114SST

5/)100(2114SST 2

1002024

200220SSR

5/1024

5/)100)(10(220SSR

2

2

2

nYYSST /)( 22 nXX

nYXXYSSR

/

/)(22

2

Page 38: METODE KUADRAT TERKECIL

Example : Read Auto Sales

Alternatively; we could compute SSE and SST and then find SSR = SST -SSE

y x y^2 yhat=10+5x y-yhat (y-yhat) 2̂14 1 196 15 -1 124 3 576 25 -1 118 2 324 20 -2 417 1 289 15 2 427 3 729 25 2 4Sy Sx Sy^2 SSE100 10 2114 14

114

1000.877193

SST = Sy^2-[(Sy)^2]/n

SSR = SST - SSER2=100/114

Page 39: METODE KUADRAT TERKECIL

• Coefficient of Determination

r 2 = SSR/SST = 100/114 = .88

The regression relationship is very strong since

88% of the variation in number of cars sold can be

explained by the linear relationship between the

number of TV ads and the number of cars sold.

Example : Reed Auto Sales

Page 40: METODE KUADRAT TERKECIL

Correlation Coefficient = Sign of b1 times Square Root of the Coefficient of Determination)

The Correlation Coefficient

21xy r)bofsign(r

Correlation coefficient is a measure of the strength of a linear association between two variables. It has a value between -1 and +1

rxy = +1 : two variables are perfectly related through a line with positive slope.

rxy = -1 : two variables are perfectly related through a line with negative slope.

rxy = 0 : two variables are not linearly related.

Page 41: METODE KUADRAT TERKECIL

IN our Pizza example, r2 = .9027 and sign of b1 is positive

The Correlation Coefficient : example

9501.r

9027.r

r)bofsign(r

xy

xy

21xy

There is a strong positive relationship between x and y.

Page 42: METODE KUADRAT TERKECIL

Coefficient of Determination and Correlation Coefficient are both measures of associations between variables.

Correlation Coefficient for linear relationship between two variables.

Coefficient of Determination for linear and nonlinear relationships between two and more variables.

Correlation Coefficient and Coefficient of Determination

Page 43: METODE KUADRAT TERKECIL

Exercise

• Given the following experimental data on rice yield (t/ha), plant height (cm) and tiller number, determine the relationships of these variables with each other using correlation and regression analysis. Obtain a model relating YIELD to the variables PLTHT and TILLER# and interpret results. Test for the significance of the parameter estimates and the regression equation. Evaluate the adequacy of the model obtained.

Page 44: METODE KUADRAT TERKECIL
Page 45: METODE KUADRAT TERKECIL

• SELAMAT BELAJAR