metode kuadrat terkecil

METODE KUADRAT TERKECIL(LEAST SQUARE METHOD)

Budi Waluyo

FAKULTAS PERTANIAN

UNIVERSTAS BRAWIJAYA 2009

Metode kuadrat terkecil

• digunakan untuk mendapatkan penaksir koefisien regresi linier

Model regresi sederhana

• Model regresi linier sederhana dinyatakan dengan persamaan : – Y = 0 + 1X + , model umum– Yi = 0 + 1Xi + i , model setiap

pengamatan

• Didapatkan eror, yaitu atau i

– = Y – Ŷ = Y – bo – b1X, atau

– i = Yi – Ŷi = Yi – bo – b1Xi

Graphical - Judgmental Solution

• Titik-titik merah adalah nilai hasil eksperimen, di-notasikan Yi , yang diduga membentuk garis lurus

Garis inilah model yang akan di-taksir, dengan cara menaksir koefisiennya, yaitu b0 dan b1, sehingga terbentuk persamaan b0 + b1 Xi.

Garis tegak lurus sumbu horisontal yang menghubungkan titik eksperimen dengan garis lurus dugaan dinamai error.

Graphical - Judgmental Solution

b1

b0

1

The Least Square Method

nnn

iii

xbbxy

xbbxy

xbbxy

xbbxy

yxy

10

31033

21022

11011

...

...

ˆ

2i

n

1ii )y(yZMin

2i10

n

1ii )xbb(yZMin

Classic Minimization

2i10

n

1ii )xbb(yZMin

We want to minimize this function with respect to b0 and b1

This is a classic optimization problem.

We may remember from high school algebra that to find the minimum value we should get the derivative and set it equal to zero.

The Least Square Method

n10nn

31033

21022

11011

iii

xbbxy

...

...

xbbxy

xbbxy

xbbxy

yxy

Note : Our unknowns are b0 and b1 .

xi and yi are known. They are our data

2i10

n

1ii )xbb(yZ

Find the derivative of Z with respect to b0 and b1 and set them equal to zero

Derivatives

n

1i

2i10i )xbby(Z

n

1ii10i

0

0)xbby)(1(2b

Z

n

1ii10ii

1

0)xbby)(x(2b

Z

b0 and b1

n

xx

n

yxxy

b2

2

1 )(

)(

xbyb 10

Pizza Restaurant ExampleWe collect a set of data from random stores of our Pizza restaurant example

Restaurant Student population Quarterly Sales (1000s) ($1000s)

i xi yi

1 2 582 6 1053 8 884 8 1185 12 1176 16 1377 20 1578 20 1699 22 14910 26 202

ExampleRestaurant i Xi Yi

1 2 582 6 1053 8 884 8 1185 12 1176 16 1377 20 1578 20 1699 22 14910 26 20

Xi Yi

116630704944140421923140338032785252

Xi2

4366464144256400400484676

Total 140 1300 21040 2528

b1

n

xx

n

yxxy

b2

2

1 )(

)(

10)140(

2528

10)1300)(140(

21040

21

b

5568

2840b1

b0

XbbY 10

13010

1300Y

1410

140X

)14(5b130 0

60b0

Estimated Regression Equation

XY 560

Now we can predict.For example, if one of restaurants of this Pizza Chain is close to a campus with 16,000 students. We predict the mean of its quarterly sales is

dollarsthousandY

Y

140

)16(560

• Simple Linear Regression Model

Y = 0 + 1X +

• Simple Linear Regression Equation

E(Y) = 0 + 1X

• Estimated Simple Linear Regression Equation

Ŷ = b0 + b1X

Summary ; The Simple Linear Regression Model

• Least Squares Criterion

min (Yi - Ŷi)2

where

Yi = observed value of the dependent variable

for the i th observation

Ŷi = estimated value of the dependent variable

for the i th observation

Summary ; The Least Square Method

• Slope for the Estimated Regression Equation

• Y -Intercept for the Estimated Regression Equation

Xi = value of independent variable for i th observation

Yi = value of dependent variable for i th observation

X = mean value for independent variable

Y = mean value for dependent variable

n = total number of observations

n

XX

n

YXYX

bi

i

ii

ii

2

2

1 )(

)(

__ __

Summary ; The Least Square Method

XbYb 10

Coefficient of DeterminationQuestion : How well does the estimated regression line fits the data.

Coefficient of determination is a measure for Goodness of Fit.Goodness of Fit of the estimated regression line to the data. Given an observation with values of Yi and Xi. We put Xi in the equation and get . Ŷi = b0 + b1Xi

(Yi – Ŷi) is called residual.

It is the error in using Ŷi to estimate Yi.

SSE = (Yi- Ŷi)2

^̂Yi

SSE : Pictorial Representation

Y = 60+5x

y10 - y10^

SSE Computations

i Xi Yi

1 2 582 6 1053 8 884 8 1885 12 1176 16 1377 20 1578 20 1699 22 14910 26 202

SSE Computations

i Xi Yi Ŷi = 60 + 5Xi 1 2 58 702 6 105 903 8 88 1004 8 188 1005 12 117 1206 16 137 1407 20 157 1608 20 169 1609 22 149 17010 26 202 190

SSE Computations

i Xi Yi Ŷi = 60 + 5xi (Yi - Ŷi ) (Yi- Ŷi

)2

1 2 58 70 -12 1442 6 105 90 15 2253 8 88 100 -12 1444 8 188 100 18 3245 12 117 120 -3 96 16 137 140 -3 97 20 157 160 -3 98 20 169 160 9 819 22 149 170 -21 44110 26 202 190 12 144

SSE Computations

i Xi Yi Ŷi = 60 + 5xi (Yi - Ŷi ) (Yi- Ŷi )2 1 2 58 70 -12 1442 6 105 90 15 2253 8 88 100 -12 1444 8 118 100 18 3245 12 117 120 -3 96 16 137 140 -3 97 20 157 160 -3 98 20 169 160 9 819 22 149 170 -21 44110 26 202 190 12 144

Total SSE = 1530

SSE = 1530 measures the error in using estimated equation to predict sales

SST Computations

Now suppose we want to estimate sales without using the level of advertising. In other words, we want to estimate Y without using X.

= ( yi) / n = 1300/10 = 130yy

This is our estimate for the next value of y.Given an observation with values of yi and xi.

yyIf Y does not depend on X, then b1 = 0. Therefore = b0 + b1x ===> b0 = Here we do not take x into account, we simply use the average of y as our sales forecast.

(yi –y ) is the error in using x to estimate yi.

SST = (yi- y )2

SST : Pictorial Representation

y10 - y

= 130yi

SST Computationsi Xi Yi (Yi - ) (Yi - )2 Y

1 2 58 -72 51842 6 105 -25 6253 8 88 -42 17644 8 188 -12 1445 12 117 -13 1696 16 137 7 497 20 157 27 7298 20 169 39 15219 22 149 19 36110 26 202 72 5184

Total SST = 15730

SST = 15730 measures the error in using mean of y values to predict sales

Y

SSE , SST and SSR

SST : A measure of how well the observations cluster around ySSE : A measure of how well the observations cluster around ŷ

If x did not play any role in vale of y then we shouldSST = SSE

If x plays the full role in vale of y then SSE = 0

SST = SSE + SSR

SSR : Sum of the squares due to regression

SSR is explained portion of SSTSSE is unexplained portion of SST

Coefficient of Determination for Goodness of Fit

SSE = SST - SSR

The largest value for SSE is

SSE = SST

SSE = SST =======> SSR = 0

SSR/SST = 0 =====> the worst fit

SSR/SST = 1 =====> the best fit

Coefficient of Determination for Pizza example

In the Pizza example, SST = 15730SSE = 1530SSR = 15730 - 1530 = 14200

r2 = SSR/SST : Coefficient of Determination

1 r2 0

r2 = 14200/15730 = .9027In other words, 90% of variations in y can be explained by the regression line.

SST Calculations

2)( YYSST

n

YYSST

2

2)(

SST Calculations

n

YYSST

2

2)(

15730)10/)1300((184730 2 SST

Observation Xi Yi Yi^21 2 58 33642 6 105 110253 8 88 77444 8 118 139245 12 117 136896 16 137 187697 20 157 246498 20 169 285619 22 149 22201

10 26 202 408041300 184730

SSR Calculations

Observation X Y XY Y 2̂ X̂ 21 2 58 116 3364 42 6 105 630 11025 363 8 88 704 7744 644 8 118 944 13924 645 12 117 1404 13689 1446 16 137 2192 18769 2567 20 157 3140 24649 4008 20 169 3380 28561 4009 22 149 3278 22201 484

10 26 202 5252 40804 67610 140 1300 21040 184730 2528

n

XX

n

YXXY

SSR2

2

2

1420010/)140(2528

]10/)1300)(140(21040[SSR

2

2

SSR Calculations

9027.r

15730/14200SST/SSRr

2

2

15301420015730SSE

SSRSSTSSE

Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales showing the number of TV ads run and the number of cars sold in each sale are shown below.

Number of TV Ads Number of Cars Sold

1 14

3 24

2 18

1 17

3 27

Example : Reed Auto Sales

We need to calculate X, Y, XY , X2, Y2

Example : Reed Auto SalesnYYSST /)( 22

nXX

nYXXYSSR

/

/)(22

2

X Y XY X2 Y21 14 14 1 1963 24 72 9 5762 18 36 4 3241 17 17 1 2893 27 81 9 729

10 100 220 24 2114

x = 10

y = 100

xy = 220

x2 = 24

y2 = 2114


114SST

5/)100(2114SST 2

1002024

200220SSR

5/1024

5/)100)(10(220SSR

2

2

2

nYYSST /)( 22 nXX

nYXXYSSR

/

/)(22

2

Example : Read Auto Sales

Alternatively; we could compute SSE and SST and then find SSR = SST -SSE

y x y^2 yhat=10+5x y-yhat (y-yhat) 2̂14 1 196 15 -1 124 3 576 25 -1 118 2 324 20 -2 417 1 289 15 2 427 3 729 25 2 4Sy Sx Sy^2 SSE100 10 2114 14

114

1000.877193

SST = Sy^2-[(Sy)^2]/n

SSR = SST - SSER2=100/114

• Coefficient of Determination

r 2 = SSR/SST = 100/114 = .88

The regression relationship is very strong since

88% of the variation in number of cars sold can be

explained by the linear relationship between the

number of TV ads and the number of cars sold.


Correlation Coefficient = Sign of b1 times Square Root of the Coefficient of Determination)

The Correlation Coefficient

21xy r)bofsign(r

Correlation coefficient is a measure of the strength of a linear association between two variables. It has a value between -1 and +1

rxy = +1 : two variables are perfectly related through a line with positive slope.

rxy = -1 : two variables are perfectly related through a line with negative slope.

rxy = 0 : two variables are not linearly related.

IN our Pizza example, r2 = .9027 and sign of b1 is positive

The Correlation Coefficient : example

9501.r

9027.r

r)bofsign(r

xy

xy

21xy

There is a strong positive relationship between x and y.

Coefficient of Determination and Correlation Coefficient are both measures of associations between variables.

Correlation Coefficient for linear relationship between two variables.

Coefficient of Determination for linear and nonlinear relationships between two and more variables.

Correlation Coefficient and Coefficient of Determination

Exercise

• Given the following experimental data on rice yield (t/ha), plant height (cm) and tiller number, determine the relationships of these variables with each other using correlation and regression analysis. Obtain a model relating YIELD to the variables PLTHT and TILLER# and interpret results. Test for the significance of the parameter estimates and the regression equation. Evaluate the adequacy of the model obtained.

• SELAMAT BELAJAR

metode kuadrat terkecil

Documents