standardized residuals and outliers

The data for the problem are in the files: housePrM.mtp housePrM.txt

column 2 is the price of the house in thousands of dollars

and column 3 is the size in hundreds of square feet.

(a)

Plot price vs size.

Notice that there is a house whose price is unusually low

given its size.

Click on the little brush in the Minitab menu and then

click on the unusual point.

Which observation is it ?

(b)

Run the regression of price on size and obtain the

residuals.

Plot the residuals vs size.

Overall, do you see any pattern in the plot of residuals vs size ?

Do you see an unusual point ?

Which observation does it correspond to ?

(c)

In part (b) we found an unusual point.

This happens quite frequently in practice.

Maybe the y value for this observation is in error ?

Maybe there is something special about this house

which makes it different from the rest ?

In practice we would have to check into it.

Points that have unusually large (or small) residuals are

called *outliers*.

This means the y value is larger (or smaller) than you expect

*given* the x
values.

How can we quantify how unusual an outlier is ?

We standardize it.

In minitab we can obtain the standardized residuals by using

the storage option in the regession dialogue.

Check "standardized residuals" and minitab will create a new column

containing the standardized residuals.

*Under the assumptions of the model the standardized residuals
should
look like iid draws from the standard normal distribution.*

If an observation has an unusually large standardized residual some

"special cause" my have affected that one in particular.

In practice it is often well worth the time to investigate and find out

why an observation is different from the rest.

Plot the standardized residuals vs size.

How unusual is our outlier ?

(d)

What are the standardized residuals ?

Well, it is a long story, but I can give you a simple

If we knew the true parameters we could calculate the

true errors:

e

since we don't know the b's, we plug in estimates

giving the residuals:

e

Thus, we can think of the residuals (the e

estimates of the true errors (the e

Under the assumptions of our model we have

e

(e

It turns out it can be more complicated than this

but approximately the standardized residuals are

e

For the data in this question obtain the values e

(that is, get a new column of numbers by dividing each

residual by the s value on the Minitab regression output).

Plot these values vs the standardized residuals given

by Minitab. How do they compare ?

(e)

Note that Minitab routinely prints out a list of "unusual observations".

Any observation which has a standardized residual bigger than 2

(in absolute value) is listed.

If the regression model is correct and there is nothing really unusual

about any of the observatioins, and we have 1000 observations,

how many observations would you expect Minitab to print out

because the standardized residual is bigger than 2 (in abs val).

(f)

the data for this problem is in zagat.mtp zagat.txt

The data for the problem is in the file zagat.mtp.

For each of 114 restaurants in New York we have ratings on

food, decor, and service.

We also have a value for price of a meal.

Our goal is to see how the three characteristics of the restaurant

are related to the price.

Regress price on food, decor, and service.

Are there any observations with large standardized residuals ?

Can you find this observation in any of the plots of price

vs the explanatory variables ?

solution