zagat multiple regression (solution)

(a)

```MTB > Plot 'price' 'food';
SUBC>   Symbol 'x'.

Plot

-
-                               x                   x
60+
-                                       2   2
price   -                           x           x   x           x
-                       x   x   x   x       x
-                           3   3   5
40+                   x   2   4   3   x       x
-   x       x   x   2   5   2   x       x
-           x       4   2   x   4       2   x
-   2       x   4   x   3   2   4
-           3   2   3   3   2   2
20+   x           3   4               x
-           x   x       x       x
-       x                       x
-
-
------+---------+---------+---------+---------+---------+food
15.0      17.5      20.0      22.5      25.0      27.5```

```Plot

-
-                                                     x x
60+
-                                       x x x x
price   -                           x               2   x
-                               x   x   2           x
-                             x x x x 2 3   x x
40+                           2 x   3 x 2   x 2
-                     x x   x 2 x 2 2 x   x 2
-                 x       x 2 x 2 2 x 2   3
-                       x x 2 2 4 3 2       x x
-   x           x       2 2 2 2 3 x     x
20+               x 2   x   x 2 x x
-             x x   x   x
-           x                               x
-
-
--------+---------+---------+---------+---------+--------decor
5.0      10.0      15.0      20.0      25.0```

```Plot

-
-                                              x         x
60+
-                                       2  x   x
price   -                                       3                x
-                             x      2  2
-                      x   x  x  4   2  2
40+                          x  5  2   3         x
-                x     4   x  3  3   x  x
-                x  x  2      2  5   3  x
-            x   x  3  2   3  2  2   2  x
-         x      x  4  3   3  2      x
20+  x         x   3     3      x
-         x  x   x  x
-            x      x
-
-
--------+---------+---------+---------+---------+--------service
12.0      15.0      18.0      21.0      24.0```

All three plots indicate a nice and possible linear relationship.
Of course we can use correlation to summarize what we see:

```           food    decor  service
decor     0.213
service   0.707    0.590
price     0.599    0.669    0.753```

Lots of correlation as you would expect.
In general, I would guess that a better restaurant
has better food, decor, and service and a higher price
so they all tend to go up together.

(b)

Minitab puts the fits in a column labeled fits1 and
the resids in a column labeled resi1.
I'll add them together and put the results in c7:

```MTB > let c7 = 'fits1' + 'resi1'
```

Here are the first 4 rows of
price, fits, resids, and fits + resids.
We can see that price = fits + resids.

```
price	FITS1	RESI1	F+E
41	36.2788	4.7212	41
54	49.7961	4.2039	54
32	27.6132	4.3868	32
20	17.7306	2.2694	20
...
```

Here is the plot of price (y) vs fits:

```
Plot

-
-                                            x        x
60+
-                                     xx  xx
price   -                               x  x   x            x
-                          x  x     xx x
-                       xx    2x2 x 3
40+                        32xx  2  x xx
-                  x2 3 4  2    x x
-                 xx3x x   xx  xx3  x
-             xx x x4x 2 xx 2 x    x
-     x  x      xx2222x x x
20+    x   xx  x   2xxx
-     x   xx     x
-          x          x
-
-
+---------+---------+---------+---------+---------+------FITS1
10        20        30        40        50        60```

Does it look like fits is more strongly related to
price than any of the three things that went into it?
I think so.
Let's look at the correlations as a way of quantifying this.

Correlations (Pearson)

```
price     food    decor  service    FITS1
food      0.599
decor     0.669    0.213
service   0.753    0.707    0.590
FITS1     0.829    0.723    0.807    0.908
RESI1     0.559   -0.000    0.000    0.000    0.000```

The fits are more highly correlated with price than any
one of our three x's by itself.
Regression actually chooses the combination of the x's to
make the fits as highly correlated with y as possible.

Here is the regression output.
R-squared is 68.7%.

```The regression equation is
price = - 30.7 + 1.38 food + 1.10 decor + 1.05 service

Predictor        Coef       StDev          T        P
Constant      -30.664       4.787      -6.41    0.000
food           1.3795      0.3533       3.90    0.000
decor          1.1043      0.1761       6.27    0.000
service        1.0480      0.3811       2.75    0.007

S = 6.298       R-Sq = 68.7%     R-Sq(adj) = 67.9%
```

The square of the correlation between price and fits is:

```MTB > let k1 = .829*.829
MTB > print k1

Data Display

K1    0.687241```

They are the same!!
This is true in general.
In multiple regression, R-squared is the square of the
correlation between the fits and y.

(c)

Here is resids vs fits.
In the plots we see that there is no correlation
between the resids and each of the x's and the fitted values.

```Plot

-
-
12+                       x  x    x
-                   x    x    x       xx     x
RESI1   -     x            xx 2  3x        x
-    x   2    xx xxxx   x xxx 2xx         xx
-     x   x         2 x 3  x    x   x  x              x
0+                x xx2 x   x   2  x 2x x
-         x  x  x 2 4x 2   x      x 2
-          x     x 2xxx  xx x   x    x
-          x     xxxx   x   x  xx x                 x
-                           x    3
-12+                x        x   x     x
-                                  x
-
-                     x
-
+---------+---------+---------+---------+---------+------FITS1
10        20        30        40        50        60```

(d)

First we need the t thing.
We have 114 observations so n-k-1 = 114-4=110.
With 110 degrees of freedom the t distribution is just
like the standard normal so the t thing for a 95% confidence
interval should be very close to 2.
Let's check:

```MTB > invcdf .025;
SUBC> t 110.

Inverse Cumulative Distribution Function

Student's t distribution with 110 DF

P( X <= x)          x
0.0250       -1.9818```

Yup.
So each of the ci's is our basic estimate +/- 2 standard errors:

```MTB > let k1 = -30.664 - 2*4.787
MTB > let k2 = -30.664 + 2*4.787
MTB > let k3 = 1.38 - 2*.3533
MTB > let k4 = 1.38 + 2*.3533
MTB > let k5 = 1.1043 - 2*.1761
MTB > let k6 = 1.1043 + 2*.1761
MTB > let k7 = 1.048 - 2*.3811
MTB > let k8 = 1.048 + 2*.3811
MTB > print k1 k2

Data Display

K1    -40.2380
K2    -21.0900
MTB > print k3 k4

Data Display

K3    0.673400
K4    2.08660
MTB > print k5 k6

Data Display

K5    0.752100
K6    1.45650
MTB > print k7 k8

Data Display

K7    0.285800
K8    1.81020```

Notice that the intercept is a little funny in this model.
We can't have a negative price, and you can't 0 ratings on
all three.
The ci's for the first (food) and third (service) slopes are pretty big.

(e)

This is the test we get directly from the Minitab output.
From the output, the larges p-value is .007, so for all
coefficients we reject the null that it is 0.

(f)

For the food coefficient we have the t value:

```MTB > let k1 = (1.3795-1)/.3533
MTB > print k1

Data Display

K1    1.07416```

The p-value should be about 32% since our
t-dist is like the standard normal and the value is close
to 1.
Let's get it exactly:

```MTB > cdf -1.07;
SUBC> t 110.

Cumulative Distribution Function

Student's t distribution with 110 DF

x     P( X <= x)
-1.0700        0.1435

MTB > let k1 = 2*.1435
MTB > print k1

Data Display

K1    0.287000```

The p-value is .287 smaller that 32% but same ballpark.

For the decor coefficient:

```MTB > let k1 = (1.1043-1)/.1761
MTB > print k1

Data Display

K1    0.592277

MTB > cdf -.6

Cumulative Distribution Function

Normal with mean = 0 and standard deviation = 1.00000

x     P( X <= x)
-0.6000        0.2743

MTB > let k1 = 2*.2743
MTB > print k1

Data Display

K1    0.548600```

for the service coefficient:

```MTB > let k1 = (1.048-1)/.3811
MTB > print k1

Data Display

K1    0.125951
MTB > cdf -.1259

Cumulative Distribution Function

Normal with mean = 0 and standard deviation = 1.00000

x     P( X <= x)
-0.1259        0.4499

MTB > let k1 = 2*.4499
MTB > print k1

Data Display

K1    0.899800```

So none of the three p-values are small.
Does this mean we know that the true coefficients are 1?
No. Remember what the confidence intervals were like.
It just means there is not sufficient evidence to reject the null
hypothesis that the slope is 1.

(g)

I used the menu (go to options in the stat > regression menu
and type in 20 20 20 for the three x values in the "prediction
intervals for new observations" box)
but here are the type commands in Minitab:

```MTB > Regress 'price' 3 'food' 'decor' 'service';
SUBC>   Constant;
SUBC>   Predict 20 20 20;
SUBC>   Brief 2.```

the prediction output is:

```Predicted Values

Fit  StDev Fit         95.0% CI             95.0% PI
39.973      0.828   (  38.332,  41.615)  (  27.384,  52.563)   ```

The PI is our prediction for the price at a particular
restaurant - pretty big (I should give more examples
where this actually works well!!!).

The CI is the interval for the average price (over all conceivable
restaurants with food, decor, service = 20, 20,20).

problem