Business 41912: Applied Multivariate Analysis

Spring Quarter of 2008

Instructor: Ruey S. Tsay

Office: HPC 455
Tel:      773-702-6750
Fax:     773-702-0458
e-mail: ruey.tsay@ChicagoGSB.edu

Office hour: (a) Wednesday 1:30 pm to 2:30 pm
                    (b) By appointment
You may e-mail me questions. E-mail is the easiest way
to make contact with me.

Teaching Assistant: Mr. David Matteson

Text: Applied Multivariate Statistical Analysis
         by R.A. Johnson and D.W. Wichern
         6th ed. Prentice Hall, 2007.
         ISBN 0-13-187715-1

Grading: Midterm 30% + Final Exam 45% + Homework 25%
where scores of each component are normalized to be out of 100.

New focus of the course: Recent developments in dimension reduction such as
sliced inverse regression, high-dimensional panel data, independent
component analsyis,  etc.

Syllabus: of the course.

Computing:

R is the main package, but students can use any other programs.
Instructions for R will be given. The following R packages are useful for the
course: mnormt, fastICA, CCA, mvoutlier

Lecture:
Week1: Review Chapters 1 to 3, and the first half of Chapter 4:
lec1 
Week2: Random sample from a multivariate normal distribution & Inference about mean
lec2
             Data set: Monthly simple returns of IBM, 3M,JNJ, GM, and INTC stocks
                             from January 1996 to December 2005: m-5c9605.txt  
Data set: Monthly log returns of Boeing, Abbott Labs, Motorola, and General Motors stocks
from January 1998 to Deceomber 2007: m-ba4c9807.txt

             Matlab program to obtain Chi-square QQ-plot: qqchi2.m
             Matlab program to compute Hotelling T^2: hotelling.m
             Matlab program for transformation: boxcox.m
              Matlab program to compute various confidence intervals for means of
              components: cfinterval.m
              Matlab programs to handle missing values in a Gaussian random sample:
              (a) EM-algorithm: emmiss.m    (b) MCMC method: mcmcmiss.m   

             Splus commands to obtain Chi-square QQ-plot: splusqqchi2.txt
R package: mvoutlier has some new tools for detecting multivariate outliers
             R program to compute Chi-square QQ-plot: rqqchi2.txt 
             R program to compute statistics for outlier detection: r-outlier.txt  
             R program to compute Hotelling T^2: hotelling.txt  
             R program to compute various confidence intervals for means:
             Use data: r-cregion.txt,   Use summary statistics: r-cregiona.txt  
R program for two multivariate control charts: r-t2chart.txt & r-t2future.txt

             Minitab commands to obtain Chi-square QQ-plot: minitab-d1.txt


Week 3: Multivariate Analysis of Variance (MANOVA)
Lec3
R programs: r-contrast.txt, Behrens.txt , Box-M.txt
           
Week 4: Multivariate Analysis of Variance (MANOVA) & Linear Regression
Lec4    R-demo: r-manova  Data sets used: t6-14.dat, t6-5.dat, t6-6.dat
R programs: r-profile.txt, r-growth.txt
[Part of the lecture is in Lec3]

Week 5: Multivariate Linear Regression
 Lec4a  
       R example for regression models with time series errors: mlreg-ts 

Week 7: Multivariate multiple linear regression & Principal Component Analysis
Lec5 & Lec6 R program for MMLR: r-mmlr.txt

       R commands: princomp, fastICA, and factanal,  respectively.
       R demonstration: r-pca   data: m-pca5c-9003.txt 

Week 8: Independent Components and Factor models:
Lec7 & Data set used: T8-4.DAT
      R demonstration: r-factor

Week 9: Canonical correlation analysis and Discriminant analysis and classification
Lec8 & Lec9
       R program: r-discri.txt,   S-Plus demo: discr-splus.txt  
       Data sets used: T11-1.DAT, T11-2.DAT

Week 10: Hierarchical clustering and multidimensional scaling
Lec10 & Data sets used: T12-4.DAT, T12-5.DAT, T12-7m.DAT, T12-9.DAT
R program: r-discrim.txt (discrimination for more than 2 categories).

Reading materials for high dimensional data analysis:
(a) Sliced inverse regression approach: sirphd    

Homework assignment:

HW#1:  Data sets used: Q1-m-5c8807.txt, Q4-t4-6.DAT
              Solutions: hw1s
HW#2: Data sets used: T5-2.DAT, T1-8.DAT, T5-8.DAT, T6-11.DAT, T5-12.DAT
Solutions: hw2s
HW#3: Data sets used: T6-9.DAT, T6-10.DAT, T6-12.DAT, T11-7.DAT, T6-17.DAT
Solutions: hw3s
HW#4: Data sets used: T7-6.DAT, T8-4.DAT, T8-5.DAT
Solutions: hw4s
HW#5: Data sets used: T1-5.DAT, T1-6.DAT, T7-7.DAT, T8-5.DAT
Solutions: hw5s

 

Minitab example:
(a) Hotelling's T^2 test: minitab-hotel.txt 
(b) Square-root of a positive definite matrix: minitab-sq.txt
(c) One-way analysis of variance: crash.txt with data crash.dat
(d) Growth curve fitting: growth.txt

Old exams
(a) Year 2004: pdf,  solution: answer  
(b) Year 2006: Midterm & solutions; final & solutions

Midterm: Week 6, Friday, May 9
Exam and solutions

Final Exam: Week 11, Friday, June 13, 3:00 pm to 6:00 pm.
Exam and solutions