Business 41912: Applied Multivariate Analysis

Spring
Quarter of 2014

Instructor: Ruey S. Tsay

Office: HPC 455

Tel: 773-702-6750

Fax: 773-702-0458

e-mail: ruey.tsay@chicagobooth.edu

Office hour: (a) Friday: 1:30 pm to 2:30 pm.

(b) By appointment

You may e-mail me questions. E-mail is the easiest way

to make contact with me.

Teaching Assistant: Mr. Yongning Wang, e-mail: ywang1@chicagobooth.edu

Text:
Applied Multivariate Statistical Analysis

by R.A. Johnson and
D.W. Wichern

6th ed. Prentice Hall,
2007.

ISBN 0-13-187715-1

Grading:
Midterm 30% + Final Exam 45%
+ Homework 25%

where total credits of each component
are normalized to be 100.

New
focus of the course: Recent developments in high-dimensional data analysis,

including dimension reduction, Lasso and related sparse regressions,
and

independent component analsyis.

Computing:

R is the main package, but students can use any other programs.

Instructions for R will be given. The following R packages are useful for the

course: mvtnorm, fastICA, CCA, lars, gamlr, leaps, rgl (3D plot), mvoutlier

Lecture: (Will be posted weekly before the lectures)

Week1: Review Chapters 1 to 3, and the first half of Chapter 4:

lec1 Data set: T1-2.DAT, Baker.dat, lec2 & Data set: m-ba4c9807.txt, T5-1.DAT, T4-1.DAT

Week2: Random sample from a multivariate normal distribution & Inference about mean

Lec2 (continued) Data set:

Matlab program to obtain Chi-square QQ-plot: qqchi2.m

Matlab program to compute Hotelling T^2: hotelling.m

Matlab program for transformation: boxcox.m

Matlab program to
compute various confidence intervals for means of

components: cfinterval.m

Matlab programs to
handle missing values in a Gaussian random sample:

(a) EM-algorithm: emmiss.m (b) MCMC method: mcmcmiss.m

R package: mvoutlier has some new tools for detecting multivariate outliers

R program to compute Chi-square QQ-plot: qqchi2.R

R program to compute beta-disribution QQ-plot: qqbeta.R

R program to compute statistics for outlier detection: outlier.R

R program to compute Hotelling T^2: Hotelling.R

R program to compute various confidence intervals for means:

Use data: confreg.R, Use
summary statistics: confrega.R

R programs for two multivariate control charts: t2chart.R & t2future.R

Week 3: Multivariate Analysis of Variance (MANOVA)

Lec3 Data sets used:T6-1.DAT, T6-2.DAT, t6-9.dat, t6-14.dat, t6-5.dat, t6-6.dat

R programs: contrast.R, Behrens.R , Box_M.R , profile.R, growth.R

R demo: r-manova

Week 4: Multivariate Analysis of Variance (MANOVA) & Linear Regression

Lec4 Data sets used: T7-5.DAT

Week 5: Multivariate Linear Regression & Principal Component Analysis

Lec5: Data set used: T7-4.dat

R example for regression models
with time series errors: mlreg-ts

R program for multivariate multiple linear regression analysis: mmlr.R

Lec6:

R commands: princomp

R demonstration: r-pca data: m-pca5c-9003.txt

Week 7: Dimension Reduction: sliced inverse regression, independent components,

and factor models.

Lec7: R script: sir.R Data set used: T8-4.DAT R demonstration: r-factor

Week 8: Canonical correlation analysis and applications.

Lec8 & Data set used: T9-12.DAT & m-pca5c-9003.txt

Week 9: Discriminant analysis and classification, clustering analysis

Lec9 & Lec10

R program: discrim.R (allows for more than 2 populations)

Data sets used: T11-1.DAT, T11-2.DAT t12-3a.dat

Week 10: Hierarchical clustering, multidimensional scaling and visualization of high-dimensional data

Lec11&Data sets used: T12-4.DAT, T12-5.DAT, T12-7m.DAT, T12-9.DAT m-barra-9003.txt

Reading materials for
high dimensional data analysis:

(a) Sliced inverse regression approach:

Homework assignment:

HW#1: Data sets
used:

Solutions: hw1s

HW#2: Data sets used:

Solutions: hw2s

HW#3: Data sets used:

Solutions: hw3s

HW#4: Data sets used:

Solutions: hw4s

HW#5: Data sets used:

Solutions: hw5s

Old exams

(a) Year 2010: Midterm & solutions; final & solutions

(b) Year 2012: Midterm & solutions;

Midterm: Week 6, Friday, May 9

Exam and solutions

Final Exam: Exam week, Friday, June 13, 8:00 am to 11:00 am.

Exam and Solutions