Business 41912: Applied Multivariate Analysis

Spring Quarter of 2016

Instructor: Ruey S. Tsay

Office: HPC 455
Tel:      773-702-6750
Fax:     773-702-0458
e-mail: ruey.tsay@chicagobooth.edu

 

Lecure hour: Thursday 8:30 AM to 11:30 AM, Room, HCC10

 

Office hour: (a) Thursday: 4:00 pm to 5:00 pm.
                    (b) By appointment
You may e-mail me questions. E-mail is the easiest way
to make contact with me.

Teaching Assistant: TBA

Text: (1) Applied Multivariate Statistical Analysis
         by R.A. Johnson and D.W. Wichern
         6th ed. Prentice Hall, 2007.
         ISBN 0-13-187715-1

Text2: (2) An Introduction to Statistical Learning with Applications in R by G. James, D. Witteb, T. Hastie, R. TibshiraniSpinger, 2013, ISBN: 1-4614-7137-0

Grading: Midterm 30% + Final Exam 45% + Homework 25%
where total credits of each component are normalized to be 100.

New focus of the course: Recent developments in high-dimensional data analysis,
including dimension reduction, Lasso and related sparse regressions, and
independent component analsyis.

Syllabus: of the course.

Computing:

R is the main package, but students can use any other programs.
Instructions for R will be given. The following R packages are useful for the
course: mvtnorm, fastICA, CCA, lars, gamlr, leaps, rgl (3D plot), mvoutlier, and many others.

There are some documents available for multivariate data analysis in R. Two examples are
(1) Introduction to R for Multivariate Data Analysis by Fernando Miguez (2007) and
(2) A Little Book of R for Multivariate Analysis by Avril Coghlan (2014). You may download both
books from web.

I have written some R routines for this ama class. They are in ama.R
You can compile these routines in R using the command: source("ama.R") after copying the file into your directory.

Lecture: (Will be posted weekly before the lectures)
Week1: Review Chapters 1 to 3, and the first half of Chapter 4:
lec1  Data set: T1-2.DAT, Baker.dat, lec2 & Data set: m-ba4c9807.txt, T5-1.DAT, T4-1.DAT

R commands used in lecture: Rcommands-lec1.txt


Week2: Random sample from a multivariate normal distribution & Inference about mean
Lec2 (continued) Data set: See Week 1

For some Matlab programs, see my ama web page of 2014.

R package: mvoutlier has some new tools for detecting multivariate outliers

R commands used in lecture: Rcommands-lec2.txt


Week 3: Multivariate Analysis of Variance (MANOVA)
Lec3 Data sets used:T6-1.DAT, T6-2.DAT, t6-9.dat, t6-14.dat, t6-5.dat, t6-6.dat
R programs: see ``ama.R''
R demo: r-manova
           
Week 4: Multivariate Analysis of Variance (MANOVA) & Linear Regression
Lec4 Data sets used: T7-5.DAT

LASSO-notes: lasso-16, package used: lars, glmnet, and grpreg. Data sets: oj.csv, prostate.csv

References for high-dimensional linear regression analsysis: references

Week 5: Multivariate Linear Regression & Principal Component Analysis
 Lec5: Data set used: T6-17.DAT(peanut data), T7-4.dat & T7-7.DAT
R example for regression models with time series errors: mlreg-ts 
R scripts: mmlr and mmlrTest are available in ama.R
R commands used: Rcommands-lec5.txt


Week 7: Principal Component Analysis and Dimension Reduction: sliced inverse regression,
independent components, and factor models.

Lec6: Principal component analysis
       R commands: princomp
       R demonstration: r-pca   data: m-pca5c-9003.txt 

Lec7: R package dr; Data set used: T8-4.DAT R demonstration: r-factor
; Data: m-tenstocks.txt

Boston housing data: housing.txt and description: housing.names

Week 8: Canonical correlation analysis and applications.
Lec8 & Data set used: T9-12.DAT & m-pca5c-9003.txt


Week 9: Discriminant analysis and classification, clustering analysis
Lec9 & Lec10
       R program: discrim.R is included in ama.R (allows for more than 2 populations)
       Data sets used: T11-1.DAT, T11-2.DAT t12-3a.dat


Week 10: Hierarchical clustering, multidimensional scaling and visualization of high-dimensional data
Lec11&Data sets used: T12-4.DAT, T12-5.DAT, T12-7m.DAT, T12-9.DAT
m-barra-9003.txt


Reading materials for high dimensional data analysis:
(a) Sliced inverse regression approach: 

Homework assignment:

HW#1:  Data sets used: Diabetes.txt
Solutions: hw1s R-output: hw1
HW#2: Data sets used: m-3stocks3dx-6115.txt
Solutions: hw2s R-output: hw2
HW#3: Data sets used: fish.txt, boys-dental.txt
Solutions: hw3s associated R output
HW#4: Data sets used: T1-7.DAT, T1-9.DAT, T7-7.DAT, and T4-6.DAT. Boston housing data are given in the lecture notes.
Solutions: hw4s ; R output
HW#5: Data sets used: T7-7.DAT, T11-6.DAT, T11-7.DAT, T11-9.DAT
Solutions: hw5s ; R output

Old exams
(a) Year 2014: Midterm & solutions; final & solutions
(b) Year 2012: Midterm & solutions;

Midterm: Week 6, Thursday, May 5: Data sets used seishu.txt, ProblemG.txt, cellphone.txt, m-ibmmsftsp0015.txt
Exam and solutions
& R output

Final Exam: Exam week
Exam and Solutions