Welcome to my personal website!

I am Assistant Professor in Econometrics and Statistics and James S. Kemper Foundation Faculty Scholar at the University of Chicago Booth School of Business.

My work brings together statistical methodology, theory and computation to develop high-performance tools for analyzing large datasets.

My research interests reside at the intersection of Bayesian and frequentist statistics, and include: data mining, variable selection, optimization, non-parametric methods, factor models, high-dimensional decision theory and inference.

Contact Information

Veronika.Rockova@ChicagoBooth.edu
369 Charles M. Harper Center
5807 South Woodlawn Avenue
Chicago, IL 60637

Publications and Manuscripts

NEW!

  • Dynamic Variable Selection with Spike-and-Slab Process Priors
    Rockova V. and McAlinn K. (2017)
    Submitted
    pdf
  • Posterior Concentration for Bayesian Regression Trees and Their Ensembles
    Rockova V. and van der Pas S. (2017)
    Submitted
    pdf
  • Simultaneous Variable and Covariance Selection with the Multivariate Spike-and-Slab Lasso
    Deshpande S., Rockova V. and George E. (2017)
    Submitted
    link
  • Bayesian Dyadic Trees and Histograms for Regression
    van der Pas S. and Rockova V. (2017)
    Neural Information Processing Systems 2017 (to appear)
    pdf

Statistical Journals

  • Particle EM for Variable Selection
    Rockova V. (2016)
    Journal of the American Statistical Association, Theory and Methods
    (Accepted)
    pdf | supplement
  • The Spike-and-Slab LASSO
    Rockova V. and George E. (2016)
    Journal of the American Statistical Association, Theory and Methods
    (Accepted)
    pdf | supplement
  • Hospital Mortality Rate Estimation for Public Reporting
    George E., Rockova V., Rosenbaum, P., Satopaa, V., Silber, J. (2016)
    Journal of the American Statistical Association, Applications (Accepted) link
  • Bayesian Estimation of Sparse Signals with a Continuous Spike-and-Slab Prior
    Rockova V. (2017)
    The Annals of Statistics (Accepted)
    pdf | supplement
  • Fast Bayesian Factor Analysis via Automatic Rotations to Sparsity
    Rockova V. and George E. (2016)
    Journal of the American Statistical Association, Theory and Methods (111), 1608-1622
    pdf | Supplement
  • Bayesian Penalty Mixing: The Case of a Non-separable Penalty
    Rockova V. and George E. (2015)
    Statistical Analysis for High-Dimensional Data - The Abel Symposium 2014 Springer Series pdf
  • EMVS: The EM Approach to Bayesian Variable Selection
    Rockova V. and George E. (2014)
    Journal of the American Statistical Association, Theory and Methods (109), 828-846 link
  • Negotiating Multicolinearity with Spike-and-Slab Priors
    Rockova V. and George E. (2014)
    Metron (72), 217-229 link
  • Incorporating Grouping in Bayesian Variable Selection with Applications in Genomics
    Rockova V. and Lesaffre E. (2014)
    Bayesian Analysis (9), 221-258. link
  • Hierarchical Bayesian Formulations for Selecting Variables in Regression Models
    Rockova V., Lesaffre E., Luime, J., Lowenberg, B. (2012)
    Statistics in Medicine (31), 1221-1237. link

Health and Policy

  • Improving Medicare's Hospital Compare Mortality Model
    Silber, J. H., Satopaa, V. A., Mukherjee, N., Rockova, V. , Wang, W., Hill, A., Even-Shoshan, O., Rosenbaum, P. R., and George, E. (2016)
    Health Services Research Journal

Refereed Proceedings

  • Determinantal Regularization for Ensemble Variable Selection
    Rockova V., Moran, G. and George E. (2016)
    19th International Conference on Artificial Inteligence & Statistics pdf
  • Determinantal Priors for Bayesian Variable Selection
    Rockova V. and George E. (2014)
    47th Scientific Meeting of Italian Statistical Society pdf
  • Fast Bayesian Factor Analysis with the Indian Buffet Process
    Rockova V. and George E. (2014)
    47th Scientific Meeting of Italian Statistical Society
  • Dual Coordinate Ascent EM for Bayesian Variable Selection
    George E., Rockova V., Lesaffre E. (2013)
    28th International Workshop in Statistical Modeling, ISBN: 978-88-96251-47-8, 165-171
  • Sparse Bayesian Factor Regression Approach to Genomic Data Integration
    Rockova V. and Lesaffre E. (2013)
    28th International Workshop in Statistical Modeling, ISBN: 978-88-96251-47-8, 337-343
  • Incorporating Prior Biological Knowledge in Bayesian Modeling of Sparse Networks
    Rockova V. and Lesaffre E. (2012)
    27th International Workshop in Statistical Modeling, ISBN: 978-80-263-0250-6, 291-296

Biomedical

  • Risk-stratification of Intermediate-risk Acute Myeloid Leukemia: Integrative Analysis of a multitude of gene mutation and expression markers
    Rockova V., Abbas S., Wouters B.J., Erpelinck C., Beverloo B., Delwel R., van Putten W., Lowenberg B. and Valk P. (2011)
    Blood (118), 1069-1076
  • The Prognostic Relevance of miR-212 Expression with Survival in Cytogenetically and Molecularly Heterogeneous AML
    Sun S., Rockova V., Bullinger L., Dijkstra M., Dohner H., Lowenberg B., Jongen-Lavrencic M. (2013)
    Leukemia (27), 100-106
  • Mutant DNMT3A: a Marker of Poor Prognosis in Acute Myeloid Leukemia
    Ribeiro A., Pratcorona M., Erpelinck C., Rockova V., Sanders M., Abbas S., Figueroa M., Zeilemaker Z., Melnick A., Lowenberg B., Valk P. and Delwel R. (2012)
    Blood (119), 5824-5831
  • Retroviral Integration Mutagenesis in Mice and Comparative Analysis in Human AML Identify Reduced PTP4A3 Expression as a Prognostic Indicator
    Beekman E., Valkhof M., Erkeland S., Taskesen E., Rockova V., Peeters J., Valk P., Lowenberg B. and Touw I. (2011)
    PLoS ONE 6(10), e26537
  • Deregulated Expression of EVI1 Defines a Poor Prognostic Subset of MLL-Rearranged Acute Myeloid Leukemias
    Groschel S., Schlenk R., Engelmann J., Rockova V., Teleanu V., Kuhn M., Eiwen K., Erpelinck C., Havermans M., Lubbert M., Germing U., Schmidt-Wolf I., Beverloo B., Schuurhuis G., Bargetzi M., Krauter J., Ganser A., Valk P., Lowenberg B., Dohner K., Dohner H., Delwel R. (2013)
    Journal of Clinical Oncology 31(1), 95-103

"Machinarium"

EMVS

C++ written R package implementing an EM algorithm for Bayesian variable selection described in Rockova and George (2014). The software is made available as is, and no warranty - about the software, its performance or its conformity to any specification - is given or implied. Please email me with comments and suggestions.

EMVS_0.1.zip


The package can be installed via R CMD BUILD and R CMD INSTALL from a local R library directory.


Check out help(EMVS) for examples.

Spike-and-Slab LASSO

C written R package implementing coordinate-wise optimization for Spike-and-Slab LASSO priors in linear regression (Rockova and George (2015)). The code has been modified from the ncvreg package of Breheny and Huang (2011).

SSL_0.1.zip

The package depends on the gsl-gnu library.

The package can be installed via R CMD BUILD and R CMD INSTALL from a local R library directory.


Check out help(SSL) for examples.

Teaching

Big Data (BUS 41201)

Course Description

BUS 41201 is a course about data mining: the analysis, exploration, and simplification of large high-dimensional datasets. Students will learn how to model and interpret complicated `Big Data' and become adept at building powerful models for prediction and classification. Techniques covered include an advanced overview of linear and logistic regression, model choice and false discovery rates, multinomial and binary regression, classification, decision trees, factor models, clustering, the bootstrap and cross-validation. We learn both basic underlying concepts and practical computational skills, including techniques for analysis of distributed data. Heavy emphasis is placed on analysis of actual datasets, and on development of application specific methodology. Among other examples, we will consider consumer database mining, internet and social media tracking, network analysis, and text mining.

Syllabus

Teaching Assistants:

Mohsen Mirtaher (mmirtaher@gmail.com)
Siying Cao (katesiying@gmail.com)

Office Hours:

By appointment

R Resources:

Dowload R , R Project Site , R Studio

Tutorials: Google developer , Princeton , TryR code school , Quick R
Books: R in a nutshell , Art of R programming, Library E-Books , Introductory Statistics with R

Piazza link

piazza.com/uchicago/spring2017/busn412010185bigdata/home

First Class Assignment:

Make yourself familiar with R! The course is a fast paced introduction to a wide variety of statistical learning methods. Knowing the basics of R before you start will make your life much easier and allow you to concentrate your effort on learning data science tools and concepts. As a start, I recommend going through R tutorials, such as the TryR tutorial at http://tryr.codeschool.com, to people who are new to R.

Week 1 : Inference at scale


Slides


Datasets:


Trucks: pickup.R , pickup.csv
Diabetes: dm2_pvals.R , dm2_fdr.R , diabetes.csv
Cholesterol: lipids.R , jointGwasMc_LDL.txt
Extra Code: fdr.R


Week 2 : Regression


Slides


Datasets:


Orange juice: oj.R , oj.csv
Spam: spam.R , spam.csv
Extra Code: deviance.R

Week 3 : Model Selection


Slides


Datasets:


Comscore: comscore.R , CS2006demographics.csv , CS2006domains.csv.csv , CS2006sites.txt , CS2006totalspend.csv
Semiconductor: semiconductor.R , semiconductor.csv
Extra Code: naref.R

Week 4 : Treatment Effects


Slides


Datasets:


Abortion: abortion.dat , abortion.R , us_cellphone.csv
Paidsearch: paidsearch.csv , paidsearch.R
Extra Code: mab.R

Week 5 : Classification


Slides


Datasets:


Credit: credit.csv , credit.R , data_description
Glass: glass.R
Extra Code: roc.R

Week 6 : Networks


Slides


Datasets:


Marriage: firenze.R , firenze.txt
Karate: karate.R
Lastfm: lastfm.R , lastfm.csv
Websearch: CaliforniaEdges.csv , CaliforniaNodes.txt , websearch.R

Week 7 : Clustering


Slides


Datasets:


Protein: protein.R , protein.csv
Wine: wine.R , wine.csv
We8there: we8there.R
Extra Code: kIC.R


Week 8 : Factor Models


Slides


Datasets:


Protein: protein.R , protein.csv
Rollcall: rollcall_votes.R , rollcall.csv , rollcall-members.csv
NBC: nbc_demographics.csv , nbc_pilotsurvey.csv , nbc_showdetails.csv , nbc.R
Gas: gas.R , gasoline.csv

Week 9 : Trees


Slides


Datasets:


Prostate: prostate_cancer.R , prostate.csv
Mcycle: mcycle.R
Calhomes: CAhousing.csv , calhomes.R