Dynamic Trees for Learning and Design (dynaTree R package)
dynaTree is an R package implementing sequential Monte Carlo inference for dynamic tree regression and classification models by particle learning (PL). The sequential nature of inference and the active learning (AL) hooks provided facilitate thrifty sequential design and optimization. The current version supports
- regression by constant and linear leaf models
- classification by multinomial leaf models
- sequential design for regression models by active learning heuristics including predictive variance (ALM) and the expected reduction in predictive variance (ALC)
- optimization of regression models by expected improvement (EI) statistics
- sequential exploration of classification boundaries by the predictive entropy
- variable selection by relevance statistics
- Saltelli-style input sensitivity analysis
- fully online learning via retirement and active discarding for massive data
- forgetting factors for drifting concepts
- Obtain R from cran.r-project.org by selecting the version for your operating system.
- Install the dynaTree package, from within R.
- Optionally, install the akima, plgp and tgp packages, which
are helpful for some of the comparisons in the examples and demos.
> install.packages(c("akima", "plgp", "tgp"))
- Load the library as you would for any R library.
- See the package documentation. A pdf version of the
reference manual, or help pages, as also available.
The help pages can be accessed from within
R. The best way to acquaint yourself with the functionality
of this package is to run the demos which illustrate the examples
contained in the papers referenced below. Try starting with...
> ?dynaTree # follow the examples
> demo(package="dynaTree") # for a listing of the demos
- Dynamic trees for learning and design (2011) with Matt Taddy and Nicholas Polson. Journal of the American Statistical Association, 106(493), pp. 109-123; preprint on arXiv:0912.1586
- Variable selection and sensitivity analysis via dynamic trees with an application to computer code performance tuning (2013) with Matt Taddy and Stefan Wild. Annals of Applied Statistics, 7(1), pp. 51-80; preprint on arXiv:1108.4739; also see our science highlight at Argonne
- Information-theoretic data discarding for dynamic trees on data streams (2013) with Christoforos Anagnostopoulos; Entropy 15(12), pp. 5510-5535; preprint on arXiv:1201.5568. A short version was presented at the NIPS workshop on Bayesian Optimization, Experimental Design and Bandits (Granada, Spain)
- Sequential regression for optimal stopping problems (2013) with Mike Ludkovski; preprint on arXiv:1309.3832
- Empirical performance modeling of GPU kernels using active learning (2014) with Prasassa Balaprakash, Karl Rupp, Azamat Mametjanov, Paul Hovland and Stefan Wild; ParCo 2013 proceedings in Parallel Computing: Accelerating Computational Science and Engineering (CSE) vol. 25, pp. 646-655; preprint at ANL/MCS-P4097-0713
- Active-learning-based surrogate models for empirical performance tuning (2013) with Prasassa Balaprakash and Stefan Wild; in IEEE Cluster 2013 proceedings; preprint at ANL/MCS-P4073-0513
- Bayesian treed response surface models (2013) with Hugh Chipman, Ed George and Rob McCulloch; WIREs Data Mining and Knowledge Discovery, 3(4)
Please send questions and comments to rbgramacy_AT (_chicagobooth_DOT_edu). Enjoy!
Robert B. Gramacy -- 2013