R-package for

Interpretable Clustering using Unsupervised Binary Trees

B. Ghattas, M. Svarc, R. Fraiman, P. Michel

Download actual version cubt_3.2 for windows, for Linux.

Last version : cubt_3.2 delivered on 20th of October 2019

Minor modifications :

- The package includes mainly one new function, variable.importance()

- Several help pages have been completed.

cubt_3.0 delivered on 12th of January 2016

Major modifications :

- The package has been modified to handle qualitative nominal variables. The crietrion used for splitting is the sum of variables’ entropies. The distinction between continuous and nominal data is done according to the mode of the data matrix. For nominal data it should be of type character.

- Bugs within predict.cubt have been corrected, and is also adapted for the nominal case.

- For the nominal case, an optimization function is proposed for the parameter minsize, opt.minsize(). It used cross validation for the intra classes deviance.

- Function Gendata2 generates the datasets used to illustrate the qualitative version of CUBT, see the paper by L. Boyer, B. Ghattas and P.Michel.

- Several functions have been added:

1. cv.strat for stratified cross validation sampling

2. rand.samp.strat for sampling a test sample with stratification.

3. CU to compute category utility

cubt_2.0 delivered on 20th of March 2014

Major modifications :

- The package has been modified to account now for discrete variables (ordinal). User may choose between two criteria for splitting.

- A new criterion may be used for splitting over ordinal variables. It is based on entropy.

- Pruning may use any distance based measure (default is Euclidean).

- The choice of the pruning parameter mindist is now guided by a new parameter qdist. Before pruning distances are computed between all the pairs of nodes. qdist is the quantile of those distances whose default value is 0.1.

Other changes

- Some bugs crashing R due to memory usage have been fixed

- Only unique different values of each covariate are considered for splits over continuous covariates

Older version, CUBT_1.0 delivered on 02 of April 2013