R-package for
Interpretable
Clustering using Unsupervised Binary Trees
B. Ghattas, M. Svarc, R. Fraiman, P. Michel
Download
actual version cubt_3.2 for windows, for Linux.
Last version : cubt_3.2 delivered on 20th of October 2019
Minor modifications :
cubt_3.0 delivered on 12th of January 2016
Major modifications :
-
The package has
been modified to handle qualitative nominal variables. The crietrion
used for splitting is the sum of variables’ entropies. The distinction between
continuous and nominal data is done according to the mode of the data matrix. For
nominal data it should be of type character.
-
Bugs within predict.cubt have been corrected, and is also
adapted for the nominal case.
-
For the nominal
case, an optimization function is proposed for the parameter minsize, opt.minsize(). It used cross validation for the intra classes deviance.
-
Function Gendata2
generates the datasets used to illustrate the qualitative version of CUBT, see
the paper by L. Boyer, B. Ghattas and P.Michel.
-
Several functions
have been added:
1. cv.strat for stratified
cross validation sampling
2. rand.samp.strat for
sampling a test sample with stratification.
3. CU to compute category utility
cubt_2.0 delivered on 20th of March 2014
Major modifications :
-
The package has been modified to account now for discrete variables (ordinal).
User may choose between two criteria for splitting.
-
A new criterion may be used for splitting over ordinal variables. It is based
on entropy.
-
Pruning may use any distance based measure (default is Euclidean).
-
The choice of the pruning parameter mindist is
now guided by a new parameter qdist. Before
pruning distances are computed between all the pairs of nodes. qdist is the quantile of those distances whose default value is 0.1.
Other
changes
-
Some bugs crashing R due to memory usage have been fixed
-
Only unique different values of each covariate are considered for splits over
continuous covariates
Older
version, CUBT_1.0 delivered on 02 of April 2013