disstree {TraMineR}R Documentation

Dissimilarity Tree

Description

Tree structured discrepancy analysis of non-measurable objects described by their pairwise dissimilarities.

Usage

disstree(formula, data= NULL, minSize = 0.05, maxdepth = 5,
   R = 1000, pval = 0.01)

Arguments

formula A formula where the left hand side is a dissimilarity matrix and the right hand specifies the candidate partitioning variables to partition the population
data a data frame where arguments in formula will be searched
minSize minimum number of cases in a node, in percentage if less than 1.
maxdepth maximum depth of the tree
R Number of permutations used to assess the significance of the split.
pval Maximum p-value, in percent

Details

The procedure iteratively splits the data. At each step, the procedure selects the variable and split that explains the biggest part of the discrepancy, i.e. the split for which we get the highest pseudo R2. The significance of the retained split is assessed through a permutation test.

Value

An object of class disstree that contains the following components:

root A node object (see below), root of the tree
adjustment A dissassoc object
formula The formula used to generate the tree
split Selected predictor, NULL for terminal nodes
vardis Node discrepancy, see dissvar
children Child nodes, NULL for terminal nodes
ind Index of individuals in this node
depth Depth of the node, starting from root node
label Node label
R2 R squared of the split, NULL for terminal nodes

References

Studer, M., G. Ritschard, A. Gabadinho and N. S. Müller (2009). Analyse de dissimilarités par arbre d'induction. Revue des Nouvelles Technologies de l'Information, EGC'2009.

Batagelj, V. (1988). Generalized ward and related clustering problems. In H. Bock (Ed.), Classification and related methods of data analysis, pp. 67-74. North-Holland, Amsterdam.

Anderson, M. J. (2001). A new method for non-parametric multivariate analysis of variance. Austral Ecology 26, 32-46.

Piccarreta, R. et F. C. Billari (2007). Clustering work and family trajectories by using a divisive algorithm. Journal of the Royal Statistical Society A 170(4), 1061-1078.

See Also

seqtree2dot to generate graphic representation of disstree objects when analyzing state sequences.

disstree2dot is a more general interface to generate such representation.

dissvar to compute discrepancy using dissimilarities and for a basic introduction to discrepancy analysis.

dissassoc to test association between objects represented by their dissimilarities and a covariate.

dissmfac to perform multi-factor analysis of variance from pairwise dissimilarities.

disscenter to compute the distance of each object to its center of group from pairwise dissimilarities.

Examples

data(mvad)

## Defining a state sequence object
mvad.seq <- seqdef(mvad[, 17:86])

## Computing dissimilarities
mvad.lcs <- seqdist(mvad.seq, method="LCS")
dt <- disstree(mvad.lcs~ male + Grammar + funemp + gcse5eq + fmpr + livboth, 
    data=mvad, R = 10)
print(dt)

## Using simplified interface to generate a file for GraphViz
seqtree2dot(dt, "mvadseqtree", seqdata=mvad.seq, type="d",
        border=NA, withlegend=FALSE, axes=FALSE, ylab="", yaxis=FALSE)

## Generating a file for GraphViz
disstree2dot(dt, "mvadtree", imagefunc=seqdplot, imagedata=mvad.seq, 
        ## Additional parameters passed to seqdplot
        withlegend=FALSE, axes=FALSE, ylab="")
  
## Second method, using a specific function
myplotfunction <- function(individuals, seqs, mds,...) {
        par(font.sub=2, mar=c(3,0,6,0), mgp=c(0,0,0))

        ## using mds to order sequence in seqiplot
        mds <- cmdscale(seqdist(seqs[individuals,], method="LCS"),k=1)
        seqiplot(seqs[individuals,], sortv=mds,...)
        }
 
## Generating a file for GraphViz
## If imagedata is not set, index of individuals are sent to imagefunc
disstree2dot(dt, "mvadtree", imagefunc=myplotfunction, title.cex=3,
        ## additional parameters passed to myplotfunction
        seqs=mvad.seq, mds=mvad.mds,
        ## additional parameters passed to seqiplot (through myplotfunction)
        withlegend=FALSE, axes=FALSE,tlim=0,space=0, ylab="", border=NA)

## To run GraphViz (dot) from R and generate an "svg" file
## shell("dot -Tsvg -O mvadtree.dot")

[Package TraMineR version 1.2-1 Index]