seqplot {TraMineR} | R Documentation |
High level plot functions for state sequence objects that can produce state distribution, frequency, index, transversal entropy, sequence of modes, meant time, and representative plots.
seqplot(seqdata, group=NULL, type="i", title=NULL, cpal=NULL, missing.color=NULL, ylab=NULL, yaxis=TRUE, axes="all", xtlab=NULL, cex.plot=1, withlegend="auto", ltext=NULL, cex.legend=1, use.layout=(!is.null(group) | withlegend!=FALSE), legend.prop=NA, rows=NA, cols=NA, ...) seqdplot(seqdata, group=NULL, title=NULL, ...) seqfplot(seqdata, group=NULL, title=NULL, ...) seqiplot(seqdata, group=NULL, title=NULL, ...) seqHtplot(seqdata, group=NULL, title=NULL, ...) seqmsplot(seqdata, group=NULL, title=NULL, ...) seqmtplot(seqdata, group=NULL, title=NULL, ...) seqrplot(seqdata, group = NULL, title = NULL, ...)
seqdata |
a state sequence object created with the seqdef function. |
group |
Plots one plot for each level of the factor given as argument. |
type |
the type of the plot. Available types are "d" for state distribution plots, "f" for sequence frequency plots, "Ht" for entropy index plots, "i" for sequence index plots, "ms" for plotting the sequence of modal states, "mt" for mean times plots and "r" for representative sequence plots. |
title |
title for the graphic. Default to NULL. |
cpal |
alternative color palette to use for the states. If user specified, a vector of colors with number of elements equal to the number of distinct states. By default, the 'cpal' attribute of the 'seqdata' sequence object is used (see seqdef ). |
missing.color |
alternative color for representing missing values inside the sequences. By default, this color is taken from the "missing.color" attribute of the sequence object being plotted. |
ylab |
an optional label for the y axis. If set to NA, no label is drawn. |
yaxis |
controls whether a y axis is plotted. If left to 'NULL' , the value is set according to the plot type, i.e. FALSE for type="i" and 'TRUE' for all other types. When set to 'TRUE' , sequence indexes are displayed for "i" , mean time values for "mt" and percentages for "d" and "f" . |
axes |
if set to "all" (default value) x axes are drawn for each plot in the graphic. If set to "bottom" and group is used, axes are drawn only under the plots located at the bottom of the graphic area. If FALSE, no x axis is drawn. |
xtlab |
optional labels for the x axis ticks labels. If unspecified, the column names of the 'seqdata' sequence object are used (see seqdef ). |
cex.plot |
expansion factor for setting the size of the font for the axis labels and names. The default value is 1. Values lesser than 1 will reduce the size of the font, values greater than 1 will increase the size. |
withlegend |
defines if and where the legend of the state colors is plotted. The default value 'auto' sets the position of the legend automatically. Other possible value is 'right'. Obsolete option 'TRUE' is identical to 'auto'. |
ltext |
optional description of the states to appear in the legend. Must be a vector of character strings with number of elements equal to the size of the alphabet. If unspecified, the 'label' attributes of the 'seqdata' sequence object is used (see seqdef ). |
cex.legend |
expansion factor for setting the size of the font for the labels in the legend. The default value is 1. Values lesser than 1 will reduce the size of the font, values greater than 1 will increase the size. |
use.layout |
if TRUE, layout is used to arrange plots when using the group option or plotting a legend. If layout is used, the standard 'par(mfrow=....)' for arranging plots will not work anymore. When withlegend is FALSE and group is NULL, layout is automatically deactivated and 'par(mfrow=....)' will work. |
legend.prop |
sets the proportion of the graphic area used for plotting the legend when use.layout=TRUE and withlegend=TRUE. Default value is set according to the place (bottom or right of the graphic area) where the legend is plotted. Values from 0 to 1. |
rows,cols |
optional arguments to arrange plots when use.layout=TRUE. |
... |
arguments to be passed to the function called to produce the appropriate statistics and the associated plot method (see details), or other graphical parameters. |
seqplot
is the generic function for high level plots of state sequence objects with group splits and automatic display of the color legend. Many different types of plots can be produced by means of the type
argument. Except for sequence index plots, seqplot
first calls the specific function producing the required statistics and then the plot method for objects produced by this function (see below). For sequence index plots, the state sequence object itself is plotted by calling the plot.stslist
method. When splitting by groups and/or displaying the color legend, the layout
function is used for arranging the plots.
The seqdplot, seqfplot, seqiplot, seqHtplot, seqmsplot, seqmtplot
and seqrplot
functions are aliases for calling seqplot
with type
argument set respectively to "d", "f", "i", "Ht", "ms", "mt"
or "r"
.
State distribution plot (type="d"
) represent the sequence of the transversal state frequencies by position (time point) computed by the seqstatd
function.
Sequence frequency plots (type="f"
) display the most frequent sequences, each one with an horizontal stack bar of its successive states. Sequences are ordered bottom-up according to the relative frequencies computed by the seqtab
function. The plot.stslist.freq
plot method is called for producing the plot.
The tlim
optional argument may be specified for selecting the number of sequences to be plotted (default is 10, i.e. the ten most frequent sequences). The width of the bars representing the sequences is by default proportional to the sequences frequencies, but this can be disabled with the pbarw=FALSE
optional argument. If weights have been specified when creating seqdata
, weighted frequencies will be returned by seqtab
since the default option is weighted=TRUE
. See examples below, the seqtab
and plot.stslist.freq
manual pages for a complete list of optional arguments and Müller et al., 2008) for a description of sequence frequency plots.
In sequence index plots (type="i"
), the requested individual sequences are rendered with horizontal stacked bars depicting the states over successive positions (time). Optional arguments are tlim
for specifying the indexes of the sequences to be plotted (defaults to the first ten sequences, i.e tlim=1:10
). For plotting nicely a (big) whole set use tlim=0
together with additional graphical parameter border=NA
and space=0
to suppress bar borders and space between bars. The sortv
argument can be used to pass a vector of numerical values for sorting the sequences. See plot.stslist
for a complete list of optional arguments.
The interest of sequence index plots has for instance been stressed by Scherer (2001) and Brzinsky-Fay et al. (2006). Notice that index plots for thousands of sequences result in very heavy PDF or POSTSCRIPT graphic files. Dramatic file size reduction may be achieved by saving the figures in bitmap format with using for instance the png
graphic device instead of postscript
or pdf
.
The entropy index plot (type="Ht"
) displays the evolution over positions of the transversal entropies (Billari, 2001). Transversal entropies are computed by calling seqstatd
function and then plotted by calling the plot.stslist.statd
plot method.
The modal state sequence plot (type="ms"
) displays the sequence of the modal states with each mode proportional to its frequency at the given position. The seqmodst
function is called which returns the sequence and the result is plotted by calling the plot.stslist.modst
plot method.
The mean time plot (type="mt"
) displays the mean time spent in each state of the alphabet as computed by the link{seqmeant}
function. The plot.stslist.meant
plot method is used to plot the resulting statistics.
The representative sequence plot (type="r"
) displays a reduced, non redundant set of representative sequences extracted from the provided state sequence object and sorted according to a representativeness criterion. The seqrep
function is called to extract the representative set which is then plotted by calling the plot.stslist.rep
method. A distance matrix is required that is passed with the dist.matrix
argument or by calling the seqdist
function if dist.matrix=NULL
. The criterion
argument sets the representativeness criterion used to sort the sequences. See examples below, the seqrep
and plot.stslist.rep
manual pages for a complete list of optional arguments and Gabadinho et al. (2009) for more details on the extraction of representative sets.
Billari, F. C. (2001). The analysis of early life courses: Complex description of the transition to adulthood. Journal of Population Research 18(2), 119-142.
Brzinsky-Fay C., U. Kohler, M. Luniak (2006). Sequence Analysis with Stata. The Stata Journal, 6(4), 435-460.
Gabadinho, A., G. Ritschard, M. Studer and N. S. Müller (2009). Summarizing Sets of Categorical Sequences. In International Conference on Knowledge Discovery and Information Retrieval, Madeira, 6-8 October. INSTICC.
Müller, N. S., A. Gabadinho, G. Ritschard and M. Studer (2008). Extracting knowledge from life courses: Clustering and visualization. In Data Warehousing and Knowledge Discovery, 10th International Conference DaWaK 2008, Turin, Italy, September 2-5, LNCS 5182, Berlin: Springer, 176-185.
Scherer S (2001). Early Career Patterns: A Comparison of Great Britain and West Germany. European Sociological Review, 17(2), 119-144.
## ====================================================== ## Creating state sequence objects from example data sets ## ====================================================== ## biofam data set data(biofam) biofam.lab <- c("Parent", "Left", "Married", "Left+Marr", "Child", "Left+Child", "Left+Marr+Child", "Divorced") biofam.seq <- seqdef(biofam, 10:25, labels=biofam.lab) ## actcal data set data(actcal) actcal.lab <- c("> 37 hours", "19-36 hours", "1-18 hours", "no work") actcal.seq <- seqdef(actcal,13:24,labels=actcal.lab) ## ex1 using weights data(ex1) ex1.seq <- seqdef(ex1, 1:13, weights=ex1$weights) ## ======================== ## Sequence frequency plots ## ======================== ## Plot of the 10 most frequent sequences seqplot(biofam.seq, type="f") ## Grouped by sex seqfplot(actcal.seq, group=actcal$sex, tlim=10) ## Unweighted vs weighted frequencies seqfplot(ex1.seq, weighted=FALSE) seqfplot(ex1.seq, weighted=TRUE) ## ===================== ## Modal states sequence ## ===================== seqplot(biofam.seq, type="ms") ## same as seqmsplot(biofam.seq) ## ==================== ## Representative plots ## ==================== ## Computing a distance matrix ## with OM metric costs <- seqsubm(biofam.seq, method="TRATE") biofam.om <- seqdist(biofam.seq, method="OM", sm=costs) ## Plot of the representative sets grouped by sex ## using the default frequency criterion seqrplot(biofam.seq, dist.matrix=biofam.om, group=biofam$sex) ## Plot of the representative sets grouped by sex ## using the default frequency criterion seqrplot(biofam.seq, group=biofam$sex, dist.matrix=biofam.om) ## Plot of the representative sets grouped by sex ## using the "density" criterion seqrplot(biofam.seq, group=biofam$sex, criterion="density", dist.matrix=biofam.om) ## ==================== ## Sequence index plots ## ==================== ## First ten sequences seqiplot(biofam.seq) ## All sequences sorted by age in 2000 ## grouped by sex ## using 'border=NA' and 'space=0' options to have a nicer plot seqiplot(actcal.seq, group=actcal$sex, tlim=0, border=NA, space=0, sortv=actcal$age00) ## =================== ## Entropy index plots ## =================== seqplot(biofam.seq, type="Ht", group=biofam$sex) ## ======================= ## State distribution plot ## ======================= ## Grouped by sex seqplot(actcal.seq, type="d", group=actcal$sex) ## Sequence index plot (first 10 seq.) ## for the actcal data set ## grouped by sex seqplot(actcal.seq, type="i", group=actcal$sex) ## =============== ## Meant time plot ## =============== ## actcal data set, grouped by sex seqplot(actcal.seq, type="mt", group=actcal$sex) ## biofam data set, grouped by sex seqmtplot(biofam.seq, group=biofam$sex)