seqdef {TraMineR} | R Documentation |
Create a state sequence object with attributes such as alphabet, color palette and state labels. Most TraMineR functions for state sequences require such a state sequence object as input argument. There are specific methods for plotting, summarizing and printing state sequence objects.
seqdef(data, var=NULL, informat="STS", stsep=NULL, alphabet=NULL, states=NULL, id=NULL, weights=NULL, start=1, left=NA, right="DEL", gaps=NA, missing=NA, void="%", nr="*", cnames=NULL, cpal=NULL, missing.color="darkgrey", labels=NULL, ...)
data |
a data frame or matrix containing sequence data. |
var |
the list of columns containing the sequences. Defaut to NULL , ie all the columns. Whether the sequences are in the compressed (successive states in a character string) or extended format is automatically detected. |
informat |
format of the original data. Default is 'STS'. Avalaible formats are: STS, SPS, SPELL. See TraMineR user's manual (Gabadinho et al., 2008) for a description of the formats. |
stsep |
the character used as separator in the original data if input format is successive states in a character string. If NULL (default value), the seqfcheck function is called for detecting automatically a separator among "-" and ":". Other separators must be specified explicitely. |
alphabet |
optional vector containing the alphabet (the list of all possible states). Use this option if some states in the alphabet don't appear in the data or if you want to reorder the states. The specified vector MUST contain AT LEAST all the states appearing in the data. It may possibly contain additional states not appearing in the data. If NULL, the alphabet is set to the distinct states appearing in the data as returned by the seqstatl function. |
states |
an optional vector containing the labels for the states. Must have a length equal to the number of states in the data, and the labels must be ordered accordingly with the values returned by the seqstatl function. |
id |
optional argument for setting the rownames of the sequence object. If NULL (default), the rownames are taken from the input data. If set to "auto", sequences are number 1 to number of sequences. A vector containing the rownames of length equal to number of sequences may be specified as well. |
weights |
optional numerical vector containing weights, which may be used by some functions to compute weighted statistics. EXPERIMENTAL. |
start |
starting time. For instance, if your sequences begin at age 15, you can specify 15. At this stage, used only for labelling column names. |
left |
the behavior for missing values appearing before the first (leftmost) valid state in each sequence. See Gabadinho et al. (2008) for more details on the options for handling missing values when defining sequence objects. By default, left missing values are treated as 'real' missing values and converted to the internal missing value code defined by the nr option. Other options are "DEL" to delete the positions containing missing values or a state code (belonging to the alphabet or not) to replace the missing values. |
right |
the behavior for missing values appearing after the last (rightmost) valid state in each sequence. Same options as for the left argument. |
gaps |
the behavior for missing values appearing inside the sequences, i.e. after the first (leftmost) valid state and before the last (rightmost) valid state of each sequence. Same options as for the left argument. |
missing |
the code used for missing values in the input data. When specified, all cells containing this value will be replaced by NA's, the internal R code for missing values. If 'missing' is not specified, cells containing NA's are considered to be missing values. |
void |
the internal code used by TraMineR for representing void elements in the sequences. Default is "% ". |
nr |
the internal code used by TraMineR for representing real missing elements in the sequences. Default is "* ". |
cnames |
optional names for the columns composing the sequence data. Those names will be used by default in the graphics as axis labels. If NULL (default), names are taken from the original column names in the data. |
cpal |
an optional color palette for representing the states in the graphics. If NULL (default), a color palette is created by calling the brewer.pal function of the RColorBrewer package. If number of states is less or equal than 8, the "Accent" palette is used. If number of states is between 8 and 12, the "Set3" palette is used. If the number of states in the data is greater than 12, you have to specify your own palette. The list of available colors is displayed by the colors function. You can also use alternatively some other palettes from the RColorBrewer package. |
missing.color |
alternative color for representing missing values inside the sequences. Defaults to "darkgrey". |
labels |
optional state labels used for the color legend of TraMineR's graphics. If NULL (default), the state names in the alphabet are used as state labels as well. |
... |
options passed to the seqformat function for handling input data that is not in STS format. |
Applying subscripts to sequence objects (eg. seq[,1:5] or seq[1:10,]
) returns a state sequence object with some attributes preserved (alphabet, missing) and some others (start, column names) adapted to the selected column or row subset. If only one column is specified, a factor is returned.
An object of class stslist
. There are print
, plot
and summary
methods for such objects. State sequence objects are required as argument to other functions such as plotting functions (seqdplot, seqiplot or seqfplot), functions to compute distances (seqdist), etc...
Gabadinho, A., G. Ritschard, M. Studer and N. S. Müller (2008). Mining Sequence Data in R
with TraMineR
: A user's guide. Department of Econometrics and Laboratory of Demography, University of Geneva.
plot.stslist
to plot state sequence objects, seqplot
for high level plots of state sequence objects, seqecreate
to create an event sequence object, seqformat
for options to handle several longitudinal data formats.
## Creating a sequence object with the columns 13 to 24 ## in the 'actcal' example data set data(actcal) actcal.seq <- seqdef(actcal,13:24, labels=c("> 37 hours", "19-36 hours", "1-18 hours", "no work")) ## Displaying the first 10 rows of the sequence object actcal.seq[1:10,] ## Displaying the first 10 rows of the sequence object ## in SPS format print(actcal.seq[1:10,], format="SPS") ## Plotting the first 10 sequences plot(actcal.seq) ## Re-ordering the alphabet actcal.seq <- seqdef(actcal,13:24,alphabet=c("B","A","D","C")) alphabet(actcal.seq) ## Adding a state not appearing in the data to the ## alphabet actcal.seq <- seqdef(actcal,13:24,alphabet=c("A","B","C","D","E")) alphabet(actcal.seq) ## Adding a state not appearing in the data to the ## alphabet and changing the states labels actcal.seq <- seqdef(actcal,13:24, alphabet=c("A","B","C","D","E"), states=c("FT","PT","LT","NO","TR")) alphabet(actcal.seq) actcal.seq[1:10,] ## ============================ ## Example with missings values ## ============================ data(ex1) ## With right="DEL" default value seqdef(ex1,1:13) ## Eliminating 'left' missing values seqdef(ex1,1:13, left="DEL") ## Eliminating 'left' missing values and gaps seqdef(ex1,1:13, left="DEL", gaps="DEL") ## ==================== ## Example with weights ## ==================== ex1.seq <- seqdef(ex1, 1:13, weights=ex1$weights) ## weighted sequence frequencies seqtab(ex1.seq)