kml-package {kml} | R Documentation |
KmL
is a new implematation of k-means for longitudinal data (or trajectories).
Here is an overview of the package. For the description of the
algorithm, see kml
.
Package: | kml |
Type: | Package |
Version: | 0.9.1 |
Date: | 2009-01-01 |
License: | GPL (>= 2) |
Lazyload: | yes |
Depends: | methods,clv |
URL: | http://www.r-project.org |
URL: | http://christophe.genolini.free.fr/kml |
To clusterize data, KmL
go through three steps, each of which
is associated to some functions:
KmL
works on object of class ClusterizLongData
(abreviated cld
).
Data preparation therfore simply consists in tranforming data into an object ClusterizLongData
.
This can be done via function
clusterizLongData
(cld
in short) or
as.clusterizLongData
(as.cld
in short).
The formers let the user build some data from scratch, the latters
convert a data.frame
in ClusterizLongData
.
Instead of working on real data, one can also work on artificial
data. Such data can be created with
generateArtificialLongData
(gald
in short). The resulting data
will be of class ArtificialLongData
which is a subclass of ClusterizLongData
.
Once an object of class ClusterizLongData
has been created, the algorithm
kml
can be run.
Starting with a ClusterizLongData
, kml
built a Clusterization
.
A object of class Clusterization
is a partition of trajectories
into subgroups that also contains some information like the
percentage of trajectories contained in each group or the Calinski &
Harabasz criterion.
kml
is a "hill-climbing" algorhithm. The specificity of this
kind of algorithm is that it always converges towards a maximum, but
one cannot know whether it is a local or a global maximum. It offers
no guarantee of optimality.
To maximize one's chances of getting a quality Clusterization
, it is better to execute the hill climbing algorithm several times,
then to choose the best solution. By default, kml
executes the hill climbing algorithm 20 times
and chooses the Clusterization
maximising the determinant of the matrix between.
Likewise, it is not possible to know beforehand the optimum number of clusters.
On the other hand, afterwards, it is possible to calculate
clues that will enable us to choose. kml
uses the Calinski &
Harabasz criterion.
In the end, kml
tests by default 2, 3, 4, 5 et 6 clusters, 20 times each.
When kml
has constructed some
Clusterization
, the user can examine them one by one and choose
to export some. This can be done via function
choice
. choice
opens a graphic windows showing
various information including the trajectories cluterized by a specific
Clusterization
.
When some Clusterization
has been selected (the user can select
more than 1), it is possible to
save them. The clusters are therefore exported towards the file
nom-cluster.csv
. Criteria are exported towards
nom-criteres.csv
. The graphs are exported according to their
extention.
Christophe Genolini
PSIGIAM: Paris Sud Innovation Group in Adolescent Mental Health
INSERM U669 / Maison de Solenn / Paris
Contact author : <genolini@u-paris10.fr>
Raphaël Ricaud
Laboratoire "Sport & Culture" / "Sports & Culture" Laboratory
University of Paris 10 / Nanterre
Article submited
Web site: http://christophe.genolini.free.fr/kml
Overview: kml-package
Classes : ClusterizLongData
, Clusterization
, ArtificialLongData
Methods : clusterizLongData
, kml
, generateArtificialLongData
, choice
, as.clusterizLongData
Plot : plot: overview
, plot(ClusterizLongData)
,
plot(Calinski)
,
plotSubGroups(ClusterizLongData)
, plotAll(ClusterizLongData)
### 1. Data Preparation myCld <- as.clusterizLongData(generateArtificialLongData()) ### 2. Building "optimal" clusterization (with only 3 redrawings) kml(myCld,nbRedrawing=3,printCal=TRUE,printTraj=TRUE) ### 3. Exporting results try(choice(myCld))