traceWminim {nsRFA} | R Documentation |
Formation of disjoint regions for Regional Frequency Analysis.
traceWminim (X, centers) sumtraceW (clusters, X) nearest (clusters, X)
X |
a numeric matrix of characteristics, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns) |
centers |
the number of clusters |
clusters |
a numeric vector containing the subdivision of X in clusters |
The Euclidean distance is used. Given p different classification variables, the distance between two elements i and j is:
d_ij = sqrt{1/p sum[h from 1 to p](x_hi - x_hj)^2}
where x_hi is the value of the h-th variable of the i-th element.
The function traceWminim
is a composition of a jerarchical algorithm, the Ward (1963) one, and an optimisation procedure consisting in the minimisation of:
W = sum[i from 1 to k](sum[j from 1 to ni]delta_ij^2)
where
k is the number of clusters (obtained initially with Ward's algorithm), ni is the number of sites in the i-th cluster and delta_ij is the Euclidean distance between the j-th element of the i-th group and the center of mass of the i-th cluster.
W is calculated with sumtraceW
.
The algorithm consist in moving a site from one cluster to another if this makes W decrease.
traceWminim
gives a vector defining the subdivision of elements characterized by X
in n=centers
clusters.
sumtraceW
gives W (it is used by traceWminim
).
nearest
gives the nearest site to the centers of mass of clusters (it is used by traceWminim
).
Alberto Viglione, e-mail: alviglio@tiscali.it.
Everitt, B. (1974) Cluster Analysis. Social Science Research Council. Halsted Press, New York.
Hosking, J.R.M. and Wallis, J.R. (1997) Regional Frequency Analysis: an approach based on L-moments, Cambridge University Pre ss, Cambridge, UK.
Viglione A., Claps P., Laio F. (2006) Utilizzo di criteri di prossimit`a nell'analisi regionale del deflusso annuo, XXX Convegno di Idraulica e Costruzioni Idrauliche - IDRA 2006, Roma, 10-15 Settembre 2006.
Viglione A. (2007) Metodi statistici non-supervised per la stima di grandezze idrologiche in siti non strumentati, PhD thesis, In press.
Ward J. (1963) Hierarchical grouping to optimize an objective function, Journal of the American Statistical Association, 58, pp. 236-244.
data(hydroSIMN) parameters summary(parameters) # traceWminim param <- parameters[c("Hm","Ybar")] n <- dim(param)[1]; k <- dim(param)[2] param.norm <- (param - matrix(mean(param),nrow=n,ncol=k,byrow=TRUE))/matrix(sd(param), nrow=n,ncol=k,byrow=TRUE) clusters <- traceWminim(param.norm,4); names(clusters) <- parameters["cod"][,] clusters annualflows summary(annualflows) x <- annualflows["dato"][,] cod <- annualflows["cod"][,] fac <- factor(annualflows["cod"][,],levels=names(clusters[clusters==1])) x1 <- annualflows[!is.na(fac),"dato"] cod1 <- annualflows[!is.na(fac),"cod"] #HW.tests(x1,cod1) # it takes some time fac <- factor(annualflows["cod"][,],levels=names(clusters[clusters==3])) x3 <- annualflows[!is.na(fac),"dato"] cod3 <- annualflows[!is.na(fac),"cod"] #HW.tests(x3,cod3) # it takes some time