LCS {qualV}R Documentation

Algorithm for the Longest Common Subsequence Problem

Description

Determines the longest common subsequence of two strings.

Usage

LCS(a, b)

Arguments

a vector (numeric or character), missing values are not accepted
b vector (numeric or character), missing values are not accepted

Details

A longest common subsequence (LCS) is a common subsequence of two strings of maximum length. The LCS Problem consists of finding a LCS of two given strings and its length (LLCS). The QSI is computed by division of the LLCS over maximum length of 'a' and 'b'.

Value

a vector 'a'
b vector 'b'
LLCS length of LCS
LCS longest common subsequence
QSI quality similarity index
va one possible LCS of vector 'a'
vb one possible LCS of vector 'b'

Note

Basing on the most prominent but simple calculation scheme this algorithm is not very efficient with respect to its time and memory requirements.

References

Wagner, R. A. and Fischer, M. J. (1974) The String-to-String Correction Problem. Journal of the ACM, 21, 168-173.

Paterson, M. and Dancík, V. (1994) Longest Common Subsequences. Mathematical Foundations of Computer Science, 841, 127-142.

Gusfield, D. (1997) Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, England, ISBN 0-521-58519-8.

Examples

# direct use
a <- c("b", "c", "a", "b", "c", "b")
b <- c("a", "b", "c", "c", "b")
LCS(a, b)

# a constructed example
x <- seq(0, 2 * pi, 0.1)  # time
y <- 5 + sin(x)           # a process
o <- y + rnorm(x, sd=0.2) # observation with random error
p <- y + 0.1              # simulation with systematic bias
plot(x, o); lines(x, p)

lcs <- LCS(f.slope(x, o), f.slope(x, p))  # too much noise
lcs$LLCS
lcs$QSI

os <- ksmooth(x, o, kernel = "normal", bandwidth = dpill(x, o), x.points = x)$y
lcs <- LCS(f.slope(x, os), f.slope(x, p))
lcs$LLCS
lcs$QSI

# observed and measured data with non-matching time intervals
data(phyto)
bbobs    <- dpill(obs$t, obs$y)
n        <- tail(obs$t, n = 1) - obs$t[1] + 1
obsdpill <- ksmooth(obs$t, obs$y, kernel = "normal", bandwidth = bbobs,
                    n.points = n)
obss     <- data.frame(t = obsdpill$x, y = obsdpill$y)
obss     <- obss[match(sim$t, obss$t),]
obs_f1   <- f.slope(obss$t, obss$y)
sim_f1   <- f.slope(sim$t, sim$y)
lcs      <- LCS(obs_f1, sim_f1)
lcs$QSI

[Package qualV version 0.2-4 Index]