lago {lago}R Documentation

The LAGO Suite: An Introductory Overview

Description

LAGO is a tool for finding rare targets in a database.

Details

To find rare targets in a database, LAGO scores every item in the database with a specialized radial basis function network (RBFnet), trained with some learning data. Items with higher scores are deemed by LAGO to be more likely than others to be from the rare class. The RBFnet constructed by LAGO has exactly the same form as the support vector machine (SVM), but it is constructed without the use of any iterative optimization procedure at all, whereas the SVM must solve a quadratic programming problem. Hence, LAGO is a very efficient kernel machine.

A snapshot of the theoretical justification for why iterative optimization can be avoided is as follows: Suppose p1 is the density function of the rare class and p0, the density function of the background class. The RBFnet constructed by LAGO is an adaptive-bandwidth kernel density estimator of p1 adjusted locally by a factor that approximates p0 to the first-order. The resulting ranking function f(x) is thus approximately a monotonic transformation of the posterior probability that item x belongs to the rare class.

The original LAGO uses elliptical radial basis functions which can adapt more flexibly to the training data. A simpler (but perhaps more generally useful) variation that uses spherical radial basis functions is also included in this package. To differentiate the two, the original LAGO is now referred to as eLAGO (e.g., eLago, rank.eLago) and the spherical variation, sLAGO (e.g., sLago, rank.sLago). We now recommended that you use sLAGO instead of eLAGO for your problem.

Like most nonparametric models such as SVM and CART, LAGO need to be fine tuned by a number of control (or regularization) parameters. The two control parameters for LAGO are: K and alpha; see help pages for eLago, sLago and tuneLago for more details. It is important that one select these parameters carefully (e.g., by cross-validation) in order for LAGO to produce reasonable results. Between the two parameters, alpha is much more important than K. Often, it suffices to fix K=5 and tune alpha alone.

Author(s)

Alexandra Laflamme-Sanders and Mu Zhu, University of Waterloo, Canada. Alexandra is the principal C programmer for the package.

References

Zhu M, Su W, Chipman HA (2006). LAGO: A computationally efficient approach for statistical detection. Technometrics, 48(2), 193 – 205.

Zhu M (2008). Kernels and ensembles: Perspectives on statistical learning. The American Statistician, 62(2), 97 – 109. [The relevant section is Section 4.2.]

Examples

## generate some training data
stuff<-generateLAGOexample(n=500,show=TRUE)
X<-stuff$X
y<-stuff$y

## generate some independent test data 
stuff<-generateLAGOexample(n=500,show=TRUE)
Xnew<-stuff$X
ynew<-stuff$y

## sLAGO example
my.lago<-sLago(y, X, kernel="g") 
lago.score<-rank.sLago(my.lago, Xnew)
par(mfrow=c(2,2))
view.sLago(my.lago,X) -> stuff
show.test(stuff, Xnew, ynew, score=lago.score)
avgp(ynew, lago.score)
topn(ynew, lago.score, n=20)

## eLAGO example
my.lago<-eLago(y, X, kernel="g") 
lago.score<-rank.eLago(my.lago, Xnew)
view.eLago(my.lago,X) -> stuff
show.test(stuff, Xnew, ynew, score=lago.score)
avgp(ynew, lago.score)
topn(ynew, lago.score, n=20)

## use the test data above as validation data to tune the control parameters
aGrid<-c(0.05,0.1,0.5,1,2,5,10)
Kgrid<-c(2,5,10,20,30)
tuneLago(y,X,ynew,Xnew,aGrid,Kgrid,kernel="g",version='s')
tuneLago(y,X,ynew,Xnew,aGrid,Kgrid,kernel="g",version='e')

[Package lago version 0.1-1 Index]