summary.CountingProcessSamplePath {STAR} | R Documentation |
These functions / methods are designed to test a
CountingProcessSamplePath
object against a uniform Poisson
process with rate 1.
## S3 method for class 'CountingProcessSamplePath': summary(object, exact = TRUE, lag.max = NULL, d = max(c(2, sqrt(length(object$ppspFct()))%/%5)), ...) ## S3 method for class 'CountingProcessSamplePath.summary': print(x, digits = 5, ...) ## S3 method for class 'CountingProcessSamplePath.summary': plot(x, y, which = c(1,2,6,8), main, caption = c(expression(paste("Uniform on ", Lambda," Test")), "Berman's Test", "Log Survivor Function", expression(paste(U[k+1]," vs ", U[k])), "Variance vs Mean Test", "Wiener Process Test", "Autocorrelation Fct.", "Renewal Test"), ask = FALSE, lag.max = NULL, d = max(c(2, sqrt(length(eval(x$call[[2]])$ppspFct()))%/%5)), ...)
object |
A CountingProcessSamplePath object. |
exact |
Should an exact Kolmogorov test be used? See ks.test . |
lag.max |
See renewalTestPlot . |
d |
See renewalTestPlot . |
x |
A CountingProcessSamplePath.summary object. |
digits |
An integer , the number of digits to be used while
printing summaries. See round . |
y |
Not used but required for compatibility with the
plot method. |
which |
If a subset of the test plots is required, specify a subset of
the numbers 1:6 . |
main |
Title to appear above the plots, if missing the
corresponding element of caption will be used. |
caption |
Default caption to appear above the plots or, if
main is given, bellow it |
ask |
A logical ; if TRUE , the user is asked to hit
the return key before
each plot generation, see par(ask=.) . |
... |
Passed to chisq.test used internally by
summary , not used in plot and print . |
If the CountingProcessSamplePath
object x
is a the
realization of a homogeneous Poisson process then, conditioned on the number of
events observed, the location of the events (jumps in N(t)) is uniform on the
period of observation. This is a basic property of
the homogeneous Poisson process derived in Chap. 2 of Cox and Lewis
(1966) and Daley and Vere-Jones (2003). Component UniformGivenN
of a CountingProcessSamplePath.summary
list contains the p.value of
the Kolmogorov test of this uniform null hypothesis. The first graph
generated by the plot
method displays the Kolgorov test
graphically (i.e., the empirical cumulative distribution
function and the null hyptohesis (the diagonal). The two
dotted lines on both sides of the diagonal correspond to 95 and
99% (asymptotic) confidence intervals. This is the graph shown on Fig. 9 (p 19) of
Ogata (1988). Notice that the summary
method allows you to compute the exact p.value.
If we write x[i] the jump locations of the
CountingProcessSamplePath
object x
and if the latter is
the realization of a homogeneous Poisson process then the intervals:
y[i]=x[i+1]-x[i]
are realizations of iid rvs from an exponential distribution with rate 1 and the:
u[i]=1 - exp(-y[i])
are realizations of iid rvs from a uniform distribution on [0,1). The second graph
generated by the plot
method tests this uniform distribution hypotheses with
a Kolmogorov Test. This is the graph shown on Fig. 10 (p 19) of
Ogata (1988) which was suggested by Berman. This is also the one of
the graphs proposed by Brown et al (2002) (the other one is a Q-Q plot for the
same quantities). The two dotted lines on both sides of the diagonal correspond to 95 and
99% (asymptotic) confidence intervals. Component BermanTest
of
a CountingProcessSamplePath.summary
list contains the p.value of
the Kolmogorov test of this uniform null hypothesis.
Following the line of the previous paragraph, if the distribution of
the y[i] is an exponential distribution with rate 1, then
their survivor function is: exp(-y). This is what's
shown on the third graph generated by the plot
method, using a log scale for
the ordinate. The point wise CI at 95 and 99% are also drawn (dotted
lines). This is the graph shown on Fig. 12 (p 20) of
Ogata (1988)
If the u[i] of the second paragraph are realizations of iid uniform rvs on
[0,1) then a plot of u[i+1] vs u[i] should
fill uniformly the unit square [0,1) x [0,1). This is the fourth
generated graph (the one shown on Fig. 11 (p 20) of
Ogata (1988)) by the plot
method while the seventh graph shows
the autocorrelation function of the u[i]s. Component RenewalTest
of a
CountingProcessSamplePath.summary
list contains a slightly more
elaborated (and quantitative) version of this test summarizing the
fourth graph (bottom right) generated by a call to
renewalTestPlot
. This list element is itself a list with
elements: chi2.95
(a logical
), chi2.99
(a
logical
) and total
(an integer
). The bounds resulting from repetitively
testing a sequence of what are, under the null hypothesis, iid
chi2 rvs are obtained using Donsker's Theorem
(see bellow). For each lag the number of degrees of freedom of the
chi2 distribution is subtracted from each
chi2 value. These centered values are then divided by
their sd (assuming the null hypothesis is correct). The cumulative sum
of the centered and scaled sequence is formed and is divided by the
square root of the maximal lag used. This is "plugged-in" the
Donsker's Theorem. The eighth graph of the plot
method displays
the resulting Wiener process. With the tight confidence regions of
Kendall et al (2007), see bellow.
If the x[i] are realization of a homogeneous Poisson
process observed between 0 and T, then
the number of events observed on non-overlapping windows of length t
should be iid Poisson rv with mean t (and variance t). The observation
period is therefore chopped into non-overlapping windows of increasing length
and the empirical variance of the event count is graphed versus the
empirical mean, together with 95 and 99% CI (using a normal
approximation). This is done by calling internally
varianceTime
. That's what's generated by the fifth graph
of the plot
method. This is the graph shown on Fig. 13 (p 20) of
Ogata (1988). Component varianceTimeSummary
of a
CountingProcessSamplePath.summary
list contains a summary of
this test, counting the number of events out of each band.
The last graph generated by the plot
method and the companions
components, Wiener95
and Wiener99
, of a
CountingProcessSamplePath.summary
list represent "new" tests
(as far as I know). They are based on the fact that if the
y[i] above are realizations of iid rvs following an exponential distribution
with rate 1, then the w[i]=y[i]-1 are realizations of
iid rvs with mean 0
and variance 1. We can then form the partial sums:
S[n]=w[1]+...+w[n]
and define the random right continuous with a left-hand limit functions on [0,1]:
S[floor(n*t)]/sqrt(n)
These functions are realizations of a process which converges (weakly)
to a Wiener process on [0,1]. The proof of this statement is a corollary of Donsker's Theorem
and can be found on pp 146-147, Theorem 14.1, of Billingsley (1999). I
thank Vilmos Prokaj for pointing this reference to me.What is then
done is testing if the putative Wiener process is entirely within the
tight boundaries defined by Kendall et al (2007) for a true Wiener
process, see crossTight
.
summary.CountingProcessSamplePath
returns a
CountingProcessSamplePath.summary
object which is a list
with the following components:
UniformGivenN |
A numeric , the p.value of the Kolmogorov
test of uniformity of the events times given the number of events. |
Wiener95 |
A logical : is the scaled martingale within the
tight 95% confidence band? |
Wiener99 |
A logical : is the scaled martingale within the
tight 99% confidence band? |
BermanTest |
A numeric , the p.value of the Kolmogorov
test of uniformity of the scaled inter events intervals. |
RenewalTest |
A list with components:
chi2.95 , chi2.99 and total . chi2.95
resp. chi2.99 is a logical and is TRUE if the
Wiener process obtained as described above is within the "tight" 95% resp. 99% confidence band of Kendall et al (2007). total gives the total number of
lags. See renewalTestPlot . |
varianceTime |
A varianceTime object. |
varianceTimeSummary |
A numeric vector with components:
total , out95 and out99 . total gives the total number of
window sizes explored. out95 gives the number of windows within
which the variance is out of the 95% confidence band. out99
gives the number of windows within
which the variance is out of the 99% confidence band. See varianceTime . |
n |
An integer , the number of events. |
call |
The matched call. |
I thank Vilmos Prokaj for pointing out Donsker's Theorem and for indicating me the proof's location (Patrick Billingsley's book).
I also thank Olivier Faugeras and Jonathan Touboul for pointing out Donsker's therom to me.
If you wan these tests to be meaningful do not apply them to the data you just used to fit your conditional intensity model.
These functions / methods are designed to replace the
summary.transformedTrain
and plot.transformedTrain
ones. The former have a more general design.
Of course to be fully usable, these functions must be coupled to
functions allowing users to fit conditional intensity models.The
support for that in STAR
is not complete yet but is coming
soon. See for now the example bellow.
The end of the example bellow (not ran by default) shows that the coverage probability of the Wiener Process confidence bands are really good even for small (50) sample sizes.
Christophe Pouzat christophe.pouzat@gmail.com
Patrick Billingsley (1999) Convergence of Probability Measures. Wiley - Interscience.
Brillinger, D. R. (1988) Maximum likelihood analysis of spike trains of interacting nerve cells. Biol. Cybern. 59: 189–200.
Brown, E. N., Barbieri, R., Ventura, V., Kass, R. E. and Frank, L. M. (2002) The time-rescaling theorem and its application to neural spike train data analysis. Neural Computation 14: 325-346.
D. R. Cox and P. A. W. Lewis (1966) The Statistical Analysis of Series of Events. John Wiley and Sons.
Daley, D. J. and Vere-Jones D. (2003) An Introduction to the Theory of Point Processes. Vol. 1. Springer.
Ogata, Yosihiko (1988) Statistical Models for Earthquake Occurrences and Residual Analysis for Point Processes. Journal of the American Statistical Association 83: 9-27.
Johnson, D.H. (1996) Point process models of single-neuron discharges. J. Computational Neuroscience 3: 275–299.
W. S. Kendall, J.- M. Marin and C. P. Robert (2007) Brownian Confidence Bands on Monte Carlo Output. Statistics and Computing 17: 1–10. Preprint available at: http://www.ceremade.dauphine.fr/%7Exian/kmr04.rev.pdf
mkCPSP
,
as.CPSP
,
plot.CountingProcessSamplePath
,
print.CountingProcessSamplePath
,
varianceTime
,
crossTight
,
renewalTestPlot
,
ks.test
## load one spike train data set of STAR data(e060824spont) ## Create the CountingProcessSamplePath object n1spt.cp <- as.CPSP(e060824spont[["neuron 1"]]) ## print it n1spt.cp ## plot it plot(n1spt.cp) ## get the summary ## Notice the warning due to few identical interspike intervals ## leading to an inaccurate Berman's test. summary(n1spt.cp) ## Simulate data corresponding to a renewal process with ## an inverse Gaussian ISI distribution in the spontaneous ## regime modulated by a multiplicative stimulus whose time ## course is a shifted and scaled chi2 density. ## Define the "stimulus" function stimulus <- function(t, df=5, tonset=5, timeFactor=5, peakFactor=10) { dchisq((t-tonset)*timeFactor,df=df)*peakFactor } ## Define the conditional intensity / hazard function hFct <- function(t, tlast, df=5, tonset=5, timeFactor=5, peakFactor=10, mu=0.075, sigma2=3 ) { hinvgauss(t-tlast,mu=mu,sigma2=sigma2)*exp(stimulus(t,df,tonset,timeFactor,peakFactor)) } ## define the function simulating the train with the thinning method makeTrain <- function(tstop=10, peakCI=200, preTime=5, df=5, tonset=5, timeFactor=5, peakFactor=10, mu=0.075, sigma2=3) { result <- numeric(500) - preTime - .Machine$double.eps result.n <- 500 result[1] <- 0 idx <- 1 currentTime <- result[1] while (currentTime < tstop+preTime) { currentTime <- currentTime+rexp(1,peakCI) p <- hFct(currentTime, result[idx], df=df, tonset=tonset+preTime, timeFactor=timeFactor, peakFactor=peakFactor, mu=mu, sigma2=sigma2)/peakCI rthreshold <- runif(1) if (p>1) stop("Wrong peakCI") while(p < rthreshold) { currentTime <- currentTime+rexp(1,peakCI) p <- hFct(currentTime, result[idx], df=df, tonset=tonset+preTime, timeFactor=timeFactor, peakFactor=peakFactor, mu=mu, sigma2=sigma2)/peakCI if (p>1) stop("Wrong peakCI") rthreshold <- runif(1) } idx <- idx+1 if (idx > result.n) { result <- c(result,numeric(500)) - preTime - .Machine$double.eps result.n <- result.n + 500 } result[idx] <- currentTime } result[preTime < result & result <= tstop+preTime] - preTime } ## set the seed set.seed(20061001) ## "make" the train t1 <- makeTrain() ## create the corresponding CountingProcessSamplePath ## object cpsp1 <- mkCPSP(t1) ## print it cpsp1 ## test it cpsp1.summary <- summary(cpsp1) cpsp1.summary plot(cpsp1.summary) ## Define a function returning the conditional intensity function (cif) ciFct <- function(t, tlast, df=5, tonset=5, timeFactor=5, peakFactor=10, mu=0.075, sigma2=3 ) { sapply(t, function(x) { if (x <= tlast[1]) return(1/mu) y <- x-max(tlast[tlast<x]) hinvgauss(y,mu=mu,sigma2=sigma2)*exp(stimulus(x,df,tonset,timeFactor,peakFactor)) } ) } ## Compute the cif of the train tt <- seq(0,10,0.001) lambda.true <- ciFct(tt,cpsp1$ppspFct()) ## plot it together with the events times ## Notice that the representation is somewhat inaccurate, the cif ## is in fact a left continuous function plot(tt,lambda.true,type="l",col=2) rug(cpsp1$ppspFct()) ## plot the integrated intensity function and the counting process plot(tt,cumsum(lambda.true)*0.001,type="l",col=2) lines(cpsp1) ## define a function doing the time transformation / rescaling ## by integrating the cif and returning another CountingProcessSamplePath transformCPSP <- function(cpsp, ciFct, CIFct, method=c("integrate","discrete"), subdivisions=100, ... ) { if (!inherits(cpsp,"CountingProcessSamplePath")) stop("cpsp should be a CountingProcessSamplePath objet") st <- cpsp$ppspFct() n <- length(st) from <- cpsp$from to <- cpsp$to if (missing(CIFct)) { if (method[1] == "integrate") { lwr <- c(from,st) upr <- c(st,to) Lambda <- sapply(1:(n+1), function(idx) integrate(ciFct, lower=lwr[idx], upper=upr[idx], subdivisions=subdivisions, ...)$value ) Lambda <- cumsum(Lambda) st <- Lambda[1:n] from <- 0 to <- Lambda[n+1] } ## End of conditional on method[1] == "integrate" if (method[1] == "discrete") { lwr <- c(from,st) upr <- c(st,to) xx <- unlist(lapply(1:(n+1), function(idx) seq(lwr[idx], upr[idx], length.out=subdivisions) ) ) Lambda <- cumsum(ciFct(xx[-length(xx)])*diff(xx)) Lambda <- Lambda - Lambda[1] st <- Lambda[(1:n)*subdivisions] from <- 0 to <- Lambda[length(Lambda)] } ## End of conditional on method[1] == "discrete" } else { result <- CIFct(c(from,st,to)) result <- result-result[1] from <- result[1] to <- result[n+2] st <- result[2:(n+1)] } ## End of conditional on missing(CIFct) mkCPSP(st,from,to) } ## transform cpsp1 cpsp1t <- transformCPSP(cpsp1,function(t) ciFct(t,cpsp1$ppspFct())) ## test it cpsp1t.summary <- summary(cpsp1t) cpsp1t.summary plot(cpsp1t.summary) ## Not run: ## compare the finite sample performances of the ## Kolmogorov test (test the uniformity of the ## jump times given the number of events) with the ## ones of the new "Wiener process test" empiricalCovProb <- function(myRates=c(10,(1:8)*25,(5:10)*50,(6:10)*100), nbRep=1000, exact=NULL ) { b95 <- function(t) 0.299944595870772 + 2.34797018726827*sqrt(t) b99 <- function(t) 0.313071417065285 + 2.88963206734397*sqrt(t) result <- matrix(numeric(4*length(myRates)),nrow=4) colnames(result) <- paste(myRates) rownames(result) <- c("ks95","ks99","wp95","wp99") for (i in 1:length(myRates)) { rate <- myRates[i] partial <- sapply(1:nbRep, function(repIdx) { st <- cumsum(rexp(5*rate,rate)) while(max(st) < 1) st <- c(st,max(st)+cumsum(rexp(5*rate,rate))) st <- st[st<=1] ks <- ks.test(st,punif,exact=exact)$p.value w <- (st*rate-seq(st))/sqrt(rate) c(ks95=0.95 < ks, ks99=0.99 < ks, wp95=any(w < -b95(st) | b95(st) < w), wp99=any(w < -b99(st) | b99(st) < w) ) } ) result[,i] <- apply(partial,1,sum) } attr(result,"nbRep") <- nbRep attr(result,"myRates") <- myRates attr(result,"call") <- match.call() result/nbRep } plotCovProb <- function(covprob,ci=0.95) { nbMax <- max(attr(covprob,"myRates")) plot(c(0,nbMax),c(0.94,1), type="n", xlab="Expected number of Spikes", ylab="Empirical cov. prob.",xaxs="i",yaxs="i") nbRep <- attr(covprob,"nbRep") polygon(c(0,nbMax,nbMax,0), c(rep(qbinom((1-ci)/2,nbRep,0.95)/nbRep,2),rep(qbinom(1-(1-ci)/2,nbRep,0.95)/nbRep,2)), col="grey50",border=NA) polygon(c(0,nbMax,nbMax,0), c(rep(qbinom((1-ci)/2,nbRep,0.99)/nbRep,2),rep(qbinom(1-(1-ci)/2,nbRep,0.99)/nbRep,2)), col="grey50",border=NA) nbS <- attr(covprob,"myRates") points(nbS,1-covprob[1,],pch=3) points(nbS,1-covprob[2,],pch=3) points(nbS,1-covprob[3,],pch=1) points(nbS,1-covprob[4,],pch=1) } system.time(covprobA <- empiricalCovProb()) plotCovProb(covprobA) ## End(Not run)