runAnalyzer-class {mspath}R Documentation

Class to Analyze Runtimes

Description

Assists the analysis of the factors affecting the performance of jobs. It requires individual jobs to be identified by the ID's of the cases the job computed.

Details

runAnalyzer(estimate) constructs an object of this class. estimate should be a dataframe or matrix with each row giving information for a single case. The ‘ID’ column, which must exist, is the ID of the case. Other columns give other values believed to predict how long the computation on the case will take. It should be sensible to sum a predictor across cases to predict how long the cases will take.

A runTime object includes information about a particular job, in particular various measures of time. To use the object with a runAnalyzer requires that the job be constructed with the list of ID's it will analyze, e.g., aRunTime <- runTime(c(1, 56, 90)) for a job that will evaluate cases 1, 56, and 90. If ra is a runAnalyzer, then add the results of a run with ra <- addResult(ra, aRunTime).

Once you are done results <- smoosh(ra) gets you a summary of the runs. Each row corresponds to a single job; the ‘ID’ column will have the ID of the first case in the job. Don't be fooled; the rest of the row applies to the whole job, and hence to all the cases in the job. The columns give the sum of the predictors for each case in the job, and the various performance measures, labelled appropriately: cpu, wall, wait, start, end, rank. See runTime for their meaning.

If you wish to repeat the exercise, use ra <- newRun(ra).

Currently, evaluation of the performance of the predictors is up to you, and if you want to look at previous runs you will need to pull them out with ra@prior.

Objects from the Class

runAnalyzer(estimate) creates one of these objects. The estimate should contain estimates of the runtimes of the jobs that will later be added.

Slots

estimate:
A "matrix" or "data.frame" with one row for each case to be analyzed. There must be a column named ‘ID’ whose values match those identifying a job in the runTimes. Other columns contain possible predictors of how much computation the case will require.
actual:
An initially empty "list" that will hold the runTime's of the jobs.
prior:
Only relevant if you conduct more than one round of analysis, i.e., call newRun. In that case it will be a list of list's. The first entry holds the values in actual for the first run, the second entry holds the second run, and so on.

Methods

addResult{ signature(analyzer = "runAnalyzer", runTime = "runTime"): to add a job and its time.

newRun
signature(analyzer = "runAnalyzer"): Return the analyzer argument, ready for a new round of analysis.
smoosh
signature(analyzer = "runAnalyzer"): ...
smoosh
{signature(analyzer = "runAnalyzer"): returns a data.frame or matrix summarizing the results.}}

Background

This class assists in the following general setting. You have computations to perform on cases; each job may calculate values for one or more cases, e.g., you may distribute the calculation of a likelihood.

You also have information that you think will predict how long the cases will take to compute.

You divide the cases into jobs, perform the computations, and then want to see how well your predictors explain various measures of performance.

Based on this analysis, or perhaps randomly, you may partition the cases in a different way and try again.

This class collects the results of all the runs (including prior rounds if you choose to do more than one round), and can provide a summary of the performance versus the predictors.

Note

Subject to change. Not terribly general at the moment, and rather hackish.

Author(s)

Ross Boylan

See Also

addResult to populate; smoosh to analyze; runTime for internal data.

Examples

   # slightly contrived, since non-distributed
   estimate <- data.frame(ID=c(30, 50), estimate=exp(c(30, 50)))
   analyzer <- runAnalyzer(estimate)
   for( x in c(30, 50) ) {
     rt <- runTime(x)
     mpirank(rt) <- 1
     tjob <- system.time(factorial(x))
     remoteTime(rt) <- c(tjob[1:3], 0) # last number is delay wait
     analyzer <- addResult(analyzer, rt)
   }
   result <- smoosh(analyzer)

[Package mspath version 0.9-9 Index]