runAnalyzer-class {mspath} | R Documentation |
Assists the analysis of the factors affecting the performance of jobs. It requires individual jobs to be identified by the ID's of the cases the job computed.
runAnalyzer(estimate)
constructs an object of this class.
estimate
should be a dataframe
or matrix
with
each row giving information for a single case. The ‘ID’
column, which must exist, is the ID of the case. Other columns give
other values believed to predict how long the computation on the case
will take. It should be sensible to sum a predictor across cases to
predict how long the cases will take.
A runTime
object includes information
about a particular job, in particular various measures of time. To
use the object with a runAnalyzer
requires that the job be
constructed with the list of ID's it will analyze, e.g.,
aRunTime <- runTime(c(1, 56, 90))
for a job that will evaluate
cases 1, 56, and 90. If ra
is a runAnalyzer
, then add
the results of a run with ra <- addResult(ra, aRunTime)
.
Once you are done results <- smoosh(ra)
gets you a summary of
the runs. Each row corresponds to a single job; the ‘ID’
column will have the ID of the first case in the job. Don't be
fooled; the rest of the row applies to the whole job, and hence to all
the cases in the job. The columns give the sum of the predictors for
each case in the job, and the various performance measures, labelled
appropriately: cpu, wall, wait, start, end, rank. See
runTime for their meaning.
If you wish to repeat the exercise, use
ra <- newRun(ra)
.
Currently, evaluation of the performance of the predictors is up to
you, and if you want to look at previous runs you will need to pull
them out with ra@prior
.
runAnalyzer(estimate)
creates one of these objects. The
estimate
should contain estimates of the runtimes of the jobs
that will later be added.
estimate
:"matrix"
or "data.frame"
with
one row for each case to be analyzed. There must be a column
named ‘ID’ whose values match those identifying a job in
the runTime
s. Other columns contain possible
predictors of how much computation the case will require.actual
:"list"
that will
hold the runTime
's of the jobs.prior
:newRun
. In that case it will
be a list
of list
's. The first entry holds the
values in actual
for the first run, the second entry holds
the second run, and so on.
addResult
{
signature(analyzer = "runAnalyzer", runTime = "runTime")
:
to add a job and its time.
signature(analyzer = "runAnalyzer")
:
Return the analyzer
argument, ready for a new round of analysis. signature(analyzer = "runAnalyzer")
: ... smoosh
signature(analyzer = "runAnalyzer")
:
returns a data.frame
or matrix
summarizing the
results.}}
This class assists in the following general setting. You have computations to perform on cases; each job may calculate values for one or more cases, e.g., you may distribute the calculation of a likelihood.
You also have information that you think will predict how long the cases will take to compute.
You divide the cases into jobs, perform the computations, and then want to see how well your predictors explain various measures of performance.
Based on this analysis, or perhaps randomly, you may partition the cases in a different way and try again.
This class collects the results of all the runs (including prior rounds if you choose to do more than one round), and can provide a summary of the performance versus the predictors.
Subject to change. Not terribly general at the moment, and rather hackish.
Ross Boylan
addResult
to populate;
smoosh
to analyze;
runTime
for internal data.
# slightly contrived, since non-distributed estimate <- data.frame(ID=c(30, 50), estimate=exp(c(30, 50))) analyzer <- runAnalyzer(estimate) for( x in c(30, 50) ) { rt <- runTime(x) mpirank(rt) <- 1 tjob <- system.time(factorial(x)) remoteTime(rt) <- c(tjob[1:3], 0) # last number is delay wait analyzer <- addResult(analyzer, rt) } result <- smoosh(analyzer)