papply {papply} | R Documentation |
An apply-like function which uses Rmpi to distribute the processing evenly across a cluster. Will use a non-MPI version if distributed processing is not available.
papply(arg_sets, papply_action, papply_commondata = list(), show_errors = TRUE, do_trace = FALSE, also_trace = c())
arg_sets |
a list, where each item will be given as an argument to papply_action |
papply_action |
A function which takes one argument. It will be called on each element of arg_sets |
papply_commondata |
A list containing the names and values of variables to be accessible to the papply_action. 'attach' is used locally to import this list. |
show_errors |
If set to TRUE, overrides Rmpi's default, and messages for errors which occur in R slaves are produced. |
do_trace |
If set to TRUE, causes the papply_action function to be traced. i.e. Each statement is output before it is executed by the slaves. |
also_trace |
If supplied an array of function names, as strings, tracing will also occur for the specified functions. |
Similar to apply and lapply, applies a function to all items of a list, and returns a list with the corresponding results.
Uses Rmpi to implement a pull idiom in order to distribute the processing evenly across a cluster. If Rmpi is not available, or there are no slaves, implements this as a non-parallel algorithm.
papply will not recursively distribute load. If papply is called within papply_action, it will use a non-parallel version.
The named elements in the list papply_commondata are imported (using 'attach') into a global namespace, and appear as global variables to the code in papply_action.
A list of return values from papply_action. Each value corresponds to the element of arg_sets used as a parameter to papply_action
Does not support distributing recursive calls in parallel. If papply is used inside papply_action, it will call a non-parallel version
Duane Currie duane.currie@acadiau.ca
http://ace.acadiau.ca/math/ACMMaC/software/papply/
# A couple trivial examples library(papply) number_lists <- list(1:10,4:40,2:27) results <- papply(number_lists,sum) results biased_sum <- function(number_list) { return(sum(number_list+bias)) } results <- papply(number_lists,biased_sum,list(bias=2)) results # A slightly larger example - training of a neural net over a parameter space. # Produces information on best rss result for each set of parameters. # Maintains static random seeds in order to provide reproducible results. # (This isn't ideal. e.g. RSS not the best measure given variation in # sizes of test sets. But, it should show closely how papply is really used) # Read in libaries and Boston Housing Data library(papply) library(MASS) data(Boston) # Generate list of parameter sets decays <- c(0.2,0.1,0.01) n_hidden <- c(2,4,6) parameters <- expand.grid(decays,n_hidden) # Set random seeds in order to have reproduceable runs. # 100 is seed for main process which generates folds. Each task will # have a random seed of the task number main_seed <- 100 seeds <- c(1:nrow(parameters)) # Create list of argument sets. Each argument is actually a list of # decay rate, number of hidden nodes, and random seed arguments <- list() for (i in 1:nrow(parameters)) { arguments[[i]] <- list(decay=parameters[i,1], hidden=parameters[i,2], seed=seeds[i]) } # Need to set random seed before generating folds # Generate random fold labels set.seed(main_seed) folds <- sample(rep(1:10,length=nrow(Boston))) # Make a list of all shared data that should exist in all slaves in the # cluster shared_data <- list(folds=folds,Boston=Boston) # Create function to run on the slave nodes. # arg is a list with decay, hidden, and seed elements try_networks <- function(arg) { # Make sure nnet library is loaded. # NOTE: notice that nnet is not loaded above for the master - it doesn't # need it. It is loaded here because the slaves need it to be # loaded. The is.loaded function is used to test if the nnet # function has been loaded. If not, it loads the nnet library. if (!is.loaded("nnet")) { library("nnet") } # Set the random seed to the provided value set.seed(arg$seed) # Set up a matrix to store the rss values from produced nets rss_values <- array(0,dim=c(10,5)) # For each train/test combination for (i in 1:10) { # Try 5 times for (j in 1:5) { # Build a net to predict home value based on other 13 values. trained_net <- nnet(Boston[folds!=i,1:13], Boston[folds!=i,14], size=arg$hidden, decay=arg$hidden, linout=TRUE) # Try building predictions based on the net generated. test <- predict(trained_net, Boston[folds==i,1:13],type="raw") # Compute and store the rss values. rss <- sqrt(sum((Boston[folds==i,14] - test)^2)) rss_values[i,j] <- rss } } # Return the rss value of the neural net which had the lowest # rss value against predictions on the test set. return(min(rss_values)) } # Call the above function for all sets of parameters results <- papply(arguments, try_networks, shared_data) # Output a list of parameter vs. minimum rss values df <- data.frame(decay=parameters[,1],hidden=parameters[,2], rss=sapply(results,function(x) {return(x)}) ) df