grid.init {GridR}R Documentation

grid.init

Description

grid.init initializes the GridR Package. If you use a config file, it is not necessary to add one of the parameters explained below.

Usage

grid.init(confFile=NULL, localTmpDir=NULL, verbose=TRUE, sshRemoteIp=NULL, sshUsername=NULL, sshRemoteDir=NULL, myProxyHost=NULL, myProxyUsername=NULL, credentialName=NULL, myProxyPwd=NULL, myProxyPort=NULL, service=NULL, sshKey=NULL, debug=FALSE, sharedDir=NULL, remoteRPath=NULL, schedulerIp=NULL, schedulerPort=NULL)

Arguments

confFile Path to the config file
localTmpDir Path to a directory where to store temporal data
verbose if verbose=TRUE a message is printed if a result is availible
sshRemoteIp If ssh mode is used, the IP of the remote host is specified here
sshUsername If an ssh mode is used, the remote username is specified here
sshRemoteDir If an ssh mode is used, the remote temp dir is specified here
sshKey If an ssh mode with windows is used, the path to the public RSA key is specified here
myProxyHost the IP of the Host where a myproxy server is running is specified here
myProxyUsername the username of the myproxy certificate is specified here
credentialName the credentialname of the myproxy certificate is specified here, if needed
myProxyPwd the password of the myproxy certificate is specified here
myProxyPort the port of the myproxy certificate is specified here, if not the default port is used
service Here you can add the default service mode, see ?grid.apply
debug If TRUE, all files will not be deleted locally and on serverside
sharedDir Path of the directory where shared vaiables will be loaded
remoteRPath Path to R, if needed
schedulerIp If a scheduler should be used, add his IP here
schedulerPort Port of the scheduler, if used

Details

The easiest way to use GridR is to use a config file.
The name of the config file is "gridr.conf" and it should be placed inside the user home directory or the directory from where R is executed. if another path is used, please specify it with the value "configFile"
The content depends on which modes you want to use:

There are different ways to submit a function. The services are only available for Linux servers and it is necessary to setup the system in a way that the user is able to login via ssh on the remote computer without entering a password. Please generate RSA Keys. (see ie. http://www.csua.berkeley.edu/~ranga/notes/ssh_nopass.html for Linux)
To use the services with Windows, a version which uses Trilead Java SSH is implemented. Please download trilead-ssh2-build212.jar or similar at http://www.trilead.com/Download/Trilead_SSH_for_Java/ and place it to <GridRInstallationDir>/GridR/GridR/ Please generate RSA Keys with Puttygen(ie. http://the.earth.li/~sgtatham/putty/latest/x86/puttygen.exe) Klick on Generate. This will generate a private/public RSA keypair. Change the Comment to your local username @ yourLocalComputer Click on Conversions/Export OpenSSH Key and save it to the path added to your config file. Now add the Text in the "Public Key for pasting into OpenSSH authorized_keys file" to a new line to ~/.ssh/authorized_keys on the server where to execute GridR

On each Server, R must be added to the PATH environment variable, R_HOME must be set or remoteRPath must be declared. If condor modes are used, a link to R must be added to /usr/bin/ or add a line to the config file: <REMOTERPATH>/path/to/R</REMOTERPATH>
If the condor batch modes are used, the package GridR must be installed on each Cluster-Host.

Availible Modes:

variableSharing

only variable and function sharing is initialized.
You must enter at least the following lines to the config file or command line:

<GRIDR> #start tag, necessary
<SHAREDDIR>/home/user/mlohmeyer/share</SHAREDDIR> #local dir where to put tmp files
<SERVICE>variableSharing</SERVICE> #which default service to use by default
</GRIDR> # end tag

local

the function is executed locally.
You must enter at least the following lines to the config file or command line:

<GRIDR> #start tag, necessary
<LOCALTMPDIR>/home/user</LOCALTMPDIR> #local dir where to put tmp files
<SERVICE>local</SERVICE> #which default service to use by default
</GRIDR> # end tag

remote.ssh

the function is copied with ssh to a single computer and is executed there directly in R.
You must enter at least the following lines to the config file or command line:

<GRIDR> #start tag, necessary
<LOCALTMPDIR>/home/user</LOCALTMPDIR> #local dir where to put tmp files
<SSHREMOTEDIR>grid/</SSHREMOTEDIR> # remote dir where to put tmp files
<SSHREMOTEIP>ip</SSHREMOTEIP> # ip of the remote host
<SSHUSERNAME>user</SSHUSERNAME> #ssh username to login on remote host
<SERVICE>remote.ssh</SERVICE> #which default service to use by default
<SSHKEY>/home/user/.ssh/id_rsa</SSHKEY> #on windows systems or with javaSsh=TRUE you have to specify the path to your public RSA key
</GRIDR> # end tag

condor.ssh

the function is copied with ssh to a computer which is connected to a condor pool and where submission of jobs is possible.
You must enter at least the following lines to the config file or command line:
<GRIDR> #start tag, necessary
<LOCALTMPDIR>/home/user</LOCALTMPDIR> #local dir where to put tmp files
<SSHREMOTEDIR>grid/</SSHREMOTEDIR> # remote dir where to put tmp files
<SSHREMOTEIP>ip</SSHREMOTEIP> # ip of the remote host
<SSHUSERNAME>username</SSHUSERNAME> #ssh username to login on remote host
<SERVICE>condor.ssh</SERVICE> #which default service to use by default
<SSHKEY>/home/user/.ssh/id_rsa</SSHKEY> #on windows systems or with javaSsh=TRUE you have to specify the path to your public RSA key
<REMOTERPATH>pathToRemoteR</REMOTERPATH># if R is not linked to /usr/bin/R on serverside, please add this Tag here </GRIDR> # end tag

scheduler if the variable schedulerIp is set, all jobs are started with the use of a scheduler. At grid.init() it is checked if there are jobs which are started earlier from another R Session. If so they are imported to your active R Session. You can start the scheduler by copying all files from R.home("library/GridR/GridR/scheduler") to the server where the scheduler should run, set up a passwordless ssh-logon from your client, to this scheduler-server and from the scheduler-server to all execution machines. Download trilead-ssh2-build212.jar or similar at http://www.trilead.com/Download/Trilead_SSH_for_Java/ and save it to the lib Directory. Now you can start the scheduler with ./start.sh <port>. The scheduler can be restarted with grid.restartScheduler and stopped with grid.stopScheduler. In scheduler Mode, all submitted jobs can be stopped by grid.stopJob or restarted by grid.restartJob

Author(s)

Malte Lohmeyer

See Also

grid.apply GridR grid.share

Examples

a<-function(s){return(2*s)}
#define a function that will be executed remotely
library("GridR")
#load the gridR-Code
grid.init(service="local", localTmpDir="GridRTmp/")
#initializes gridR with the parameters entered in the config file
grid.apply("x",a, 3, wait=TRUE)
#applies function `a` with parameter 3 and writes the result to variable x. until the function is executed, x has a lock.
x
grid.apply("y", sum,1:5, wait=TRUE, check=FALSE) # if internal functions are used, its important to set check=FALSE, otherwise the codetools package returns an error
y

[Package GridR version 0.9.1 Index]