Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
HPC Package Documentation

Introduction

The HPC package lets you run Sho functions on an HPC cluster for parameter sweeps and related scenarios.  In its most basic form, you specify a function which takes a couple of inputs, one of which will be the same for all remote instances, one of which will be one element of a list that you'll also pass in.  Advanced features let you specify data and output directories that will be copied to/from the cluster, as well as the ability to reconstitute a past cluster session from its output directory.

To use this package, you will have to have access to an HPC cluster, know the name of the headnode, and know/have read/write access to your personal working directory on the HPC cluster.  Your cluster administrator should be able to give you all of this information.

Functions

  1. session = clusterrun(cluster, fn, data, paramlist, inDir=None, outDir=None, includeFilter=None, verbose=False)

    session = clusterrun(cluster, fn, data, paramlist, inDir=None, outDir=None, includeFilter=None, verbose=False)

    FN is a function (defined in some .py file) with two, three, or four arguments. On the cluster, when FN is called, the first argument will be DATA; the second argument will be an element from PARAMLIST. CLUSTERRUN finds the directory which contains the .py file in which FN is defined (we'll refer to this as FNMODULE), then copies all .py and .dll files in that directory (and its subdirectories, recursively) to the cluster. Note that DATA and the elements of PARAMLIST must be of a type that is defined for a new instance of Sho that has only imported FNMODULE.

    INDIR is a local directory that contains additional data your function may need; if INDIR is defined, it will be copied to the cluster; the cluster-accessible location will be the third argument to FN.

    OUTDIR is a local directory; if specified, FN will get a fourth argument, a directory, to which the instance can write output information. When the command terminates, files in that directory will be copied back to OUTDIR on the local machine, in a subdirectory that is the session ID. Note that files written to this directory should have unique names for each instance, since they will all be copied to one directory.

    INCLUDEFILTER is a list of additional file types and names (beyond the default .py and .dll) that will be copied from the directory containing the module defining FN, as well as its subdirectories. Example: includeFilter=["*.cs", "*.pl", "foo.txt"]

    SESSION is an object that identifies this job, and is an argument you can use to the clustergetresults() function once clusterrun has returned.  SESSION also contains the working directory for the job (SESSION.WorkDir) as well as the session ID (SESSION.id)
  2. cl = clustersetup(HEADNODE,WORKDIR)

    Sets up a cluster for use, including creating the working directory and installing the current version of Sho. HEADNODE is a string specifying the name of the the head node; WORKDIR in the path to the working directory for your account on the cluster. The running version of Sho will be copied to your working directory in the directory "sho", but if it is already there it will not be recopied.

    Returns a Cluster object cl, where cl.WorkDir is the working directory

  3. session = clustersession(SESSIONDIR)

    Reconstitutes a session from a session working directory.  For instance, if you run an experiment and want to look at the results later, you can save the working directory (SESSION.WorkDir) from a cluster execution, and then use clustersession() to reconstitute it and examine the results.  clustersession does not require access to the cluster as long as you have the

Walkthrough

# setup
>>> cl = clustersetup('HEADNODENAME','//FILESERVER/YOURWORKDIR')
# note that your actual working directory, "YOURWORKDIR" may have your username or other info tacked onto the base workdir, e.g., //FILESERVER/BASEWORKDIR/LOCATION/USERNAME; check with your system administrator for details.

# write a function in a file, for instance, I have a function addstuff in file rctest.py
>>> addpath("blah/foo/blah/foo")
>>> import rctest
>>> rctest.addstuff(3,4)
7

# now run it on the cluster and get the results
>>> session = clusterrun(cl, rctest.addstuff, 10, [1,2,3,4,5])
<a dialog pops up asking for your credentials; it then blocks until execution is complete>
>>> session.getresults()
Dictionary[int, object]({0 : 11, 1 : 12, 2 : 13, 3 : 14, 4 : 15})

# explore the job's working directory to look at stdout and stderr and other output
>>> session.Workdir
'//FILESERVER/YOURWORKDIR\\session-9dbd860b-e141-4dd0-9896-0cc470640ff7'

# now let’s say you’re recovering a past experiment whose results are in some directory – note the dir doesn’t have to be on the cluster, you could have copied it somewhere local to keep it around
>>> session = clustersession('//FILESERVER/YOURWORKDIR/session-eb821858-b00f-4e6b-8f90-76f92838c9d6')
>>> session.getdata()
10
>>> session.getparams()
[1,2,3,4,5]
>>> session.getresults()
Dictionary[int, object]({0 : 11, 1 : 12, 2 : 13, 3 : 14, 4 : 15})