Nanobio

Nanobio is the second scientific computation cluster at the HPCf.  In order to work on a high-performance environment such as Nanobio, there is some information that you should know.  Below you have some important topics regarding working on Nanobio.  Feel free to refer to them whenever you find yourself stuck on something.  If you have a question that is not answered in the topics below, let us know on help@hpcf.upr.edu, and we’ll be glad to help you out.

Note: Nanobio has been largely succeeded by our our newest cluster Boqueron, and new users of HPCf resources are given an account on Boqueron; not on Nanobio.  This page remains in our site as a reference for existing users who continue to work on Nanobio.

  • Logging in to Nanobio
  • Submitting Jobs
  • Submitting Parallel Jobs
  • Monitoring Jobs
  • Environment Modules
  • Staging Jobs

Your Nanobio username and password are for logging in to Nanobio through SSH.  How you connect to Nanobio will depend on your operating system.

Note: Do not try to connect to Nanobio by pointing your Web browser to https://nanobio.hpcf.upr.edu.  Your SSH username and password will not work there.

Linux or Mac OS X

On Linux or Mac OS X, the process to log in to Nanobio is quite straightforward.

  1. Open a Terminal window.
  2. Type in the following command (without the leading $):
    $ ssh <username>@nanobio.hpcf.upr.edu

    substituting your actual username for <username>.  Then hit Enter.

  3. If this is your first time connecting, you will now be asked if you want to trust the connection to Nanobio; answer “yes” and hit Enter again.
  4. You will then be asked for your password.  This completes your login.

Windows

Unlike Linux or Mac OS X, Windows does not provide an SSH-capable terminal by default.  Therefore, you’ll need to download software that provides such functionality.

The most popular tool (and our recommendation) is PuTTY, which you can download for free from here.

After you download and install PuTTY (or whichever software you chose), you can connect to Nanobio with the following information.

  • Hostname: nanobio.hpcf.upr.edu
  • Port: 22
  • Username: Your Nanobio username
  • Password: Your Nanobio password

Click on the Connect button.  If this is your first time connecting, you will now be asked if you want to trust the connection to Nanobio; answer “yes” and hit Enter again.

You should now be connected to Nanobio.

Android

Though you’ll find yourself connecting to Nanobio mostly through your workstation or laptop, you might find it useful sometimes to connect to Nanobio through your smartphone or tablet.  Apps such as JuiceSSH allow you to establish SSH connections from your Android device.  Just remember to use nanobio.hpcf.upr.edu as the hostname/address of the server you want to connect to, and make sure you are connecting using port 22.

Submitting Jobs

Nanobio provides a high-performance file system for you to work in, appropriately called /work.  HPCf guidelines require that you always keep any files related to work you are dong in the /work file system.  When you log in to Nanobio, you are located at you home directory, which is meant to be used to hold final results of jobs you’ve run.  Therefore, you will need to change into the /work before submitting any jobs to be run on Nanobio.  To do so, simply run the command

$ cdw

and you’ll be in your work directory.

Note: The command cdw is not a Linux standard command.  It is an alias local to the Nanobio cluster.

Before submitting a job, remember to have all the job-related data on your /work directory.

Running Non-Interactive Jobs (aka Batch Jobs)

Once you have all your data in place, you are ready to create a “submit script” which is simply a file that specifies the instructions that your job will run.

The following is an example submit script sge-script.sh:

#!/bin/bash

#$ -N job-name
# The name you want to give the job.

#$ -M user-email@something.com 
#The email address of the user.

#$ -cwd
#Runs the job from the directory of submission.

#$ -S /bin/bash
#Specifies the interpreting shell for this job to be the Bash shell.

#$ -l h_rt=hh:mm:ss
#Specifies the maximum running time this job will have. 
#The maximum time you may request is 168 hours (1 week).

#This script simply prints out current clock time in 12-hour format

/bin/date +%r

The lines that start with #$ are options passed to the queue manager (the software that schedules and manages jobs on Nanobio).  There are many other options that may be specified, but these defaults work for most jobs.  You don’t need to concern yourself with the line beginning with #! , just make sure it’s included at the very top of your script.  Any other lines starting with the # character are “comments”, which are simply arbitrary text that is ignored by the system.

Below the lines starting with #$ (and any optional comments), is where you would place the actual commands your job will execute.  In this example, we simply execute the command date for printing the current date.

Note: The above example submit script with the #$ queue manager options are used for the Sun Grid Engine (SGE) queue manager, which is the current queue manager at Nanobio.  These options will not work on other queue managers used by other systems, such as SLURM.

Once you have your submit script ready, you can submit your job to the queue manager by running the qsub command with the name of your submit script as an argument:

$ qsub sge-script.sh

The job will generate various files in your current directory.  These files contain the job output and errors (if any).  When reporting any problems with your jobs, please include any errors generated by the job.

Running Interactive Jobs

There may be times when you need to work with a software that must be run interactively rather than in batch form.  For these kinds of situations, you don’t need to create a submit script (see Running Non-Interactive Jobs above).  Instead you must tell the queue manager to log you into a node in the cluster for you to work in.  You do so by running the qlogin command.

$ qlogin

After the command runs successfully, you’ll have logged into a node and you may proceed to run your interactive software.

Note: The qlogin command is part of the Sun Grid Engine (SGE) queue manager, which is the current queue manager at Nanobio. The command will not work with other queue managers used by other systems, such as SLURM.

Do I really need to run qlogin?

You might be wondering, “can’t I just run an interactive job without running the qlogin command?”.  The problem with this approach is that Nanobio’s architecture consists of a head node that manages other so-called worker nodes.  The head node is the machine which you log into when you connect to Nanobio.  Since it is meant to be used for management only, and since it is the point of connection for users into the Nanobio cluster, running interactive jobs directly on the head node may cause it to become unstable, which will not only disrupt your work but that of all other users of Nanobio.  Because of this, running jobs on the head node is against HPCf usage policy.  Always use qlogin before starting interactive computation sessions.

Submitting Parallel Jobs

The -pe option is used in qsub and qlogin to specify the parallel environment to use but it is also used to tell SGE how many processors the job will use. There are various parallel environments implemented on nanobio.hpcf.upr.edu, here is a list and a description of each:

  • orte – This PE is used to execute MPI jobs with OpenMPI and it uses a fill-up allocation rule to allocate processors for a job. So it will give you any amount of processors to execute a job but it will not ensure that a complete node is separated for the job. So it may be the case that a job using this PE is sharing resources with another job.
  • mpich – This PE is similar to orte because it use fill-up allocation, but the difference is it used MPICH to execute MPI jobs.
  • smp- This PE is for running shared memory parallel jobs that is why only use one node. It can be use with any program that support SMP computing.

Compiling and Executing an MPI job with OpenMPI

The Linux cluster can also be used to execute parallel jobs that are implemented with the Message Passing Interface (MPI). When executing an MPI job, a couple of SGE environment variables need to be included in the script. The following is a example of how to compile and execute an MPI job with OpenMPI using orte PE.

First copy or download mpi-hello.c in your home directory:

[username@nanobio ~]$ cat mpi-hello.c
 #include <stdio.h>
 #include "mpi.h"

 int main( argc, argv )
 int  argc;
 char **argv;
 {
 int rank, size;
 MPI_Init( &argc, &argv );
 MPI_Comm_size( MPI_COMM_WORLD, &size );
 MPI_Comm_rank( MPI_COMM_WORLD, &rank );
 printf( "Hello world from process %d of %d\n", rank, size );
 MPI_Finalize();
 return 0;
 }

Then compile this example with mpicc and create the executable.

[username@nanobio ~]$ /opt/openmpi/bin/mpicc -o mpi-hello mpi-hello.c
 [username@nanobio ~]$ ls mpi-hello*
 mpi-hello  mpi-hello.c

Then create a SGE script to submit the job.

[username@nanobio ~]$ cat mpi-script.sh
 #!/bin/bash
 #$ -M user-email@something.com
 #$ -cwd
 #$ -l h_rt=hh:mm:ss
 #$ -S /bin/bash
 
 /opt/openmpi/bin/mpirun -np $NSLOTS /home/username/mpi-hello

Then submit the job and verify the output:

[username@nanobio ~]$ qsub -pe orte 4 mpi-script.sh
 Your job 8008 ("mpi-script.sh") has been submitted
 [username@nanobio ~]$ cat mpi-script.sh.o8008
 Hello world from process 0 of 4
 Hello world from process 1 of 4
 Hello world from process 2 of 4
 Hello world from process 3 of 4

Compiling and Executing an MPI job with MPICH

Using MPICH to submit a MPI job is similar to OpenMPI the direference is in the way you compile and that it use the PE mpich. The following with be how to compile and execute the MPI example mpi-hello.c.

[username@nanobio ~]$ /opt/mpich2/gnu/bin/mpicc -o mpich-hello mpi-hello.c
 [username@nanobio ~]$ ls -l mpich-hello
 -rwxrwxr-x 1 username username 359569 Apr 28  2009 mpich-hello

Then you create a SGE script to execute the job.

[username@nanobio ~]$ cat mpich-script.sh
 #!/bin/bash
 #$ -M user-email@something.com
 #$ -cwd
 #$ -l h_rt=hh:mm:ss
 #$ -S /bin/bash

 /opt/mpich2/gnu/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines $home/mpich-hello

Next you submit the MPICH job and verify the result.

[username@nanobio ~]$ qsub -pe mpich 4 mpich-script.sh
 Your job 8281 ("mpich-script.sh") has been submitted
 [username@nanobio ~]$ cat mpich-script.sh.o8281
 Hello world from process 2 of 4
 Hello world from process 1 of 4
 Hello world from process 3 of 4
 Hello world from process 0 of 4

Monitoring Jobs

You can check job status using the qstat command.  When qstat is executed with no arguments, it will display a summarized list of jobs that are currently running.

[root@nanobio ~]# qstat 
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
264554 0.60500 Li12Mn9Mo3 jifishojan   r     03/24/2010 16:51:39 all.q@compute-0-13.local           4        
301989 0.50500 rafa-prim5 rarce        r     03/28/2010 19:16:50 all.q@compute-0-17.local           1        
302006 0.50500 ls_cuatro_ rarce        r     03/30/2010 07:43:35 all.q@compute-0-14.local           1        
302075 0.60500 Li3Ni9W3O2 jifishojan   r     03/30/2010 17:49:52 all.q@compute-0-12.local           4        
302273 0.50500 rafa-prim1 rarce        r     04/03/2010 18:40:30 all.q@compute-0-9.local            1       

You can also check all your jobs that are submitted to the queue using argument -u username like this:

[root@nanobio ~]# qstat -u rarce
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
301989 0.50500 rafa-prim5 rarce        r     03/28/2010 19:16:50 all.q@compute-0-17.local           1        
302006 0.50500 ls_cuatro_ rarce        r     03/30/2010 07:43:35 all.q@compute-0-14.local           1        
302273 0.50500 rafa-prim1 rarce        r     04/03/2010 18:40:30 all.q@compute-0-9.local            1       

If you want to display a cluster summary CPU resources use the following command:

[root@nanobio ~]# qstat -g c    
CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
--------------------------------------------------------------------------------
all.q                             0.20    126      0     90    256      0     40

Environment Modules

In order to manage the large amount of software available in Nanobio, software is organized in environment modules which are loaded by a user on demand.  Environment modules allow users to load and unload all environment variables (such as PATH, CFLAGS, LDFLAGS, LD_LIBRARY_FLAGS, MANPATH) for all the software installed on Nanobio.  Modules are accessed through the module application. In this guide, you will learn how to load and unload modules and how to find software installed on Nanobio.

Let’s get started.

Module subcommands

The module command receives a subcommand as an argument. The basic subcommands are:

avail
load
unload
list

Finding available modules

To see a list of all available modules execute the avail subcommand.

$ module avail

Loading and unloading modules

These subcommands, the ones you will be using most often, are pretty much self-explainable.

$ module load gcc/4.6.0
$ module unload gcc/4.6.0

You may specify more than one modulefile to both subcommands:

$ module load gcc/4.6.0 openmpi/1.4.3
$ module unload gcc/4.6.0 openmpi/1.4.3

You may run into errors while loading a module for two main reasons:

1) the module you are trying to load has a prerequisite; in this case, a module has to be loaded before attempting to load this one.

2) the module you are trying to load has a conflict with one already loaded; in this case, a module has to be unloaded before attempting to load this one.

Listing loaded modules

Sometimes you will need to know if a modulefile is already loaded. The subcommand list will allow you to do this.

$ module list

Staging Jobs

When a set of processors is requested by a job, the different nodes share data on the home directory of the job owner by mounting this directory on each node by using a Lustre Filesystem. Lustre is a network file system meaning it allows a computer to access files over a network as easily as if they were on its local disks. The problem with using Lustre in the cluster is that if a program accesses a lot of small of files on disk or writes a lot of small of files to disk then this causes a large amount of latency and the program execution can take much longer than expected.

We use staging to solve this problem:

We first need to create a directory that is local to the execution node. In nanobio, the /state/partition1/tmp directory is local to each machine and is not shared through the network so we will create our own directory there. To assure that this directory is unique for each job we will use the SGE environment variable JOBID. So we will create a variable TMPDIR that will be the string “tmpdir” concatenated with the value of JOBID. This TMPDIR is local to each node. So when trying to access data located in TMPDIR a request to Lustre is not needed. So before the beginning of the job execution we shall copy all the files that our job needs to execute into TMPDIR. We also change the current directory to TMPDIR so that all output files generated by our job are copied into TMPDIR. At the end of the execution of the job all the files should be copied from TMPDIR to the home directory of the job owner and TMPDIR should be removed from the system. So now, a SGE script that uses staging would look like this:

##########Start of my script#########################

# Directives and options for SGE
#$ ...
#$ ...
#$ ...

#Create TMPDIR variable and create TMPDIR directory
export TMPDIR=/state/partition1/tmp/tmpdir.$JOBID
mkdir $TMPDIR

# Copy necessary files to TMPDIR
cp inputfile1 $TMPDIR
cp inputfile2 $TMPDIR
...
cp inputfilen $TMPDIR

# Change directory to TMPDIR
cd $TMPDIR

# Execution of program
./myProgram

# Copy output files from TMPDIR to HOME
cp outputfile1 $HOME
cp outputfile2 $HOME
...
cp outputfilem $HOME

# Remove TMPDIR
rm -rf $TMPDIR
##############End of my script#######################