19 Jun

Old Cluster Nanobio’s Retirement

We at the HPCf have decided that our old cluster Nanobio (Boqueron’s predecessor) will be retired after the end of July of this year. On August 1st, 2017, Nanobio will be powered down and any data left in any of its file systems (including /home and /work) will no longer be available from that time onward. As we have done since Boqueron went online on March 2016, we encourage all users who still have any data on Nanobio to move or copy it to some other system as soon as possible. Transferring data to Boqueron is one option, but please keep in mind that, per our usage policies, users should transfer to Boqueron only the data that they will actually need for their work there; Boqueron is not designed or intended for long-term storage. Users will be responsible for making sure they do not incur any data loss as a result of Nanobio’s retirement.

Any users whose HPCf accounts were created after Boqueron went online may disregard this notice since they do not have a Nanobio account. As always, if you have any questions or comments, feel free to send them to help@hpcf.upr.edu.

26 Jan

Boqueron Scheduled Partial Downtime Jan 30 – Jan 31, 2017

The staff from the data center where Boqueron is hosted at will be carrying out some electrical work next week that will impact Boqueron.  To cooperate with the efforts and to protect as many jobs as possible from getting spontaneously killed, we have opened a maintenance window starting at 7:30am Monday January 30th and ending at 12:00pm Tuesday January 31st.   Any newly submitted jobs that cannot complete before the maintenance window begins will be held in the queue until the window ends.

During this window, the Boqueron login node will continue to operate, and the file systems /home and /work should continue to be available, so you should be able to access your files during this time.  The worker nodes will be powered down, however.

Jobs that are currently running will be allowed to continue to run, but if any remain running at the time of the maintenance window, they will unfortunately be killed.

We realize this announcement comes a bit short notice, but please understand that HPCf staff was notified of this electrical work yesterday afternoon.

We apologize for any inconvenience you may experience from this maintenance window, and we thank you for your cooperation.  As always, if you have any questions or comments, please send them to help@hpcf.upr.edu.

31 Oct

Boqueron Power Outage During the Halloween Weekend

An unexpected outage in the data center where Boqueron is hosted wiped out 60 of Boqueron’s nodes over the weekend. All jobs running on those nodes were killed.  We have since turned the nodes back on and they should now be working as usual.

To make sure the entire cluster has recovered from the outage, the 20 nodes that remained up during the outage (nodes 41~60) will be rebooted at a later time, and so they will remain closed off to new jobs until then (they will be listed as either “draining” or “drained” by Slurm). The jobs that are currently running there will be allowed to run uninterrupted.

The outage did not affect Boqueron’s login node or Boqueron’s /home or /work file systems.  Other HPCf services such as our web site, however, were affected.  They are also back online, and you should not experience any issues when using them.

We are currently in talks with the administrators at the data center to asses the situation and prevent it from occurring again.

As always, if you have any questions, problems, or comment, you may write to us at help@hpcf.upr.edu.

15 Aug

/work Filesystem Outtage Has Been Fixed

Early today, Boqueron’s /work had a hiccup and it went offline for a few hours.  This caused some weird behavior, including not being able to sign in through certain software such as SCP programs.  We’ve fixed the issue, but because jobs run on /work, the running jobs that remained after /work went offline eventually got killed.  Everything should be fine now and you should be able to resubmit your jobs with no problems.  No data loss should have occurred as a result of this error.

Please remember that per HPCf Usage Policies, data in /work is not backed up.  Always make sure to move important data off of /work and into a more permanent storage location, such as a computer in your laboratory or a personal workstation.

As always, if you have any other issues or questions, please let us know at help@hpcf.upr.edu.

01 Jun

Boqueron Unscheduled Maintenance During Morning of June 1st, 2016

As you may have noticed, Boqueron underwent unscheduled, urgent maintenance earlier today.  This maintenance unfortunately had the effect of killing all running and pending jobs in the process.

What happened?

An issue in the way Slurm (the queue manager) was interacting with the cluster manager software that we use on Boqueron (Bright CM) was causing Slurm’s configuration to get rewritten spontaneously when certain actions were taken in the cluster.  Every time this happened, jobs got killed.  Today, we contacted Bright support and they were kind enough to help us out through a live screen-sharing session.  The changes they had to make required the Slurm configuration to get rewritten much like those other times, and so earlier today jobs got killed as well.

Is the issue resolved?

Yes.  The issue was bugging us for a few weeks now, but it should now be completely resolved.

Can I submit jobs again?  Won’t they get killed again?

Yes, you may submit jobs again; and no, they should not get killed again.  No system is perfect, but the solution we arrived at today with Bright support should result in continuous, stable queue operation under normal, day-to-day circumstances.

But I’m afraid they’ll get killed again!

You shouldn’t be afraid.  We recognize that the recent shakiness of the queues would cause user confidence to drop, but again, the core issue should now be resolved and we expect stable times for our cluster. *knocks on wood*

We do apologize for the inconvenience this has created.  For any further questions or comments, feel free to contact us at help@hpcf.upr.edu.

10 May

Notice of Boqueron Scheduled Maintenance – Wed May 18, 2016

Boqueron will be undergoing scheduled maintenance on Wednesday May 18th, 2016.  Effective immediately, all jobs which cannot complete their run by 12:00 am May 18th will not be allocated until maintenance is over.  We have reserved a maintenance window starting at 12:00 am and ending at 1:00 pm the same day, though we anticipate the maintenance operations will take a much shorter time than that.  We will email another notice indicating when maintenance is done so that you may submit jobs again.

Reason for Maintenance Window

As some of you may have noticed, Boqueron is currently set up in a way that allows a single compute core to run multiple jobs at once.  That is to say, a user’s job currently does not actually reserve compute cores, the cores are shared among various jobs.  This is not the intended behavior of Boqueron.  Not only does this hurt jobs’ performance, but it places a heavy load on the compute nodes.

To fix this, Slurm (the resource manager) must be switched off and reconfigured.  Switching off Slurm would kill any jobs that are running at the moment, so we need to create a maintenance window to ensure that no jobs are killed in the process.

What effect will this change have on future jobs?

After the change, we anticipate that all user jobs will run much faster than they do right now.  There’s a small trade-off, though: since cores will now be reserved, we’ll start seeing jobs actually waiting in line to be allocated.  So far, because of the core sharing, most jobs are usually run almost as soon as they are submitted (but they run under a considerably degraded performance).  We anticipate this will change, and that jobs will actually have to wait before being allocated (but they will run much faster once they are allocated).

As always, if you have any questions, feel free to contact us at help@hpcf.upr.edu.

16 Mar

Boqueron – All Users Have Been Added, plus VASP and Gaussian 09

The build of our new cluster Boqueron has progressed well in the past few weeks, and we are pleased to announce that all currently-registered users have finally been added to Boqueron.  If you believe that you were registered to receive an account but have not received one yet, please write to help@hpcf.upr.edu so that we can help you out.

Additionally, Gaussian 09 and VASP 5 are now available on our cluster for those users who are authorized by the programs’ respective licensors to use them.  If you need any help getting them set up, please let us know.  VASP in particular is still being tested on Boqueron since some users have run into some errors that apparently are known issues with VASP.

So if you will use VASP (which you can only do if we have received prior written authorization from VASP that you may do so), please keep in mind we’re still doing a test drive, and please report any errors to us so that we can get VASP running well as soon as possible.

08 Mar

New Cluster Build – Beta Launch is Live

We are pleased to announce that Boqueron’s beta has launched.  We are adding users progressively instead of all at once in order to ease the transition from Nanobio.  Users will know that they have been added when they receive an email message with details on logging in to Boqueron.  If you haven’t received your email message yet, you will soon.  Users are encouraged to explore the cluster and compare how it works vs. how Nanobio works.

Software is still being added to the cluster, but even if your software is not yet available, you can start working on becoming familiar with Boqueron and transferring the files you will need for your work.

Also, please remember to request the software that you will need if it has not yet been added to Boqueron.  You can do so by sending us a message at help@hpcf.upr.edu with the subject line

[Boqueron Software Request]

We thank you for your patience throughout this long process and hope that you will enjoy using Boqueron for your work.

26 Feb

New Cluster Build – Finishing Touches and Beta Launch


We are pleased to announce that our new cluster build has been making great strides lately and that we expect our new cluster to be up and running by sometime next week (finally!).  We can now reveal that the name of our new cluster will be Boqueron (pronounced boh-keh-ron) and that it will initially feature over 2200 computation cores, and 200 TB of /work space served over a 10Gbps and QDR Infiniband backbone.

Registered users should soon be receiving email messages with details on logging in to Boqueron.  When the cluster launches, however, you might notice that not all the software you used on Nanobio is readily available on Boqueron.  This is by design.  Instead of importing all the software on Nanobio that users may or may not actually use, we want you to request the software that you will actually use so that we can have a sort of clean slate with Boqueron.  In that sense, Boqueron’s launch will be like a sort of beta launch.

What this means for you

We need our users to request the software that they will actually use on Boqueron.  You can either wait until the launch of the new cluster to do so, or you can do so right now.  Just drop us a line at help@hpcf.upr.edu with the subject line

[Boqueron Software Request]

and we will queue your requested software for installation.  We can’t guarantee that all the requested software will be available at launch time, but the sooner you request it, the sooner it’ll be installed.

A note on license-based software

All software has a license of some sort.  However, some licenses are more restrictive than others.  These “more-restrictive-license” software is what I refer to as “license-based software”.  License-based software might take a bit longer to be available on Boqueron precisely because of their more restrictive and sometimes for-profit nature.  Some vendors will readily transfer our Nanobio licenses to be used on Boqueron, but others have much stricter processes and policies that will make their transfer over to Boqueron slower.  We ask for your patience with the availability of these software.

You might be wondering: “Well, why did you wait until the cluster was ready to launch to deal with the licenses?”  The answer is that we have to wait until out system is actually up and running before we can ask vendors to authorize it for running their software.  We can’t ask vendors to authorize a machine that doesn’t exist yet.

What software is currently available on Boqueron?

Just like in Nanobio, once you log in to Boqueron, you can run

$ module avail

to see what software is currently installed on Boqueron.  I will offer a preview, however, of software that is already installed so that you don’t have to request it.

samtools 1.2
bcftools 1.2
bowtie2 2.2.6
boost (headers only) 1.59.0
python2 2.7.11
python3 3.5.1
pip2 7.1.2
jdk 1.8.0_65
R 3.2.3
trinityrnaseq 2.1.1
dsk 2.0.7
kanalyze 0.9.7
jellyfish 2.2.4
tophat 2.1.0
gatk/queue 3.5
trinotate 2.0.2
ncbi-blast+ 2.2.31
hmmer 3.1b2
signalp 4.1
tmhmm 2.0c
rnammer 1.2
Blast+ dbs:
 - SwissProt
 - Uniref90
HMMER dbs:
 - Pfam-A

Do you have documentation on Boqueron available?

Boqueron’s documentation is still a work in progress, but you are free to browse it if you wish to start becoming familiar with the new cluster.  It’s available here.

As always, we’d like to remind you that any dates we give for the new cluster are always subject to change.  That’s just the nature of working with computer resources.  Unforeseen circumstances always come up.  We thank you for your patience throughout this long process.  If you have any further questions or comments, feel free to contact us at help@hpcf.upr.edu.