18 Jul

Boqueron /work Unexpected Downtime Resolved

Boqueron’s /work file system recently experienced unexpected downtime due to a hardware issue. Our colleagues at Central Administration provided us with the replacement hardware we needed, and the issue seems to have finally been resolved. As far as we can tell, there has not been any data loss; however, we still encourage users to connect to Boqueron and verify the integrity of their files. We’d also like to take this opportunity to remind our users that, per our Usage Policies, the /work directory in Boqueron is specifically built for performance and not for reliability, and that, as such, data in users’ /work directory is not and will not be backed up. Users are responsible for maintaining their own backups of any data they keep in their /work.

We apologize for any delay in communication regarding this issue. The HPCf has been understaffed these past two weeks, precisely when the hardware problem popped up. All HPCf staff should be back at the office starting Monday July 23.

We apologize for the inconvenience this issue has no doubt caused, and we thank you for your patience and continued support for the HPCf. As always, you may direct any questions or comments to help(at)hpcf.upr.edu.

19 Jan

Boqueron Power Outage on Friday January 19, 2018

On the morning of Friday January 19 there was a power outage at the data center where HPCf computers are hosted. This outage is outside of HPCf’s control, and it unfortunately knocked out 60 of Boqueron’s nodes, which resulted in many jobs getting spontaneously killed. The power outage has been fixed, and most of Boqueron’s features should now be working as normal. Some of the nodes may take a bit longer to reconnect to the cluster, however, so we kindly ask for your patience in this matter. We apologize for the inconvenience this situation has caused our users. If you have any questions or experience any errors, please write to us at help(at)hpcf.upr.edu, and we will gladly help you out.