The data center where the Boqueron cluster is hosted will be undergoing some electrical maintenance from Saturday, February 15 to Sunday, February 16. The electrical work will require a full power down of Boqueron this Wednesday, February 12. After Boqueron is brought back online (which should happen the same day), it will operate at reduced capacity until the electrical work at the data center is completed. “Reduced capacity” means that a significant portion of Boqueron’s nodes will remain offline in order to lighten the load on the the data center’s circuits.
Because of this, we have placed some limits on the jobs Boqueron may currently accept. Specifically, any jobs that cannot complete their run by 8:00 am Wednesday will not be accepted until Boqueron is powered down and brought back online. Furthermore, once Boqueron is brought back online, only half of its nodes will be available to run jobs, so the remaining half will not accept any jobs until the electrical maintenance is completed at the data center.
All of Boqueron’s nodes will be powered down on Wednesday morning. Any jobs currently running will continue as usual, but, if they are still running by the time the nodes are powered down, they will unfortunately be killed.
We realize this information comes on a somewhat short notice, but we at the HPCf were only informed recently that the date for the electrical work would be this weekend, so we had no way of alerting our users much earlier than this. We apologize for any inconvenience this situation may cause for your work.
Half of Boqueron’s node are set to be powered back on on Wednesday, February 12. The rest are set to be powered up again on Monday, February 17. The nodes that will remain powered off throughout the weekend are the following: node011 through node020, node031 through node040, node051 through node060, and node071 through node080.
As always, if you have any questions or comments, please send them to help(at)hpcf.upr.edu.