Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The first non-holiday Wednesday of each month is dedicated to performing maintenance on Mana.  Maintenance typically begins between 9 AM and 10 AM and can run as late as 6 PM.  In the case maintenance will take longer than expected, additional email notification will go out while maintenance is being performed to provide an update on when we estimate maintenance to conclude.

...

TypeFrequencySystems (nodes) affectedHow updates are appliedImpact on Users
Routine patching9 to 11 times a year

Border systems:

  • Login nodes
  • Open onDemand nodes
  • Data transfer nodes

Internal systems:

  • Compute nodes
  • Firewall
  • File Systems
Updates are applied in a rolling fashion by power cycling all the nodes during the maintenance window, except for the compute nodes.  Compute nodes are updated by first having the scheduler drain the nodes and then power cycle the node once it is idle.  Upon restarting, the node is once again place into service. 

Interactive jobs may be impacted, but batch jobs placed in non-preempting partitions should not be impacted.


Momentary disruptions to storage system systems and access via the border systems will occur as the border systems are rebooted for patching.

Core infrastructure updates1 to 3 times a year

Border systems:

  • Login nodes
  • Open onDemand nodes
  • Data transfer nodes

Internal systems:

  • Compute nodes
  • Switches

  • File systems

  • Firewall
  • File Systems

All jobs regardless of progress are either halted and requeued or canceled if the job was submit as not being able to be requeued.

Users access is denied and all servers are idled or powered down for the duration of the maintenance window.

Once maintenance is complete, nodes are restarted and jobs that were requeued are released and begin to run once again.  Jobs that do not save checkpoints, will have lost any progress they made prior to the downtime.  

A maintenance reservation is put in for the day of the maintenance starting at 8 AM HST.  

Batch jobs that would run longer than the start of the maintenance reservation will be queue and wait until the reservation is done before they would be allocated resources.  

Interactive jobs that request a time limit that would overlap the maintenance reservation will be denied by the scheduler.

Jobs that do not overlap the maintenance reservation will be scheduled as normal.

...