Infrastructure Maintenance Windows Principle
Univ of Hawaii - ITS Technical Architecture - Principle
Infrastructure Maintenance Windows
Principle
To the greatest extent possible, potentially disruptive infrastructure maintenance should occur within predefined maintenance windows.
Sunday 6:00AM to noon will be the standard infrastructure maintenance window. There will be a handful of applications (detailed below) whose maintenance will fall outside of this standard window except in unusual and rare situations where very broad outages are necessary.
Tuesday before 10:00AM will the standard infrastructure maintenance window for non-production systems.
Production Maintenance windows
The table below defines the maintenance windows for each ITS operated system. This table applies to production systems and to systems that have been designated “production-like” by application teams (e.g. KFS team keeps the QA environment identical to production at all times).
Systems | Time |
All Applications (except those listed in following rows) | Sunday, 6:00AM to Noon |
Laulima | Friday, 10:00PM to Midnight |
Luminus | Friday, 10:00PM to Midnight |
HPC Cluster | Wednesday, 8:00AM to 5:00PM |
PeopleSoft | Sunday, 6:00AM to 11:00AM |
HR Data Mart | Sunday, 6:00AM to 11:00AM |
Non-production Maintenance Windows
Tuesday morning before 10:00AM will be the standard window for maintenance of non-production systems. Usage of the Tuesday maintenance window should be announced by noon Monday.
Maintenance outside of this window should be negotiated with affected teams as necessary.
Maintenance types
ITS recognizes two types of maintenance that must occur:
Disruptive maintenance ……... Maintenance that is expected to cause an outage to one or more systems.
Non-disruptive maintenance … Maintenance that is not expected to cause an outage to any systems.
? | Disruptive maintenance should occur during system maintenance windows. |
? | Non-disruptive maintenance for production systems can occur at any time, but should occur outside of business hours (weekdays 7:45AM-4:30PM) if there is concern about accidental disruption of service. Non-disruptive maintenance for non-production systems can occur at any time. |
? | Non-disruptive maintenance should not occur during peak business periods (e.g. start of term, year-end-close), if it could affect those systems. |
Maintenance urgency
Additionally, maintenance work may have different levels of urgency. ITS has identified two levels of urgency for maintenance:
Critical …………………………. Maintenance that is very time sensitive. Often includes maintenance that addresses a specific ongoing technical problem, fixes a security hole, or addresses some other serious system flaw.
Other ……………………..……. Maintenance that is not very time sensitive. Often this includes upgrades, less urgent security patches and other system improvements.
? | Critical maintenance of production systems should be announced to ITS stakeholders as early as possible, generally by the Tuesday Change Management meeting immediately before the maintenance. |
? | Other (non-Critical) disruptive maintenance of production systems should be communicated thirty days in advance for most applications, although some smaller applications may require less advance warning. |
The intention of this process is for the infrastructure group to provide advanced notification to key ITS stakeholders (like application team members) so they can adequately plan for the impact of the change. Part of this process includes the ability for the stakeholders to raise concerns and suggest alternatives. The infrastructure team should thoughtfully schedule maintenance, but allow feedback from stakeholders and make appropriate adjustments.
End-user Communication
This principle does not propose changes to how ITS does communication, rather we propose leverage existing communication channels exactly as they are used today. For example, application teams may announce maintenance outages via their own email lists, and infrastructure teams may announce maintenance outages through the ITS web site.
Very rarely, a broad multi-system outage may be necessary as part of maintenance. If this occurs, a special communication plan may need to be developed by a cross-ITS team.
Departmental Systems
ITS Contract Services supports many local departmental systems. Contract Services should perform maintenance on schedules appropriate for each department (so they are outside of the scope of this document)
Departments may be affected by infrastructure changes of back-end systems so departmental SLAs should reflect that departments may be affected by ITS’ standard maintenance window. In that case, Contract Services can handle communication responsibilities.
Change History
Approved August 2015