Maintenance windows are periods of time during which a service is made unavailable to allow for maintenance operations. Maintenance windows are announced with some advance, so that users do not get angry and can try to use the service before or after the maintenance window. Typically, maintenance windows are scheduled during off-peak hours.
Some organisations try to avoid maintenance windows as much as possible, to increase their uptime as much as possible. I think maintenance windows are a good thing for IT teams, and in this articles I’ll explain why.
(And yes, the purpose of this article is to convince our customers who are reluctant to schedule maintenance)
TL;DR: Periodic maintenance windows can be healthy both for your team and your infrastructure. Business won’t be damaged.
Maintenance windows are not incidents
Let me start with something that is theoretically obvious for everyone, but in practice some people do not consider: maintenance windows are not incidents!
Users are told about maintenance windows. They know in advance that they won’t be able to use the service at a certain time. In most cases they should be able to organise their activities in a way that they won’t have a damage from maintenance windows.
Incidents are bad for reputation. Maintenance windows, if used wisely, are not. No one will think that your organisation is not reliable because it scheduled some maintenance! Of course you should try to avoid having too many maintenance windows.
Updates and optimisations
You should always have some redundancy but, in practice, many services out there rely on standalone servers. When those servers are down, the service is down. But even clusters sometimes require a restart. And not many services rely on multi-cluster high availability solutions. So an upgrade or certain configurations changes may require the service to be unavailable for a while.
But you can’t avoid upgrades and configuration changes, and you can’t delay them for a long time. You’d end up having obsolete versions, with known security bugs, non optimal configuration and performance, and so on. Scheduling a maintenance window is far better.
Check your skills
Does your team know what to do when a server crashes or needs to be restarted? What if some configuration error or some other anomaly prevents a server from restarting? You should check the logs, understand the messages you read, and take action. But you need proper skills and knowledge to do so.
You don’t want to find out that you don’t know what to do after an unscheduled crash!
A maintenance window is different. The scheduled downtime is based on prudent estimations, so if something goes wrong you should have enough time to find the problem and fix it. Remember to document any problems you encounter. Knowledge acquired during scheduled maintenance windows can be precious.
Fear of the unknown
Sometimes restarting a service, a server or a cluster may be high desirable but somehow avoidable. The decision whether to do it or not may depend on several factors, and one of them may be your fear – whether you admit it or not.
If you’re afraid to restart something, it means you’re not confident in its good health, or in your team skills. This lack of confidence surely has caused, and will cause, even more problems. It shouldn’t be the case. Use maintenance windows to strengthen your confidence.
Maintenance windows allow organisations to do the necessary maintenance and be more comfortable with normal operations. The alternative is to live in a perpetual state of lack of confidence, with obsolete and non-optimised systems. Use maintenance windows wisely, whenever it’s necessary.