A checklist for Database backup automation

by Federico Razzoli | Jan 6, 2022 | Strategy

Need Help? Click Here for Expert Support

A typical question we ask new customers is: do you have proper backups in place? Some of them have some form of backups, but answer no. The others give a wrong answer. Here’s a checklist of things you should have in place.

Vettabase can help you implement Database Automation

Checklist

If you want to have reliable, rock-solid backups, ask yourself these questions:

Do you have clear, measurable performance and availability goals?
Are your backups automated?
Do you take your backups frequently enough?
In case of a failure, can you restore a backup quickly?
In case the latest backup is not available, can’t be restored or contains corrupted data, can you recover an older one?
Do you have an automated restore procedure?
Do you regularly, automatically test backup & restore procedures?
Do you monitor backups?
Do you need to compress backups?
Do you need to encrypt backups?
Do you monitor restore procedures?
When a backup fails, is another backup ready to be used?
Do you understand that “the cloud” is not magic?

The bullet points try to follow a logical order, not an importance order. The list is incomplete. The exact bullets and their importance vary depending on your organisation.

Performance and availability goals

Do you have clear, measurable performance and availability goals? In other words, do you have SLOs (Service Level Objectives)? You can (and probably should) even have different SLOs for different services, because services/pages don’t have the same importance. For example, the user registration page is more critical than the About Us page.

Objectives look like this:

The page must load in < 1 second.
Slowdowns that affect > 2% of users must be no longer than 1 hour, and occur no more than twice a month, with a minimum distance of 7 days from each other.
Unplanned outages must not last more than 30 minutes.

SLOs should determine our choices when it comes to backups, as our decisions will determine how much recent data we could lose in case of an incident, and how much time is needed to restore a backup.

Objectives must be realistic, and they must keep into account the actual losses that performance problems and outages cause. This means that you should know how many sales you will lose in a 30 minutes slowdown, and how many users will leave your website and never come back (farewell, Customer Lifetime Value). You should quantify these losses, and then you’ll know how much you can invest in proper interventions that aim to avoid them.

Really, calculate these numbers. If you can’t, use some guesstimates and plan to collect proper metrics in the future. Your decisions will be more rational.

Automation

Are your backups automated? Manual backups should never be relied on. People don’t take them regularly, makes mistakes, etc. Seriously… backups should be automated.

Frequency

Do you take your backup frequently enough? This depends on how many recent data you can afford to lose. A common decision is to take them every 24 hours. That is fine only if you can lose up to one day of data in case of failure.

Sure, backups are heavy operations. But they can (should!) be taken from a replica or an unused node of a cluster, to avoid slowing down production. Sure, they also take space. But you can probably use incremental backups from a replica or even from the master.

Short term availability

In case of a failure, can you restore a backup quickly? The time to recover includes the time that is necessary to make a backup available, so a copy of the latest backup should probably be kept on the database server. Maybe even in the same disk – but this means that, if the disk is damaged, you will have to copy a backup from elsewhere. Using a Network Attached Storage could be a good compromise.

If your database is a cloud instance, you can at least make sure that the latest backup is in the same private network.

The time to recover also includes the interval of time after the corruption happens and before you take action. So, make sure your monitoring and alerting systems are adequate.

Long term availability

In case the latest backup is not available, can’t be restored or contains corrupted data, can you recover an older one? An older backup should be archived somewhere safe. This means that recovering may take time. This will only happen in extraordinary cases, so a longer recovery time should be acceptable. Normally you won’t rely on archived backups. Yet, you should check that restore won’t take too much time.

Automated restore

Do you have an automated restore procedure? Is it regularly, automatically tested?

Automated backups should be tested with an automated procedure. This can easily be combined with another need that your organisation surely has: feeding staging databases. Backups can be restored into staging database servers every night.

Having a restore script means that you can run it in case of need. This will make restore faster, documented, and will avoid human mistakes.

Monitoring

Do you monitor backups? You should monitor that they exist, they’re not empty, their size, and how much time they take.

Do you monitor restore procedures? You should check that it doesn’t fail, and how much time it takes.

Compression

Do you need to compress backups?

Backup compression will help reduce the costs of storing and moving around your backups. This includes the resources you have to pay, but also the time took by backup-related operations (archiving, restoring).

But remember: everything you do with your backups will add some complexity and increase the likelihood of a backup failure, or a restore failure. Monitor your backups size as mentioned above, and therefore monitor the cost of keeping those backups. If compression is not necessary, you may prefer to leave backups uncompressed. This is especially true for the latest backup.

Encryption

Do you need to encrypt backups?

Applicable regulations, like GDPR, and your organisation policies determine if your backups should be encrypted. If so, monitor that they are encrypted and can successfully be decrypted. Make sure your use secure algorithms, and make sure you’ll receive an alert if a vulnerability is discovered in the encryption software you use.

Multiple strategies

When a backup fails, is another backup ready to be used?

Provided that the restore automation, the monitoring, and the alerting are perfect, the only thing they do is to to let you know that a backup failed. Hopefully you can fix the problem before the next scheduled backup, but you currently you don’t have a valid backup.

For this reason, you should have more than one backup strategies in place. When needed, you will try to restore your fastest, most reliable backup. If that fails, you will try to restore progressively slower or less reliable backups. You may have snapshots as a primary strategy.

The cloud

Do you understand that “the cloud” is not magic? By “you” I mean your team, your management, and whoever is involved. Everyone needs to understand that your favourite cloud provider fails often, and cannot guarantee that the snapshots it provides will always work.

Your vendor has written somewhere, somehow, what they do or don’t guarantee. Did you read those warnings? This should save their a**e if you bring them to the court. Saving yours in case of a backup failure is up to you.

Conclusions

We tried to summarise the features that your database infrastructure should have. If they’re missing, you have some degree of technical debts.

If you need help with backup automation and backup testing automation, consider our database automation service.

Federico Razzoli

Did you like this article?

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Federico Razzoli

Federico Razzoli is a database professional, with a preference for open source databases, who has been working with DBMSs since year 2000. In the past 20+ years, he served in a number of companies as a DBA, Database Engineer, Database Consultant and Software Developer. In 2016, Federico summarized his extensive experience with MariaDB in the “Mastering MariaDB” book published by Packt. Being an experienced database events speaker, Federico speaks at professional conferences and meetups and conducts database trainings. He is also a supporter and advocate of open source software. As the Director of Vettabase, Federico does business worldwide but prefers to do it from Scotland where he lives.

Optimise your database before scaling

Dec 4, 2024

Nowadays, there is a common misconceptions that we don't need to optimise the technologies we use, because we can just scale out or scale up. Whereas scaling out means to add one or more instances of a technology, and scaling up means to run the technology on better...

How to test database backups

Feb 14, 2022

Testing backups is often considered too expensive. But it is not, if the tests are adequately designed.

The importance of maintenance windows

Oct 25, 2021

Periodical maintenance windows can be healthy both for your team and your infrastructure. Business won’t be damaged.

Services



Email

Schedule Meeting

Phone

A checklist for Database backup automation

Checklist

Performance and availability goals

Automation

Frequency

Short term availability

Long term availability

Automated restore

Monitoring

Compression

Encryption

Multiple strategies

The cloud

Conclusions

Recent Posts

Optimise your database before scaling

How to test database backups

The importance of maintenance windows

Services

Database Automation

Database Training

Database Health Check

Monthly DBA Time

Database Upgrade

0 Comments

Submit a Comment Cancel reply

Email

Schedule Meeting

Phone

Quick Links

Recent Posts

To BLOB or not to BLOB? The image storage dilemma

MariaDB 11.8 LTS: Parallel Dumps, PARSEC authentication, new SQL syntaxes, and more

Query Optimisation: Using indexes for WHERE with multiple conditions

Policies & Licenses

Follow Us on Social Media