Schedule Meeting

a

Dropping database objects in a safe way

by | Aug 28, 2020 | MariaDB, MySQL

Need Help?  Click Here for Expert Support
A drop of rain on a wooden table
A DROP on a TABLE

Whenever I worked as a DBA, I was reasonably scared by DROP commands. Don’t get me wrong: despite this, I badly wanted to drop obsolete tables. After that, I could more easily ask developers to fix bad practices, like tables without primary keys or without indexes at all. Similarly, I wanted to drop obsolete columns, especially from very big tables, and to normalise big sparse matrixes (which implies moving some columns to a new table). And I wanted to drop duplicate or unused indexes, as they add a useless overhead to writes and to the optimiser.

But every time a developer said something like “it’s ok, this table is not used anymore”, I knew I could not trust that statement. And actually, from time to time, their optimistic opinions proved wrong. Not because I worked with stupid developers! I am sure that they always ran at least a git grep <tablename> on their repositories. But maybe the table was used in some hideous job or script, and they missed it.

The same happened with columns. On Federico-Razzoli.com, I already wrote how dropping an index (even an unused one) can break an application.

But don’t let this stop you. Here I’m discussing the process I used to safely drop stuff. Of course not all companies are the same, so the process could vary, and should probably vary in your case.

XKCD strip. School calls a mom complaining about her son's name: Robert'); DROP TABLE students;
Technically, this woman could be in good faith

Checking if objects are used

If you use MariaDB or Percona Server, you should definitely enable the user_statistics plugin. It can be useful in many ways. PMM uses it to draw graphs on table usage, and you can probably do the same with other technologies, like Zabbix templates. If a table is used, you will see from a graph. To manually check tables usage, and which tables are unused, see these queries.

Check if the table or column is used in any stored routines, triggers or events. Some LIKE queries will help you, unless you have many procedures and you are looking for a table name like order:

SELECT ROUTINE_SCHEMA
    FROM information_schema.ROUTINES
    WHERE ROUTINE_BODY LIKE '%table_name%';
SELECT EVENT_OBJECT_SCHEMA, EVENT_OBJECT_TABLE
    FROM information_schema.TRIGGERS
    WHERE ACTION_STATEMENT LIKE '%table_name%';
SELECT EVENT_SCHEMA, EVENT_NAME
    FROM information_schema.EVENTS
    WHERE EVENT_BODY LIKE '%table_name%';

Check of course your scripts, including scripts for tasks like backups or anonymisation.

The rest is up to your developers. If they say that a table is not used, you will pretend you believe them. But be sure to check with all relevant teams. This typically includes analysts and data scientists – often, developers working on the operational side have no idea what analysts do or need. Don’t use private messages for this type of communications. The best way would be to add tasks in their bug tracker (JIRA or whatever) to ask them to check if a table or column is used. This will give more people the opportunity to report that an object shouldn’t be dropped, and will show that you did all the checks you should do in this cases. Also, they will probably take the task more seriously, as they can show their managers how they used their time.

Don’t forget that the same checks must be done for indexes, as they could be mentioned in optimiser hints.

Dropping objects

Whatever you are dropping – a table, a column or an index – the process is the same. Here’s a summary – below we’ll discuss every point:

  1. Drop or rename the object in staging.
  2. Rename the object in production.
  3. Wait and monitor.
  4. Take a backup.
  5. Set the backup retention time.
  6. Drop the object.

The first step must be dropping the table/column/index in staging. You already know this, and I won’t stress this obvious concept. Instead, I want to stress that in the real world this is not enough. I saw several cases where a table was never used in staging, but it was used in production, maybe by some job or by some forgotten feature.

In production, you should rename the object. Add a prefix, for example __old_. If I can establish a naming convention (a good topic for another time) I disallow names to start with a _ character. This makes __old_ and other special prefixes safer, so that you can reliably use them in your scripts (for example, excluding __old_ tables from dumps).

You may want to rename the object (instead of dropping it) in staging, too.

The next step is simply waiting. You will have a standard time period to wait until the table is considered safe to delete, for example 2 days. What a reasonable period is depends on factors that are specific to your company, for example the frequency of job runs.

Application errors are hopefully monitored. If something goes bad in that period, you can very quickly restore the old situation by renaming back the table, column or index. Since this operation only affects metadata, it is very fast (unless it’s locked by a long-running query/transaction).

If nothing went wrong, you should take a backup of the object, including its SQL definition. In the case of a column, a backup could be a dump of the primary key plus the column to delete. In the case of an index, its definition is all you need.

Decide for how much time the backup should be preserved. The wise answer could easily be forever. Why? Because you never know what can happen. For example, you may have a table with data about the translators of contents that disappeared from your website. Developers will probably tell you that the table can be deleted. And that’s wrong: your company could still have some legal dispute with those people in the future.

Finally, you can drop the object!

Invisible columns

If you use MariaDB, you can use invisible columns. Making a column invisible will prevent it from being returned by queries of type SELECT *. If the column is still used by applications, errors will occur, and you can make the column visible again.

Making a column visible or invisible is an instantaneous operation.

Ignored and Invisible indexes

Indexes can be renamed, just like tables and columns. But this only affects queries that explicitly try to use them using optimiser hints. Queries may still use those indexes because MySQL optimiser knows thinks that it’s a good idea. If the optimiser is right, and usually it is, dropping an index could make some queries slow.

MariaDB 10.6+ supports Ignored Indexes, and MySQL 8.0 implements invisible indexes. An invisible index cannot be used in any way by a query. The optimiser will not take such index into consideration. And it an optimiser hint suggests to use (or even ignore) that index, the query will fail.

A small overhead

Note that these extra steps don’t add a big overhead. Once a table is renamed, you can go back to your everyday work. And maybe once a week you can backup and drop renamed tables. This step can even be automated, and quite easily – just don’t forget that every automation needs to be properly monitored.

See also

Ebooks

Related articles

Related courses

Federico Razzoli

Did you like this article?

Meme: if having a coffee in the morning doesn't wake you up, try deleting a table in production instead.

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Federico Razzoli
Federico Razzoli is a database professional, with a preference for open source databases, who has been working with DBMSs since year 2000. In the past 20+ years, he served in a number of companies as a DBA, Database Engineer, Database Consultant and Software Developer. In 2016, Federico summarized his extensive experience with MariaDB in the “Mastering MariaDB” book published by Packt. Being an experienced database events speaker, Federico speaks at professional conferences and meetups and conducts database trainings. He is also a supporter and advocate of open source software. As the Director of Vettabase, Federico does business worldwide but prefers to do it from Scotland where he lives.

Recent Posts

Coming up next, a ColumnStore webinar

Coming up next, a ColumnStore webinar

The 27th is fast approaching but you can still signup to our live webinar where we will be exploring the benefits of time series data using MariaDB ColumnStore. We will be jumping into a live demo with some example data and queries simulating high volume time series...

The benefits of MariaDB ColumnStore

The benefits of MariaDB ColumnStore

Last week Richard announced our projects on MariaDB ColumnStore. Take a look at his great post, if you didn't already. Since then, I've got some questions from customers, colleagues and friends: why did you guys decide to robustly invest into ColumnStore, and offer...

Let’s go, MariaDB ColumnStore at Vettabase!

Let’s go, MariaDB ColumnStore at Vettabase!

I have been an avid user of Columnar based storage engines for as long as I have been a DBA. The advantage is instantly obvious once you and your teams start to write or convert queries which previously took tens of minutes, are now returning in fractions of a second....

Services

Need Help?  Click Here for Expert Support

3 Comments

  1. Karl Levik

    Thank you for a useful and insightful article!

    I haven’t tested this is, but I assume that cascading foreign key constraints referencing a table will behave the same way when you drop the table as when you delete from the table. If so, then that might be another thing to keep in mind when dropping a table in production. It’s not always the case that these constraints have been defined correctly, so you might end up inadvertently deleting a lot more data than you intended to!

    Reply
    • federico.razzoli

      Hi Karl! Thanks for dropping by.

      No really. You cannot drop a table referenced by a foreign key. You can drop a table that contains a foreign key, but this has no cascade effect (actually it’s not supposed to). Foreign keys can be temporarily disabled, if necessary, with SET foreign_key_checks = 0. The problem is that this disables all foreign keys, not just the ones that cause troubles.

      Foreign keys also make online schema changes virtually impossible. pt-osc has ways to handle foreign keys, but there is always a certain amount of risk. But this is another story 🙂

      Reply
      • Karl

        Ah, of course! Yes, you can’t drop a table referenced by a foreign key (except with foreign_key_checks=OFF like you said). Where was my brain today?

        Reply

Submit a Comment

Your email address will not be published. Required fields are marked *