Schedule Meeting

a

The risks of replication filters in MariaDB and MySQL

by | Apr 23, 2021 | MariaDB, MySQL

Need Help?  Click Here for Expert Support

MariaDB and MySQL support what they call replication filters. This feature allows us to determine which tables are replicated, and which are not. Filters can be set on a replica to filter-in tables and databases: the replica will not replicate everything from the master, but only the specified databases and tables. Or they can filter-out tables and databases: the specified databases and tables will be ignored, everything else will be replicated. Or we can use a combination of inclusive filters and exclusive filters. And we can filter specific databases, specific tables, or regular expressions for table names (for example, to include all the tables with a certain prefix).

A bit confusing but powerful. Unfortunately, this feature is also very dangerous, as it can easily cause inconsistencies (different data) between the master and the replica. This can happen if applications run queries in a wrong way, or if a DBA runs a migration without following certain rules.

Inconsistent data is per se bad news, even when no one notices the inconsistencies. But keep also in mind that, if MariaDB/MySQL detects an inconsistency, replication will break. MariaDB/MySQL detects inconsistencies, for example, and it tries to replicate a row deletion but the target row doesn’t exist anymore.

Before listing the risks themselves, let me clarify a couple of concepts that some of my readers may not familiar with.

The risks of MariaDB/MySQL replication filters
Blixy replicated itself without a couple of limbs, thanks to MariaDB Replication Filters!

Some MariaDB/MySQL concepts

Let me explain a couple of MySQL concepts first, because they’re important to understand the risks of replication filters.

I want this article to be understandable for non-expert MySQL users. But if you know everything about binary log formats and you know what default database means, just skip to How replication filters can cause inconsistencies and Binary log filters, below.

Statement replication

All the risks only affect the replication of events from the master, in the STATEMENT format. Events are logged as STATEMENT if:

  • binlog_format = STATEMENT;
  • binlog_format = MIXED (for non-deterministic statements);
  • If a user with the SUPER privilege changes the binlog_format, maybe just temporarily and at session level;
    • This could also happen in a stored procedure created by a superuser;
  • DDL statements (CREATE TABLEALTER TABLEDROP TABLE…) are always logged as STATEMENT.

For more details, see Setting The Binary Log Format, in MySQL documentation.

Default database

A default database is the currently selected database. If there is none, you’ll need to write queries in this form:

SELECT * FROM my_database.my_table;

Using a default database means to write something like this:

USE my_database;
SELECT * FROM my_table;

When writing an application, usually one specified a database when establishing the connection to MySQL, instead of running USE.

How replication filters can cause inconsistencies

  • When no default database is selected, database-level filters won’t work.
  • When a database is selected but we are writing to a table into a different database, database-level filters won’t work.
  • For the reason explained above, data changes made by inter-database foreign keys and triggers escape replication filters.
  • replicate_do_table doesn’t prevent changes made in stored functions. One could object that functions should not write to tables, but that is not always true. First of all, anything that is supported by the technology could be done by the users. And also, a function could treat table data as static variables, for example to implement a counter.
  • Replication breaks when a single statement modifies both a filtered-in table and a filtered-out table. (multi-table DELETE, multi-table UPDATE).

The last risk is particularly tricky if you use patterns to exclude (or include) table names. For example, you may add a filter to ignore all tables starting with old_. But after some months/years a new table called old_index may be created, and you may want to replicate it. At that point, it’s very easy to make a mistake.

Binary log filters

Replication filters are set on the replica to decide which events – sent by the master – will be replicated. Binary log filters are set on the master to decide which events should be written in the binary log. Filtered-out events are not logged and not sent to any replica. They are twin options: identical, with a different prefix (replicate_ or binlog_).

Binary log filters are subject to the same limitations/risks as replication filters. But, while they sound like an optimisation (because events that we don’t want to replicate are not logged by the master and not seen by the replicas at all), they are inherently subject to two important problems:

  • Not logged events cannot be replicated by any replica;
  • Using the binary log as an incremental backupwill not restore any non-logged change;
    • A way to undo a destructive change is to restore the latest backup and re-apply the changes from the binary log, except for the one we want to undo. But if the binary log doesn’t include all changes made to the data, this will not be possible.

Recommendations for a safe replication

My recommendations about the use of these features are:

  • Don’t use binary log filters at all.
  • Filtering out the mysql system database normally does not harm. This is desirable to avoid replicating master users and permissions on the replica.
    • The only case when this is dangerous is when we write into the mysql database manually, instead of using statements like CREATE USER.
  • sys database should be different for every MySQL version. It is a good idea to filter it out.
  • Filtering out information_schema or performance_schema has no effect, simply don’t.
  • Sometimes we really want to avoid replicate some regular databases. And using replication filters is the only way. But at least, we should follow some rules strictly.
    • Use ROW binary log format.
    • Make sure that applications select a default database and never write to a non-default database.
    • If you do migrations manually, make sure to properly use USE.

Conclusions

We discussed the risks of replication filters and binary log filters, and I offered my recommendations to make your replication safe.

To get an expert review of your MariaDB or MySQL servers, including the reliability of your replication, consider a MariaDB Health Check or a MySQL Health Check.

Federico Razzoli

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Federico Razzoli
Federico Razzoli is a database professional, with a preference for open source databases, who has been working with DBMSs since year 2000. In the past 20+ years, he served in a number of companies as a DBA, Database Engineer, Database Consultant and Software Developer. In 2016, Federico summarized his extensive experience with MariaDB in the “Mastering MariaDB” book published by Packt. Being an experienced database events speaker, Federico speaks at professional conferences and meetups and conducts database trainings. He is also a supporter and advocate of open source software. As the Director of Vettabase, Federico does business worldwide but prefers to do it from Scotland where he lives.

Recent Posts

New project: Awesome MariaDB list

New project: Awesome MariaDB list

At Vettabase, we're delighted to announce our new project, sponsored by MariaDB Foundation: the Awesome MariaDB list on GitHub! We already interact with the MariaDB Foundation as Technology Partners, and this project is the newest outcome of our long-time friendship....

How to query a REST API with MariaDB CONNECT engine

How to query a REST API with MariaDB CONNECT engine

Querying a remote REST API directly from the database: does it sound like pure madness? Some people will surely answer that yes, it does. So, first of all, let me spend a paragraph to show some great benefits of doing so. Then I'll show you how to query a REST API...

Services

Need Help?  Click Here for Expert Support

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *