Schedule Meeting

a

The Mystery of ProxySQL Galera Writer Switchover!

by | Dec 13, 2024 | ProxySQL, Galera

Need Help?  Click Here for Expert Support

Writer switchover issues with ProxySQL and Galera can feel like an unsolved puzzle. Recently, I encountered two strange behaviors that revealed underlying issues in how ProxySQL interacts with Galera clusters. In this post, I’ll walk through the unexpected behaviors I faced, and the troubleshooting approaches that helped me address these challenges, and workarounds to avoid them.

ProxySQL is one of the best open source solutions for load balancing and failover, and it’s an essential component in the setups we design and implement for many of our customers. ProxySQL is SQL-aware, which allows read-write split. Its flexibility and advanced features make it a key part of achieving high availability in MySQL and MariaDB including Galera clusters and asynchronous replication.

At Vettabase, using ProxySQL with Galera is a part of our daily work supporting high availability and failover for customers. It’s worth noting that a switchover isn’t always the result of a master crash. Often, it’s an intentional operation – for example, when you need to convert the current writer node into a maintenance mode. While this should be straightforward, the issues I uncovered show that achieving smooth, predictable switchovers with ProxySQL and Galera requires careful consideration and, in some cases, workarounds.

We usually set up Galera clusters with just one writer, even though it’s a multi-master system. This helps avoid write conflicts, which can happen if more than one node is handling writes at the same time. These conflicts can lead to a node eviction or, in rare cases, even a cluster crash. While Galera supports multi-master writes, keeping one logical master makes things more stable and reliable.

Through hands-on deployments and troubleshooting, I uncovered two cases, now documented as issues #4702 and #4750, revealing unexpected behavior in ProxySQL’s interaction with Galera.

  • Issue #4702: Unexpected behavior for Galera hosts and write switching.
  • Issue #4750: ProxySQL writer switchover fails on the first LOAD MYSQL SERVERS TO RUNTIME execution, succeeding only on the second attempt.

Let’s dive into each of these issues.

Issue #4702: Unexpected behavior for Galera hosts and write switching

In our setup, ProxySQL’s role is to manage writes and reads across multiple Galera nodes, ensuring that only one node serves as the active writer at any given time. However, with Issue #4702, we noticed that ProxySQL wasn’t always switching the writer role as expected.

ProxySQL seemed to “stick” to the original writer node, even after adjusting the hostgroup settings to prioritize a different node. To work around this, we manually changed the hostgroup and weights and reloaded the runtime configuration. This behavior is partially related to the Galera configucation, suggesting that changing the hostgroup ID alone might not be sufficient to initiate a proper writer switchover.

Writers selection

The nodes to be placed as writers in the writer hostgroup are deterministically selected from the currently available cluster nodes based on weight + hostname + port. You can check how this selection is exercised in the different flowcharts. The same selection is used when the node is going to be set initially in a target hostgroup, or for deciding whether a server already present in a hostgroup should be replaced or not.

In reality the process of moving a node to the writer_hostgroup is not as simple as stated in the diagram.

When a new node is identified as a writer, a process for selecting the new writers takes place. This means that the node that joins is not guaranteed to be a writer, or any of the nodes that previously were writers are guaranteed to keep being online, after a new writer joins.

This is determined by a fixed order rule composed by: “weight + hostname + port. If a node that was previously a writer is going to be replaced with a newly-joined node, the previous will be kept in the writer_hostgroup, but set as SHUNNED.

Conclusion: If you’re working with ProxySQL for writer switchover, remember that adjusting hostgroup IDs may not be enough; keep weight and hostname settings in mind to ensure the expected node takes over as the writer.

UPDATE MYSQL_SERVERS SET weight=1;
UPDATE MYSQL_SERVERS SET weight=10 WHERE hostname='10.10.20.15';
LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;
SELECT * FROM mysql_servers;
SELECT hostgroup_id, hostname, status, weight FROM runtime_mysql_servers;
+--------------+--------------+---------+--------+
| hostgroup_id | hostname     | status  | weight |
+--------------+--------------+---------+--------+
| 10           | 10.10.20.15  | ONLINE  | 10     |
| 10           | 10.10.20.8   | SHUNNED | 1      |
| 10           | 10.10.20.9   | SHUNNED | 1      |
| 11           | 10.10.20.8   | ONLINE  | 1      |
| 11           | 10.10.20.9   | ONLINE  | 1      |
| 20           | 10.10.20.8   | ONLINE  | 1      |
| 20           | 10.10.20.9   | ONLINE  | 1      |
+--------------+--------------+---------+--------+

Issue #4750: Switchover Fails on First LOAD MYSQL SERVERS TO RUNTIME, Succeeds on Second Attempt

Issue #4750 took the writer switchover puzzle even further. ProxySQL has a LOAD MYSQL SERVERS TO RUNTIME command that reloads the MySQL servers and applies changes to runtime. However, in some instances, executing this command only once did not apply the expected writer switchover changes. We found that running the command twice was necessary to fully update ProxySQL’s runtime configuration.

We’ll be eagerly awaiting feedback from ProxySQL support on this behavior.

Closing Thoughts and Workarounds

Writer switchover issues with ProxySQL and Galera have felt like piecing together a puzzle. Thanks to hands-on troubleshooting and real-world deployments at Vettabase, we’ve documented these quirks and reported them to the ProxySQL team. For now, using the workarounds described can help manage these issues in production environments.

While the journey to uncover these behaviors was complex, knowing these quirks provides us with the insights needed to better support our customers and their high-availability setups.

Mykhaylo Rykmas

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Mike Rykmas
Mike Rykmas is a skilled software engineer with years of expertise in database administration, cloud computing, and IT infrastructure. Thanks to his strong background in data management and performance optimization, Mike has successfully led and implemented scalable solutions, managing petabytes of data to meet diverse business needs.

Recent Posts

No Results Found

The page you requested could not be found. Try refining your search, or use the navigation above to locate the post.

Services

Need Help?  Click Here for Expert Support

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *