Last updated on 9 October 2021
One great thing about microservice architectures is that different services can use different database technologies. For example, an online store may use MariaDB to persist purchases, Redis for caching a complex catalogue, Elasticsearch for fulltext searches against the product descriptions, Apache Cassandra for showing purchase statistics by month and week, and so on. Each of these databases would do things that the others couldn’t do too well (or at all).
Using different database systems for different purposes may be a great idea if the use case justifies that. But it has costs that you need to understand.
It is extremely important to pay attention to all the following points, because when we adopt a new technology we need to rely on it.
Coding for the database
Developers should choose a database technology based on their application (or microservice) needs. And from that moment on, they’ll have to adapt their new code logic to the database they’ve chosen. A graph has different functionalities from a key/value, a queue… and so on. Relational databases are arguably the most general-purpose technology, but more specialised models behave much better when used in the right contexts.
The vast majority of developers know something about the relational model. Most of them don’t understand a relational DBMS in depth: they can’t build proper indexes for complex queries, and don’t know how to relax a transaction locks. Even less they understand other models: they surely know something about other models too, but they may not know which problems each of them may bring, and they probably don’t know the solutions.
They’ll learn. Developers are very good at learning. But a model cannot be learnt in a week. If you need to hire a new developer for a team, you will need to look for someone who knows a certain technology, or allocate time for her to learn. The same will happen when you move developers between teams.
More expensive infrastructure
Every time you introduce a new database technology in your company, you should expand your infrastructure to include some new facilities for that technology. Typical components to add are:
- Monitoring and/or log analysis.
- Backups. Unless data can be rebuilt from elsewhere in a reasonable time.
- Backup monitoring and testing.
- Load balancing and failover.
- Automation: configuration, maintenance, upgrade, scale out…
Note that your team theoretically needs the necessary skills to plan, implement, operate, fine-tune and troubleshoot all of these components, as well as the technology itself.
In practice, this is not possible. Nowadays DBAs and devops administer too many different technologies to be able to know all of them in depth. For this reason you may need to buy consulting and/or support on these technologies. And yes, for some technologies you can buy services from us.
But it’s on the cloud!
Having a database in the cloud makes your infrastructure more flexible. But it doesn’t save you in any way from having poor performance and disasters. We saw all kind of problems with our customers.
So yes, you can use for example your vendor’s snapshots and automated scale out. Maybe you should. But only if you have a fair understanding of how these things work, what can go wrong, and what to do when (not if!) something goes wrong. Your cloud vendors snapshots have a fundamental characteristic of any other backup type: they can fail. So an automated restore should be periodically tested, and they shouldn’t be your only backup strategy. Automated failover is great, but not magic. Periodically test it, see if it actually works, how much time it takes, and if your applications work correctly in case of a failover.
Test these things, don’t ask your vendor. There are often interesting discrepancies between what they say and what actually happens, and anyway not everything depends on them. You have many ways to break working backup systems, working failover, etc, on your side. If you want some examples, let me know.
Different databases need to be consistent to each other.
In some cases, the needed consistency is absolute, meaning when certain data are modified the change must immediately affect all relevant databases. We must take care of error handling, and implement a form of global rollback. And maybe even some form of isolation (make the change invisible everywhere until it’s visible everywhere – which is mostly a utopia). As well as durability (don’t let crashes undo recent changes). These are the goals of XA transactions, but not many technologies support them.
Sometimes consistency can be relaxed. This doesn’t mean that we can ignore the problems mentioned above: we still need to reflect the changed across different databases at some point, and changes must not be undone by server crashes. We may also have to find out how to verify if data are consistent, in some situations; and what to do when they aren’t.
Another important question is: up to which extent can we allow inconsistencies between different databases? How much time can pass before a change is reflected to a certain database? How many data can be inconsistent, as a maximum?
We will not really dig into any of these topics, but you surely understand that they are quite complex.
Use the right tool for the right purpose.
Different database types exist for very good reason. Trying to use a database for something it can’t do, or doesn’t do well, can lead to big problems. But underestimating the costs of introducing a new technology can lead to even bigger problems. Proper knowledge and proper infrastructure are essential to keep the database stable, secure and fast.