InnoDB buffer pool is the most important memory area to allocate. It contains the most frequently read data and index entries from InnoDB tables. Its size is determined by
innodb_buffer_pool_size, which is one of the two most important settings for performance.
The official recommendation from MySQL is to keep it 80% of your available memory. The official MariaDB recommendation is to keep it 70-80% of your memory. Both are wrong.
Why is the rule of thumb wrong?
Because it makes you think that there is a magical number. In a technical documentation, magical numbers look much more sexy than the usual answer “it depends”. Unfortunately, the usual answer is given too many times because it’s true too many times.
The problems with this particular magic number are:
- A huge buffer pool can be slow.
- Having a buffer pool bigger than the data we most frequently read gives us very small benefit.
- Having a buffer pool bigger than the whole dataset is useless.
- Other buffers need memory. In particular, session buffers may be quite fat if concurrency is high and users run many big joins. Some tools may need memory, too (backup, etc).
So the universal rule is: keep the buffer pool as big as frequently read data. But this advice is too generic. Normally we don’t know the size of the most frequently accessed data. So let’s see how to find out if the buffer pool is too small.
Checking if your buffer pool is too small
There are several metrics you can check. You can see them with
SHOW ENGINE InnoDB STATUS, which is easy to read for humans. But it’s much better to check them in a graph, and see how they change over time. For that, you need some monitoring solution. Here we’ll mention the status variables to check. Depending on your monitoring solution, you will need to find the correct graphs to check.
Size of the free list
Innodb_buffer_pool_pages_free is the number of pages in the buffer pool that are not currently in use. But in many cases, you see a value and wonder… is this number too low? Is it big?
First of all, free pages should never be zero, or close to zero. And there is a rule of thumb that the free pages shouldn’t be less than 5% of the total pages, at least not often. But these are vague indications. Actually, this metric is clearly not enough.
Dirty pages flush
Innodb_buffer_pool_wait_free counts the time spent waiting until an “emergency flush” operation has ended. It shouldn’t grow often. But what does it mean?
InnoDB background threads always try to keep some pages free. In this way, when new data must be added to the buffer pool, InnoDB can just write it into a free page.
It could happen that no free pages are available. There could be several reasons for this, like busy CPU, slow disks or slow operating system sync functions. But normally this happens if the buffer pool is not big enough. If queries frequently read data that is not in the buffer pool, InnoDB keeps reading data from the disk and cache it. But while doing so it has to evict some data that is probably frequently read too.
Note that, while such an emergency flush occurs, a user connection is on hold.
Innodb_buffer_pool_pages_flushed counts the number of flushed pages. Flushed pages shouldn’t be too many, compared to the read pages. If it happens, caching pages does not save enough work for InnoDB. However, these are the total flushed pages, not just the pages flushed to quickly free some memory.
Innodb_buffer_pool_read_requests is the number of pages read. InnoDB tries to read each of them from the buffer pool first. If a page is not in the buffer pool, it is read from disk, and
Innodb_buffer_pool_reads is incremented.
The purpose of a buffer pool is to reduce disk reads, so the ratio between these counters is a good indicator of its efficiency.
We can check these metrics:
Innodb_buffer_pool_pages_freeshould not be zero or close to zero. We should always have free pages available.
- There is a general rule of thumb that it should normally be >= 5% of total pages.
Innodb_buffer_pool_wait_freeshould not happen. It indicates that a user thread is waiting because it found no free pages to write into.
Innodb_buffer_pool_pages_flushedshould be low, or we are flushing too many pages to free pages. Compare it with the number of read pages.
Innodb_buffer_pool_read_requestsshould be low, or the buffer pool is not preventing enough disk reads.
We saw some quick methods to find out if InnoDB is not big enough, taking a look at graphs. These are not the only ways. Feel free to comment, and let us know which methods you use. And remember: comments are welcome, and they help making this website a collective KnowledgeBase!
If you find out that your InnoDB buffer pool is not big enough, you may want to consider our MariaDB Health Check or our MySQL Health Check.If your buffer pool is too small, probably many more settings can be optimised, and that would sensibly improve your database performance.
Thank you for your great explanation. Another common “wisdom” used to be that the swap size should be set to the same size as the main RAM. I wonder how many computers (or smartphones or tablets) really need any swap space at all nowadays. And I hope that nobody would configure their buffer pool to be so big that it would cause excessive swapping.
I would like to point out that in MariaDB Server 10.5, we simplified the buffer pool and the page flushing. There is only one buffer pool instance and one dedicated page cleaner thread. Here are some related tickets from https://jira.mariadb.org:
MDEV-15058 gives some reasoning for moving to a single buffer pool instance
MDEV-15053 and MDEV-22871 reduced buf_pool.mutex contention
MDEV-23399 simplified the LRU page eviction and removed the single-page flushing. The LRU eviction is now being executed by user threads, not by the page cleaner thread anymore. This is what you refer to as “emergency flush”. I prefer to call it “eviction flushing”.
MDEV-23855 reduced the latency of another “emergency flush”, which I call “checkpoint flushing”. When the circular redo log file fills up, we must perform a log checkpoint, which will logically discard the start of the log. If we allowed the end of the log to overwrite the start, then crash recovery would not work.
These changes align with the Albert Einstein’s thought of “making things as simple as possible, but not simpler”. The last part is important, because our refactoring introduced an inadvertent regression:
MDEV-24537 innodb_max_dirty_pages_pct_lwm=0 lost its special meaning
At FOSDEM 2021, I will give a talk on buffer pool performance improvements in the MariaDB server.
Einstein’s wisdom was amazing.
Thanks for pointing out these changes. You mentioned some of them in our interview and our meetup in London, but it’s useful to have the links in one place.
Thanks for also mentioning the regression. I wish that admitting limits and flaws were more common in software communities.
I’m against swapping, especially in a cluster (having a crashed node is better than having a frozen cluster). But, talking of the buffer pool size… I saw some unreasonably big values for innodb_buffer_pool_size, almost equal to the RAM.
I’ll be waiting to hear your talk at FOSDEM!
Is Innodb_buffer_pool_pages_free being non-zero/small really a good metric? If there’s no free pages, then isn’t taking a LRU page just as quick (provided its not dirty) as taking a free page?
I’m 100% in agreement about Innodb_buffer_pool_read_requests and Innodb_buffer_pool_wait_free. Thanks for writing this up, and I’ve removed the 70-80% magic numbers from the MariaDB page (finally, thanks to the reminder here).
Hi Daniel! Thanks for your feedback, I’m happy that my article helped eliminate the magic number! I’ll update the text.
No, I don’t consider
Innodb_buffer_pool_pages_freea good metric alone, but:
– I’d normally expect to see it well above zero.
– If one of the other metrics is not low but
Innodb_buffer_pool_pages_freeis also not low, I conclude that our problem is not buffer pool size.
Are you sure it should be Innodb_buffer_pool_read_requests / Innodb_buffer_pool_reads and not in oposit way?
If there are 10 read requests and only one logical read that InnoDB could not satisfy from the buffer pool, and had to read directly from disk (Innodb_buffer_pool_reads = 1) then the result is 10 / 1 = 10
It even not close to zero. More read requests we have the higher number will be.
In case of Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests we would have 1 / 10 = 0.1
More read requests we have – the result gets more close to zero.
Yes, the order of operands was inverted. I fixed the article. Thank you!