Schedule Meeting

a

Tuning PostgreSQL Auto-vacuum

by | Apr 28, 2022 | PostgreSQL

Need Help?  Click Here for Expert Support

Before we look at how to tune PostgreSQL auto-vacuum, let’s take a brief look at the importance of vacuuming in PostgreSQL.

Vettabase PostgreSQL Health Check

Why Autovacuum is necessary

In PostgreSQL, rows that are deleted or modified are not completely removed. Rather they are marked as dead tuples. In other words, PostgreSQL does not physically remove these rows but places a marker on them to prevent future queries from returning or accessing dead tuples.

Dead rows consume storage space, which leads to a bloated database. Bloated database is one of the major reasons why database servers perform poorly (excluding other architectural reasons).

To prevent dead rows from taking up excess storage space, we need to run a background process to recover storage space occupied by dead rows or tuples.

PostgreSQL provides the VACUUM command and VACUUM ANALYZE commands to get rid of dead rows or tuples. However, both commands differ in how they operate.

The VACUUM command focuses only on getting rid of dead rows to recover storage space whilst the VACUUM ANALYZE command gets rid of storage space occupied by dead rows and also updates the statistics used by the query planner to select or choose the most efficient plan for executing a query.

There are other parameters such as FULL , FREEZE, and DISABLE_PAGE_SKIPPING that we can attach to the VACUUM command to run vacuum tasks against tables in PostgreSQL.

The PostgreSQL documentation contains additional parameters to customize vacuuming tasks.

The Role of Visibility Map in Vacuum Processing

Postgres version earlier than or before version 8.4 performed poorly in vacuuming dead tuples. Tables are vacuumed via the following steps:

  • Postgres scans all tables to get all dead tuples and freeze old tuples if necessary.
  • Remove index tuples that point to respective dead tuples.
  • Remove dead tuples, update live tuples, clean up index and update table statistics.

However the introduction of visibility map improved vacuum processing in Postgresql. The visibility map determines whether each page in the table file has dead tuples. Vacuum processing skips a page that does not have dead tuples. No scanning of tables is needed in this case.

Configure Postgres Vacuum Processing

Let’s discuss how to appropriately tune auto vacuuming or automatic vacuuming in PostgreSQL using the following parameters in PostgreSQL.

auto_vacuum: This parameter is a boolean type: ON or OFF. The default value is ON. In this case, the server starts the autovacuum launcher daemon. In addition, PostgreSQL recommends that the track_counts parameter must be enabled for the autovacuum to work.

track_counts parameter enables the collection of statistics on database activity.

Even if you decide to set auto_vacuum to OFF, PostgreSQL will launch an autovacuum process if there is an urgent need to prevent transaction wraparound ID.

log_autovacuum_min_duration: Logs vacuuming tasks for a specified amount of time. This parameter is helpful if you want to track or log the activity of autovacuum tasks. For instance, you want to check if there is an autovacuum task that did not run or execute due to a non-existing relation. Setting the value to -1 logs such information.

autovacuum_max_workers: autovacuum workers/processes are not the same as the autovacuum launcher. The autovacuum launcher looks for tables/relations that need to be vacuumed and then starts the autovacuum workers to work on selected tables/relations. Based on the INTEGER value assigned to this parameter, let’s say, four workers. The launcher starts four workers to work on selected tables.

The default value for the autovacuum_max_workers is 3. If you are dealing with many dead tuples we recommend you to increase number of workers to speed up vacuum processing. Otherwise, the default value is okay.

autovacuum_naptime: This parameter defines the naptime for the autovacuum launcher. So assuming we want the launcher to check up on the database to issue VACUUM and ANALYZE commands after every two minutes. We can set the integer value of the autovacuum_naptime to 2. Setting appropriate naptime prevents contention of resources with other background Postgres processes since vacuum processing is initialized periodically. Note that this parameter is not specifically related to autovacuum workers but rather the launcher.

autovacuum_vacuum_threshold: This parameter defines the threshold to trigger the VACUUM command on PostgreSQL tables. So assuming you want to trigger the VACUUM command tables with 100 dead tuples. You can set the integer value to 100. The default value is 50 deleted/updated tuples/rows. This parameter specifies the minimum threshold for deleted/updated tuples/rows.

autovacuum_vacuum_insert_threshold: This parameter is similar to the autovacuum_vacuum_threshold parameter. The only difference is inserting tuples/rows instead of updated/deleted rows/tuples. The default value is 1000 tuples. However, if -1 is specified, autovacuum will not initiate a VACUUM command on any of the selected tables. You can increase the value for the autovacuum_vacuum_insert_threshold parameter if there are enough resources to spare for vacuuming. If not, we advise you not to adjust the default value.

autovacuum_analyze_threshold: This parameter specifies the minimum or least number of inserted, updated, or deleted tuples to initiate or trigger the ANALYZE command on selected tables. The default value is 50 tuples/rows.

Assuming your Postgres server runs on a platform with multiple faster cores, It’s advisable to increase the default value to run or execute the ANALYZE command frequently since the query planner depends on the statistical results produced by the ANALYZE command to choose the most efficient execution plan for queries. Thus, if the result in the pg_statistic system catalog is not up to date, the query planner cannot choose the most efficient execution plan for queries.

autovacuum_vacuum_scale_factor: This parameter basically specifies a fraction of the table size which decides when to trigger the VACUUM command. It is added to the parameter autovacuum_vacuum_threshold. The default value of 20% is optional and appropriate for data platforms with minimal processing speed.

autovacuum_vacuum_insert_scale_factor: This parameter is similar to the previous parameter we discussed. The only difference is that this parameter is rather added to the autovacuum_vacuum_insert_threshold parameter instead of the autovacuum_vacuum_threshold parameter.

autovacuum_analyze_scale_factor: This parameter determines or specifies a fraction of the table size that is in included the autovacuum_analyze_threshold parameter to decide when to execute the ANALYZE command. The default value is 0.1 or 10%. Again, it’s advisable to increase the default value if Postgres runs on servers with multiple cores since updated statistics improve query performance.

autovacuum_freeze_max_age: This parameter specifies the maximum value for a table’s pg_class.relfrozenxid column to prevent transaction wraparound within the table. The VACUUM command is executed or triggered to freeze old rows or convert old rows into frozen rows. The default value for this transaction is 200 million transactions. We suggest the default value is appropriate for most cases.

autovacuum_multixact_ freeze_min_age: This parameter is similar to the previous one we just discussed but it focuses on the minimum value for multixact before the VACUUM command is triggered to prevent multixact ID wraparound within the table. The default value for this parameter is 400 million mutlixacts.

Note that multixact in Postgres refers to more than one transaction concurrently locking a row.

autovacuum_vacuum_cost_delay: This parameter is used to prevent autovacuum workers from exceeding work points specified by the parameter autovacuum_vacuum_cost_limit. It is advisable not to increase the value for this parameter to prevent workers from excessively consuming operating system resources to avoid resource contention.

autovacuum_vacuum_cost_limit: This parameter prevents autovacuum workers from consuming too much available operating system resources. It is advisable to set the value of the parameter to 0 if you are dealing with large tables to avoid bloated tables.

However, you can set it to -1 if you are dealing with small tables to allow other cpu cores to focus on other high priority processing tasks.

We do not advise you to strictly follow the above instructions to tune autovacuum for PostgreSQL. However you can make use of these steps to understand how autovaccum works in PostgreSQL and tune it to your own preference.

Conclusions

Configuring autovacuum correctly is important to avoid performance degradation cause by dead tuples. If you need advice about tuning it properly, consider our PostgreSQL Health Checks.

Michael Aboagye

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Michael Aboagye
Michael is PostgreSQL consultant at Vettabase. He is specialised in PostgreSQL performance, security and automation.

Recent Posts

First steps with pgBackRest, a backup solution for PostgreSQL

First steps with pgBackRest, a backup solution for PostgreSQL

pgBackRest is an open source backup tool that creates physical backups with some improvements compared to the classic pg_basebackup tool. pg_basebackup is included in the PostgreSQL binaries, and it offers a great set of features for hot binary backups, remote...

Hints to optimise queries with a LIKE comparison

Hints to optimise queries with a LIKE comparison

In SQL, using the LIKE operator is a powerful way to find strings that match a certain pattern. It's suitable for most use cases, thanks to its two wildcard characters: _ means any one character. % means any sequence of zero or more characters. However, many queries...

Services

Need Help?  Click Here for Expert Support

3 Comments

  1. Daniel Kroeger

    Hi,

    For the parameters:
    – autovacuum_analyze_threshold
    – autovacuum_analyze_scale_factor

    you suggest to “[…] it’s advisable to increase the default value if Postgres runs on servers with multiple cores since updated statistics improve query performance”.

    But what you mean is “decrease” and not “increase” the value to let the analyze run “earlier” to have more up-to-date statistics no ?

    Reply
  2. Michael Aboagye

    Hi Daniel,

    Yes. It is “decrease”. Not “increase”. so the analyze command updates table statistics more frequently.

    Reply
  3. Michael Aboagye

    I will update it shortly.

    Thanks.

    Reply

Submit a Comment

Your email address will not be published. Required fields are marked *