Before we look at how to tune PostgreSQL auto-vacuum, let’s take a brief look at the importance of vacuuming in PostgreSQL.
Why Autovacuum is necessary
In PostgreSQL, rows that are deleted or modified are not completely removed. Rather they are marked as dead tuples. In other words, PostgreSQL does not physically remove these rows but places a marker on them to prevent future queries from returning or accessing dead tuples.
Dead rows consume storage space, which leads to a bloated database. Bloated database is one of the major reasons why database servers perform poorly (excluding other architectural reasons).
To prevent dead rows from taking up excess storage space, we need to run a background process to recover storage space occupied by dead rows or tuples.
PostgreSQL provides the
VACUUM command and
VACUUM ANALYZE commands to get rid of dead rows or tuples. However, both commands differ in how they operate.
VACUUM command focuses only on getting rid of dead rows to recover storage space whilst the
VACUUM ANALYZE command gets rid of storage space occupied by dead rows and also updates the statistics used by the query planner to select or choose the most efficient plan for executing a query.
There are other parameters such as
DISABLE_PAGE_SKIPPING that we can attach to the
VACUUM command to run vacuum tasks against tables in PostgreSQL.
The PostgreSQL documentation contains additional parameters to customize vacuuming tasks.
The Role of Visibility Map in Vacuum Processing
Postgres version earlier than or before version 8.4 performed poorly in vacuuming dead tuples. Tables are vacuumed via the following steps:
- Postgres scans all tables to get all dead tuples and freeze old tuples if necessary.
- Remove index tuples that point to respective dead tuples.
- Remove dead tuples, update live tuples, clean up index and update table statistics.
However the introduction of visibility map improved vacuum processing in Postgresql. The visibility map determines whether each page in the table file has dead tuples. Vacuum processing skips a page that does not have dead tuples. No scanning of tables is needed in this case.
Configure Postgres Vacuum Processing
Let’s discuss how to appropriately tune auto vacuuming or automatic vacuuming in PostgreSQL using the following parameters in PostgreSQL.
auto_vacuum: This parameter is a boolean type:
OFF. The default value is
ON. In this case, the server starts the
autovacuum launcher daemon. In addition, PostgreSQL recommends that the
track_counts parameter must be enabled for the
autovacuum to work.
track_counts parameter enables the collection of statistics on database activity.
Even if you decide to set
OFF, PostgreSQL will launch an autovacuum process if there is an urgent need to prevent transaction wraparound ID.
log_autovacuum_min_duration: Logs vacuuming tasks for a specified amount of time. This parameter is helpful if you want to track or log the activity of autovacuum tasks. For instance, you want to check if there is an autovacuum task that did not run or execute due to a non-existing relation. Setting the value to
-1 logs such information.
autovacuum_max_workers: autovacuum workers/processes are not the same as the
autovacuum launcher. The
autovacuum launcher looks for tables/relations that need to be vacuumed and then starts the autovacuum workers to work on selected tables/relations. Based on the INTEGER value assigned to this parameter, let’s say, four workers. The launcher starts four workers to work on selected tables.
The default value for the
autovacuum_max_workers is 3. If you are dealing with many dead tuples we recommend you to increase number of workers to speed up vacuum processing. Otherwise, the default value is okay.
autovacuum_naptime: This parameter defines the naptime for the autovacuum launcher. So assuming we want the launcher to check up on the database to issue
ANALYZE commands after every two minutes. We can set the integer value of the
2. Setting appropriate naptime prevents contention of resources with other background Postgres processes since vacuum processing is initialized periodically. Note that this parameter is not specifically related to autovacuum workers but rather the launcher.
autovacuum_vacuum_threshold: This parameter defines the threshold to trigger the
VACUUM command on PostgreSQL tables. So assuming you want to trigger the
VACUUM command tables with 100 dead tuples. You can set the integer value to 100. The default value is 50 deleted/updated tuples/rows. This parameter specifies the minimum threshold for deleted/updated tuples/rows.
autovacuum_vacuum_insert_threshold: This parameter is similar to the
autovacuum_vacuum_threshold parameter. The only difference is inserting tuples/rows instead of updated/deleted rows/tuples. The default value is 1000 tuples. However, if -1 is specified, autovacuum will not initiate a
VACUUM command on any of the selected tables. You can increase the value for the
autovacuum_vacuum_insert_threshold parameter if there are enough resources to spare for vacuuming. If not, we advise you not to adjust the default value.
autovacuum_analyze_threshold: This parameter specifies the minimum or least number of inserted, updated, or deleted tuples to initiate or trigger the
ANALYZE command on selected tables. The default value is 50 tuples/rows.
Assuming your Postgres server runs on a platform with multiple faster cores, It’s advisable to increase the default value to run or execute the
ANALYZE command frequently since the query planner depends on the statistical results produced by the
ANALYZE command to choose the most efficient execution plan for queries. Thus, if the result in the
pg_statistic system catalog is not up to date, the query planner cannot choose the most efficient execution plan for queries.
autovacuum_vacuum_scale_factor: This parameter basically specifies a fraction of the table size which decides when to trigger the
VACUUM command. It is added to the parameter
autovacuum_vacuum_threshold. The default value of 20% is optional and appropriate for data platforms with minimal processing speed.
autovacuum_vacuum_insert_scale_factor: This parameter is similar to the previous parameter we discussed. The only difference is that this parameter is rather added to the
autovacuum_vacuum_insert_threshold parameter instead of the
autovacuum_analyze_scale_factor: This parameter determines or specifies a fraction of the table size that is in included the
autovacuum_analyze_threshold parameter to decide when to execute the
ANALYZE command. The default value is 0.1 or 10%. Again, it’s advisable to increase the default value if Postgres runs on servers with multiple cores since updated statistics improve query performance.
autovacuum_freeze_max_age: This parameter specifies the maximum value for a table’s
pg_class.relfrozenxid column to prevent transaction wraparound within the table. The
VACUUM command is executed or triggered to freeze old rows or convert old rows into frozen rows. The default value for this transaction is 200 million transactions. We suggest the default value is appropriate for most cases.
autovacuum_multixact_ freeze_min_age: This parameter is similar to the previous one we just discussed but it focuses on the minimum value for multixact before the
VACUUM command is triggered to prevent multixact ID wraparound within the table. The default value for this parameter is 400 million mutlixacts.
multixact in Postgres refers to more than one transaction concurrently locking a row.
autovacuum_vacuum_cost_delay: This parameter is used to prevent autovacuum workers from exceeding work points specified by the parameter
autovacuum_vacuum_cost_limit. It is advisable not to increase the value for this parameter to prevent workers from excessively consuming operating system resources to avoid resource contention.
autovacuum_vacuum_cost_limit: This parameter prevents autovacuum workers from consuming too much available operating system resources. It is advisable to set the value of the parameter to 0 if you are dealing with large tables to avoid bloated tables.
However, you can set it to -1 if you are dealing with small tables to allow other cpu cores to focus on other high priority processing tasks.
We do not advise you to strictly follow the above instructions to tune autovacuum for PostgreSQL. However you can make use of these steps to understand how autovaccum works in PostgreSQL and tune it to your own preference.
Configuring autovacuum correctly is important to avoid performance degradation cause by dead tuples. If you need advice about tuning it properly, consider our PostgreSQL Health Checks.
For the parameters:
you suggest to “[…] it’s advisable to increase the default value if Postgres runs on servers with multiple cores since updated statistics improve query performance”.
But what you mean is “decrease” and not “increase” the value to let the analyze run “earlier” to have more up-to-date statistics no ?
Yes. It is “decrease”. Not “increase”. so the analyze command updates table statistics more frequently.
I will update it shortly.