Populating tables faster in PostgreSQL

by Michael Aboagye | Jul 9, 2021 | PostgreSQL

Need Help? Click Here for Expert Support

You have been assigned by the marketing team to load bulk data from a CSV file into a PostgreSQL table. How would you go about it?

The COPY command in PostgreSQL is the most preferred option here. Let’s look at how the COPY command can be used to load data from a CSV file into a PostgreSQL database.

We assume you already know how to create a database, table and also define the data types in PostgreSQL. If not, you can check the PostgreSQL documentation for detailed information.

Blixy loves the PostgreSQL COPY command.

Contents hide

1 How does the copy command in PostgreSQL Works?

2 Using the COPY FROM command

3 Copy from CSV file to PostgreSQL table

4 Conclusions

How does the copy command in PostgreSQL Works?

The COPY command copies files from CSV files into tables in a PostgreSQL database or from tables to CSV files. It is the preferred method when you want to load bulk data into a PostgreSQL table.

This COPY command is different from the meta-command copy which is used by the psql client to load bulk data into PostgreSQL tables. When loading a large dataset using the COPY command, this command should be executed on the PostgreSQL server hosting the target tables. In the case of the copy used by the psql client, it should be executed from the host where the client resides.

There are two types of COPY command available in PostgreSQL. There are COPY TO and COPY FROM commands. COPY TO command is used to copy a table in PostgreSQL to a selected file whilst the COPY FROM command is used to copy records in a file to a table.

The COPY command supports different file formats such as csv, binary, and text.

However, there is a slight difference between the COPY TO command and the COPY FROM command in terms of transferring files from one source of storage to another.

For instance, when copying data from files to a Postgresql table there is no need to specify the absolute path of the target file but in a situation where you are copying data from Postgres tables to csv files, then you need to specify the absolute path of the target files.

Using the COPY FROM command

Now let’s look at how to use the COPY FROM command to copy data in CSV file format.

Apart from CSV file format, it supports other file formats such as text and binary.

Copy from CSV file to PostgreSQL table

To copy the details.csv file using the COPY FROM command, execute the following on your Linux terminal:

COPY details (FName, LName, DoB, Email)
    FROM '/home/mikey/Downloads/details.csv'
    DELIMITER ','
    CSV HEADER;

The output below means the COPY FROM command has copied all 10 rows in the details.csv file into the details table.

COPY 10

The query below displays all rows which were available in the details.csv file before they were imported into the details table in PostgreSQL.

select * from details;

fname | lname | dob | email
--------+------------+------------+---------------------
John | Doe | 1995-01-05 | john@gmail.com
Jane | Doe | 1995-02-05 | jane.doe@gmail.com
Davies | Michel | 1990-04-05 | davies@gmail.com
Amas | Wick | 1998-03-08 | wick@gmail.com
Esther | Wigan | 1997-06-05 | wigan@ioamail.com
Frank | James | 1996-04-07 | james@gmail.com
Gordon | Mack | 1978-04-03 | ja@gmail.com
Femi | Akemidola | 1998-05-06 | akemi@gmail.com
Denis | Wayo | 1996-05-04 | wayo@gmail.com
Shola | Brown | 1995-04-02 | shola@gmail.com
(10 rows)

How long does it take COPY FROM command to import data?

If we want to find out how long it takes to import all the 10 rows inside the main.csv file into the batch table, we can use the following command to enable timing on the Postgres server as shown below:

timing on

Try to import the 10 rows inside the main.csv file into the postgres table batch to find out how long it takes.

COPY batch (FName, City, Gender)
    FROM '/home/mikey/Downloads/main.csv'
    DELIMITER ','
    CSV HEADER;

The output below shows that the COPY FROM command took 55 milliseconds to import 10 rows into the batch table.

COPY 10
Time: 55.664 ms

So the COPY FROM command is far more efficient when importing data into a PostgreSQL table in bulk rather than inserting rows one after another as shown below:

INSERT INTO memo (FName, City, Gender) VALUES ('mike', 'Accra', 'male');
INSERT 0 1
Time: 67.555 ms

Conclusions

Although by default, the COPY FROM command is optimized for bulk loading of data, we can make it more efficient by focusing on the following:

Increase maintenance_work_mem parameter: Increasing the value of this parameter increases the performance of the COPY FROM command if you are going to create indexes immediately after populating the table with the COPY command.
Increase checkpoint_segments: In PostgreSQL when checkpointing occurs, all dirty pages are flushed to the disk for persistence. Each checkpoint_segment is usually 16 megabytes. So in the case of loading bulk data, Postgres will checkpoint more frequently than the specified checkpoint frequency via checkpoint_timeout parameter.

Checkpointing frequently result in performance issues because of excess use of operating system resources. You can prevent frequent checkpointing by increasing checkpoint segment beyond 16 megabytes.

Finally, it is advisable to enable the autovacuum daemon to ensure that the query planner has up-to-date statistics about the table you are about to load bulk data. So in a situation where we have two large CSV files to import into a table in PostgreSQL, the query planner can rely on recent statistics about the table to make efficient decisions related to query planning.

If you need assistance with your PostgreSQL operations, consider our PostgreSQL Health Checks.

Michael Aboagye

Did you like this article?

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Michael Aboagye

Michael is PostgreSQL consultant at Vettabase. He is specialised in PostgreSQL performance, security and automation.

Query Optimisation: Using indexes for WHERE with multiple conditions

Jun 5, 2025

You have a beautiful application, but a page is slow. After some investigation, you find out that a query is unexpectedly slow, ruining user experience and causing frustration for you. It is a simple query with a WHERE clause and nothing else. You tried to build an...

When indexes make SQL queries slower

Apr 15, 2025

During our SQL optimisation training courses, I always stress the need of understanding how databases run SQL queries internally. Which is easier and more intuitive than you'd think, anyway. But if you overlook this aspect, there is always a risk of sticking to simple...

A Review of 2024 in the Database World

Jan 21, 2025

It's January 2025, so it's a good time to look back, and write my thoughts on the most important events and trends of 2024 in the database world. Would you like to share your thoughts? I'd be happy to read your comments. How does MariaDB compare to MySQL? MariaDB...

Services



Email

Schedule Meeting

Phone

Populating tables faster in PostgreSQL

How does the copy command in PostgreSQL Works?

Using the COPY FROM command

Copy from CSV file to PostgreSQL table

Conclusions

Recent Posts

Query Optimisation: Using indexes for WHERE with multiple conditions

When indexes make SQL queries slower

A Review of 2024 in the Database World

Services

Database Automation

Database Training

Database Health Check

Monthly DBA Time

Database Upgrade

0 Comments

Submit a Comment Cancel reply

Email

Schedule Meeting

Phone

Quick Links

Recent Posts

Why Your Database Deserves Consistent Names and Types

To BLOB or not to BLOB? The image storage dilemma

MariaDB 11.8 LTS: Parallel Dumps, PARSEC authentication, new SQL syntaxes, and more

Policies & Licenses

Follow Us on Social Media