How to delete duplicate rows in MariaDB

by Federico Razzoli | Nov 28, 2021 | MariaDB

Need Help? Click Here for Expert Support

Deleting duplicate values can be necessary, for example, when we realise that a column (or a combination of columns) should be UNIQUE. If the column existed in production for some time, it’s possible that it now contains some duplicate values, so trying to create a UNIQUE index will fail:

MariaDB [test]> ALTER TABLE person ADD UNIQUE unq_email (email);
ERROR 1062 (23000): Duplicate entry 'john@smith.com' for key 'unq_email'

There are two cases here.

We can delete duplicate values without any criteria – that is: if there are two rows with email='john@smith.com' we want to delete one of them, and we don’t have a reason to choose one or another.
Or we may need to delete rows based on some criteria – for example, we may choose to delete the newest rows, or prefer to delete the rows where a column is empty.

Contents hide

1 Forcing a UNIQUE index creation

2 Selectively deleting duplicate rows

2.1 Theory

2.2 Practice

2.3 Delete newest rows

2.4 Delete non-empty values first

3 Before MariaDB 10.2

4 Conclusions

Forcing a UNIQUE index creation

Forcing the creation of a UNIQUE index is the easiest way to delete duplicates without any criteria. In other words, you’ll remove duplicates, but you don’t know which rows will survive. We can do it this way:

ALTER IGNORE TABLE person ADD UNIQUE unq_email (email);

The IGNORE keyword tells MariaDB to ignore (delete) duplicate rows and create the index.

Selectively deleting duplicate rows

First I’ll explain what we need to do, so that the logic will be clear. Then I’ll show example queries.

Theory

We want to do the following:

Order the rows by the column that contains duplicates that we need to eliminate. The duplicates will be “grouped” together.
Add a secondary order so that, the duplicate we choose to preserve appears first.
Add a progressive number to the duplicates, so we can delete those with a progressive number greater than 1.

Practice

First, we need a query that makes it clear which rows are duplicate:

SELECT
        email, full_name,
        ROW_NUMBER() OVER (
            PARTITION BY email
            ORDER BY email, full_name
        ) AS row_number
    FROM person
    GROUP BY email, full_name
;

ROW_NUMBER() is a window function. In the simplest case, it will return a progressive number for all returned rows, following the specified order. With PARTITION BY email, it will reset the count every time it finds a different value for email. For example:

SELECT
        p.id, p.email,
        ROW_NUMBER() OVER (
            PARTITION BY email
            ORDER BY p.email, p.id
        ) AS row_number
    FROM person p
    GROUP BY p.email, p.id
;

For example:

+----+----------------+------------+
| id | email          | row_number |
+----+----------------+------------+
|  4 | doctor@who.com |          1 |
|  5 | doctor@who.com |          2 |
|  1 | john@smith.com |          1 |
|  2 | john@smith.com |          2 |
|  3 | john@smith.com |          3 |
+----+----------------+------------+

We can delete the rows with a row_number greater than 1:

DELETE person
    FROM person
    INNER JOIN (
        SELECT
            p.id,
            ROW_NUMBER() OVER (
                PARTITION BY email
                ORDER BY p.email, p.id
            ) AS row_number
        FROM person p
        GROUP BY p.email, p.id
    ) dup
    ON person.id = dup.id
    WHERE dup.row_number > 1
;

Now we want to add some logic.

Delete newest rows

To delete the newest rows and preserve the oldest, we can modify the window function ORDER BY clause:

DELETE person
    FROM person
    INNER JOIN (
        SELECT
            p.id,
            ROW_NUMBER() OVER (
                PARTITION BY email
                ORDER BY
                    p.email,
                    p.registration_date,
                    p.id
            ) AS row_number
        FROM person p
        GROUP BY p.email, p.id
    ) dup
    ON person.id = dup.id
    WHERE dup.row_number > 1
;

Delete non-empty values first

To try preserving a row with a non-empty full_name:

DELETE person
    FROM person
    INNER JOIN (
        SELECT
            p.id,
            ROW_NUMBER() OVER (
                PARTITION BY email
                ORDER BY
                    p.email,
                    p.full_name > '' DESC,
                    p.id
            ) AS row_number
        FROM person p
        GROUP BY p.email, p.id
    ) dup
    ON person.id = dup.id
    WHERE dup.row_number > 1
;

Note that p.full_name > '' DESC will cause the values that are not NULL and not empty to be returned first.

Before MariaDB 10.2

Window functions support was added to MariaDB 10.2. Older versions do not support window functions.

When working with older versions, we can use a query like this to find values that occur multiple times:

SELECT email, COUNT(*) AS count
    FROM person
    GROUP BY email
    HAVING COUNT(*) > 1
;

If we want to get more information about duplicate rows, we can use a query like this:

SELECT p.*
    FROM person p
    INNER JOIN (
        SELECT email, COUNT(*) AS count
            FROM person
            GROUP BY email
            HAVING COUNT(*) > 1
    ) dup
    ON p.id = dup.id
;

Conclusions

Deleting duplicate values is a common problem when creating a UNIQUE index. While MariaDB allows to “brutally” delete duplicates without any criteria, that is not always acceptable. Before version 10.2 window functions were not supported, so there were no easy solutions.

With modern MariaDB versions, we can order the rows so that the one we care about appears first, and then thanks to ROW_NUMBER() we can delete the duplicates.

To master advanced SQL and query optimisation, consider our SQL optimisation training for teams.

Federico Razzoli

Did you like this article?

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Federico Razzoli

Federico Razzoli is a database professional, with a preference for open source databases, who has been working with DBMSs since year 2000. In the past 20+ years, he served in a number of companies as a DBA, Database Engineer, Database Consultant and Software Developer. In 2016, Federico summarized his extensive experience with MariaDB in the “Mastering MariaDB” book published by Packt. Being an experienced database events speaker, Federico speaks at professional conferences and meetups and conducts database trainings. He is also a supporter and advocate of open source software. As the Director of Vettabase, Federico does business worldwide but prefers to do it from Scotland where he lives.

MariaDB Underrated Features: Zero Dates and Partial Dates

Dec 30, 2025

How do you represent information like this in a database? This event happened in 2015/06, but we don't know in which day. This job is scheduled to happen on the first day of the month at 00:00:00, every month and every year. This never happened. There are many ways to...

Navigating Tree and Graph Data with Recursive SQL

Dec 27, 2025

Hierarchical and networked data appears everywhere in modern databases: organisational charts, product category trees, dependency graphs, and even transport networks. Applications need to retrieve this data to draw a chart, find out whom a certain employee reports to,...

Deploying garbd (Galera Arbitrator Daemon) | MariaDB Galera pt 2

Dec 11, 2025

In the first part of this series, we deployed a 3-node MariaDB Galera Cluster on Ubuntu 24.04. While a 3-node topology provides the best fault tolerance, sometimes you need a simpler setup - for example, a two-node cluster with a lightweight arbitrator to maintain...

Services



Email

Schedule Meeting

Phone

How to delete duplicate rows in MariaDB

Forcing a UNIQUE index creation

Selectively deleting duplicate rows

Theory

Practice

Delete newest rows

Delete non-empty values first

Before MariaDB 10.2

Conclusions

Recent Posts

MariaDB Underrated Features: Zero Dates and Partial Dates

Navigating Tree and Graph Data with Recursive SQL

Deploying garbd (Galera Arbitrator Daemon) | MariaDB Galera pt 2

Services

Database Automation

Database Training

Database Health Check

Monthly DBA Time

Database Upgrade

0 Comments

Submit a Comment Cancel reply

Email

Schedule Meeting

Phone

Quick Links

Recent Posts

MariaDB Underrated Features: Zero Dates and Partial Dates

Navigating Tree and Graph Data with Recursive SQL

Deploying garbd (Galera Arbitrator Daemon) | MariaDB Galera pt 2

Policies & Licenses

Follow Us on Social Media