Schedule Meeting

a

Navigating Tree and Graph Data with Recursive SQL

by | Dec 27, 2025 | MariaDB, PostgreSQL, SQL Language

Need Help?  Click Here for Expert Support

Hierarchical and networked data appears everywhere in modern databases: organisational charts, product category trees, dependency graphs, and even transport networks. Applications need to retrieve this data to draw a chart, find out whom a certain employee reports to, or find the routes that connect two train stops.

Storing and querying this kind of data in a relational database is not trivial. If it’s modelled poorly, you might easily end up running a query for each node: an example of the infamous N+1 issue that doesn’t scale, and usually represents a performance bottleneck. This article shows you how to design tables to store tree or graph data, and how to navigate these structures efficiently with a single SQL query.

We’ll use example written for MariaDB, but the SELECT queries work equally well with PostgreSQL.

Querying Trees

A tree is a data structure where nodes have exactly one parent node, except root nodes, which have none. A node can have any number of children. Here’s an example from the Encyclopédie, ou dictionnaire raisonné des sciences, des arts et des métiers (1752), source: Wikipedia.

Variants

There are variants. For example, you might require that a node has exactly 0 or 2 children. You can still use the above table, but you’ll have to enforce this constraint on the application level, which requires additional queries. Or you can enforce it with stored procedures to add 2 children and delete 2 children, and make sure that the application user doesn’t have permissions to directly write to the table. But, in this case, it might be easier to invert the relationship’s direction: you can remove parent_id, and add child1 and child2 columns.

We can’t cover all variants of a tree in this article. By reading the rest of the text you’ll find out the principles, and you’ll have to figure out by yourself how to apply them to any variant you might need to implement.

Tree Tables

Let’s see how to represent a tree in a database with an example. The following table represents product categories, where a category can contain other categories. You can create the following table with MariaDB:

CREATE OR REPLACE TABLE category (
    id INT UNSIGNED AUTO_INCREMENT,
    parent_id INT UNSIGNED,
    name VARCHAR(100) NOT NULL
        CHECK (name > ''),
    PRIMARY KEY (id),
    UNIQUE unq_id_parent_id (id, parent_id),
    FOREIGN KEY fk_parent_id_to_id (parent_id)
        REFERENCES category (id)
        ON DELETE CASCADE
        ON UPDATE RESTRICT,
    UNIQUE unq_name (name)
);

The key parts are:

  • id: Normally I recommend to use the UUID data type for primary keys, but we’re going to use INT to make the results more reasable.
  • parent_id: For root nodes it’s NULL, for other nodes it points to id. This allows self-joins or, in simple words, connecting a row from this table to another row in the same table.
  • A foreign key called fk_parent_id_to_id officialises this relationship.

Step-By-Step Operations

Let’s start easy. The following are simple queries that can be used to navigate the tree step by step, using the id and parent_id fields. Let’s assume that we are navigating statrting from the row with id=11.

Find the immediate parent

SELECT id, parent_id, name FROM category WHERE id =
    (SELECT parent_id FROM category WHERE id = 11);

Find the immediate children

SELECT id, parent_id, name FROM category WHERE parent_id = 11;

Find all siblings

SELECT id, parent_id, name FROM category WHERE parent_id =
    (SELECT parent_id FROM category WHERE id = 11);

Find next sibling

SELECT id, parent_id, name FROM category
    WHERE
        parent_id =
            (SELECT parent_id FROM category WHERE id = 11)
        AND id > 11
    ORDER BY id
    LIMIT 1
;

Find previous sibling

SELECT id, parent_id, name FROM category
    WHERE
        parent_id =
            (SELECT parent_id FROM category WHERE id = 11)
        AND id < 11
    ORDER BY id DESC
    LIMIT 1
;

Obtaining a Tree From a Query

The following query returns trees starting from root nodes:

WITH RECURSIVE category_tree AS (
    SELECT
            id, parent_id, name,
            id AS root_id, 1 AS level, TRUE AS is_root
        FROM category
        WHERE parent_id IS NULL
    UNION ALL
    SELECT
            c.id, c.parent_id, c.name,
            ct.root_id, ct.level + 1, FALSE AS is_root
        FROM category c
        INNER JOIN category_tree ct
            ON c.parent_id = ct.id
)
SELECT id, parent_id, root_id, level, is_root, name
    FROM category_tree
    ORDER BY root_id, level, id
;
+------+-----------+---------+-------+---------+------------------+
| id   | parent_id | root_id | level | is_root | name             |
+------+-----------+---------+-------+---------+------------------+
|    1 |      NULL |       1 |     1 |       1 | Home & Garden    |
|   11 |         1 |       1 |     2 |       0 | Kitchen          |
|   12 |         1 |       1 |     2 |       0 | Bedroom          |
|  111 |        11 |       1 |     3 |       0 | Ovens            |
|  112 |        11 |       1 |     3 |       0 | Cookers          |
|  113 |        11 |       1 |     3 |       0 | Fridges          |
|  121 |        12 |       1 |     3 |       0 | Beds             |
|  122 |        12 |       1 |     3 |       0 | Wardrobes        |
|  123 |        12 |       1 |     3 |       0 | Night Tables     |
|    2 |      NULL |       2 |     1 |       1 | Electronics      |
|   21 |         2 |       2 |     2 |       0 | Computers        |
|   22 |         2 |       2 |     2 |       0 | Audio & Video    |
|  211 |        21 |       2 |     3 |       0 | Laptops          |
|  212 |        21 |       2 |     3 |       0 | Gaming Computers |
|  221 |        22 |       2 |     3 |       0 | TV               |
|  222 |        22 |       2 |     3 |       0 | Wi-Fi            |
+------+-----------+---------+-------+---------+------------------+

Le’ts analyse this query.

We have a recursive Common Table Expression (CTE) called category_tree.

The CTE starts with an anchor part, which obtains the root nodes and runs only once:

SELECT
        id, parent_id, name,
        id AS root_id, 1 AS level, TRUE AS is_root
    FROM category
    WHERE parent_id IS NULL

Then we have an INNER JOIN that recursively joins the CTE’s results with the category table:

SELECT
        c.id, c.parent_id, c.name,
        ct.root_id, ct.level + 1, FALSE AS is_root
    FROM category c
    INNER JOIN category_tree ct
        ON c.parent_id = ct.id

The first time, it will only joins the root nodes (the anchor part’s results) with their immediate children. Then, it will join these two levels with their immediate childre. And so on.

The anchor part and the recursive part are merged using UNION ALL. The reason is that there can’t be any duplicate rows, so there is no need to do any additional work to remove them.

Finally, we have an outer query that orders and returns all the results from the CTE’s last execution.

Note that we have some additional columns. They aren’t required, but you might find the, useful in some cases. These columns are:

  • root_id is the root node’s id. It’s useful if the root level is particularly important for you. root_id is assigned in the anchor, and is copied as-is in the next levels.
  • level is the row’s tree level. It starts as 1 in the anchor, and is incremented at every CTE execution.
  • is_root isn’t necessary in this case, because you might check whether parent_id is NULL. But a root node returned by your query might be any node at any tree level. For example, you might arbitrarily decide that it’s the node with id=12. In this case, is_root might be useful. t’s assigned to TRUE in the anchor and to FALSE for the next level.

Querying Graphs

Graphs can be considered similar to trees. If you see it in this way, the main difference is that nodes can have more than one parent.

But actually, in most graphs it doesn’t make sense to distinguish between parents and children, or talk about siblings. There are only peer nodes, and links between them called arcs. These links have a direction.

In the following examples, however, we’ll use a graph that is actually a small variation of the category tree we used earlier. Is a graph, and no a tree, because a category can have multiple parents: trees don’t allow this.

A graph example is a public transport map:

Variations

An arc may carry additional metadata, such as creation and deletion timestamps, tags, or the reason why the arc exists.

Additionally, arcs might have a weight. If the graph represents a transport map, the weight is probably the dinstance that separates the nodes linked by an arc. This might be taken into account when we need to find the shortest path between two nodes. But this is beyond the scope of this article. If you need to use weights in this way, I recommend using the MariaDB OQGRAPH engine. If there is interest, it might be a good topic for a future article.

Graph Tables

A graph table is a many to many relationship between a table and itself. Let’s see a practical example:

CREATE OR REPLACE TABLE category (
    id INT UNSIGNED AUTO_INCREMENT,
    name VARCHAR(100) NOT NULL
        CHECK (name > ''),
    PRIMARY KEY (id),
    INDEX idx_name (name)
);

CREATE OR REPLACE TABLE category_arc (
    id INT UNSIGNED AUTO_INCREMENT,
    from_category INT UNSIGNED NOT NULL,
    to_category INT UNSIGNED NOT NULL,
    PRIMARY KEY (id),
    UNIQUE unq_from_category_to_category (from_category, to_category),
    FOREIGN KEY fk_from_category_to_id (from_category)
        REFERENCES category (id)
        ON DELETE CASCADE
        ON UPDATE RESTRICT,
    FOREIGN KEY fk_to_category_to_id (to_category)
        REFERENCES category (id)
        ON DELETE CASCADE
        ON UPDATE RESTRICT
);

We modified the category table by removing the parent_id column.

The many to many relationship is represented by category_arc. As anticipated, this relationship has a direction: from_category points to a parent category, and to_category points to a child category. This design allows us to have multiple children for the same parent, and multiple parents for the same child.

Step-By-Step Operations

Since arcs have a direction, we’ll need to run queries that keep into account this direction. We’ll assume that we’re statrting from the row with id=21.

Find current node’s parents and children

SELECT
        c.id, c.name,
        JSON_ARRAYAGG(DISTINCT c_parent.from_category ORDER BY 1) AS parents,
        JSON_ARRAYAGG(DISTINCT c_child.to_category ORDER BY 1) AS children
    FROM category c
    INNER JOIN category_arc c_parent
        ON c.id = c_parent.to_category
    INNER JOIN category_arc c_child
        ON c.id = c_child.from_category
    WHERE c.id = 21
    GROUP BY c.id, c.name
;

Find all adjacent nodes (no distinction between parents and children)

SELECT
        c.id, c.name
    FROM (
        (SELECT from_category AS id FROM category_arc WHERE to_category = 21)
        UNION ALL
        (SELECT to_category AS id FROM category_arc WHERE from_category = 21)
    ) a
    INNER JOIN category c
        ON a.id = c.id
;

Finding All Reachable Nodes, Recursively

The following query recursively finds all nodes that are directly or indirectly linked to the starting node. It treats parents and children in the same way.

WITH RECURSIVE category_network AS (
    (
        SELECT
                id, name,
                1 AS distance
            FROM category
            WHERE id = 21
    )
    UNION ALL
    (
        (
            SELECT
                    c.id, c.name,
                    cn.distance + 1 AS distance
                FROM category_network cn
                INNER JOIN category_arc ca 
                    ON cn.id = ca.from_category
                INNER JOIN category c 
                    ON c.id = ca.to_category
                WHERE c.id <> cn.id
        ) UNION ALL (
            SELECT
                    c.id, c.name,
                    cn.distance + 1 AS distance
                FROM category_network cn
                INNER JOIN category_arc ca 
                    ON cn.id = ca.to_category
                INNER JOIN category c 
                    ON c.id = ca.from_category
                WHERE c.id <> cn.id
        )
    )
) CYCLE id RESTRICT
SELECT DISTINCT
        id, name,
        distance
    FROM category_network
    WHERE id <> 21
    ORDER BY id
;

Again, we start with a CTE called category_network, and the first part is the anchor. It selects the starting node, which doesn’t have to be an edge node.

The anchor is the first part of a UNION. Second part is another UNION. This is the case because, with our table design, parents and children are linked using different columns: from_category and to_category, but our query shouldn’t distinguish between parents and children.

Another way to do this would be to write ON clauses with an OR logical operator or an IN comparison operator:

ON t1.id = t2.from_category OR t1.id = t2.to_category
ON t1.id IN (t2.from_category, t2.to_category)

But this wouldn’t be index-friendly. So we prefer to read parents and children in the two branches of a UNION, which should use indexes properly. If the rows returned by the UNION are not too many, this query will be fast.

Note that all UNIONs are of type UNION ALL. This is because they can’t possibly return duplicate rows, so we don’t need the database to do extra work to remove duplicates.

Then we have cycle detection syntax: CYCLE id RESTRICT. We have to do this because, in a graph, two nodes can be connected by more than one path. In this case, we’ll have an infinite loop. To avoid this, we need to use the CYCLE ... RESTRICT clause. This query was tested on MariaDB. PostgreSQL requires a slightly different syntax for cycle detection, the the logic is exactly the same.

Finally, we have an outer query that excludes the starting node (you might want to do so or not, depending on your application logic) and sorts the rows by id.

In this query, I add an additional columns: distance. You might need it or not, The logic is the same that I used for the level column of the tree query.

Conclusions

We discussed how to store tree and graph data structures in a relational database. We discussed how to run simple next-node queries, and how to run queries that return all the nodes reachable form the starting point. For graph examples, we actually used a variant of the tree table design, which is still a logical tree but allows a node to have multiple parents. In a training, I wouldn’t do such a thing. But if you managed to follow my explanation, you learnt how to handle graphs the harder way, and shouldn’t have any problems working with a proper graph, which doesn’t have a distinction between parents and children.

We used MariaDB for the examples. Almost all the queries will work on PostgreSQL, too. But, as mentioned, MariaDB offers another interesting feature that I didn’t discuss in this article: the OQGRAPH storage engine. This engine allows us to work with weighted graphs and easily find the shortest path between two nodes. If there is interest, I can illustrate OQGRAPH in a dedicate article.

If you want to know more about trees, graphs, CTEs, or other advanced aspects of SQL, consider our SQL training courses.

Federico Razzoli

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Federico Razzoli
Federico Razzoli is a database professional, with a preference for open source databases, who has been working with DBMSs since year 2000. In the past 20+ years, he served in a number of companies as a DBA, Database Engineer, Database Consultant and Software Developer. In 2016, Federico summarized his extensive experience with MariaDB in the “Mastering MariaDB” book published by Packt. Being an experienced database events speaker, Federico speaks at professional conferences and meetups and conducts database trainings. He is also a supporter and advocate of open source software. As the Director of Vettabase, Federico does business worldwide but prefers to do it from Scotland where he lives.

Recent Posts

Deploying garbd (Galera Arbitrator Daemon) | MariaDB Galera pt 2

Deploying garbd (Galera Arbitrator Daemon) | MariaDB Galera pt 2

In the first part of this series, we deployed a 3-node MariaDB Galera Cluster on Ubuntu 24.04. While a 3-node topology provides the best fault tolerance, sometimes you need a simpler setup - for example, a two-node cluster with a lightweight arbitrator to maintain...

Why Your Database Deserves Consistent Names and Types

Why Your Database Deserves Consistent Names and Types

I see a lot of problems with database schemas: missing indexes, wrong types, tables used like an spreadsheet... but all these problems are widely known. An often overlooked problem is inconsistency. Let me explain how you should guarantee database consistency, and how...

Services

Need Help?  Click Here for Expert Support

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *