Developers who are not familiar with SQL are often confused by MariaDB and MySQL’s infamous GROUP BY
error. From time to time, customers ask us to explain it, so it’s time we publish an article on this topic. The error I’m talking about is the following:
ERROR 1055 (42000): 'db_name.table_name.column_name' isn't in GROUP BY
Let’s see what this error means, and how to rewrite a query accordingly.
Understanding the error
The error components are the following:
1055
is the error code, specific to MariaDB and MySQL;'42000'
is the SQLSTATE, a sort of standard error code consisting of five characters, used by most DBMSs;'db_name.table_name.column_name' isn't in GROUP BY
is the error message.
These different components are returned separately by mysql_errno()
, mysql_sqlstate()
, and mysql_error()
C API functions (which is the core of other connectors). In stored procedures, they can be read separately with GET DIAGNOSTICS
.
See the MariaDB error codes and SQLSTATEs reference.
What causes the error
As the error says, when a GROUP BY
clause is present, you can’t select any column. You can only select columns that are:
- Mentioned in the
GROUP BY
clause; - passed to aggregate functions.
Expressions that involve GROUP BY
columns and aggregate functions are permitted, expressions that involve other columns are not.
But why do we have this limitation? Well, this is better explained with an example. Look at this query:
SELECT uuid, department, MAX(salary)
FROM company.employee
GROUP BY department
;
ERROR 1055 (42000): 'company.employee.uuid' isn't in GROUP BY
Here’s what we’re telling MariaDB:
GROUP BY
department – Return a single row for each department, similarly toSELECT DISTINCT
.MAX(salary)
– For each department, find the highest salary.
This query reads actual table rows as input, but what it returns as output are calculated values. The result of MAX(salary)
can’t be associated to a specific uuid
.
This is counter-intuitive, because we know that an employee who earns the highest salary in its department has a UUID. But again, we’re thinking in terms of actual rows, whereas our query returns aggregates. And by the way, if you think better about it, you’ll realise that there can be several rows where salary = MAX(salary)
in each department, whereas GROUP BY
can’t return multiple rows per department.
This becomes more intuitive if we select AVG(salary). It’s likely that no rows has salary = AVG(salary)
, because an average is just an artificial number that is supposed to summarise a series of numbers.
This query will work (it involves an expression, but doesn’t select any disallowed column):
SELECT
department,
COUNT(*) AS employee_count,
AVG(salary) AS salary_avg,
MAX(salary) - MIN(salary) AS salary_range
FROM employee
GROUP BY department
;
+------------+----------------+------------+--------------+
| department | employee_count | salary_avg | salary_range |
+------------+----------------+------------+--------------+
| ADMIN | 1 | 50000.0000 | 0 |
| HR | 2 | 31000.0000 | 2000 |
| MARKETING | 3 | 42066.6667 | 4200 |
| SALES | 2 | 63000.0000 | 16000 |
| WEB | 5 | 67500.0000 | 6000 |
+------------+----------------+------------+--------------+
How to suppress the error in the wrong way
Since the early days of MySQL, long before MariaDB existed, MySQL provided a dirty shortcut to make the above query work. It is the ONLY_FULL_GROUP_BY
flag in the sql_mode
variable. This flag made the above query work. It is still incorrect SQL, so let’s see it in this way: MariaDB/MySQL get a malformed request, but they do their best to find out what the user wanted to achieve. See it in action:
> SET @@sql_mode = CONCAT(@@sql_mode, ',ONLY_FULL_GROUP_BY');
> SELECT @@sql_mode;
+-------------------------------------------------------------------+
| @@sql_mode |
+-------------------------------------------------------------------+
| ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO |
+-------------------------------------------------------------------+
> SELECT uuid, department, MAX(salary) FROM company.employee GROUP BY department LIMIT 3;
ERROR 1055 (42000): 'company.employee.uuid' isn't in GROUP BY
> SET @@sql_mode = REPLACE(@@sql_mode, 'ONLY_FULL_GROUP_BY', '');
> SELECT @@sql_mode;
+------------------------------------------------+
| @@sql_mode |
+------------------------------------------------+
| STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO |
+------------------------------------------------+
> SELECT uuid, department, MAX(salary) FROM company.employee GROUP BY department LIMIT 3;
+--------------------------------------+------------+-------------+
| uuid | department | MAX(salary) |
+--------------------------------------+------------+-------------+
| f4c299f0-b4af-11ef-b216-0242ac110002 | ADMIN | 50000 |
| f4c299bf-b4af-11ef-b216-0242ac110002 | HR | 32000 |
| f4c29491-b4af-11ef-b216-0242ac110002 | MARKETING | 44200 |
+--------------------------------------+------------+-------------+
Is ONLY_FULL_GROUP_BY
on by default?
- In MariaDB,
ONLY_FULL_GROUP_BY
is OFF by default; - In MySQL 5.6 or older,
ONLY_FULL_GROUP_BY
is OFF by default; - In MySQL 5.7 or later,
ONLY_FULL_GROUP_BY
is ON by default.
I recommend to include ONLY_FULL_GROUP_BY
in the sql_mode
, in the global scope.
Ideally you should rewrite incorrect GROUP BY
queries accordingly, but this might require a long time, including the tests to identify problematic queries.
If you have problems rewriting incorrect queries, it’s also possible to temporarily disable ONLY_FULL_GROUP_BY
at session level:
SET @@sql_mode = REPLACE(@@sql_mode, 'ONLY_FULL_GROUP_BY', '');
...incorrect GROUP BY query...
SET @@sql_mode = CONCAT(@@sql_mode, ',ONLY_FULL_GROUP_BY');
How to fix the query in a right way
Sticking to our example, suppose that you want to know the UUIDs of the employees who receive the highest salaries in their departments. If several employees receive the highest salary, you want to see all their UUIDs.
The general solution is to join:
- A
GROUP BY
subquery with the aggregate value; - The base table with the columns we need to read.
In out example, we can write a JOIN
like this:
SELECT
max.department, max.salary_max,
e.uuid AS employee_uuid
FROM (
SELECT
department, MAX(salary) AS salary_max
FROM company.employee
GROUP BY department
) max
INNER JOIN company.employee e
ON max.department = e.department
AND max.salary_max = e.salary
;
+------------+------------+--------------------------------------+
| department | salary_max | employee_uuid |
+------------+------------+--------------------------------------+
| MARKETING | 44200 | f4c29909-b4af-11ef-b216-0242ac110002 |
| WEB | 70000 | f4c29988-b4af-11ef-b216-0242ac110002 |
| WEB | 70000 | f4c299a0-b4af-11ef-b216-0242ac110002 |
| HR | 32000 | f4c299d7-b4af-11ef-b216-0242ac110002 |
| ADMIN | 50000 | f4c299f0-b4af-11ef-b216-0242ac110002 |
| SALES | 71000 | f4c29a0a-b4af-11ef-b216-0242ac110002 |
| SALES | 71000 | 0a3fa336-b4ff-11ef-b216-0242ac110002 |
+------------+------------+--------------------------------------+
As you can see, the WEB department has two employees who receive the maximum salary. They appear in two separate rows. If you prefer to see a comma-separated list of UUIDs in the same row, you can use GROUP_CONCAT()
. But this is out of the scope of this article.
Aggregates without GROUP BY
Sometimes, you might encounter an error that is very similar to the one we already mentioned:
ERROR 1140 (42000): Mixing of GROUP columns (MIN(),MAX(),COUNT(),...) with no GROUP columns is illegal if there is no GROUP BY clause
This happens when we mix columns and aggregate values without GROUP BY
.
For example, you want to get the highest salary and the UUID of the employee who receives it. The wrong query is the following:
SELECT uuid, MAX(salary) FROM company.employee;
The previous solutions can be used with minor changes. I’ll leave it to you as an exercise.
But for MIN()
and MAX()
functions, you can also use the WITH TIES
syntax:
SELECT
uuid, salary AS salary_max
FROM employee
ORDER BY salary DESC
FETCH FIRST 1 ROW WITH TIES
;
For more details, see MariaDB: WITH TIES syntax.
DISTINCT
DISTINCT
can be seen as a GROUP BY
that includes all the columns. As a consequence:
- Since all columns are grouped, any use of aggregate functions is valid and won’t generate the above errors;
- That said, aggregating a single value is never useful.
Conclusions
We discussed MariaDB/MySQL errors number 1055 and 1140. As we discussed, both of them are caused by an incomplete understanding of how row aggregation works in SQL. We discussed a quick workaround based on sql_mode, but I discourage you from using it, if possible. We also discussed that the correct way to obtain what you need is joining a GROUP BY
subquery with the base table.
If you don’t fully understand SQL and you are a data analyst, you should consider our services for data analysts.
If you have problems with slow queries, you should consider our SQL optimisation training.
Federico Razzoli
0 Comments