Schedule Meeting

a

PostgreSQL Features Missing in Amazon Redshift

by | Aug 10, 2020 | Amazon Redshift, PostgreSQL

Need Help?  Click Here for Expert Support

Redshift is an Amazon database technology available for AWS customers. It is based on PostgreSQL 8. The codebase was modified to develop a horizontally scalable SQL DBMS, suitable for analytics, and in particular Data WareHouse.

Many PostgreSQL features are not available in Redshift, particularly those that are not suitable for a database cluster or for analytics, and those that Amazon didn’t consider important enough to spend the necessary resources to maintain them. Some PostgreSQL features were added later, probably after being reimplemented by the Redshift team.

This article shows an incomplete list of PostgreSQL features that are missing in Redshift. It is mainly written for those who are familiar with PostgreSQL and are evaluating Redshift for analytics. Note that I’m trying to maintain this article up to date over time.

Is Redshift about distributing elephants?

Main missing features

Let me stress it again: this is not meant to be a complete list.

The code

Redshift is not open source, nor source available.

PostgreSQL is open source, but its license does not disallow Amazon from using its codebase to build a cloud database.

Ownership

One great thing about PostgreSQL is that it is not developer or owned by a company. Instead, many organisations and individuals contribute to improve it and have interest in keeping it alive.

Redshift’s future appears solid because there is interest around it, and its owner is solid. Still, it depends on a single company. As a corner case, if Amazon’s plans change for any reason, Redshift may be retired at some point. And since it is not open source, no one would be able to keep it alive without an agreement with the vendor.

Latency

Amazon Redshift has a good throughput, and it can optimise some big queries with many joins by running them in parallel through the cluster.

However, the latency is far too high and unpredictable for OLTP workloads. If a simple query that would run in less than 0.1 on PostgreSQL is run against Redshift, it could very well take 0.5 seconds, or sometimes even 1 second. Data analysts don’t care about such a small wait, but for an OLTP workload it would be a disaster.

Redshift is only intended for analytics. Don’t try to use it for different purposes.

SELECT *

Redshift stores data in a columnar fashion. As a consequence, selecting all columns (SELECT *) or many columns causes performance penalties.

Extensions

Redshift does not support the concept of extensions.

Data types

Redshift does not support certain types, like:

User-defined types, pseudo-types and domains are also not supported.

Indexes

Redshift allows a DISTKEY and a SORTKEY for each table (very similar to Cassandra primary keys), but there are no secondary indexes.

Even for those keys, PostgreSQL ASC and DESC options, index types, partial indexes, expressional indexes, non-key columns are not available. Generated columns are not available.

Constraints

Constraints like primary key uniqueness, UNIQUE constraints and foreign keys are not enforced. Instead, Redshift assumes that they are enforced by the client application, so for example:

  • SELECT DISTINCT pk will return the same results as SELECT pk.
  • If there is a primary key or a UNIQUE key, COUNT(DISTINCT *) will return the same number as COUNT(*).

This may cause queries to return wrong results.

CHECK constraints, ASSERTIONs and triggers are not supported.

Inheritance

Not supported.

Foreign Data Wrappers

Foreign data wrappers are not supported in Redshift.

Though, it has external tables to use data in some specific formats stored in Amazon S3. External tables are populated from the results of a query. The UNLOAD statement can be used to export data to Apache Parquet files in S3.

Transactions

Transactions have the following important limitations:

But again, keep in mind that Redshift is designed for data warehousing. It allows to:

  • Load data while users run queries;
  • Load data concurrently into different tables;
  • Alter a dimension structure and its data inside the same transaction.

SQL

The following statements (between others) are missing:

  • MERGE;
  • INSERT ... ON CONFLICT;
  • COPY has less options and more specific semantics;
  • RETURNING;
  • CHECKPOINT;
  • SECURITY LABEL;
  • LISTEN;
  • VALUES.

Stored procedures and functions

  • Stored procedures can only be written in PL/plSQL or Python. Other procedural languages are not supported.
  • Stored functions are very limited in Redshift:
    • Functions can only be a single SELECT or a Python script. Other languages, including PL/plSQL, are not supported.
    • Functions can’t read data from tables (they’re SELECTs without a FROM clause).
    • Functions can only use positional arguments. There are no named arguments.

There are several other minor limitations for both procedures and functions. For example VARIADIC arguments are not supported, and there is no COMMENT clause.

Security features

Missing security features:

Conclusions

I used Redshift in the past and I will probably use it again. Let’s be clear: as a general rule, I prefer to use open source technologies, and in particular I don’t like to depend on a single vendor. But many organisations are already depending on Amazon, and don’t have any plans to change that.

Redshift is not easy because it’s in the cloud. It’s not something easy to switch to because it’s just good ol’ Postgres. You will need to learn things like how keys work and compression. Its costs are also potentially high (something not covered in this article).

All that said, would we recommend it? Sometimes we would. The main reason for that is integration with other Amazon services, where lock-ins are considered acceptable.

Federico Razzoli

Did you like this article?

Image credit

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Federico Razzoli
Federico Razzoli is a database professional, with a preference for open source databases, who has been working with DBMSs since year 2000. In the past 20+ years, he served in a number of companies as a DBA, Database Engineer, Database Consultant and Software Developer. In 2016, Federico summarized his extensive experience with MariaDB in the “Mastering MariaDB” book published by Packt. Being an experienced database events speaker, Federico speaks at professional conferences and meetups and conducts database trainings. He is also a supporter and advocate of open source software. As the Director of Vettabase, Federico does business worldwide but prefers to do it from Scotland where he lives.

Recent Posts

First steps with pgBackRest, a backup solution for PostgreSQL

First steps with pgBackRest, a backup solution for PostgreSQL

pgBackRest is an open source backup tool that creates physical backups with some improvements compared to the classic pg_basebackup tool. pg_basebackup is included in the PostgreSQL binaries, and it offers a great set of features for hot binary backups, remote...

Hints to optimise queries with a LIKE comparison

Hints to optimise queries with a LIKE comparison

In SQL, using the LIKE operator is a powerful way to find strings that match a certain pattern. It's suitable for most use cases, thanks to its two wildcard characters: _ means any one character. % means any sequence of zero or more characters. However, many queries...

Services

Need Help?  Click Here for Expert Support

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *