Redshift is an Amazon database technology available for AWS customers. It is based on PostgreSQL 8. The codebase was modified to develop a horizontally scalable SQL DBMS, suitable for analytics, and in particular Data WareHouse.
Many PostgreSQL features are not available in Redshift, particularly those that are not suitable for a database cluster or for analytics, and those that Amazon didn’t consider important enough to spend the necessary resources to maintain them.
This article shows an incomplete list of PostgreSQL features that are missing in Redshift. It is mainly written for those who are familiar with PostgreSQL and are evaluating Redshift for analytics.
Main missing features
Let me stress it again: this is not meant to be a complete list.
Redshift is not open source, nor source available.
PostgreSQL is open source, but its license does not disallow Amazon from using its codebase to build a cloud database.
One great thing about PostgreSQL is that it is not developer or owned by a company. Instead, many organisations and individuals contribute to improve it and have interest in keeping it alive.
Redshift’s future appears solid because its owner appears solid. Still, it depends on a single company. As a corner case, should Amazon’s plans change for any reason, Redshift may even be retired at some point.
While Redshift has a good throughput, the latency is far too high and unpredictable for OLTP workloads. Redshift is only intended for analytics.
Redshift stores data in a columnar fashion. As a consequence, selecting all columns (
SELECT *) or many columns causes performance penalties.
Redshift does not support the concept of extensions.
Redshift does not support certain types, like:
Redshift allows a
DISTKEY and a
SORTKEY for each table (very similar to Cassandra primary keys), but there are no secondary indexes.
No index type can be specified.
SELECT DISTINCT pkwill return the same results as
- If there is a primary key or a
COUNT(DISTINCT *)will return the same number as
ASSERTIONs and triggers are not supported.
Foreign Data Wrappers
Foreign data wrappers are not supported in Redshift.
Transactions have the following important limitations:
- The only supported isolation level is
SAVEPOINTs are not supported;
VACUUMimplicitly commit the current transaction.
But again, keep in mind that Redshift is designed for data warehousing. It allows to:
- Load data while users run queries;
- Load data concurrently into different tables;
- Alter a dimension structure and its data inside the same transaction.
The following statements (between others) are missing:
COPYhas less options and more specific semantics;
Stored procedures, functions and triggers can only be written in PL/plSQL. Other procedural languages are not supported.
I used Redshift in the past and I will probably use it again. Let’s be clear: as a general rule, I prefer to use open source technologies, and in particular I don’t like to depend on a single vendor. But many organisations are already depending on Amazon, and don’t have any plans to change that.
Redshift is not easy because it’s in the cloud. It’s not something easy to switch to because it’s just good ol’ Postgres. You will need to learn things like how keys work and compression. Its costs are also potentially high (something not covered in this article).
All that said, would we recommend it? Sometimes we would. The main reason for that is integration with other Amazon services, where lock-ins are considered acceptable.