Optimizing Redshift SQL Queries Via Query Plan Estimates

Using SQL queries to generate reports across several days can take a non-trivial amount of time. While it is tempting to simply throw more hardware at the problem, it does little to address the potential problem of inefficient queries.

Inefficient queries are precursors to their final production ready counterparts, similar to developing software whereby the first version is often scrappy and unoptimized. However, once the business logic is nailed down, the software can be optimized by swapping programming constructs with leaner ones, removing redundant logic, abstracting common code into a parent class, etc

Thankfully, in most database systems we can take a look at the query execution plan by either using the EXPLAIN and/or EXPLAIN ANALYZE commands, or using a query profiler.

In this post, I will explain how I optimized a large and complex Amazon Redshift query using its execution plan via the EXPLAIN command. The optimization steps that I took reduced my query’s average execution time by 61%. Do note that your mileage may vary, depending on the indexes, views, data types, distribution keys, etc that you have in your data warehouse.

The actual query has been generalized and greatly simplified in this post. 


Overview

A while back I developed a complex Amazon Redshift query that allowed querying a large table across multiple days. As expected, execution time increased proportionally with the number of days, but I was determined to optimize the query before increasing the amount of hardware resources it was allocated.

To give a general overview of the query, it involved the following steps:

    • Left join TABLE_A with TABLE_B and TABLE_C
      • TABLE_A = the main table
      • TABLE_B = records to exclude from TABLE_A
      • TABLE_C = unused (from legacy logic)
    • Filter TABLE_A rows based on its integer and string values (convert NULLs to empty strings) of several important columns
    • Select a subset of columns from TABLE_A and TABLE_B (including some unnecessary ones) to be returned

Before we do any optimization, lets look at the query plan with estimated costs. Normally, EXPLAIN ANALYZE would be used to update table statistics and get both the query execution plan with the actual execution costs, row count, and byte length per row. However, I was using an account with limited permissions and did not have privilege to update any table statistics. Hence I could only execute the EXPLAIN command for query plans.

Here are the key parts of the original execution plan to benchmark any improvements against:

  • Sequential Scan on TABLE_A with row filtering (i.e. WHERE clauses):
    • XN Seq Scan on TABLE_A (cost=0.00..786825920.92 rows=1 width=280)
  • Sequential Scan on TABLE_B
    • XN Seq Scan on TABLE_B (cost=0.00..17667.88 rows=1766788 width=17)
  • Sequential Scan on TABLE_C
    • XN Seq Scan on TABLE_C (cost=0.00..22312.14 rows=2231214 width=28)
  • TABLE_A LEFT JOIN TABLE_C
    • XN Hash Right Join DS_DIST_OUTER (cost=786825920.92..180999361907.60 rows=1 width=682)
  • Using result set from previous line, LEFT JOIN with TABLE_B
    • XN Hash Right Join DS_DIST_OUTER (cost=180999361907.60..229998349578.89 rows=176679 width=714)
  • Final Cost
    • XN HashAggregate (cost=229998358412.84..229998359612.84 rows=40000 width=714)

You can find out what these values mean at the Redshift Documentation: https://docs.aws.amazon.com/redshift/latest/dg/c-the-query-plan.html

Optimization Steps

Step 1: Discarding the Unnecessary Columns / Tables

Before even looking at the query plan, we can already see some quick improvements wins, namely:

  • remove unnecessary LEFT JOIN on TABLE_C
  • do not SELECT any unnecessary columns

After removing the unnecessary join and column selection, the estimated costs looks like this:

  • Sequential Scan on TABLE_A with row filtering:
    • XN Seq Scan on TABLE_A (cost=0.00..786825920.92 rows=1 width=280)
  • Sequential Scan on TABLE_B
    • XN Seq Scan on TABLE_B (cost=0.00..17667.88 rows=1766788 width=17)
  • TABLE_A LEFT JOIN TABLE_B (improvement)
    • XN Hash Right Join DS_DIST_OUTER (cost=180999361907.60..229998349578.89 rows=176679 width=666)
  • Final Cost (improvement)
    • XN HashAggregate (cost=229998356204.36..229998357304.36 rows=40000 width=666)

Step 2: JOIN Bottlenecks in Query Execution Plan

When I analyzed the new execution plan, I noticed the following:

  • A LEFT JOIN between TABLE_A and TABLE_B on Column_1, just so that we can filter out the row in the WHERE clause based on the presence of  TABLE_B.Column_2 value (i.e not null)
  • To illustrate the concept, here is a similar and simplified query:
    SELECT  TABLE_A.[relevant_columns]
      FROM  TABLE_A LEFT JOIN
              (SELECT Column_1, 'invalid' as Column_2 FROM TABLE_B)
        ON  TABLE_A.Column_1 = TABLE_B.Column_1
     WHERE  nvl(Column_2,'') != 'invalid'

And JOINS are expensive operations and they expand the column width of the result set. Furthermore, this approach is rather inefficient as we are doing an entire JOIN operation on a table just so that we can remove it later on – seems like an awful waste of compute cycles!

Thus the next step was to do a smarter row filtering: Instead of creating Column_2 just for the sake of ‘marking’ the row, we can simplify by enclosing just Column_1 in a Common Table Expression (CTE), and removing it in the query itself:

WITH records_to_remove AS (
  SELECT Column_1 FROM TABLE_B
), 
only_relevant_columns AS (
  SELECT  [relevant_columns]
    FROM  TABLE_A
   WHERE  [other_conditions]

)

SELECT *
  FROM relevant_columns 
 WHERE only_relevant_columns.Column_1 NOT IN 
         (SELECT Column_1 FROM records_to_remove)

Note: In this simplified example, I could have made it into a subquery. However, this CTE was reused somewhere else.

The improvements I made was to remove a JOIN operation, and the nvl() function that would have been applied to every row in the result set. The new execution plan looks like this:

  • Sequential Scan on TABLE_A with row filtering (improvement)
    • XN Seq Scan on TABLE_A (cost=0.00..710681476.96 rows=1 width=232)
  • Sequential Scan on TABLE_B
    • XN Seq Scan on TABLE_B (cost=0.00..17667.88 rows=1766788 width=17)
  • TABLE_A LEFT JOIN TABLE_B (improvement)
    • XN Hash Join DS_DIST_BOTH (cost=0.02..710719877.03 rows=1 width=236)
  • Final Cost (improvement)
    • XN HashAggregate (cost=710764046.76..710764046.80 rows=1 width=236)

Definitely a big improvement, and we can be better!

Step 3: Smarter Column Value Filtering

From the execution plan above, most of the execution time is spent on the sequential scan of TABLE_A due to the complicated row filtering (i.e. WHERE clauses). I tried several combinations and the following worked for me:

  • Look at table DDL to determine if certain checks are irrelevant (e.g. checking if column value is not null when the column does not allow for nulls)
  • Collapse multiple conditions on the same column into one condition using POSIX operators
  • Remove unnecessary functions

To illustrate, the following example WHERE conditions can be simplified

  • From:
    WHERE Column_5 IS NOT NULL
      AND UPPER(Column_5) NOT LIKE '%APPLE%'
      AND UPPER(Column_5) NOT LIKE '%BANANA%'
      AND UPPER(Column_5) NOT LIKE '%COFFEE%'
  • To:
    WHERE Column_5 !~* '.*(APPLE|BANANA|COFFEE).*'

After “refactoring” the WHERE clauses in the query,  my new estimated costs looks like this:

  • Sequential Scan on TABLE_A with row filtering (improvement)
    • XN Seq Scan on TABLE_A (cost=0.00..474117753.57 rows=1 width=220)
  • Sequential Scan on TABLE_B
    • XN Seq Scan on TABLE_B (cost=0.00..17667.88 rows=1766788 width=17)
  • TABLE_A LEFT JOIN TABLE_B (improvement)
    • XN Hash Join DS_DIST_BOTH (cost=474154553.67..474154553.70 rows=1 width=114)
  • Final Cost (improvement)
    • XN Aggregate (cost=474234779.75..474234779.75 rows=1 width=0)

Step 4: Not All Optimizations Are Good – Run Actual Query to Verify Cost Estimates

Alternatively, do EXPLAIN ANALYZE

It is good practice to run the actual query (without the EXPLAIN) to verify if the cost estimates are accurate. I found out that one of the “optimizations” that I did actually increased the execution time.

Thinking that POSIX operators were more efficient, I converted one of the large NOT IN clauses to use POSIX operators.

From:

  • WHERE Column_6 NOT IN ('Apple', 'Banana', 'Coffee', [many_more_values])

To:

  • WHERE Column_6 !~* '(Apple|Banana|Coffee|many_more_values)'

The query plan indicated that this approach had a significant overall improvement of up to 50% from Step 2:

  • Final Cost
    • XN HashAggregate (cost=349466949.86..349466949.86 rows=1 width=0) 

However, when I actually ran the query, it was 20% slower. My hunch is that the large set of values actually made the POSIX comparison slower than the “NOT IN” approach.

Bonus Step: Using Common Table Expressions (CTE) to Avoid Duplicate Sequential Scans

After my improvements to the main query, I analyzed the other CTEs and found that I had a severely inefficient query that looked similar to the following:

SELECT DISTINCT col_alias, Column_2
 FROM  (SELECT DISTINCT Column_5 AS col_alias, Column_2
          FROM TABLE_Z
         WHERE TABLE_Z.Column_1 = 'value'
         UNION
        SELECT DISTINCT Column_8 AS col_alias, Column_2
          FROM TABLE_Z
         WHERE TABLE_Z.Column_1 = 'value')
GROUP BY col_alias, Column_2

where TABLE_Z contained a non-trivial number of rows. The inefficiencies I found via the execution plan were:

  • Outer DISTINCT was redundant as the UNION operator already enforces uniqueness in the result set
  • Two sequential table scans were being done on TABLE_Z to check for Column_1 = 'value'

To remove the redundant sequential scan, I made a CTE that did the WHERE clause filtering, and used that in the two subqueries. In addition, I removed the outer SELECT DISTINCT as it was not needed.

The query now looks like this:

WITH filtered_table_z AS (
  SELECT  Column_2, Column_5, Column_8
    FROM  TABLE_Z
   WHERE  Column_1 = 'value'
)
SELECT  DISTINCT Column_5 AS col_alias, Column_2
  FROM  filtered_table_z
 UNION
SELECT  DISTINCT Column_8 AS col_alias, Column_2
  FROM  filtered_table_z

The new execution plan showed that only one sequential scan would be done. I verified that the execution time was indeed faster, and that the number of rows in the result set remained the same.

Summary

By analyzing the query execution plan (obtained via the EXPLAIN command), I was able to identify bottlenecks and optimize my complex Redshift query. The estimated logical cost dropped from 229998359612.84 units to 474234779.75 units, and led to 61% reduction of execution time.

I chose to stop here following the Pareto principle (80/20 rule).

Are there more areas to optimize further? Yes, and here are some possible ideas:

  • Assuming we could change the table DDL, we can consider changing the distribution keys to improve the JOIN operation (identified by the DS_DIST_BOTH keyword)
  • Assuming we could have a more permissive account, we can create views that do not require expensive sequential scans for row filtering as every row is now relevant to the query
  • Assuming we can edit the ingestion scripts, we can add some business rules there (e.g. UPPER(), nvl2(), COALESCE()) to reduce the number of operations that any query needs to do when querying the tables

Disclaimer: Some options may not actually be feasible as a data warehouse is typically designed to serve an entire organization and/or multiple teams. My query is not the only “customer” of the data warehouse, and narrowly optimizing the tables to benefit it may be detrimental for other queries.

Hope this helps anyone who is looking for tips to optimize their queries!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s