SQL Lateral Joins

Drew Landis

Recently, I heard about a relatively new (as of 9.3) feature in Postgres called a LATERAL JOIN. A LATERAL JOIN enables a subquery in the from part of a clause to reference columns from preceding items in the from list (quoted from here). Without LATERAL, each subquery is evaluated independently and so cannot cross-reference any other from item (quoted from here).

I’ve been looking for a good way to use this feature in our internal code and I finally found one. In our specific instance, using a LATERAL JOIN sped up a query by an order of magnitude! However, our example was relatively complex and specific to us, so here is a generic (very contrived) example.

Setup:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Create a table with 10 million rows of 6 character random strings of numbers
-- takes about 30 sec to create
CREATE TABLE rand_table AS
SELECT
id,
LPAD(FLOOR(RANDOM()*1000000)::text, 6, '0') AS rand_string
FROM GENERATE_SERIES(1,10000000) id;
-- Create table with 999,999 rows of 6 character strings of numbers (from 1 to 999,999)
-- takes about 1 sec to create
CREATE TABLE series_table AS
SELECT
id,
LPAD(id::text, 6, '0') AS series
FROM GENERATE_SERIES(1,999999) id
-- Vacuum analyze both tables
VACUUM ANALYZE rand_table;
VACUUM ANALYZE series_table;

Test:

Let’s count how many instances of '010170' there are in rand_table. Then we’ll LEFT JOIN that to the series_table. Like, I said, super contrived…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
SELECT
st.id AS series_id,
st.series,
rt.count
FROM series_table st
-- Fast query (using a LATERAL JOIN) ~2 seconds
/*
LEFT JOIN LATERAL (
SELECT
rand_string,
COUNT(rand_string)
FROM rand_table
WHERE rand_string = st.series -- this is the lateral magic!
GROUP BY rand_string
) rt ON st.series = rt.rand_string
*/
-- Slow query (using a regular JOIN) ~10 seconds
/*
LEFT JOIN (
SELECT
rand_string,
COUNT(rand_string)
FROM rand_table
GROUP BY rand_string
) rt ON st.series = rt.rand_string
*/
WHERE st.id = 10170

Clean Up:

1
2
DROP TABLE rand_table;
DROP TABLE series_table;

Results:

The LATERAL JOIN returns results in about 2 seconds, while the regular JOIN takes about 10 seconds.

In the slow (LEFT JOIN) query, the subquery is forced to return all data, and then JOIN on that entire result set. It takes a long time to grab and count the entire result set of the subquery, which considerably increases the overall query time. In the fast (LEFT JOIN LATERAL) query, the subquery is able to reference the st.series column from a preceding item, and pare down the subquery result set to only include data that will ultimately be JOINed upon.

As stated, this example is super contrived and there are definitely other ways to rewrite and improve it, but hopefully this will give you the gist of how a LATERAL JOIN should look and function. Also note, this query speed only improved by about 5 times, however for our internal query, we were able to improve query time by an entire order of magnitude. Depending where you use LATERAL JOINs, some queries will improve more than others.

Hope that helps!

Update (July 15, 2016):

I updated the above test so that rand_table has 100 million rows of 7 character numeric strings, and series_table has 9,999,999 rows of 7 character numeric strings, then re-ran the test. This time, the LATERAL JOIN finished in about 44 seconds and the regular JOIN finished in about 20 minutes and 45 seconds (–> 1,245 seconds). This means that the LATERAL JOIN completed the query 28 times faster than the regular JOIN!