Resizing a Numeric Column in a PostgreSQL Table without Changing Data

Drew Landis

While using PostgreSQL, you may find yourself in a situation where you have a column whose data type is now too small and the length needs to be increased. Updating a column type in PostgreSQL can, at times, be nothing short of very painful.

Let’s say you have a column of type varchar(25). When you first created the column, you decided there was absolutely no way you could ever need more than 25 characters in that column. Fast forward months or years later and you now realize that column requires 40 characters. This would be fine, except that (A) the table is huge - which means it could take a significant amount of time for this command to finish - and (B) there are views and rules that depend on that column - which generates errors when you try the standard ALTER TABLE my_table ALTER COLUMN my_column TYPE varchar(40);. There is no good solution to (A) other than waiting. Which may or may not be allowed based on your business needs. The solution to (B) is painfully manual. You need to drop all dependent views and rules (e.g. Primary Key, etc), make the column data type change, then recreate all dependent views and rules. This sucks.

Luckily, the folks over at sniptools.com solved this exact problem in this post. I won’t go into the details (you should look at the post directly), but suffice it to say, I have used their solution multiple times on a production database and it has worked amazingly well.

Great. Problem solved. …Except that now we have the exact same problem with columns of type numeric(precision, scale). I have a column of type numeric(2,0) and I really need it to be numeric(4,0). I’m running into all of the same problems as the varchar issue above.

Thankfully, there is a very similar solution! To demonstrate this, let’s start by creating a fake table of varying numeric types:

1
2
3
4
5
6
7
8
9
10
11
12
CREATE TABLE my_numeric_test(
numeric_one_zero numeric(1,0),
numeric_one_one numeric(1,1),
numeric_two_zero numeric(2,0),
numeric_two_one numeric(2,1),
numeric_three_zero numeric(3,0),
numeric_three_one numeric(3,1),
numeric_four_zero numeric(4,0),
numeric_four_one numeric(4,1),
numeric_fortyfive_fifteen numeric(45,15),
numeric_sixhundredeightythree_threehundred numeric(683,300)
);

Next, inspect the atttypmod of the different columns:

1
2
3
4
5
6
SELECT atttypmod, attname
FROM pg_attribute
WHERE 1 = 1
AND attrelid = 'my_numeric_test'::regclass
AND attname LIKE ('numeric_%')
ORDER BY atttypmod;

Notice, there is a pattern here:
atttypmod = precision * 65,536 + scale + 4

Let’s say we want to update column numeric_four_zero to have type numeric(9,0). A couple of tests:

1
2
INSERT INTO my_numeric_test (numeric_four_zero) VALUES (1234); -- works!
INSERT INTO my_numeric_test (numeric_four_zero) VALUES (123456789); -- ERROR: numeric field overflow

Using the algorithm from above, for numeric(9,0) we see atttypmod = 9 * 65,536 + 0 + 4 = 589,828. Here is how we can update the column type:

1
2
3
4
5
6
7
8
-- Regular way to update column type (don't use this for this example...)
--ALTER TABLE my_numeric_test ALTER COLUMN numeric_four_zero TYPE numeric(9,0);
-- Hack way to update column type
UPDATE pg_attribute
SET atttypmod = 589828
WHERE attrelid = 'my_numeric_test'::regclass
AND attname = 'numeric_four_zero';

We can run the same test as above and see that it works:

1
INSERT INTO my_numeric_test (numeric_four_zero) VALUES (123456789); -- works!

We can also select the column from the table and see that the column type has changed:

1
SELECT * FROM my_numeric_test;

Finally, cleanup:

1
DROP TABLE my_numeric_test;

Warning: I’m not sure if there are any side effects of doing this on your own code. I think it should work, but give no guarantees implicitly nor explicitly that it will not turn your database into a smoking, ruined heap.

Again, many thanks to this sniptools post. Without them, it would not have been possible.

Hope that helps!

Update (Sept 19, 2016):

I was too worried about potential side effects of using this hack and opted to not use it on a production environment. Instead, I dropped 80 views, updated about 65 column data types, and then recreated the 80 views. It required lots more work, but this way, I’m more confident in the final product. As stated before, if you do use this hack, do so at your own risk.

SQL Lateral Joins

Drew Landis

Recently, I heard about a relatively new (as of 9.3) feature in Postgres called a LATERAL JOIN. A LATERAL JOIN enables a subquery in the from part of a clause to reference columns from preceding items in the from list (quoted from here). Without LATERAL, each subquery is evaluated independently and so cannot cross-reference any other from item (quoted from here).

I’ve been looking for a good way to use this feature in our internal code and I finally found one. In our specific instance, using a LATERAL JOIN sped up a query by an order of magnitude! However, our example was relatively complex and specific to us, so here is a generic (very contrived) example.

Setup:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Create a table with 10 million rows of 6 character random strings of numbers
-- takes about 30 sec to create
CREATE TABLE rand_table AS
SELECT
id,
LPAD(FLOOR(RANDOM()*1000000)::text, 6, '0') AS rand_string
FROM GENERATE_SERIES(1,10000000) id;
-- Create table with 999,999 rows of 6 character strings of numbers (from 1 to 999,999)
-- takes about 1 sec to create
CREATE TABLE series_table AS
SELECT
id,
LPAD(id::text, 6, '0') AS series
FROM GENERATE_SERIES(1,999999) id
-- Vacuum analyze both tables
VACUUM ANALYZE rand_table;
VACUUM ANALYZE series_table;

Test:

Let’s count how many instances of '010170' there are in rand_table. Then we’ll LEFT JOIN that to the series_table. Like, I said, super contrived…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
SELECT
st.id AS series_id,
st.series,
rt.count
FROM series_table st
-- Fast query (using a LATERAL JOIN) ~2 seconds
/*
LEFT JOIN LATERAL (
SELECT
rand_string,
COUNT(rand_string)
FROM rand_table
WHERE rand_string = st.series -- this is the lateral magic!
GROUP BY rand_string
) rt ON st.series = rt.rand_string
*/
-- Slow query (using a regular JOIN) ~10 seconds
/*
LEFT JOIN (
SELECT
rand_string,
COUNT(rand_string)
FROM rand_table
GROUP BY rand_string
) rt ON st.series = rt.rand_string
*/
WHERE st.id = 10170

Clean Up:

1
2
DROP TABLE rand_table;
DROP TABLE series_table;

Results:

The LATERAL JOIN returns results in about 2 seconds, while the regular JOIN takes about 10 seconds.

In the slow (LEFT JOIN) query, the subquery is forced to return all data, and then JOIN on that entire result set. It takes a long time to grab and count the entire result set of the subquery, which considerably increases the overall query time. In the fast (LEFT JOIN LATERAL) query, the subquery is able to reference the st.series column from a preceding item, and pare down the subquery result set to only include data that will ultimately be JOINed upon.

As stated, this example is super contrived and there are definitely other ways to rewrite and improve it, but hopefully this will give you the gist of how a LATERAL JOIN should look and function. Also note, this query speed only improved by about 5 times, however for our internal query, we were able to improve query time by an entire order of magnitude. Depending where you use LATERAL JOINs, some queries will improve more than others.

Hope that helps!

Update (July 15, 2016):

I updated the above test so that rand_table has 100 million rows of 7 character numeric strings, and series_table has 9,999,999 rows of 7 character numeric strings, then re-ran the test. This time, the LATERAL JOIN finished in about 44 seconds and the regular JOIN finished in about 20 minutes and 45 seconds (–> 1,245 seconds). This means that the LATERAL JOIN completed the query 28 times faster than the regular JOIN!