Generating URL-Safe Short IDs in Postgresql

postgres elephant words

In Postgresql, generally an incrementing primary key is used, which SERIAL is an excellent helper type for. Sometimes though, an incrementing ID exposed in an API or URLs reveals too much about the service, like the number of users you have (see the German Tank Problem).

You could use UUIDs for this, but they're long and obnoxious if you're exposing them to a user via URLs. Instead, we want a short string, say 8 characters in length, that is a cryptographically-secure, random value. We need a cryptographically-secure generator because otherwise someone could guess the generator algorithm, seed the previous value, and we'd again be vulnerable to the German Tank Problem.

Postgresql comes with an extension called pgcrypto that has a generate_random_bytes function. We'll use this to generate 6 bytes, giving us 2^48 possible keys, and then encode them in URL-safe Base64. The Birthday Problem probability table says that we can expect a 1% chance of having had a collision once we've generated 240,000 IDs, 25% after 1,300,000 IDs, and 50% at 20,000,000 IDs. So even after 20,000,000 IDs generated, we may only have had 1 collision. We'll use a LOOP just in case we hit a collision.

First, let's make sure pgcrypto is loaded.

CREATE EXTENSION IF NOT EXISTS "pgcrypto";

Then, we'll write a Trigger Procedure which generates a random key and ensures that it is unique to a table.

-- Create a trigger function that takes no arguments.
-- Trigger functions automatically have OLD, NEW records
-- and TG_TABLE_NAME as well as others.
CREATE OR REPLACE FUNCTION unique_short_id()
RETURNS TRIGGER AS $$

 -- Declare the variables we'll be using.
DECLARE
  key TEXT;
  qry TEXT;
  found TEXT;
BEGIN

  -- generate the first part of a query as a string with safely
  -- escaped table name, using || to concat the parts
  qry := 'SELECT id FROM ' || quote_ident(TG_TABLE_NAME) || ' WHERE id=';

  -- This loop will probably only run once per call until we've generated
  -- millions of ids.
  LOOP

    -- Generate our string bytes and re-encode as a base64 string.
    key := encode(gen_random_bytes(6), 'base64');

    -- Base64 encoding contains 2 URL unsafe characters by default.
    -- The URL-safe version has these replacements.
    key := replace(key, '/', '_'); -- url safe replacement
    key := replace(key, '+', '-'); -- url safe replacement

    -- Concat the generated key (safely quoted) with the generated query
    -- and run it.
    -- SELECT id FROM "test" WHERE id='blahblah' INTO found
    -- Now "found" will be the duplicated id or NULL.
    EXECUTE qry || quote_literal(key) INTO found;

    -- Check to see if found is NULL.
    -- If we checked to see if found = NULL it would always be FALSE
    -- because (NULL = NULL) is always FALSE.
    IF found IS NULL THEN

      -- If we didn't find a collision then leave the LOOP.
      EXIT;
    END IF;

    -- We haven't EXITed yet, so return to the top of the LOOP
    -- and try again.
  END LOOP;

  -- NEW and OLD are available in TRIGGER PROCEDURES.
  -- NEW is the mutated row that will actually be INSERTed.
  -- We're replacing id, regardless of what it was before
  -- with our key variable.
  NEW.id = key;

  -- The RECORD returned here is what will actually be INSERTed,
  -- or what the next trigger will get if there is one.
  RETURN NEW;
END;
$$ language 'plpgsql';

Replace id in the select, and NEW.id with whatever your PRIMARY KEY is called.

Now let's hook this Trigger up to a Table to execute whenever there is an INSERT.

CREATE TABLE test (id TEXT PRIMARY KEY, name TEXT);

-- We name the trigger "trigger_test_genid" so that we can remove
-- or replace it later.
-- If an INSERT contains multiple RECORDs, each one will call
-- unique_short_id individually.
CREATE TRIGGER trigger_test_genid BEFORE INSERT ON test FOR EACH ROW EXECUTE PROCEDURE unique_short_id();

You'll note that we want unique_short_id to trigger before the INSERT is finalized, so that we can replace the PRIMARY KEY.

Now, whenever we INSERT, our Trigger Procedure will run and generate a unique ID.

INSERT INTO test (name) VALUES ('cheese'), ('ham'), ('turkey'), ('chicken');
SELECT * FROM test;
    id    |  name
----------+---------
 Ixw1yIj7 | cheese
 SXq0jZ-q | ham
 KKWXEtBu | turkey
 DRRXFs1U | chicken
(4 rows)

We've got short, URL-safe, unique IDs that we can expose in APIs and URLs that don't reveal any information and aren't cumbersome or ugly.


Update: @lancestout points out that there is a chance of producing swear words. You could search a blacklist of words allowing for mutations, but maybe it'd be easier to encode it as hex instead of base64 with: encode(gen_random_bytes(6), 'hex'). The downside of using hex is that encoding 6 bytes takes around 12 characters.


If you'd like assistance in your web applications and APIs, let us know!

You might also enjoy reading:

Blog Archives: