🐷 ✊🏽 💝 PostgreSQL Antipatterns: Unique IDs 👸🏻 🏪 🚵🏼

Quite often, a developer needs to generate some unique identifiers for records in the PostgreSQL table - both when inserting records and when reading them.

Counters table

It would seem - what is easier? We set up a separate plate, in it - an entry with a counter. We need to get a new identifier - read from there to write a new value - do it UPDATE...

Do n't do that ! Because tomorrow you will have to solve problems:

persistent overlapping locks when UPDATE

see PostgreSQL Antipatterns: fighting hordes of "dead"
gradual degradation of the speed of access to the data of the counter table

see PostgreSQL Antipatterns: updating a large table under load
... and the need to clean it up with active transactions that will bother you,

see DBA: when VACUUM passes, we clean the table manually

SEQUENCE object

For such tasks, PostgreSQL provides a separate entity - SEQUENCE. It is nontransactional, that is, it does not cause locks , but two "parallel" transactions will certainly receive different values .

To get the next ID from a sequence, just use the function nextval:

SELECT nextval('seq_name'::regclass);

Sometimes you need to get several IDs at once - for streaming recording via COPY, for example. Using for this setval(currval() + N)is fundamentally wrong ! For the simple reason that between calls to the "inner" ( currval) and "outer" ( setval) functions, a concurrent transaction could change the current value of the sequence. The correct way is to call the nextvalrequired number of times:

SELECT
  nextval('seq_name'::regclass)
FROM
  generate_series(1, N);

Serial pseudo

It is not very convenient to work with sequences in "manual" mode. But our typical task is to ensure the insertion of a new record with a new sequence-ID! Especially for this purpose, PostgreSQL is invented serial, which, when generating a table, "expands" into something like . There is no need to remember the name of the automatically generated sequence linked to the field, there is a function for this . The same function can be used in your own substitutions - for example, if there is a need to make a common sequence for several tables at once. However, since working with the sequence is nontransactional, if the identifier from it was received by a rollbacked transaction, then the sequence of IDs in the saved table records will be "leaky"id integer NOT NULL DEFAULT nextval('tbl_id_seq')

pg_get_serial_sequence(table_name, column_name)DEFAULT

...

GENERATED columns

Starting with PostgreSQL 10 , it is possible to declare an identity column ( GENERATED AS IDENTITY) that conforms to the SQL: 2003 standard. In the variant, the GENERATED BY DEFAULTbehavior is equivalent serial, but with GENERATED ALWAYSeverything more interesting:

CREATE TABLE tbl(
  id
    integer
      GENERATED ALWAYS AS IDENTITY
);

INSERT INTO tbl(id) VALUES(DEFAULT);
--   :     10 .
INSERT INTO tbl(id) VALUES(1);
-- ERROR:  cannot insert into column "id"
-- DETAIL:  Column "id" is an identity column defined as GENERATED ALWAYS.
-- HINT:  Use OVERRIDING SYSTEM VALUE to override.

Yes, in order to insert a specific value "across" such a column, you will have to make extra efforts with OVERRIDING SYSTEM VALUE:

INSERT INTO tbl(id) OVERRIDING SYSTEM VALUE VALUES(1);
--   :     11 .

Note that now we have two identical values in the table id = 1- that is, GENERATED does not impose additional UNIQUE conditions and indices , but is purely a declaration, as well as serial.

In general, on modern PostgreSQL versions the use of serial is deprecated, with the preferred replacement for GENERATED. Except, perhaps, the situation of support for cross-version applications working with PGs below 10.

Generated UUID

Everything is fine as long as you work within one database instance. But when there are several of them, there is no adequate way to synchronize the sequences (however, this does not prevent you from “inadequately” synchronizing them , if you really want to). This is where the type UUIDand functions for generating values for it come to the rescue . I usually use it uuid_generate_v4()as the most "casual" one.

Hidden system fields

tableoid / ctid

Sometimes, when fetching records from a table, you need to somehow address a specific "physical" record, or find out from which particular section a particular record was obtained when accessing the "parent" table using inheritance .

In this case, the hidden system fields present in each record will help us :

tableoidstores the oid-id of the table - that is, tableoid::regclass::textgives the name of a particular table-section
ctid - "physical" address of the record in the format (<>,<>)

For example, ctidit can be used for operations with a table without a primary key , but tableoidfor the implementation of certain kinds of foreign keys.

oid

Up to 11 PostgreSQL was possible to declare when you create the attribute table WITH OIDS:

CREATE TABLE tbl(id serial) WITH OIDS;

Each entry in this table gets an additional hidden field oidwith a globally unique value within the database - as it was organized for system tables like pg_class, pg_namespace...

When you insert a record in a table generated value is returned immediately to the result of the query:

INSERT INTO tbl(id) VALUES(DEFAULT);

  :   OID 16400   11 .

Such a field is invisible for a "normal" table query:

SELECT * FROM tbl;

id
--
 1

It, like other system fields, must be requested explicitly:

SELECT tableoid, ctid, xmin, xmax, cmin, cmax, oid, * FROM tbl;

tableoid | ctid  | xmin | xmax | cmin | cmax | oid   | id
---------------------------------------------------------
   16596 | (0,1) |  572 |    0 |    0 |    0 | 16400 |  1

True, the value oidis only 32 bits , so it is very easy to get an overflow, after which oidit will not even be possible to create any table (it needs a new one !). Therefore, since PostgreSQL 12, it is WITH OIDSno longer supported .

"Fair" time clock_timestamp

Sometimes, during a long execution of a query or procedure, you want to bind the "current" time to the record. Failure awaits anyone who tries to use the function to do this now()- it will return the same value throughout the entire transaction .

To get the "right now" time, there is a function clock_timestamp()(and another bunch of its brothers). The difference in behavior of these functions can be seen on the example of a simple query:

SELECT
  now()
, clock_timestamp()
FROM
  generate_series(1, 4);

              now              |        clock_timestamp
-------------------------------+-------------------------------
 2020-08-19 16:26:05.626629+03 | 2020-08-19 16:26:05.626758+03
 2020-08-19 16:26:05.626629+03 | 2020-08-19 16:26:05.626763+03
 2020-08-19 16:26:05.626629+03 | 2020-08-19 16:26:05.626764+03
 2020-08-19 16:26:05.626629+03 | 2020-08-19 16:26:05.626765+03

PostgreSQL Antipatterns: Unique IDs