PostgreSQL Antipatterns: Unique IDs

Quite often, a developer needs to generate some unique identifiers for records in the PostgreSQL table - both when inserting records and when reading them.





Counters table



It would seem - what is easier? We set up a separate plate, in it - an entry with a counter. We need to get a new identifier - read from there to write a new value - do it UPDATE...



Do n't do that ! Because tomorrow you will have to solve problems:





SEQUENCE object



For such tasks, PostgreSQL provides a separate entity - SEQUENCE. It is nontransactional, that is, it does not cause locks , but two "parallel" transactions will certainly receive different values .



To get the next ID from a sequence, just use the function nextval:



SELECT nextval('seq_name'::regclass);


Sometimes you need to get several IDs at once - for streaming recording via COPY, for example. Using for this setval(currval() + N)is fundamentally wrong ! For the simple reason that between calls to the "inner" ( currval) and "outer" ( setval) functions, a concurrent transaction could change the current value of the sequence. The correct way is to call the nextvalrequired number of times:



SELECT
  nextval('seq_name'::regclass)
FROM
  generate_series(1, N);


Serial pseudo



It is not very convenient to work with sequences in "manual" mode. But our typical task is to ensure the insertion of a new record with a new sequence-ID! Especially for this purpose, PostgreSQL is invented serial, which, when generating a table, "expands" into something like . There is no need to remember the name of the automatically generated sequence linked to the field, there is a function for this . The same function can be used in your own substitutions - for example, if there is a need to make a common sequence for several tables at once. However, since working with the sequence is nontransactional, if the identifier from it was received by a rollbacked transaction, then the sequence of IDs in the saved table records will be "leaky"id integer NOT NULL DEFAULT nextval('tbl_id_seq')



pg_get_serial_sequence(table_name, column_name)DEFAULT



...



GENERATED columns



Starting with PostgreSQL 10 , it is possible to declare an identity column ( GENERATED AS IDENTITY) that conforms to the SQL: 2003 standard. In the variant, the GENERATED BY DEFAULTbehavior is equivalent serial, but with GENERATED ALWAYSeverything more interesting:



CREATE TABLE tbl(
  id
    integer
      GENERATED ALWAYS AS IDENTITY
);


INSERT INTO tbl(id) VALUES(DEFAULT);
--   :     10 .
INSERT INTO tbl(id) VALUES(1);
-- ERROR:  cannot insert into column "id"
-- DETAIL:  Column "id" is an identity column defined as GENERATED ALWAYS.
-- HINT:  Use OVERRIDING SYSTEM VALUE to override.


Yes, in order to insert a specific value "across" such a column, you will have to make extra efforts with OVERRIDING SYSTEM VALUE:



INSERT INTO tbl(id) OVERRIDING SYSTEM VALUE VALUES(1);
--   :     11 .


Note that now we have two identical values โ€‹โ€‹in the table id = 1- that is, GENERATED does not impose additional UNIQUE conditions and indices , but is purely a declaration, as well as serial.



In general, on modern PostgreSQL versions the use of serial is deprecated, with the preferred replacement for GENERATED. Except, perhaps, the situation of support for cross-version applications working with PGs below 10.



Generated UUID



Everything is fine as long as you work within one database instance. But when there are several of them, there is no adequate way to synchronize the sequences (however, this does not prevent you from โ€œinadequatelyโ€ synchronizing them , if you really want to). This is where the type UUIDand functions for generating values for it come to the rescue . I usually use it uuid_generate_v4()as the most "casual" one.



Hidden system fields



tableoid / ctid



Sometimes, when fetching records from a table, you need to somehow address a specific "physical" record, or find out from which particular section a particular record was obtained when accessing the "parent" table using inheritance .



In this case, the hidden system fields present in each record will help us :



  • tableoidstores the oid-id of the table - that is, tableoid::regclass::textgives the name of a particular table-section
  • ctid - "physical" address of the record in the format (<>,<>)


For example, ctidit can be used for operations with a table without a primary key , but tableoidfor the implementation of certain kinds of foreign keys.



oid



Up to 11 PostgreSQL was possible to declare when you create the attribute table WITH OIDS:



CREATE TABLE tbl(id serial) WITH OIDS;


Each entry in this table gets an additional hidden field oidwith a globally unique value within the database - as it was organized for system tables like pg_class, pg_namespace...



When you insert a record in a table generated value is returned immediately to the result of the query:



INSERT INTO tbl(id) VALUES(DEFAULT);


  :   OID 16400   11 .


Such a field is invisible for a "normal" table query:



SELECT * FROM tbl;


id
--
 1


It, like other system fields, must be requested explicitly:



SELECT tableoid, ctid, xmin, xmax, cmin, cmax, oid, * FROM tbl;


tableoid | ctid  | xmin | xmax | cmin | cmax | oid   | id
---------------------------------------------------------
   16596 | (0,1) |  572 |    0 |    0 |    0 | 16400 |  1


True, the value oidis only 32 bits , so it is very easy to get an overflow, after which oidit will not even be possible to create any table (it needs a new one !). Therefore, since PostgreSQL 12, it is WITH OIDSno longer supported .



"Fair" time clock_timestamp



Sometimes, during a long execution of a query or procedure, you want to bind the "current" time to the record. Failure awaits anyone who tries to use the function to do this now()- it will return the same value throughout the entire transaction .



To get the "right now" time, there is a function clock_timestamp()(and another bunch of its brothers). The difference in behavior of these functions can be seen on the example of a simple query:



SELECT
  now()
, clock_timestamp()
FROM
  generate_series(1, 4);


              now              |        clock_timestamp
-------------------------------+-------------------------------
 2020-08-19 16:26:05.626629+03 | 2020-08-19 16:26:05.626758+03
 2020-08-19 16:26:05.626629+03 | 2020-08-19 16:26:05.626763+03
 2020-08-19 16:26:05.626629+03 | 2020-08-19 16:26:05.626764+03
 2020-08-19 16:26:05.626629+03 | 2020-08-19 16:26:05.626765+03



All Articles