CREATE TABLE task AS
SELECT
id
, (random() * 100)::integer person -- 100
, least(trunc(-ln(random()) / ln(2)), 10)::integer priority -- 2
FROM
generate_series(1, 1e5) id; -- 100K
CREATE INDEX ON task(person, priority);
The word "is" in SQL turns into
EXISTS
- here's the simplest version and let's start:
SELECT
*
FROM
generate_series(0, 99) pid
WHERE
EXISTS(
SELECT
NULL
FROM
task
WHERE
person = pid AND
priority = 10
);
all plan pictures are clickable
So far, everything looks good, but ...
EXISTS + IN
... then they came to us and asked to include not only
priority = 10
8 and 9 as "super" :
SELECT
*
FROM
generate_series(0, 99) pid
WHERE
EXISTS(
SELECT
NULL
FROM
task
WHERE
person = pid AND
priority IN (10, 9, 8)
);
They read 1.5 times more, and it also affected the execution time.
OR + EXISTS
Let's try to use our knowledge that it is
priority = 8
much more likely to meet a record with than 10:
SELECT
*
FROM
generate_series(0, 99) pid
WHERE
EXISTS(
SELECT
NULL
FROM
task
WHERE
person = pid AND
priority = 8
) OR
EXISTS(
SELECT
NULL
FROM
task
WHERE
person = pid AND
priority = 9
) OR
EXISTS(
SELECT
NULL
FROM
task
WHERE
person = pid AND
priority = 10
);
Note that PostgreSQL 12 is already smart enough to make subsequent
EXISTS
-subqueries only for those "not found" by the previous ones after 100 searches for value 8 - only 13 for value 9, and only 4 for 10.
CASE + EXISTS + ...
On previous versions, a similar result can be achieved by "hiding under CASE" the following queries:
SELECT
*
FROM
generate_series(0, 99) pid
WHERE
CASE
WHEN
EXISTS(
SELECT
NULL
FROM
task
WHERE
person = pid AND
priority = 8
) THEN TRUE
ELSE
CASE
WHEN
EXISTS(
SELECT
NULL
FROM
task
WHERE
person = pid AND
priority = 9
) THEN TRUE
ELSE
EXISTS(
SELECT
NULL
FROM
task
WHERE
person = pid AND
priority = 10
)
END
END;
EXISTS + UNION ALL + LIMIT
The same, but you can get a little faster if you use the "hack"
UNION ALL + LIMIT
:
SELECT
*
FROM
generate_series(0, 99) pid
WHERE
EXISTS(
(
SELECT
NULL
FROM
task
WHERE
person = pid AND
priority = 8
LIMIT 1
)
UNION ALL
(
SELECT
NULL
FROM
task
WHERE
person = pid AND
priority = 9
LIMIT 1
)
UNION ALL
(
SELECT
NULL
FROM
task
WHERE
person = pid AND
priority = 10
LIMIT 1
)
LIMIT 1
);
Correct indexes are the key to the health of the database
Now let's look at the problem from a completely different side. If we know for sure that the
task
number of records that we want to find are several times less than the rest , then we will make a suitable partial index. At the same time, let's go straight from the "dot" enumeration
8, 9, 10
to
>= 8
:
CREATE INDEX ON task(person) WHERE priority >= 8;
SELECT
*
FROM
generate_series(0, 99) pid
WHERE
EXISTS(
SELECT
NULL
FROM
task
WHERE
person = pid AND
priority >= 8
);
I had to read 2 times faster and 1.5 times less!
But, probably, to subtract all suitable ones at
task
once - will it be even faster? ..
SELECT DISTINCT
person
FROM
task
WHERE
priority >= 8;
Far from always, and certainly not in this case - because instead of 100 reads of the first available records, we have to read more than 400!
And in order not to guess which of the query options will be more effective, and to know it with confidence - use explain.tensor.ru .