Double negative and query performance

In certain situations, a query can be optimised using double negation (NOT NOT).

PostgreSQL has two implementations of hash join: Hash Join itself and Hashed Subplan, which may serve anti- and semi-joins.

The first one is shown in the plan as Hash Join node. It is normally used for explicit JOIN operations, including multiple tables in FROM section. EXISTS(), IN(), and NOT EXISTS() conditions in WHERE clause may get rewritten to joins as well if there is a subquery in the parentheses. Such joins (called semi-joins and anti-joins) may be executed using Hash Join as well. Like any other join node, Hash Join sits on top of two other plan nodes, called outer and inner. Hash Join node always comes with Hash node over the inner one, in fact Hash is an intrinsic part of Hash Join, we never find any of them them without the other in a plan.

postgres=# explain
postgres-# select *
postgres-# from pg_attribute
postgres-# where attrelid in (select oid from pg_class);
                               QUERY PLAN
------------------------------------------------------------------------
 Hash Join  (cost=18.83..134.10 rows=2623 width=205)
   Hash Cond: (pg_attribute.attrelid = pg_class.oid)
   ->  Seq Scan on pg_attribute  (cost=0.00..82.23 rows=2623 width=205)
   ->  Hash  (cost=14.48..14.48 rows=348 width=4)
         ->  Seq Scan on pg_class  (cost=0.00..14.48 rows=348 width=4)

If the inner relation is small enough to fit into work_mem when hashed, Classic Hash Join algorithm is used. First, inner relation data gets scanned to build a hash table. Then the outer relation is scanned only once, and for every tuple obtained from it, Postgres looks up for the matching records in the hash table. Depending on the join kind (left/inner/semi-/anti-), all or not all matched or unmatched resulting rows are emitted, but always in the same order as they were obtained from the outer relation. Right and outer joins also produce some tuples that are not based on any particular outer relation tuple.

However, the inner relation may be larger than the amount of RAM allowed for the operation. In this case, Postgres uses Hybrid Hash Join algorithm. Using a hash function, it splits both inner and outer relations into the same number of batches, and dumps them to disk. Then, for each pair of inner and outer batches, it reads them from disk and matches up their tuples using Classic Hash Join approach. As a hash function distributes values pseudo-randomly across batches, there’s no chance for the order to be preserved.

Despite some planning-time estimations, the final decision whether to use single-batch or multi-batch algorithm is made on the execution stage, so the planner cannot guarantee that Hash Join node preserves the tuple order of its outer node.

The other PostgreSQL approach to hash joins is called a Hashed Subplan. It is a method to evaluate an IN() condition found in SELECT or WHERE section of a query, with an uncorrelated subquery in the parentheses of IN(). In a plan, it’s displayed as a separate subplan which is referenced in the filter or output section of the node that uses it.

postgres=# explain (verbose)
postgres-# select attname,
postgres-#        attrelid,
postgres-#        attrelid in (select oid from pg_class)
postgres-# from pg_attribute
postgres-# where attrelid not in (
postgres-#        select adrelid from pg_attrdef
postgres-#       );
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Seq Scan on pg_catalog.pg_attribute  (cost=35.75..124.25 rows=1182 width=69)
   Output: pg_attribute.attname, pg_attribute.attrelid, (hashed SubPlan 1)
   Filter: (NOT (hashed SubPlan 2))
   SubPlan 2
     ->  Seq Scan on pg_catalog.pg_attrdef  (cost=0.00..18.30 rows=830 width=4)
           Output: pg_attrdef.adrelid
   SubPlan 1
     ->  Seq Scan on pg_catalog.pg_class  (cost=0.00..14.50 rows=350 width=4)
           Output: pg_class.oid

The idea of Hashed Subplan is to execute the subplan only once, to reproduce its result as a hash table in RAM, and to evaluate the IN() expression by a hash table lookup for each row of the node of the main query. This algorithm preserves main query node ordering.

What happens if the hash table doesn’t fit the RAM allowed for the operation? The subplan normally gets wrapped into Materialize node, so it is still executed only once. But the result does not get hashed, it is stored in RAM or even in a temporary file on disk instead. For each row of the main query node, the subquery result gets scanned to find whether IN condition holds or not. To emit false as a result of IN() expression, we need to scan it all the way through, which may take some time, especially when repeated for every row of the main query node.

The choice between Hashed and plain Subplan approaches is made on the planning stage. In case of wrong estimates, this can lead to suboptimal plan or RAM overuse. But unlike Hash Join, both hashed and ordinary subplans preserve order, and the planner is aware of it. If certain ordering is requested and it’s provided by the node, Postgres won’t need to sort it once again. That can speed things up.

One might desire to choose between the two algorithms if the operation can be described by both of them. One example of such queries is an anti-join of two tables written using NOT IN () or NOT EXISTS (). They are sadly remembered to be semantically different because of nulls, but in the non-null case they are equivalent. In Postgres, NOT IN uses Subplan (potentially hashed), while NOT EXISTS gets rewritten into an anti-join, which is then executed using one of the three join algorithms: Hash Join, Merge Join or Nested Loop.

A less known example is a semi-join, which can be written using IN() or NOT NOT IN(). They are fully equivalent in terms of semantics. The former one gets rewritten into a join to be executed by one of the three join algorithms. The latter one doesn’t, so it falls back to Subplan, which may be hashed if we have enough RAM.

This is an example of how dramatic the speed-up could be:

explain analyze
select code3
from t
where not not code1 in (
    select code1
    from s
)
order by code3;

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                   QUERY PLAN                                                                    │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Only Scan using t_code3_code1_idx on t  (cost=236.60..816237.45 rows=5000138 width=69) (actual time=1.129..3197.339 rows=9850647 loops=1) │
│   Filter: (hashed SubPlan 1)                                                                                                                    │
│   Rows Removed by Filter: 149353                                                                                                                │
│   Heap Fetches: 0                                                                                                                               │
│   SubPlan 1                                                                                                                                     │
│     ->  Index Only Scan using s_code1_code2_idx on s  (cost=0.28..226.25 rows=3865 width=29) (actual time=0.010..0.524 rows=3865 loops=1)       │
│           Heap Fetches: 0                                                                                                                       │
│ Planning time: 0.078 ms                                                                                                                         │
│ Execution time: 3455.053 ms                                                                                                                     │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

We get data from t sorted in an efficient way using an Index-only Scan, and then check whether corresponding values are present in s without breaking the order. Solutions without NOT NOT require at least 16 seconds on the same hardware, as they either have to sort the joined result explicitly, or to access s for each row of t individually using Nested Loop. See the snippets below for details.

set lc_messages = 'C';
\set ON_ERROR_STOP 1
select version();
\pset pager off
\timing on
drop schema if exists tmp cascade;
create schema tmp;
set search_path = tmp;

create table s(code1 text not null, code2 text not null);
create table t(code1 text not null, code3 text not null);

-- A many-to-many correspondence between a 1k-element set of code1 and a 1k-element set of code2.
-- About 0.39% of all possible combinations: ~4k pairs.
insert into s(code1, code2)
select repeat(i::text, 10),
       repeat(j::text, 10)
from generate_series(1, 1000) i
cross join generate_series(1, 1000) j
where substr(md5(i::text || '-' || j::text), 1, 2) = '00';

-- 10M lines, each with a random code1 and unique code3
insert into t(code1, code3)
select repeat(ceil(random()*1000)::text, 10),
       repeat(i::text, 10)
from generate_series(1, 10000000) i;

create index on s(code2, code1);
create index on s(code1, code2);

alter table t add primary key(code3);
create index on t(code1, code3);
create index on t(code3, code1);

vacuum analyze s;
vacuum analyze t;

set work_mem to '2GB';

\echo try NOT NOT
explain analyze
select code3
from t
where not not code1 in (
    select code1
    from s
)
order by code3;
\g
\g

\echo try straightforward
explain analyze
select code3
from t
where code1 in (
    select code1
    from s
)
order by code3;
\g
\g

\echo try join
explain analyze
select code3
from t
join (
    select distinct code1
    from s
) _ using (code1)
order by code3;
\g
\g

\echo try dirty tricks, but still not as fast
set enable_hashjoin to off;
set enable_sort to off;

explain analyze
select code3
from t
where code1 in (
    select code1
    from s
)
order by code3;
\g
\g
SET
Time: 0.152 ms
┌───────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                version                                                │
├───────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4, 64-bit │
└───────────────────────────────────────────────────────────────────────────────────────────────────────┘
(1 row)

Time: 0.180 ms
Pager usage is off.
Timing is on.
psql:/home/l/701.sql:6: NOTICE:  drop cascades to 2 other objects
DETAIL:  drop cascades to table s
drop cascades to table t
DROP SCHEMA
Time: 567.898 ms
CREATE SCHEMA
Time: 3.662 ms
SET
Time: 0.078 ms
CREATE TABLE
Time: 71.308 ms
CREATE TABLE
Time: 69.541 ms
INSERT 0 3865
Time: 669.934 ms
INSERT 0 10000000
Time: 63039.886 ms (01:03.040)
CREATE INDEX
Time: 61.875 ms
CREATE INDEX
Time: 77.114 ms
ALTER TABLE
Time: 54377.507 ms (00:54.378)
CREATE INDEX
Time: 89829.445 ms (01:29.829)
CREATE INDEX
Time: 55355.726 ms (00:55.356)
VACUUM
Time: 58.286 ms
VACUUM
Time: 7983.110 ms (00:07.983)
SET
Time: 0.131 ms
try NOT NOT
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                   QUERY PLAN                                                                   │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Only Scan using t_code3_code1_idx on t  (cost=92.00..816075.89 rows=4999882 width=69) (actual time=1.233..4312.916 rows=9849748 loops=1) │
│   Filter: (hashed SubPlan 1)                                                                                                                   │
│   Rows Removed by Filter: 150252                                                                                                               │
│   Heap Fetches: 0                                                                                                                              │
│   SubPlan 1                                                                                                                                    │
│     ->  Seq Scan on s  (cost=0.00..81.65 rows=3865 width=29) (actual time=0.005..0.423 rows=3865 loops=1)                                      │
│ Planning time: 0.247 ms                                                                                                                        │
│ Execution time: 4596.496 ms                                                                                                                    │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(8 rows)

Time: 4597.139 ms (00:04.597)
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                   QUERY PLAN                                                                   │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Only Scan using t_code3_code1_idx on t  (cost=92.00..816075.89 rows=4999882 width=69) (actual time=1.141..3608.677 rows=9849748 loops=1) │
│   Filter: (hashed SubPlan 1)                                                                                                                   │
│   Rows Removed by Filter: 150252                                                                                                               │
│   Heap Fetches: 0                                                                                                                              │
│   SubPlan 1                                                                                                                                    │
│     ->  Seq Scan on s  (cost=0.00..81.65 rows=3865 width=29) (actual time=0.019..0.436 rows=3865 loops=1)                                      │
│ Planning time: 0.108 ms                                                                                                                        │
│ Execution time: 3894.312 ms                                                                                                                    │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(8 rows)

Time: 3894.750 ms (00:03.895)
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                   QUERY PLAN                                                                   │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Only Scan using t_code3_code1_idx on t  (cost=92.00..816075.89 rows=4999882 width=69) (actual time=1.181..3486.537 rows=9849748 loops=1) │
│   Filter: (hashed SubPlan 1)                                                                                                                   │
│   Rows Removed by Filter: 150252                                                                                                               │
│   Heap Fetches: 0                                                                                                                              │
│   SubPlan 1                                                                                                                                    │
│     ->  Seq Scan on s  (cost=0.00..81.65 rows=3865 width=29) (actual time=0.009..0.451 rows=3865 loops=1)                                      │
│ Planning time: 0.108 ms                                                                                                                        │
│ Execution time: 3768.449 ms                                                                                                                    │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(8 rows)

Time: 3768.892 ms (00:03.769)
try straightforward
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                            QUERY PLAN                                                                             │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Gather Merge  (cost=1000.99..3222090.45 rows=9850227 width=69) (actual time=3.073..16761.902 rows=9849748 loops=1)                                                │
│   Workers Planned: 2                                                                                                                                              │
│   Workers Launched: 2                                                                                                                                             │
│   ->  Nested Loop Semi Join  (cost=0.97..2084129.83 rows=4104261 width=69) (actual time=0.065..13070.764 rows=3283249 loops=3)                                    │
│         ->  Parallel Index Only Scan using t_code3_code1_idx on t  (cost=0.69..732653.20 rows=4166569 width=98) (actual time=0.033..814.085 rows=3333333 loops=3) │
│               Heap Fetches: 0                                                                                                                                     │
│         ->  Index Only Scan using s_code1_code2_idx on s  (cost=0.28..0.35 rows=4 width=29) (actual time=0.003..0.003 rows=1 loops=10000000)                      │
│               Index Cond: (code1 = t.code1)                                                                                                                       │
│               Heap Fetches: 0                                                                                                                                     │
│ Planning time: 0.772 ms                                                                                                                                           │
│ Execution time: 17118.777 ms                                                                                                                                      │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

Time: 17119.864 ms (00:17.120)
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                            QUERY PLAN                                                                             │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Gather Merge  (cost=1000.99..3222090.45 rows=9850227 width=69) (actual time=3.042..16850.081 rows=9849748 loops=1)                                                │
│   Workers Planned: 2                                                                                                                                              │
│   Workers Launched: 2                                                                                                                                             │
│   ->  Nested Loop Semi Join  (cost=0.97..2084129.83 rows=4104261 width=69) (actual time=0.063..13143.046 rows=3283249 loops=3)                                    │
│         ->  Parallel Index Only Scan using t_code3_code1_idx on t  (cost=0.69..732653.20 rows=4166569 width=98) (actual time=0.032..805.428 rows=3333333 loops=3) │
│               Heap Fetches: 0                                                                                                                                     │
│         ->  Index Only Scan using s_code1_code2_idx on s  (cost=0.28..0.35 rows=4 width=29) (actual time=0.003..0.003 rows=1 loops=10000000)                      │
│               Index Cond: (code1 = t.code1)                                                                                                                       │
│               Heap Fetches: 0                                                                                                                                     │
│ Planning time: 0.756 ms                                                                                                                                           │
│ Execution time: 17208.877 ms                                                                                                                                      │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

Time: 17209.975 ms (00:17.210)
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                            QUERY PLAN                                                                             │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Gather Merge  (cost=1000.99..3222090.45 rows=9850227 width=69) (actual time=2.695..16704.513 rows=9849748 loops=1)                                                │
│   Workers Planned: 2                                                                                                                                              │
│   Workers Launched: 2                                                                                                                                             │
│   ->  Nested Loop Semi Join  (cost=0.97..2084129.83 rows=4104261 width=69) (actual time=0.061..13068.243 rows=3283249 loops=3)                                    │
│         ->  Parallel Index Only Scan using t_code3_code1_idx on t  (cost=0.69..732653.20 rows=4166569 width=98) (actual time=0.032..809.870 rows=3333333 loops=3) │
│               Heap Fetches: 0                                                                                                                                     │
│         ->  Index Only Scan using s_code1_code2_idx on s  (cost=0.28..0.35 rows=4 width=29) (actual time=0.003..0.003 rows=1 loops=10000000)                      │
│               Index Cond: (code1 = t.code1)                                                                                                                       │
│               Heap Fetches: 0                                                                                                                                     │
│ Planning time: 0.757 ms                                                                                                                                           │
│ Execution time: 17064.898 ms                                                                                                                                      │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

Time: 17065.950 ms (00:17.066)
try join
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                      QUERY PLAN                                                                      │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Nested Loop  (cost=92.00..148390176.21 rows=9849769 width=69) (actual time=1.313..615325.620 rows=9849748 loops=1)                                   │
│   Join Filter: (t.code1 = s.code1)                                                                                                                   │
│   Rows Removed by Join Filter: 4993383935                                                                                                            │
│   ->  Index Only Scan using t_code3_code1_idx on t  (cost=0.69..790985.16 rows=9999765 width=98) (actual time=0.032..2884.871 rows=10000000 loops=1) │
│         Heap Fetches: 0                                                                                                                              │
│   ->  Materialize  (cost=91.31..115.94 rows=985 width=29) (actual time=0.000..0.021 rows=500 loops=10000000)                                         │
│         ->  HashAggregate  (cost=91.31..101.16 rows=985 width=29) (actual time=1.087..1.198 rows=985 loops=1)                                        │
│               Group Key: s.code1                                                                                                                     │
│               ->  Seq Scan on s  (cost=0.00..81.65 rows=3865 width=29) (actual time=0.008..0.330 rows=3865 loops=1)                                  │
│ Planning time: 0.209 ms                                                                                                                              │
│ Execution time: 615652.812 ms                                                                                                                        │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

Time: 615653.353 ms (10:15.653)
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                      QUERY PLAN                                                                      │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Nested Loop  (cost=92.00..148390176.21 rows=9849769 width=69) (actual time=1.321..616491.724 rows=9849748 loops=1)                                   │
│   Join Filter: (t.code1 = s.code1)                                                                                                                   │
│   Rows Removed by Join Filter: 4993383935                                                                                                            │
│   ->  Index Only Scan using t_code3_code1_idx on t  (cost=0.69..790985.16 rows=9999765 width=98) (actual time=0.033..2916.248 rows=10000000 loops=1) │
│         Heap Fetches: 0                                                                                                                              │
│   ->  Materialize  (cost=91.31..115.94 rows=985 width=29) (actual time=0.000..0.021 rows=500 loops=10000000)                                         │
│         ->  HashAggregate  (cost=91.31..101.16 rows=985 width=29) (actual time=1.093..1.206 rows=985 loops=1)                                        │
│               Group Key: s.code1                                                                                                                     │
│               ->  Seq Scan on s  (cost=0.00..81.65 rows=3865 width=29) (actual time=0.008..0.336 rows=3865 loops=1)                                  │
│ Planning time: 0.218 ms                                                                                                                              │
│ Execution time: 616822.648 ms                                                                                                                        │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

Time: 616823.284 ms (10:16.823)
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                      QUERY PLAN                                                                      │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Nested Loop  (cost=92.00..148390176.21 rows=9849769 width=69) (actual time=1.507..616559.377 rows=9849748 loops=1)                                   │
│   Join Filter: (t.code1 = s.code1)                                                                                                                   │
│   Rows Removed by Join Filter: 4993383935                                                                                                            │
│   ->  Index Only Scan using t_code3_code1_idx on t  (cost=0.69..790985.16 rows=9999765 width=98) (actual time=0.039..2910.991 rows=10000000 loops=1) │
│         Heap Fetches: 0                                                                                                                              │
│   ->  Materialize  (cost=91.31..115.94 rows=985 width=29) (actual time=0.000..0.021 rows=500 loops=10000000)                                         │
│         ->  HashAggregate  (cost=91.31..101.16 rows=985 width=29) (actual time=1.226..1.396 rows=985 loops=1)                                        │
│               Group Key: s.code1                                                                                                                     │
│               ->  Seq Scan on s  (cost=0.00..81.65 rows=3865 width=29) (actual time=0.010..0.401 rows=3865 loops=1)                                  │
│ Planning time: 0.234 ms                                                                                                                              │
│ Execution time: 616890.813 ms                                                                                                                        │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

Time: 616891.480 ms (10:16.891)
try dirty tricks, but still not as fast
SET
Time: 0.113 ms
SET
Time: 0.090 ms
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                            QUERY PLAN                                                                             │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Gather Merge  (cost=1000.99..3222090.45 rows=9850227 width=69) (actual time=3.492..16450.801 rows=9849748 loops=1)                                                │
│   Workers Planned: 2                                                                                                                                              │
│   Workers Launched: 2                                                                                                                                             │
│   ->  Nested Loop Semi Join  (cost=0.97..2084129.83 rows=4104261 width=69) (actual time=0.084..12960.563 rows=3283249 loops=3)                                    │
│         ->  Parallel Index Only Scan using t_code3_code1_idx on t  (cost=0.69..732653.20 rows=4166569 width=98) (actual time=0.040..785.196 rows=3333333 loops=3) │
│               Heap Fetches: 0                                                                                                                                     │
│         ->  Index Only Scan using s_code1_code2_idx on s  (cost=0.28..0.35 rows=4 width=29) (actual time=0.003..0.003 rows=1 loops=10000000)                      │
│               Index Cond: (code1 = t.code1)                                                                                                                       │
│               Heap Fetches: 0                                                                                                                                     │
│ Planning time: 0.747 ms                                                                                                                                           │
│ Execution time: 16804.019 ms                                                                                                                                      │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

Time: 16805.085 ms (00:16.805)
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                            QUERY PLAN                                                                             │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Gather Merge  (cost=1000.99..3222090.45 rows=9850227 width=69) (actual time=3.023..16540.460 rows=9849748 loops=1)                                                │
│   Workers Planned: 2                                                                                                                                              │
│   Workers Launched: 2                                                                                                                                             │
│   ->  Nested Loop Semi Join  (cost=0.97..2084129.83 rows=4104261 width=69) (actual time=0.066..12983.642 rows=3283249 loops=3)                                    │
│         ->  Parallel Index Only Scan using t_code3_code1_idx on t  (cost=0.69..732653.20 rows=4166569 width=98) (actual time=0.035..808.068 rows=3333333 loops=3) │
│               Heap Fetches: 0                                                                                                                                     │
│         ->  Index Only Scan using s_code1_code2_idx on s  (cost=0.28..0.35 rows=4 width=29) (actual time=0.003..0.003 rows=1 loops=10000000)                      │
│               Index Cond: (code1 = t.code1)                                                                                                                       │
│               Heap Fetches: 0                                                                                                                                     │
│ Planning time: 0.751 ms                                                                                                                                           │
│ Execution time: 16895.938 ms                                                                                                                                      │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

Time: 16896.980 ms (00:16.897)
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                            QUERY PLAN                                                                             │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Gather Merge  (cost=1000.99..3222090.45 rows=9850227 width=69) (actual time=2.768..16495.479 rows=9849748 loops=1)                                                │
│   Workers Planned: 2                                                                                                                                              │
│   Workers Launched: 2                                                                                                                                             │
│   ->  Nested Loop Semi Join  (cost=0.97..2084129.83 rows=4104261 width=69) (actual time=0.061..12980.924 rows=3283249 loops=3)                                    │
│         ->  Parallel Index Only Scan using t_code3_code1_idx on t  (cost=0.69..732653.20 rows=4166569 width=98) (actual time=0.033..789.839 rows=3333333 loops=3) │
│               Heap Fetches: 0                                                                                                                                     │
│         ->  Index Only Scan using s_code1_code2_idx on s  (cost=0.28..0.35 rows=4 width=29) (actual time=0.003..0.003 rows=1 loops=10000000)                      │
│               Index Cond: (code1 = t.code1)                                                                                                                       │
│               Heap Fetches: 0                                                                                                                                     │
│ Planning time: 0.749 ms                                                                                                                                           │
│ Execution time: 16853.613 ms                                                                                                                                      │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

Time: 16854.722 ms (00:16.855)
SET
Time: 0.332 ms
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                 version                                                  │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ PostgreSQL 11devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4, 64-bit │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(1 row)

Time: 0.297 ms
Pager usage is off.
Timing is on.
psql:/home/alexey/701.sql:7: NOTICE:  drop cascades to 2 other objects
DETAIL:  drop cascades to table s
drop cascades to table t
DROP SCHEMA
Time: 599.657 ms
CREATE SCHEMA
Time: 5.045 ms
SET
Time: 0.086 ms
CREATE TABLE
Time: 9.046 ms
CREATE TABLE
Time: 5.092 ms
INSERT 0 3865
Time: 665.500 ms
INSERT 0 10000000
Time: 58605.960 ms (00:58.606)
CREATE INDEX
Time: 18.408 ms
CREATE INDEX
Time: 20.334 ms
ALTER TABLE
Time: 53155.667 ms (00:53.156)
CREATE INDEX
Time: 78656.668 ms (01:18.657)
CREATE INDEX
Time: 42493.020 ms (00:42.493)
VACUUM
Time: 36.282 ms
VACUUM
Time: 2120.070 ms (00:02.120)
SET
Time: 0.096 ms
try NOT NOT
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                   QUERY PLAN                                                                    │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Only Scan using t_code3_code1_idx on t  (cost=236.60..816237.45 rows=5000138 width=69) (actual time=1.915..3835.155 rows=9850647 loops=1) │
│   Filter: (hashed SubPlan 1)                                                                                                                    │
│   Rows Removed by Filter: 149353                                                                                                                │
│   Heap Fetches: 0                                                                                                                               │
│   SubPlan 1                                                                                                                                     │
│     ->  Index Only Scan using s_code1_code2_idx on s  (cost=0.28..226.25 rows=3865 width=29) (actual time=0.009..1.283 rows=3865 loops=1)       │
│           Heap Fetches: 0                                                                                                                       │
│ Planning time: 0.147 ms                                                                                                                         │
│ Execution time: 4103.452 ms                                                                                                                     │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)

Time: 4103.925 ms (00:04.104)
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                   QUERY PLAN                                                                    │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Only Scan using t_code3_code1_idx on t  (cost=236.60..816237.45 rows=5000138 width=69) (actual time=1.216..3355.182 rows=9850647 loops=1) │
│   Filter: (hashed SubPlan 1)                                                                                                                    │
│   Rows Removed by Filter: 149353                                                                                                                │
│   Heap Fetches: 0                                                                                                                               │
│   SubPlan 1                                                                                                                                     │
│     ->  Index Only Scan using s_code1_code2_idx on s  (cost=0.28..226.25 rows=3865 width=29) (actual time=0.019..0.574 rows=3865 loops=1)       │
│           Heap Fetches: 0                                                                                                                       │
│ Planning time: 0.087 ms                                                                                                                         │
│ Execution time: 3624.756 ms                                                                                                                     │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)

Time: 3625.149 ms (00:03.625)
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                   QUERY PLAN                                                                    │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Only Scan using t_code3_code1_idx on t  (cost=236.60..816237.45 rows=5000138 width=69) (actual time=1.129..3197.339 rows=9850647 loops=1) │
│   Filter: (hashed SubPlan 1)                                                                                                                    │
│   Rows Removed by Filter: 149353                                                                                                                │
│   Heap Fetches: 0                                                                                                                               │
│   SubPlan 1                                                                                                                                     │
│     ->  Index Only Scan using s_code1_code2_idx on s  (cost=0.28..226.25 rows=3865 width=29) (actual time=0.010..0.524 rows=3865 loops=1)       │
│           Heap Fetches: 0                                                                                                                       │
│ Planning time: 0.078 ms                                                                                                                         │
│ Execution time: 3455.053 ms                                                                                                                     │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)

Time: 3455.438 ms (00:03.455)
try straightforward
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                             QUERY PLAN                                                                             │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Gather Merge  (cost=1000.99..3222616.99 rows=9850524 width=69) (actual time=3.908..22078.676 rows=9850647 loops=1)                                                 │
│   Workers Planned: 2                                                                                                                                               │
│   Workers Launched: 2                                                                                                                                              │
│   ->  Nested Loop Semi Join  (cost=0.97..2084622.08 rows=4104385 width=69) (actual time=0.070..17224.643 rows=3283549 loops=3)                                     │
│         ->  Parallel Index Only Scan using t_code3_code1_idx on t  (cost=0.69..732665.89 rows=4166782 width=98) (actual time=0.029..1096.818 rows=3333333 loops=3) │
│               Heap Fetches: 0                                                                                                                                      │
│         ->  Index Only Scan using s_code1_code2_idx on s  (cost=0.28..0.35 rows=4 width=29) (actual time=0.004..0.004 rows=1 loops=10000000)                       │
│               Index Cond: (code1 = t.code1)                                                                                                                        │
│               Heap Fetches: 0                                                                                                                                      │
│ Planning time: 0.674 ms                                                                                                                                            │
│ Execution time: 22538.693 ms                                                                                                                                       │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

Time: 22539.708 ms (00:22.540)
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                             QUERY PLAN                                                                             │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Gather Merge  (cost=1000.99..3222616.99 rows=9850524 width=69) (actual time=2.551..23439.469 rows=9850647 loops=1)                                                 │
│   Workers Planned: 2                                                                                                                                               │
│   Workers Launched: 2                                                                                                                                              │
│   ->  Nested Loop Semi Join  (cost=0.97..2084622.08 rows=4104385 width=69) (actual time=0.066..18309.754 rows=3283549 loops=3)                                     │
│         ->  Parallel Index Only Scan using t_code3_code1_idx on t  (cost=0.69..732665.89 rows=4166782 width=98) (actual time=0.033..1148.646 rows=3333333 loops=3) │
│               Heap Fetches: 0                                                                                                                                      │
│         ->  Index Only Scan using s_code1_code2_idx on s  (cost=0.28..0.35 rows=4 width=29) (actual time=0.005..0.005 rows=1 loops=10000000)                       │
│               Index Cond: (code1 = t.code1)                                                                                                                        │
│               Heap Fetches: 0                                                                                                                                      │
│ Planning time: 0.728 ms                                                                                                                                            │
│ Execution time: 23921.557 ms                                                                                                                                       │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

Time: 23922.638 ms (00:23.923)
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                             QUERY PLAN                                                                             │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Gather Merge  (cost=1000.99..3222616.99 rows=9850524 width=69) (actual time=9.089..24387.297 rows=9850647 loops=1)                                                 │
│   Workers Planned: 2                                                                                                                                               │
│   Workers Launched: 2                                                                                                                                              │
│   ->  Nested Loop Semi Join  (cost=0.97..2084622.08 rows=4104385 width=69) (actual time=0.075..18991.672 rows=3283549 loops=3)                                     │
│         ->  Parallel Index Only Scan using t_code3_code1_idx on t  (cost=0.69..732665.89 rows=4166782 width=98) (actual time=0.035..1204.805 rows=3333333 loops=3) │
│               Heap Fetches: 0                                                                                                                                      │
│         ->  Index Only Scan using s_code1_code2_idx on s  (cost=0.28..0.35 rows=4 width=29) (actual time=0.005..0.005 rows=1 loops=10000000)                       │
│               Index Cond: (code1 = t.code1)                                                                                                                        │
│               Heap Fetches: 0                                                                                                                                      │
│ Planning time: 0.691 ms                                                                                                                                            │
│ Execution time: 24898.043 ms                                                                                                                                       │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

Time: 24899.088 ms (00:24.899)
try join
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                      QUERY PLAN                                                                       │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Nested Loop  (cost=0.97..148397891.44 rows=9850273 width=69) (actual time=0.788..503048.784 rows=9850647 loops=1)                                     │
│   Join Filter: (t.code1 = s.code1)                                                                                                                    │
│   Rows Removed by Join Filter: 4993406349                                                                                                             │
│   ->  Index Only Scan using t_code3_code1_idx on t  (cost=0.69..791000.84 rows=10000277 width=98) (actual time=0.028..3009.261 rows=10000000 loops=1) │
│         Heap Fetches: 0                                                                                                                               │
│   ->  Materialize  (cost=0.28..250.69 rows=985 width=29) (actual time=0.000..0.017 rows=500 loops=10000000)                                           │
│         ->  Unique  (cost=0.28..235.92 rows=985 width=29) (actual time=0.007..0.853 rows=985 loops=1)                                                 │
│               ->  Index Only Scan using s_code1_code2_idx on s  (cost=0.28..226.25 rows=3865 width=29) (actual time=0.006..0.526 rows=3865 loops=1)   │
│                     Heap Fetches: 0                                                                                                                   │
│ Planning time: 0.175 ms                                                                                                                               │
│ Execution time: 503359.327 ms                                                                                                                         │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

Time: 503359.859 ms (08:23.360)
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                      QUERY PLAN                                                                       │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Nested Loop  (cost=0.97..148397891.44 rows=9850273 width=69) (actual time=0.706..503826.677 rows=9850647 loops=1)                                     │
│   Join Filter: (t.code1 = s.code1)                                                                                                                    │
│   Rows Removed by Join Filter: 4993406349                                                                                                             │
│   ->  Index Only Scan using t_code3_code1_idx on t  (cost=0.69..791000.84 rows=10000277 width=98) (actual time=0.023..3122.452 rows=10000000 loops=1) │
│         Heap Fetches: 0                                                                                                                               │
│   ->  Materialize  (cost=0.28..250.69 rows=985 width=29) (actual time=0.000..0.017 rows=500 loops=10000000)                                           │
│         ->  Unique  (cost=0.28..235.92 rows=985 width=29) (actual time=0.006..0.791 rows=985 loops=1)                                                 │
│               ->  Index Only Scan using s_code1_code2_idx on s  (cost=0.28..226.25 rows=3865 width=29) (actual time=0.006..0.489 rows=3865 loops=1)   │
│                     Heap Fetches: 0                                                                                                                   │
│ Planning time: 0.153 ms                                                                                                                               │
│ Execution time: 504149.007 ms                                                                                                                         │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

Time: 504149.525 ms (08:24.150)
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                      QUERY PLAN                                                                       │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Nested Loop  (cost=0.97..148397891.44 rows=9850273 width=69) (actual time=1.145..504033.354 rows=9850647 loops=1)                                     │
│   Join Filter: (t.code1 = s.code1)                                                                                                                    │
│   Rows Removed by Join Filter: 4993406349                                                                                                             │
│   ->  Index Only Scan using t_code3_code1_idx on t  (cost=0.69..791000.84 rows=10000277 width=98) (actual time=0.030..3079.642 rows=10000000 loops=1) │
│         Heap Fetches: 0                                                                                                                               │
│   ->  Materialize  (cost=0.28..250.69 rows=985 width=29) (actual time=0.000..0.017 rows=500 loops=10000000)                                           │
│         ->  Unique  (cost=0.28..235.92 rows=985 width=29) (actual time=0.009..1.154 rows=985 loops=1)                                                 │
│               ->  Index Only Scan using s_code1_code2_idx on s  (cost=0.28..226.25 rows=3865 width=29) (actual time=0.008..0.697 rows=3865 loops=1)   │
│                     Heap Fetches: 0                                                                                                                   │
│ Planning time: 0.235 ms                                                                                                                               │
│ Execution time: 504349.079 ms                                                                                                                         │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

Time: 504349.651 ms (08:24.350)
try dirty tricks, but still not as fast
SET
Time: 0.110 ms
SET
Time: 0.072 ms
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                             QUERY PLAN                                                                             │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Gather Merge  (cost=1000.99..3222616.99 rows=9850524 width=69) (actual time=5.662..22534.438 rows=9850647 loops=1)                                                 │
│   Workers Planned: 2                                                                                                                                               │
│   Workers Launched: 2                                                                                                                                              │
│   ->  Nested Loop Semi Join  (cost=0.97..2084622.08 rows=4104385 width=69) (actual time=0.056..17632.458 rows=3283549 loops=3)                                     │
│         ->  Parallel Index Only Scan using t_code3_code1_idx on t  (cost=0.69..732665.89 rows=4166782 width=98) (actual time=0.025..1123.205 rows=3333333 loops=3) │
│               Heap Fetches: 0                                                                                                                                      │
│         ->  Index Only Scan using s_code1_code2_idx on s  (cost=0.28..0.35 rows=4 width=29) (actual time=0.004..0.004 rows=1 loops=10000000)                       │
│               Index Cond: (code1 = t.code1)                                                                                                                        │
│               Heap Fetches: 0                                                                                                                                      │
│ Planning time: 0.735 ms                                                                                                                                            │
│ Execution time: 23076.513 ms                                                                                                                                       │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

Time: 23077.568 ms (00:23.078)
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                             QUERY PLAN                                                                             │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Gather Merge  (cost=1000.99..3222616.99 rows=9850524 width=69) (actual time=6.082..24545.920 rows=9850647 loops=1)                                                 │
│   Workers Planned: 2                                                                                                                                               │
│   Workers Launched: 2                                                                                                                                              │
│   ->  Nested Loop Semi Join  (cost=0.97..2084622.08 rows=4104385 width=69) (actual time=0.072..19204.305 rows=3283549 loops=3)                                     │
│         ->  Parallel Index Only Scan using t_code3_code1_idx on t  (cost=0.69..732665.89 rows=4166782 width=98) (actual time=0.035..1202.854 rows=3333333 loops=3) │
│               Heap Fetches: 0                                                                                                                                      │
│         ->  Index Only Scan using s_code1_code2_idx on s  (cost=0.28..0.35 rows=4 width=29) (actual time=0.005..0.005 rows=1 loops=10000000)                       │
│               Index Cond: (code1 = t.code1)                                                                                                                        │
│               Heap Fetches: 0                                                                                                                                      │
│ Planning time: 0.707 ms                                                                                                                                            │
│ Execution time: 25033.991 ms                                                                                                                                       │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

Time: 25035.075 ms (00:25.035)
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                             QUERY PLAN                                                                             │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Gather Merge  (cost=1000.99..3222616.99 rows=9850524 width=69) (actual time=5.388..24322.633 rows=9850647 loops=1)                                                 │
│   Workers Planned: 2                                                                                                                                               │
│   Workers Launched: 2                                                                                                                                              │
│   ->  Nested Loop Semi Join  (cost=0.97..2084622.08 rows=4104385 width=69) (actual time=0.076..19032.374 rows=3283549 loops=3)                                     │
│         ->  Parallel Index Only Scan using t_code3_code1_idx on t  (cost=0.69..732665.89 rows=4166782 width=98) (actual time=0.037..1191.194 rows=3333333 loops=3) │
│               Heap Fetches: 0                                                                                                                                      │
│         ->  Index Only Scan using s_code1_code2_idx on s  (cost=0.28..0.35 rows=4 width=29) (actual time=0.005..0.005 rows=1 loops=10000000)                       │
│               Index Cond: (code1 = t.code1)                                                                                                                        │
│               Heap Fetches: 0                                                                                                                                      │
│ Planning time: 1.073 ms                                                                                                                                            │
│ Execution time: 24820.320 ms                                                                                                                                       │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

Time: 24821.705 ms (00:24.822)

It should be noted that, in case of data growth, Hashed Subplan can turn into an ordinary Subplan, with performance degrading royally. Use the trick only if you trust the underlying query result cardinality not to go up rapidly.

Leave a comment