Files
impala/testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test
Tim Armstrong b2d9901fb8 IMPALA-9176: shared null-aware anti-join build
This switches null-aware anti-join (NAAJ) to use shared
join builds with mt_dop > 0. To support this, we
make all access to the join build data structures
from the probe read-only. NAAJ requires iterating
over rows from build partitions at various steps
in the algorithm and before this patch this was not
thread-safe. We avoided that problem by having a
separate builder for each join node and duplicating
the data.

The main challenge was iteration over
null_aware_partition()->build_rows() from the probe
side, because it uses an embedded iterator in the
stream so was not thread-safe (since each thread
would be trying to use the same iterator).

The solution is to extend BufferedTupleStream to
allow multiple read iterators into a pinned,
read-only, stream. Each probe thread can then
iterate over the stream independently with no
thread safety issues.

With BufferedTupleStream changes, I partially abstracted
ReadIterator more from the rest of BufferedTupleStream,
but decided not to completely refactor so that this patchset
didn't cause excessive churn. I.e. much BufferedTupleStream
code still accesses internal fields of ReadIterator.

Fix a pre-existing bug in grouping-aggregator where
Spill() hit a DCHECK because the hash table was
destroyed unnecessarily when it hit an OOM. This was
flushed out by the parameter change in test_spilling.

Testing:
Add test to buffered-tuple-stream-test for multiple readers
to BTS.

Tweaked test_spilling_naaj_no_deny_reservation to have
a smaller minimum reservation, required to keep the
test passing with the new, lower, memory requirement.

Updated a TPC-H planner test where resource requirements
slightly decreased for the NAAJ.

Ran the naaj tests in test_spilling.py with TSAN enabled,
confirmed no data races.

Ran exhaustive tests, which passed after fixing IMPALA-9611.

Ran core tests with ASAN.

Ran backend tests with TSAN.

Perf:
I ran this query that exercises EvaluateNullProbe() heavily.

  select l_orderkey, l_partkey, l_suppkey, l_linenumber
  from tpch30_parquet.lineitem
  where l_suppkey = 4162 and l_shipmode = 'AIR'
        and l_returnflag = 'A' and l_shipdate > '1993-01-01'
        and if(l_orderkey > 5500000, NULL, l_orderkey) not in (
          select if(o_orderkey % 2 = 0, NULL, o_orderkey + 1)
          from orders
          where l_orderkey = o_orderkey)
  order by 1,2,3,4;

It went from ~13s to ~11s running on a single impalad with
this change, because of the inlining of CreateOutputRow() and
EvalConjuncts().

I also ran TPC-H SF 30 on Parquet with mt_dop=4, and there was
no change in performance.

Change-Id: I95ead761430b0aa59a4fb2e7848e47d1bf73c1c9
Reviewed-on: http://gerrit.cloudera.org:8080/15612
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-24 20:56:58 +00:00

128 lines
3.8 KiB
Plaintext

# This file contains tests where we don't want the python test framework to supply the
# debug_action value because the test won't succeed with all possible debug_action values.
====
---- QUERY
# Tests for the case where a spilled partition has 0 probe rows and so we don't build the
# hash table in a partitioned hash join. Always runs with the minimum reservation to force
# spilling.
# INNER JOIN
set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
select straight_join count(*)
from
lineitem a, lineitem b
where
a.l_partkey = 1 and
a.l_orderkey = b.l_orderkey;
---- TYPES
BIGINT
---- RESULTS
173
---- RUNTIME_PROFILE
row_regex: .*NumHashTableBuildsSkipped: .* \([1-9][0-9]*\)
====
---- QUERY
# spilled partition with 0 probe rows, NULL AWARE LEFT ANTI JOIN
set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
select straight_join count(*)
from
lineitem a
where
a.l_partkey not in (select l_partkey from lineitem where l_partkey > 10)
and a.l_partkey < 1000;
---- TYPES
BIGINT
---- RESULTS
287
---- RUNTIME_PROFILE
row_regex: .*NumHashTableBuildsSkipped: .* \([1-9][0-9]*\)
====
---- QUERY
# spilled partition with 0 probe rows, RIGHT OUTER JOIN
set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
select straight_join count(*)
from
supplier right outer join lineitem on s_suppkey = l_suppkey
where s_acctbal > 0 and s_acctbal < 10;
---- TYPES
BIGINT
---- RESULTS
12138
---- RUNTIME_PROFILE
row_regex: .*NumHashTableBuildsSkipped: .* \([1-9][0-9]*\)
====
---- QUERY
# spilled partition with 0 probe rows, RIGHT OUTER JOIN
# Setting max_row_size == default_spillable_buffer_size was sufficient to trigger
# IMPALA-9349, because it means there is no surplus reservation during repartitioning.
set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
select straight_join count(*)
from
supplier right outer join lineitem on s_suppkey = l_suppkey
where s_acctbal > 0 and s_acctbal < 10;
---- TYPES
BIGINT
---- RESULTS
12138
---- RUNTIME_PROFILE
row_regex: .*NumHashTableBuildsSkipped: .* \([1-9][0-9]*\)
====
---- QUERY
# spilled partition with 0 probe rows, RIGHT ANTI JOIN
set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
with x as (select * from supplier limit 10)
select straight_join count(*)
from
x right anti join lineitem on s_suppkey + 100 = l_suppkey;
---- TYPES
BIGINT
---- RESULTS
5995258
---- RUNTIME_PROFILE
row_regex: .*NumHashTableBuildsSkipped: .* \([1-9][0-9]*\)
====
---- QUERY
# Aggregation query that will OOM and fail to spill because of IMPALA-3304 without
# any help from DEBUG_ACTION.
set mem_limit=75m;
select l_orderkey, group_concat(l_comment) comments
from lineitem
group by l_orderkey
order by comments desc
limit 5
---- CATCH
Memory limit exceeded
====
---- QUERY
# Top-N query with large limit that will OOM because spilling is not implemented:
# IMPALA-3471. It does not need any help from DEBUG_ACTION.
set topn_bytes_limit=-1;
set mem_limit=100m;
select *
from lineitem
order by l_orderkey desc
limit 6000000
---- CATCH
Memory limit exceeded
====
---- QUERY
# Hash join that will fail to repartition and therefore fail from out-of-memory because
# of a large number of duplicate keys on the build side: IMPALA-4857. It does not need
# any help from DEBUG_ACTION.
set mem_limit=250m;
select straight_join *
from supplier join /* +broadcast */ lineitem on s_suppkey = l_linenumber
order by l_tax desc
limit 5
---- CATCH
row_regex:.*Cannot perform hash join at node with id .*. Repartitioning did not reduce the size of a spilled partition.*
====
---- QUERY
# Analytic query with certain kinds of large windows can't be spilled: IMPALA-5738. It
# does not need any help from DEBUG_ACTION.
set mem_limit=100m;
select avg(l_tax) over (order by l_orderkey rows between 100000000 preceding and 10000000 following)
from lineitem
---- CATCH
Memory limit exceeded
====