Files
impala/testdata/workloads/functional-query/queries/QueryTest/spilling.test
Tim Armstrong fcf08d1822 IMPALA-9725: incorrect spilling join results for wide keys
The control flow was broken if the join operator hit
the end of the expression values cache before the end
of the probe batch, immediately after processing a row
for a spilled partition. In NextProbeRow(), the problematic
code path was:
* The last row in the expression values cache was for a
  spilled partition, so skip_row=true and it falls out
  of the loop with 'current_probe_row_' pointing to that
  row.
* probe_batch_iterator->AtEnd() is false, because
  the expression value cache is smaller than the probe batch,
  so 'current_probe_row_' is not nulled out.

Thus we end up in a state where 'current_probe_row_' is
set, but 'hash_table_iterator_' is unset.

In the case of a left anti join, this was interpreted by
ProcessProbeRowLeftSemiJoins() as meaning that there was
no hash table match for 'current_probe_row_', and it
therefore returned the row.

This bug could only occur under specific circumstances:
* The join key takes up > 256 bytes in the expression values
  cache (assuming the default batch size of 1024).
* The join spilled.
* The join operator returns rows that were unmatched in
  the right input, i.e. LEFT OUTER JOIN, LEFT ANTI JOIN,
  FULL OUTER JOIN.

The core of the fix is to null out 'current_probe_row_' when
falling out the bottom of the loop in NextProbeRow(). Related
DCHECKS were fixed and some control flow was slightly
simplified.

Testing:
Added a test query on TPC-H that reproduces the problem reliably.

Ran exhaustive tests.

Change-Id: I9d7e5871c35a90e8cf24b8dded04775ee1eae9d8
Reviewed-on: http://gerrit.cloudera.org:8080/15904
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-05-15 16:49:32 +00:00

16 KiB