IMPALA-6258: Uninitialized tuple pointers in row batch for empty rows

Tuple pointers in the generated row batches may not be initialized
if a tuple has byte size 0. There are some codes which compare these
uninitialized pointers against nullptr so having them uninitialized
may return wrong (and non-deterministic) results, e.g.:
impala::TupleIsNullPredicate::GetBooleanVal()

The following query produces non-deterministic results currently:

SELECT count(v.x) FROM functional.alltypestiny t3 LEFT OUTER JOIN (
  SELECT true AS x FROM functional.alltypestiny t1 LEFT OUTER JOIN
  functional.alltypestiny t2 ON (true)) v ON (v.x = t3.bool_col)
WHERE t3.bool_col = true;

The alltypestiny table has 8 records, 4 records of them has the true
boolean value for bool_col. Therefore, the above query is a fancy
way of saying "8 * 8 * 4", i.e. it should return 256.

The solution is that scanners initialize tuple pointers to a non-null
value when there are no materialized slots. This non-null value is
provided by the new static member Tuple::POISON.

I extended QueryTest/scanners.test with the above query. This test
executes the query against all table formats.

This change has the biggest performance impact on count(*) queries on
large kudu tables. For my quick benchmark I copied tpch_kudu.lineitem
and doubled its data. The resulting table has 12,002,430 rows.

Without this patch 'select count(*) from biglineitem' runs for ~0.12s.
With the patch applied, the overhead is around a dozens of ms. I measured
the query on my desktop PC using a relase build of Impala. On debug builds,
the execution time of the patched version is around 160% of the original
version.

Without this patch:
+--------------+--------+----------+----------+--------+------------+-----------+---------------+---------------------+
| Operator     | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows | Peak Mem  | Est. Peak Mem | Detail              |
+--------------+--------+----------+----------+--------+------------+-----------+---------------+---------------------+
| 03:AGGREGATE | 1      | 127.50us | 127.50us | 1      | 1          | 28.00 KB  | 10.00 MB      | FINALIZE            |
| 02:EXCHANGE  | 1      | 22.32ms  | 22.32ms  | 3      | 1          | 0 B       | 0 B           | UNPARTITIONED       |
| 01:AGGREGATE | 3      | 1.78ms   | 1.89ms   | 3      | 1          | 16.00 KB  | 10.00 MB      |                     |
| 00:SCAN KUDU | 3      | 8.00ms   | 8.28ms   | 12.00M | -1         | 512.00 KB | 0 B           | default.biglineitem |
+--------------+--------+----------+----------+--------+------------+-----------+---------------+---------------------+

With this patch:
+--------------+--------+----------+----------+--------+------------+-----------+---------------+---------------------+
| Operator     | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows | Peak Mem  | Est. Peak Mem | Detail              |
+--------------+--------+----------+----------+--------+------------+-----------+---------------+---------------------+
| 03:AGGREGATE | 1      | 129.01us | 129.01us | 1      | 1          | 28.00 KB  | 10.00 MB      | FINALIZE            |
| 02:EXCHANGE  | 1      | 33.00ms  | 33.00ms  | 3      | 1          | 0 B       | 0 B           | UNPARTITIONED       |
| 01:AGGREGATE | 3      | 1.99ms   | 2.13ms   | 3      | 1          | 16.00 KB  | 10.00 MB      |                     |
| 00:SCAN KUDU | 3      | 13.13ms  | 13.97ms  | 12.00M | -1         | 512.00 KB | 0 B           | default.biglineitem |
+--------------+--------+----------+----------+--------+------------+-----------+---------------+---------------------+

Change-Id: I298122aaaa7e62eb5971508e0698e189519755de
Reviewed-on: http://gerrit.cloudera.org:8080/9239
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
This commit is contained in:
Zoltan Borok-Nagy
2018-02-07 17:05:00 +01:00
committed by Impala Public Jenkins
parent a8fc9f0fc7
commit 0eaab69fff
5 changed files with 27 additions and 2 deletions

View File

@@ -98,3 +98,15 @@ select count(*) from alltypes where rand() - year > month;
---- TYPES
BIGINT
====
---- QUERY
# IMPALA-6258: Uninitialized tuple pointers in row batch for empty rows
# The following query was non-deterministic because of this bug
select count(v.x) from alltypestiny t3 left outer join (
select true as x from alltypestiny t1 left outer join
alltypestiny t2 on (true)) v
on (v.x = t3.bool_col) where t3.bool_col = true
---- RESULTS
256
---- TYPES
BIGINT
====