IMPALA-5554: sorter DCHECK on null column

The bug was in the DCHECK. The DCHECK is intended to make sure that a
tuple's string data didn't get split across blocks. The logic assumed
that if the second-or-later string column was in the next-block, that
the strings were split between blocks. However, that assumption is
invalid if there are NULL strings, which do not belong in any block.

The fix for the DCHECK (which is still useful) is to count the number
of non-NULL strings and make sure that no non-NULL strings were split
between blocks.

Testing:
Added a test that reproduces the crash.

Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90
Reviewed-on: http://gerrit.cloudera.org:8080/7295
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
This commit is contained in:
Tim Armstrong
2017-06-25 15:37:33 -07:00
committed by Impala Public Jenkins
parent d9fc9be021
commit 4a3ef9c773
2 changed files with 39 additions and 3 deletions

View File

@@ -29,6 +29,38 @@ STRING,STRING,STRING
'Faaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa','',''
'Faaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa','',''
---- RUNTIME_PROFILE
row_regex: .* TotalMergesPerformed: [^0] .*
row_regex: .* SpilledRuns: [^0] .*
row_regex: .* TotalMergesPerformed: [^0].*
row_regex: .* SpilledRuns: [^0].*
====
---- QUERY
# Regression test for IMPALA-5554: first string column in sort tuple is null
# on boundary of spilled block. Test does two sorts with a NULL and non-NULL
# string column in both potential orders.
set max_block_mgr_memory=50m;
select *
from (
select *, first_value(col) over (order by sort_col) fv
from (
select concat(l_linestatus, repeat('a', 63)) sort_col, if(l_returnflag = 'foo', l_returnflag, NULL) col
from tpch_parquet.lineitem limit 100000
union all
select if(l_returnflag = 'foo', l_returnflag, NULL) sort_col, concat(l_linestatus, repeat('a', 63)) col
from tpch_parquet.lineitem) q limit 100000
) q2
limit 10
---- TYPES
STRING,STRING,STRING
---- RESULTS
'Faaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa','NULL','NULL'
'Faaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa','NULL','NULL'
'Faaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa','NULL','NULL'
'Faaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa','NULL','NULL'
'Faaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa','NULL','NULL'
'Faaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa','NULL','NULL'
'Faaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa','NULL','NULL'
'Faaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa','NULL','NULL'
'Faaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa','NULL','NULL'
'Faaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa','NULL','NULL'
---- RUNTIME_PROFILE
row_regex: .* SpilledRuns: [^0].*
====