IMPALA-6324: Support reading RLE-encoded boolean values in Parquet scanner

Impala already supported RLE encoding for levels and dictionary pages, so
the only task was to integrate it into BoolColumnReader.

A new benchmark, rle-benchmark.cc is added to test the speed of RLE
decoding for different bit widths and run lengths.

There might be a small performance impact on PLAIN encoded booleans,
because of the additional branch when the cache of BoolColumnReader is
filled. As the cache size is 128, I considered this to be outside the
"hot loop".

Testing:

As Impala cannot write RLE encoded bool columns at the moment, parquet-mr
was used to create a test file, testdata/data/rle_encoded_bool.parquet

tests/query_test/test_scanners.py#test_rle_encoded_bools creates a table
that uses this file, and tries to query from it.

Change-Id: I4644bf8cf5d2b7238b05076407fbf78ab5d2c14f
Reviewed-on: http://gerrit.cloudera.org:8080/9403
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
This commit is contained in:
Tim Armstrong
2018-02-09 12:00:54 -08:00
committed by Impala Public Jenkins
parent d03b66ca35
commit 588e1d46e9
10 changed files with 331 additions and 53 deletions

View File

@@ -0,0 +1,17 @@
====
---- QUERY
# Verify that the total number of True values is correct.
select count(*) from rle_encoded_bool where b;
---- TYPES
BIGINT
---- RESULTS
140
====
---- QUERY
# Verify that the bool and int values match.
select count(*) from rle_encoded_bool where (b and i = 1) or (not b and i = 0);
---- TYPES
BIGINT
---- RESULTS
279
====