mirror of
https://github.com/apache/impala.git
synced 2026-01-08 03:02:48 -05:00
IMPALA-6324: Support reading RLE-encoded boolean values in Parquet scanner
Impala already supported RLE encoding for levels and dictionary pages, so the only task was to integrate it into BoolColumnReader. A new benchmark, rle-benchmark.cc is added to test the speed of RLE decoding for different bit widths and run lengths. There might be a small performance impact on PLAIN encoded booleans, because of the additional branch when the cache of BoolColumnReader is filled. As the cache size is 128, I considered this to be outside the "hot loop". Testing: As Impala cannot write RLE encoded bool columns at the moment, parquet-mr was used to create a test file, testdata/data/rle_encoded_bool.parquet tests/query_test/test_scanners.py#test_rle_encoded_bools creates a table that uses this file, and tries to query from it. Change-Id: I4644bf8cf5d2b7238b05076407fbf78ab5d2c14f Reviewed-on: http://gerrit.cloudera.org:8080/9403 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins
This commit is contained in:
committed by
Impala Public Jenkins
parent
d03b66ca35
commit
588e1d46e9
17
testdata/workloads/functional-query/queries/QueryTest/parquet-rle-encoded-bool.test
vendored
Normal file
17
testdata/workloads/functional-query/queries/QueryTest/parquet-rle-encoded-bool.test
vendored
Normal file
@@ -0,0 +1,17 @@
|
||||
====
|
||||
---- QUERY
|
||||
# Verify that the total number of True values is correct.
|
||||
select count(*) from rle_encoded_bool where b;
|
||||
---- TYPES
|
||||
BIGINT
|
||||
---- RESULTS
|
||||
140
|
||||
====
|
||||
---- QUERY
|
||||
# Verify that the bool and int values match.
|
||||
select count(*) from rle_encoded_bool where (b and i = 1) or (not b and i = 0);
|
||||
---- TYPES
|
||||
BIGINT
|
||||
---- RESULTS
|
||||
279
|
||||
====
|
||||
Reference in New Issue
Block a user