mirror of
https://github.com/apache/impala.git
synced 2025-12-19 18:12:08 -05:00
IMPALA-3964: Fix crash when a count(*) is performed on a nested collection.
The Bug: Prior to this patch, a DCHECK was used to verify that the underlying memory pool for the scratch batch was empty in a count based scenario. For IMPALA-3964 (where a count(*) is performed on a nested collection), if a Parquet column chunk is compressed, upon reading each new data page it would be decompressed and eventually placed in to the underlying scratch batch memory pool causing the aforementioned DCHECK to fail. This was not picked up in the test suite as the TPCH nested Parquet data is not compressed. The Fix: Removed the erroneous DCHECK. Added logic to determine if any memory in the scratch batch needs to be freed (due to the transfer that occurs from the decompressed data pool), if so, it will be done. Augmented the load_nested.py script to snappy compress each of the tables within the 'tpch_nested_parquet' database. This is consistent with how the flat TPCH Parquet data set is stored. Regarding test coverage, there are already a number of tests that will perform nested collection counts against the tables in the 'tpch_nested_parquet' database. For uncompressed nested Parquet, the 'test_nested_types.py' test suite leverages the 'ComplexTypesTbl' table to provide good coverage. Change-Id: Id0955c85d18dfba4bd29a35ec95d0355da050607 Reviewed-on: http://gerrit.cloudera.org:8080/3940 Reviewed-by: Michael Ho <kwho@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins
This commit is contained in:
committed by
Internal Jenkins
parent
3a630a5d68
commit
90a6b3206e
3
testdata/bin/load_nested.py
vendored
3
testdata/bin/load_nested.py
vendored
@@ -257,6 +257,7 @@ def load():
|
||||
|
||||
CREATE TABLE customer
|
||||
STORED AS PARQUET
|
||||
TBLPROPERTIES('parquet.compression'='SNAPPY')
|
||||
AS SELECT * FROM tmp_customer;
|
||||
|
||||
DROP TABLE tmp_orders_string;
|
||||
@@ -265,6 +266,7 @@ def load():
|
||||
|
||||
CREATE TABLE region
|
||||
STORED AS PARQUET
|
||||
TBLPROPERTIES('parquet.compression'='SNAPPY')
|
||||
AS SELECT * FROM tmp_region;
|
||||
|
||||
DROP TABLE tmp_region_string;
|
||||
@@ -272,6 +274,7 @@ def load():
|
||||
|
||||
CREATE TABLE supplier
|
||||
STORED AS PARQUET
|
||||
TBLPROPERTIES('parquet.compression'='SNAPPY')
|
||||
AS SELECT * FROM tmp_supplier;
|
||||
|
||||
DROP TABLE tmp_supplier;
|
||||
|
||||
Reference in New Issue
Block a user