IMPALA-10482, IMPALA-10493: Fix bugs in full ACID collection query rewrites

IMPALA-10482: SELECT * query on unrelative collection column of
transactional ORC table will hit IllegalStateException.

The AcidRewriter will rewrite queries like
"select item from my_complex_orc.int_array" to
"select item from my_complex_orc t, t.int_array"

This cause troubles in star expansion. Because the original query
"select * from my_complex_orc.int_array" is analyzed as
"select item from my_complex_orc.int_array"

But the rewritten query "select * from my_complex_orc t, t.int_array" is
analyzed as "select id, item from my_complex_orc t, t.int_array".

Hidden table refs can also cause issues during regular column
resolution. E.g. when the table has top-level 'pos'/'item'/'key'/'value'
columns.

The workaround is to keep track of the automatically added table refs
during query rewrite. So when we analyze the rewritten query we can
ignore these auxiliary table refs.

IMPALA-10493: Using JOIN ON syntax to join two full ACID collections
produces wrong results.

When AcidRewriter.splitCollectionRef() creates a new collection ref
it doesn't copy every information needed to correctly execute the
query. E.g. it dropped the ON clause, turning INNER joins to CROSS
joins.

Testing:
 * added e2e tests

Change-Id: I8fc758d3c1e75c7066936d590aec8bff8d2b00b0
Reviewed-on: http://gerrit.cloudera.org:8080/17038
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
Zoltan Borok-Nagy
2021-02-08 12:58:09 +01:00
committed by Impala Public Jenkins
parent 676f79aa81
commit f0f083e45e
8 changed files with 233 additions and 2 deletions

View File

@@ -800,6 +800,20 @@ DELETE FROM {db_name}{db_suffix}.{table_name} WHERE id % 2 = 0;
---- DATASET
functional
---- BASE_TABLE_NAME
pos_item_key_value_complextypestbl
---- COLUMNS
pos bigint
item int
key string
value int
int_array array<int>
int_map map<string, int>
---- DEPENDENT_LOAD_HIVE
INSERT OVERWRITE TABLE {db_name}{db_suffix}.{table_name} SELECT id, id, CAST(id AS STRING), CAST(id AS STRING), int_array, int_map FROM {db_name}{db_suffix}.complextypestbl;
====
---- DATASET
functional
---- BASE_TABLE_NAME
complextypestbl_non_transactional
---- COLUMNS
id bigint