IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

INTERSECT and EXCEPT set operations are implemented as rewrites to
joins. Currently only the DISTINCT qualified operators are implemented,
not ALL qualified. The operator MINUS is supported as an alias for
EXCEPT.

We mimic Oracle and Hive's non-standard implementation which treats all
operators with the same precedence, as opposed to the SQL Standard of
giving INTERSECT higher precedence.

A new class SetOperationStmt was created to encompass the previous
UnionStmt behavior. UnionStmt is preserved as a special case of union
only operands to ensure compatibility with previous union planning
behavior.

Tests:
* Added parser and analyzer tests.
* Ensured no test failures or plan changes for union tests.
* Added TPC-DS queries 14,38,87 to functional and planner tests.
* Added functional tests test_intersect test_except
* New planner testSetOperationStmt

Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Reviewed-on: http://gerrit.cloudera.org:8080/16123
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Aman Sinha <amsinha@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
This commit is contained in:
Shant Hovsepian
2020-06-08 19:31:12 -04:00
committed by Tim Armstrong
parent 033a4607e2
commit ea3f073881
30 changed files with 5117 additions and 796 deletions

View File

@@ -122,6 +122,12 @@ class TestQueries(ImpalaTestSuite):
result = self.execute_query(query_string, vector.get_value('exec_option'))
assert result.data[0] == '60'
def test_intersect(self, vector):
self.run_test_case('QueryTest/intersect', vector)
def test_except(self, vector):
self.run_test_case('QueryTest/except', vector)
def test_sort(self, vector):
if vector.get_value('table_format').file_format == 'hbase':
pytest.xfail(reason="IMPALA-283 - select count(*) produces inconsistent results")