mirror of
https://github.com/apache/impala.git
synced 2025-12-30 03:01:44 -05:00
This patch enable VALIDATE_CARDINALITY test options in several planner tests that touch aggregation node. Enabling it has revealed three bugs. First, in IMPALA-13405, cardinality estimate of MERGE phase aggregation is not capped against the output cardinality of the EXCHANGE node. This patch fix it by adding such capping. Second, tuple-based optimization IMPALA-13405 can cause cardinality underestimation if HAVING predicate exist. This is due to the default selectivity of 10% applied for each HAVING predicate. This patch skip tuple-based optimization if AggregationNode.conjuncts_ is ever not empty. It will stay skipped on stats recompute, even if conjuncts_ is transfered into the next Merge AggregationNode above the plan. The optimization skip causes following PlannerTest (under testdata/workloads/functional-planner/queries/PlannerTest/) to revert their cardinality estimation to their state pior to IMPALA-13405: - tpcds/tpcds-q39a.test - tpcds/tpcds-q39b.test - tpcds_cpu_cost/tpcds-q39a.test - tpcds_cpu_cost/tpcds-q39b.test In the future, we should consider raising the default selectivity for HAVING predicate and undo this skipping logic (IMPALA-13542). Third, is missing stats recompute after conjunct transfer in multi-phase aggregation. This will be fixed separately by IMPALA-13526. Testing: - Enable cardinality validation in testMultipleDistinct* - Update aggregation.test to reflect current PlannerTest output. Added some test cases in aggregation.test. - Run and pass TpcdsPlannerTest and TpcdsCpuPlannerTest. - Selectively run some more planner tests that touch AggregationNode and pass them. Change-Id: Iadb4af9fd65fdb85b66fae1e403ccec8ca5eb102 Reviewed-on: http://gerrit.cloudera.org:8080/22184 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>