mirror of
https://github.com/apache/impala.git
synced 2026-01-07 00:02:28 -05:00
The bug: Analytic functions introduced a few challenges in properly wrapping exprs with TupleIsNullPredicates when substituting exprs from outer-joined inline views. 1. The logical to physical tuple mapping during the plan generation of analytics invalidated the tuple ids originally set in upstream TupleIsNullPredicates introduced during analysis (e.g., in the result exprs). 2. TupleIsNullPredicates require specific tuple ids for evaluation. Since sort nodes materializes a new tuple, it's impossible to evaluate TupleIsNullPredicates referring to a sort's input after the sort. Non-analytic sorts handle this case during analysis by materializing the result of that select block. However, analytic sorts used to only materialize the slots of materialized tuple ids of the input plan node. The fixes: 1. Move the TupleIsNullPredicate wrapping from the inline-view analysis into the inline-view planning. This avoids the original problem because all physical output tuples are known during plan generation. This simple change has a few subtle consequences: First, we must rely on the plan root's output smap for substituting the final result exprs, and *not* use the top-level base table smap generated during analysis. Second, during plan generation we must use an inline view's smap (and *not* its base table smap) for generating the output smap of its plan such that we can properly wrap the rhs exprs in TupleIsNullPredicates at every level. This change also fixes IMPALA-1946 by deferring the TupleIsNullWrapping to planning time. 2. To preserve the information whether an input tuple was null or not at an anlytic sort, we materialize TupleIsNullPredicates, which are then substituted by a SlotRef into the sort's tuple in ancestor nodes. This patch also cleans up and consolidates the code used for wrapping exprs into TupleIsNullPredicate itself. Change-Id: I5c6d142bdf9c99ece2a564e557d4ffe22ac90865 Reviewed-on: http://gerrit.cloudera.org:8080/317 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins
This directory contains Impala test workloads. The directory layout for the workloads should follow: workloads/ <data set name>/<data set name>_dimensions.csv <- The test dimension file <data set name>/<data set name>_core.csv <- A test vector file <data set name>/<data set name>_pairwise.csv <data set name>/<data set name>_exhaustive.csv <data set name>/queries/<query test>.test <- The queries for this workload