mirror of
https://github.com/apache/impala.git
synced 2025-12-30 12:02:10 -05:00
Equivalent class is used to get the equivalencies between slots. It is ill-defined and the current implementation is inefficient. This patch removes it and directly uses the information from the value transfer graph instead. Value transfer graph is reimplemented using Tarjan's strongly connected component algorithm and BFS with adjacency lists to speed up on both condensed and sparse graphs. Testing: It passes the existing tests. In planner tests the equivalence between SCC-condensed graph and uncondensed graph is checked. A test case is added for a helper class IntArrayList. An outer-join edge case is added in planner test. On a query with 1800 union operations, the equivalence class computation time is reduced from 7m57s to 65ms and the planning time is reduced from 8m5s to 13s. Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a Reviewed-on: http://gerrit.cloudera.org:8080/8317 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins
This directory contains Impala test workloads. The directory layout for the workloads should follow: workloads/ <data set name>/<data set name>_dimensions.csv <- The test dimension file <data set name>/<data set name>_core.csv <- A test vector file <data set name>/<data set name>_pairwise.csv <data set name>/<data set name>_exhaustive.csv <data set name>/queries/<query test>.test <- The queries for this workload