mirror of
https://github.com/apache/impala.git
synced 2026-01-09 06:05:09 -05:00
1. Improved join cardinality estimation.
For each equi join predicate we try to determine whether it is
a foreign/primary key (FK/PK) join condition, and either use a
special FK/PK estimation or a generic estimation method. We
maintain the minimum cardinality for each method separately,
and finally return in order of preference:
- the FK/PK estimate, if there was at least one FP/PK predicate
- the generic estimate, if there was at least one predicate with
sufficient stats
- otherwise, we optimistically assume a FK/PK join with a join
selectivity of 1, and return the left-hand size cardinality
2. More robust handling of conjuncts with unknown selectivities,
and conjuncts that are not independent. Uses exponential backoff.
3. More accurate broadcast vs. partitioned join cost estimation.
We now account for the 4 byte per-tuple overhead when serializing
rows over an exchange. This change is especially helpful in cases
where one side of the join has no materialized slots, i.e., it
has a row size of 0, and an exchange used to appear free.
We are obviously not done with improving join cardinality estimates.
This patch is merely a step in the right direction, in particular,
the code and behavior are now more explicit and easier to reason about
than before, and better reflects the original intent (i.e., fixes the
IMPALA-976 bug).
Change-Id: I00d8e8230e2844cb807d128d82b35ee78db7d774
Reviewed-on: http://gerrit.cloudera.org:8080/1668
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins