impala

mirror of https://github.com/apache/impala.git synced 2026-01-01 00:00:20 -05:00

Files

Alex Behm 91e1eb0789 CDH-18563: Speed up the computation of transitive value transfers.

The issue: Computing the full transitive closure for all slots can be very
expensive (10s of seconds for >2k slots, minutes for >4k slots).
Queries with many views and/or unions were affected most because each
union/view adds a new tuple with slots, increasing the total number of slots.

The fix: The new algorithm exploits the sparse structure of the value transfer
graph for a significant speedup (>100x). The high-level steps are:
1. Identify complete subgraps based on bi-directional value transfers, and
   coalesce the slots of each complete subgraph into a single slot.
2. Map the remaining uni-directional value transfers into the new slot domain.
3. Identify the connected components of the uni-directional value transfers.
   This step partitions the value transfers into disjoint sets.
4. Compute the transitive closure of each partition from (3) in the new slot
   domain separately. Hopefully, the partitions are small enough to afford
   the O(N^3) complexity of the brute-force transitive closure computation.

Change-Id: I35b57295d8f04b92f00ac48c04d1ef1be4daf41b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2360
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins

2014-04-24 23:53:28 -07:00

.settings

IMPALA-150: Performing dynamic partition insert via Impala on "large" table fails and takes down HDFS

2014-01-08 10:50:07 -08:00

src

CDH-18563: Speed up the computation of transitive value transfers.

2014-04-24 23:53:28 -07:00

.gitignore

Move minicluster_xml_conf to HADOOP_CONF_DIR.

2014-01-08 10:53:03 -08:00

pom.xml

Migrate DataErrors tests to Python test framework, re-enable subset of tests

2014-04-18 02:25:11 -07:00