mirror of https://github.com/apache/impala.git synced 2026-01-07 00:02:28 -05:00

Go to file

Michael Ho 42ca45e830 IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg

Since commit d2d3f4c (on asf-master), TAggregateExpr contains
the logical input types of the Aggregate Expr. The reason they
are included is that merging aggregate expressions will have
input tyes of the intermediate values which aren't necessarily
the same as the input types. For instance, NDV() uses a binary
blob as its intermediate value and it's passed to its merge
aggregate expressions as a StringVal but the input type of NDV()
in the query could be DecimalVal. In this case, we consider
DecimalVal as the logical input type while StringVal is the
intermediate type. The logical input types are accessed by the
BE via GetConstFnAttr() during interpretation and constant
propagation during codegen.

To handle distinct aggregate expressions (e.g. select count(distinct)),
the FE uses 2-phase aggregation by introducing an extra phase of
split/merge aggregation in which the distinct aggregate expressions'
inputs are coverted and added to the group-by expressions in the first
phase while the non-distinct aggregate expressions go through the normal
split/merge treatement.

The bug is that the existing code incorrectly propagates the intermediate
types of the non-grouping aggregate expressions as the logical input types
to the merging aggregate expressions in the second phase of aggregation.
The input aggregate expressions for the non-distinct aggregate expressions
in the second phase aggregation are already merging aggregate expressions
(from phase one) in which case we should not treat its input types as
logical input types.

This change fixes the problem above by checking if the input aggregate
expression passed to FunctionCallExpr.createMergeAggCall() is already
a merging aggregate expression. If so, it will use the logical input
types recorded in its 'mergeAggInputFn_' as references for its logical
input types instead of the aggregate expression input types themselves.

Change-Id: I158303b20d1afdff23c67f3338b9c4af2ad80691
Reviewed-on: http://gerrit.cloudera.org:8080/6724
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins

2017-04-26 21:40:32 +00:00

IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg

2017-04-26 21:40:32 +00:00

bin

IMPALA-5147: Add the ability to exclude hosts from query execution

2017-04-26 01:45:40 +00:00

cmake_modules

IMPALA-4758: (2/2) Impala-side changes to build with latest gutil

2017-03-29 02:52:34 +00:00

common

IMPALA-5147: Add the ability to exclude hosts from query execution

2017-04-26 01:45:40 +00:00

docs

IMPALA-2924: [DOCS] Add docs for HDFS cache-related hints

2017-04-14 22:37:34 +00:00

ext-data-source

IMPALA-5224: remove defunct codehaus repository

2017-04-20 03:24:33 +00:00

IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg

2017-04-26 21:40:32 +00:00

infra

IMPALA-5189: Pin version of setuptools-scm

2017-04-19 22:03:51 +00:00

shell

IMPALA-5182: Explicitly close connection to impalad on error from shell

2017-04-20 23:14:33 +00:00

ssh_keys

Move ssh keys from bin directory to fix packaging build break

2014-01-08 10:44:12 -08:00

testdata

IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg

2017-04-26 21:40:32 +00:00

tests

IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg

2017-04-26 21:40:32 +00:00

www

IMPALA-5147: Add the ability to exclude hosts from query execution

2017-04-26 01:45:40 +00:00

.clang-format

Match .clang-format more closely to actual practice.

2016-10-14 00:08:17 +00:00

.clang-tidy

IMPALA-3200: move bufferpool under runtime

2016-11-22 07:31:34 +00:00

.gitignore

IMPALA-4653: fix sticky config variable problem

2017-01-05 01:43:36 +00:00

buildall.sh

Add a build flag for the undefined behavior sanitizer, aka "ubsan".

2017-03-01 03:09:17 +00:00

CMakeLists.txt

IMPALA-5172: Always pass DEBUG build type to Impala-Lzo

2017-04-19 22:11:14 +00:00

DISCLAIMER

IMPALA-3808: Add incubating DISCLAIMER from the Incubator Branding Guide

2016-09-02 02:12:45 +00:00

EXPORT_CONTROL.md

IMPALA-4406: Add cryptography export control notice

2016-11-04 18:26:40 +00:00

LICENSE.txt

IMPALA-4230: ASF policy issues from 2.7.0 rc3.

2016-10-19 23:59:02 +00:00

LOGS.md

Consolidate test and cluster logs under a single directory.

2016-03-28 19:23:22 +00:00

NOTICE.txt

2017-01-29 00:01:03 +00:00

README.md

IMPALA-4512: Add a script that builds Impala on stock Ubuntu 14.04.

2016-11-29 22:10:58 +00:00

README.md

Welcome to Impala

Lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters.

Impala is a modern, massively-distributed, massively-parallel, C++ query engine that lets you analyze, transform and combine data from a variety of data sources:

Best of breed performance and scalability.
Support for data stored in HDFS, Apache HBase and Amazon S3.
Wide analytic SQL support, including window functions and subqueries.
On-the-fly code generation using LLVM to generate CPU-efficient code tailored specifically to each individual query.
Support for the most commonly-used Hadoop file formats, including the Apache Parquet (incubating) project.
Apache-licensed, 100% open source.

More about Impala

To learn more about Impala as a business user, or to try Impala live or in a VM, please visit the Impala homepage.

If you are interested in contributing to Impala as a developer, or learning more about Impala's internals and architecture, visit the Impala wiki.

Supported Platforms

Impala only supports Linux at the moment.

Build Instructions

See bin/bootstrap_build.sh.

Export Control Notice

This distribution uses cryptographic software and may be subject to export controls. Please refer to EXPORT_CONTROL.md for more information.

Languages

C++ 49.3%

Java 30.4%

Python 14.5%

JavaScript 1.3%

C 1.2%

Other 3.2%