mirror of
https://github.com/apache/impala.git
synced 2026-01-03 06:00:52 -05:00
ba84ad03cb83d7f7aed8524fcfbb0e2cdc9fdd53
9 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
2ee914d5b3 |
IMPALA-5903: Inconsistent specification of result set and result set metadata
Before this commit it was quite random which DDL oprations returned a result set and which didn't. With this commit, every DDL operations return a summary of its execution. They declare their result set schema in Frontend.java, and provide the summary in CalatogOpExecutor.java. Updated the tests according to the new behavior. Change-Id: Ic542fb8e49e850052416ac663ee329ee3974e3b9 Reviewed-on: http://gerrit.cloudera.org:8080/9090 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> |
||
|
|
4a39e7c29f |
IMPALA-5980: Upgrade to LLVM 5.0.1
Highlighting a few changes in LLVM: - Minor changes to some function signatures - Minor changes to error handling - Split Bitcode/ReaderWriter.h - https://reviews.llvm.org/D26502 - Introduced an optional new GVN optimization pass. Needed to fix a bunch of new clang-tidy warnings. Testing: Ran core and ASAN tests successfully. Performance: Ran single node TPC-H and targeted perf with scale factor 60. Both improved on average. Identified regression in "primitive_filter_in_predicate" which will be addressed by IMPALA-6621. +-------------------+-----------------------+---------+------------+------------+----------------+ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +-------------------+-----------------------+---------+------------+------------+----------------+ | TARGETED-PERF(60) | parquet / none / none | 22.29 | -0.12% | 3.90 | +3.16% | | TPCH(60) | parquet / none / none | 15.97 | -3.64% | 10.14 | -4.92% | +-------------------+-----------------------+---------+------------+------------+----------------+ +-------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +-------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ | TARGETED-PERF(60) | PERF_LIMIT-Q1 | parquet / none / none | 0.01 | 0.00 | R +156.43% | * 25.80% * | * 17.14% * | 1 | 5 | | TARGETED-PERF(60) | primitive_filter_in_predicate | parquet / none / none | 3.39 | 1.92 | R +76.33% | 3.23% | 4.37% | 1 | 5 | | TARGETED-PERF(60) | primitive_filter_string_non_selective | parquet / none / none | 1.25 | 1.11 | +12.46% | 3.41% | 5.36% | 1 | 5 | | TARGETED-PERF(60) | primitive_filter_decimal_selective | parquet / none / none | 1.40 | 1.25 | +12.25% | 3.57% | 3.44% | 1 | 5 | | TARGETED-PERF(60) | primitive_filter_string_like | parquet / none / none | 16.87 | 15.65 | +7.78% | 5.05% | 0.37% | 1 | 5 | | TARGETED-PERF(60) | primitive_min_max_runtime_filter | parquet / none / none | 1.79 | 1.71 | +4.77% | 0.71% | 1.73% | 1 | 5 | | TARGETED-PERF(60) | primitive_broadcast_join_2 | parquet / none / none | 0.60 | 0.58 | +3.64% | 3.19% | 3.81% | 1 | 5 | | TARGETED-PERF(60) | primitive_filter_string_selective | parquet / none / none | 0.95 | 0.93 | +2.91% | 5.23% | 5.85% | 1 | 5 | | TARGETED-PERF(60) | primitive_broadcast_join_3 | parquet / none / none | 4.33 | 4.21 | +2.83% | 5.46% | 3.25% | 1 | 5 | | TARGETED-PERF(60) | primitive_groupby_bigint_lowndv | parquet / none / none | 4.59 | 4.47 | +2.82% | 3.73% | 1.14% | 1 | 5 | | TARGETED-PERF(60) | primitive_conjunct_ordering_3 | parquet / none / none | 0.20 | 0.19 | +2.65% | 4.76% | 2.24% | 1 | 5 | | TARGETED-PERF(60) | PERF_AGG-Q1 | parquet / none / none | 2.49 | 2.43 | +2.31% | 1.06% | 1.93% | 1 | 5 | | TARGETED-PERF(60) | PERF_AGG-Q6 | parquet / none / none | 2.04 | 2.00 | +2.09% | 3.51% | 2.80% | 1 | 5 | | TPCH(60) | TPCH-Q3 | parquet / none / none | 12.37 | 12.17 | +1.62% | 0.80% | 2.45% | 1 | 5 | | TARGETED-PERF(60) | PERF_STRING-Q5 | parquet / none / none | 4.52 | 4.45 | +1.54% | 1.23% | 1.08% | 1 | 5 | | TPCH(60) | TPCH-Q6 | parquet / none / none | 2.95 | 2.91 | +1.33% | 1.92% | 1.67% | 1 | 5 | | TARGETED-PERF(60) | PERF_STRING-Q4 | parquet / none / none | 3.71 | 3.66 | +1.26% | 0.34% | 0.53% | 1 | 5 | | TPCH(60) | TPCH-Q1 | parquet / none / none | 18.69 | 18.47 | +1.19% | 0.75% | 0.31% | 1 | 5 | | TARGETED-PERF(60) | PERF_STRING-Q7 | parquet / none / none | 8.15 | 8.07 | +0.99% | 3.92% | 1.58% | 1 | 5 | | TARGETED-PERF(60) | primitive_groupby_decimal_highndv | parquet / none / none | 31.31 | 31.01 | +0.97% | 1.74% | 1.14% | 1 | 5 | | TPCH(60) | TPCH-Q5 | parquet / none / none | 7.59 | 7.53 | +0.78% | 0.38% | 0.99% | 1 | 5 | | TARGETED-PERF(60) | PERF_AGG-Q4 | parquet / none / none | 21.25 | 21.09 | +0.76% | 0.76% | 0.75% | 1 | 5 | | TARGETED-PERF(60) | primitive_conjunct_ordering_4 | parquet / none / none | 0.24 | 0.24 | +0.75% | 3.14% | 4.76% | 1 | 5 | | TPCH(60) | TPCH-Q19 | parquet / none / none | 7.88 | 7.82 | +0.74% | 2.39% | 2.64% | 1 | 5 | | TARGETED-PERF(60) | primitive_orderby_bigint | parquet / none / none | 5.10 | 5.07 | +0.61% | 0.74% | 0.54% | 1 | 5 | | TARGETED-PERF(60) | PERF_STRING-Q3 | parquet / none / none | 3.61 | 3.59 | +0.60% | 1.45% | 0.90% | 1 | 5 | | TARGETED-PERF(60) | primitive_orderby_all | parquet / none / none | 27.63 | 27.48 | +0.55% | 0.85% | 0.10% | 1 | 5 | | TPCH(60) | TPCH-Q4 | parquet / none / none | 5.81 | 5.79 | +0.45% | 1.65% | 2.16% | 1 | 5 | | TPCH(60) | TPCH-Q13 | parquet / none / none | 23.49 | 23.43 | +0.27% | 0.83% | 0.63% | 1 | 5 | | TPCH(60) | TPCH-Q21 | parquet / none / none | 68.88 | 68.76 | +0.18% | 0.22% | 0.19% | 1 | 5 | | TARGETED-PERF(60) | primitive_groupby_decimal_lowndv.test | parquet / none / none | 4.38 | 4.37 | +0.09% | 2.45% | 0.45% | 1 | 5 | | TARGETED-PERF(60) | primitive_conjunct_ordering_5 | parquet / none / none | 10.40 | 10.40 | +0.07% | 0.77% | 0.50% | 1 | 5 | | TARGETED-PERF(60) | primitive_long_predicate | parquet / none / none | 222.37 | 222.23 | +0.06% | 0.25% | 0.25% | 1 | 5 | | TPCH(60) | TPCH-Q8 | parquet / none / none | 10.65 | 10.65 | +0.03% | 0.55% | 1.40% | 1 | 5 | | TARGETED-PERF(60) | primitive_shuffle_join_one_to_many_string_with_groupby | parquet / none / none | 261.84 | 261.87 | -0.01% | 0.91% | 0.74% | 1 | 5 | | TARGETED-PERF(60) | PERF_AGG-Q3 | parquet / none / none | 9.44 | 9.45 | -0.02% | 0.92% | 1.33% | 1 | 5 | | TPCH(60) | TPCH-Q16 | parquet / none / none | 5.21 | 5.21 | -0.02% | 1.46% | 1.64% | 1 | 5 | | TARGETED-PERF(60) | primitive_top-n_all | parquet / none / none | 34.58 | 34.62 | -0.11% | 0.22% | 0.19% | 1 | 5 | | TARGETED-PERF(60) | primitive_topn_bigint | parquet / none / none | 4.24 | 4.25 | -0.13% | 6.66% | 2.03% | 1 | 5 | | TARGETED-PERF(60) | PERF_STRING-Q2 | parquet / none / none | 3.23 | 3.24 | -0.34% | 2.03% | 0.32% | 1 | 5 | | TARGETED-PERF(60) | primitive_broadcast_join_1 | parquet / none / none | 0.18 | 0.18 | -0.40% | 6.16% | 2.45% | 1 | 5 | | TARGETED-PERF(60) | primitive_exchange_broadcast | parquet / none / none | 46.27 | 46.51 | -0.52% | 7.83% | * 15.60% * | 1 | 5 | | TARGETED-PERF(60) | primitive_groupby_bigint_pk | parquet / none / none | 114.32 | 114.92 | -0.52% | 0.24% | 0.61% | 1 | 5 | | TPCH(60) | TPCH-Q22 | parquet / none / none | 6.66 | 6.70 | -0.53% | 1.39% | 0.84% | 1 | 5 | | TPCH(60) | TPCH-Q20 | parquet / none / none | 5.78 | 5.81 | -0.62% | 1.25% | 0.67% | 1 | 5 | | TPCH(60) | TPCH-Q2 | parquet / none / none | 2.53 | 2.55 | -0.64% | 3.86% | 3.72% | 1 | 5 | | TARGETED-PERF(60) | PERF_AGG-Q5 | parquet / none / none | 0.58 | 0.58 | -0.75% | 0.99% | 6.89% | 1 | 5 | | TARGETED-PERF(60) | PERF_AGG-Q7 | parquet / none / none | 2.05 | 2.07 | -0.86% | 2.16% | 4.73% | 1 | 5 | | TARGETED-PERF(60) | primitive_shuffle_join_union_all_with_groupby | parquet / none / none | 54.86 | 55.34 | -0.87% | 0.25% | 0.66% | 1 | 5 | | TARGETED-PERF(60) | primitive_conjunct_ordering_2 | parquet / none / none | 7.52 | 7.59 | -0.98% | 1.53% | 1.73% | 1 | 5 | | TPCH(60) | TPCH-Q9 | parquet / none / none | 36.43 | 36.79 | -1.00% | 1.60% | 7.39% | 1 | 5 | | TARGETED-PERF(60) | PERF_STRING-Q1 | parquet / none / none | 2.79 | 2.82 | -1.10% | 1.15% | 2.25% | 1 | 5 | | TPCH(60) | TPCH-Q11 | parquet / none / none | 1.95 | 1.97 | -1.18% | 3.14% | 2.24% | 1 | 5 | | TARGETED-PERF(60) | PERF_AGG-Q2 | parquet / none / none | 10.98 | 11.11 | -1.24% | 0.77% | 1.45% | 1 | 5 | | TARGETED-PERF(60) | primitive_small_join_1 | parquet / none / none | 0.22 | 0.22 | -1.34% | * 13.03% * | * 12.31% * | 1 | 5 | | TPCH(60) | TPCH-Q7 | parquet / none / none | 42.82 | 43.41 | -1.37% | 1.63% | 1.51% | 1 | 5 | | TARGETED-PERF(60) | primitive_empty_build_join_1 | parquet / none / none | 3.30 | 3.35 | -1.54% | 2.15% | 1.27% | 1 | 5 | | TARGETED-PERF(60) | PERF_STRING-Q6 | parquet / none / none | 10.34 | 10.54 | -1.81% | 0.24% | 2.02% | 1 | 5 | | TARGETED-PERF(60) | primitive_groupby_bigint_highndv | parquet / none / none | 32.80 | 33.46 | -1.98% | 1.29% | 0.61% | 1 | 5 | | TARGETED-PERF(60) | primitive_filter_decimal_non_selective | parquet / none / none | 1.62 | 1.67 | -3.01% | 0.79% | 1.65% | 1 | 5 | | TARGETED-PERF(60) | primitive_conjunct_ordering_1 | parquet / none / none | 0.13 | 0.14 | -3.36% | 8.66% | * 12.66% * | 1 | 5 | | TARGETED-PERF(60) | primitive_exchange_shuffle | parquet / none / none | 84.92 | 87.96 | -3.46% | 1.46% | 1.50% | 1 | 5 | | TPCH(60) | TPCH-Q12 | parquet / none / none | 6.98 | 7.31 | -4.57% | 1.03% | 7.13% | 1 | 5 | | TPCH(60) | TPCH-Q18 | parquet / none / none | 47.54 | 50.39 | -5.64% | 5.70% | 5.53% | 1 | 5 | | TARGETED-PERF(60) | primitive_filter_bigint_non_selective | parquet / none / none | 0.88 | 0.96 | -7.81% | 4.27% | 5.97% | 1 | 5 | | TPCH(60) | TPCH-Q15 | parquet / none / none | 8.14 | 9.15 | -11.09% | 0.63% | * 10.44% * | 1 | 5 | | TPCH(60) | TPCH-Q10 | parquet / none / none | 12.66 | 14.28 | -11.34% | 4.32% | 1.14% | 1 | 5 | | TPCH(60) | TPCH-Q17 | parquet / none / none | 10.31 | 12.59 | -18.14% | 0.65% | 3.72% | 1 | 5 | | TARGETED-PERF(60) | primitive_filter_bigint_selective | parquet / none / none | 0.14 | 0.19 | I -27.60% | * 32.55% * | * 39.78% * | 1 | 5 | | TPCH(60) | TPCH-Q14 | parquet / none / none | 6.10 | 11.00 | I -44.55% | 4.06% | 3.84% | 1 | 5 | +-------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ Change-Id: Ib0a15cb53feab89e7b35a56b67b3b30eb3e62c6b Reviewed-on: http://gerrit.cloudera.org:8080/9584 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins |
||
|
|
7ce519f92b |
IMPALA-6008: Creating a UDF from a shared library with a .ll extenion
crashes impala Impala crashes on creating a UDF from a shared library (.so file) which was renamed to have .ll extension. CreateFile() call in GetSymbols() fails and returns on error and does not close the codegen object. This patch closes the codegen object on failure. This avoids hitting a DCHECK later up in the stack. The chain of failures also invokes the DiagnosticHandlerFn. RuntimeState object is NULL when the DiagnosticHandlerFn gets called in this case. This change also adds a check before accessing it for logging. [localhost:21000] > create function foo4 (string, string) returns string location '/tmp/bad_udf.ll' symbol='MyAwesomeUdf'; Query: create function foo4 (string, string) returns string location '/tmp/bad_udf.ll' symbol='MyAwesomeUdf' ERROR: AnalysisException: Could not load binary: /tmp/bad_udf.ll LLVM diagnostic error: Invalid bitcode signature Change-Id: Id060668802ca9c80367cdc0e8a823b968d549bbb Reviewed-on: http://gerrit.cloudera.org:8080/9154 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins |
||
|
|
3ddafcd295 |
IMPALA-6184: Clean up after ScalarExprEvaluator::Clone() fails
When ScalarExprEvaluator::Clone() fails, the newly created evaluator was not added to the output vector. This makes it impossible for callers to close and clean up the evaluators afterwards. This change fixes this by always adding the newly created evaluator to the output vector before checking for the error status. This path is only exercised in the scanner code. Two new tests are added to exercise the failure paths. Testing done: newly added tests in udf-errors.test Change-Id: I45ffd722d0a69ad05ae3c748cf504c7f1a959a1d Reviewed-on: http://gerrit.cloudera.org:8080/8572 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins |
||
|
|
d7246d64c7 |
IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. The tests are run with both codegen enabled and disabled. To avoid flaky tests, we switch the UDF tests to use "unique_database". Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Reviewed-on: http://gerrit.cloudera.org:8080/4655 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins |
||
|
|
b15d992abe |
IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
This patch is mostly mechanical move of codegen related logic from each exec node's Prepare() to its Codegen() function. After this change, code generation will no longer happen in Prepare(). Instead, it will happen after Prepare() completes in PlanFragmentExecutor. This is an intermediate step towards the final goal of sharing compiled code among fragment instances in multi-threading. As part of the clean up, this change also removes the logic for lazy codegen object creation. In other words, if codegen is enabled, the codegen object will always be created. This simplifies some of the logic in ScalarFnCall::Prepare() and various Codegen() functions by reducing error checking needed. This change also removes the logic added for tackling IMPALA-1755 as it's not needed anymore after the clean up. The clean up also rectifies a not so well documented situation. Previously, even if a user explicitly sets DISABLE_CODEGEN to true, we may still codegen a UDF if it was written in LLVM IR or if it has more than 8 arguments. This patch enforces the query option by failing the query in both cases. To run the query, the user must enable codegen. This change also extends the number of arguments supported in the interpretation path of ScalarFn to 20. Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651 Reviewed-on: http://gerrit.cloudera.org:8080/4651 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins |
||
|
|
2916132283 |
S3: enable more tests for S3
As needed, fix up file paths and other misc things to get more test cases running against S3. Change-Id: If4eaf9200f2abd17074080a37cd0225d977200ad Reviewed-on: http://gerrit.cloudera.org:8080/167 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins |
||
|
|
8f0c206bdd |
IMPALA-1087: Fix error handling loading libraries in LibCache
If an error occurred loading a library in LibCache (e.g. by using CREATE FUNCTION) an error is returned but a cache entry may still exist which may result in strange errors later when the cache entry is accessed by subsequent queries. This changes LibCache::GetCacheEntry to ensure cache entries do not exist if errors occur. Because GetCacheEntry needs to take the global lock and then the cache entry lock, but needs to unlock the global lock before performing slow HDFS operations, we set the error status on the cache entry so that all locks can be released when an error occurs. Other threads that attempt to access the cache entry check the status and return if it is not OK. The first thread (the thread that got the error) can then remove the cache entry whenever it is able to again acquire the global lock_. Change-Id: I00fd0e2a4611b06fa72ffe0aaaa7d077b7a0c36e Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4642 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins |
||
|
|
09aff77a6c |
IMPALA-943: removed database udf_test from front-end tests
Added CATCH section to test files. Change-Id: I28ba3a6e5ae4c53df5b86505573793d7b150863b Reviewed-on: http://gerrit.ent.cloudera.com:8080/2782 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins (cherry picked from commit 5b616715958f3ebfdc45b8dc0e4baa82bd55f1d2) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2912 |