impala

mirror of https://github.com/apache/impala.git synced 2026-01-28 09:03:52 -05:00

Author	SHA1	Message	Date
Csaba Ringhofer	0906e0817c	IMPALA-7889: Write new logical types in Parquet Fill the LogicalType field in Parquet schemas for columns that have an associated logical type. ConvertedType still has to be filled to remain compatible with older readers. Testing: - added new tests to check both logical and converted types to test_insert_parquet.py Change-Id: I6f377950845683ab9c6dea79f4c54db0359d0b91 Reviewed-on: http://gerrit.cloudera.org:8080/12004 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-17 02:48:39 +00:00
Tim Armstrong	e1e258039d	IMPALA-8078: fix test_corrupt_stats explain output Testing: impala-py.test tests/metadata/test_compute_stats.py \ -k corrupt_stats \ --workload_exploration_strategy=functional-query:exhaustive Change-Id: I0201f7d5363f793290179defcac3a56c6f47d1a9 Reviewed-on: http://gerrit.cloudera.org:8080/12227 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-17 02:34:12 +00:00
Attila Jeges	3338bae608	IMPALA-8043: Fix BE test failures related to SystemV timezones. This is a fix for the following issue: 1. Some BE tests (e.g. ExprTest.TimestampFunctions) use the system's local timezone but run against a test timezone db (instead of the system's timezone db). 2. On some Linux installations /usr/share/zoneinfo contains symlinks to files in the /usr/share/zoneifo/SystemV directory (e.g /usr/share/zoneinfo/America/Los_Angeles is a symlink to ../SystemV/PST8PDT). 3. The 'SystemV' directory is not part of the test timezone db, since it is obsolete and excluded by default. Consequently, if the system's local timezone is set to America/Los_Angeles, BE tests won't find the corresponding timezone file in the test timezone db. BE tests will default to UTC, which will break some of them. This change sets local timezone explicitly for failing BE tests, so they don't depend on the system's local timezone. It also adds 'SystemV' directory to the test timezone db to avoid similar issues in the future. Change-Id: I9288cd24c8af0c059e55d47c86bd92eaf0075681 Reviewed-on: http://gerrit.cloudera.org:8080/12199 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-15 17:04:55 +00:00
Janaki Lahorani	f8c3e4c0a4	IMPALA-8064: Fix intermittent test failures from test_min_max_filters test_min_max_filters and test_decimal_min_max_filters records the aggregated probe rows to check whether min-max filter was exercised. In the case of ASAN builds, the probe side started processing before the filters reached the probe side, because ASAN builds are a little slower. The resolution is to increase RUNTIME_FILTER_WAIT_TIME_MS to accommodate ASAN. This issue was also seen earlier on a runtime filter tests and fixed through IMPALA-6201. This fix mimics the same, by setting RUNTIME_FILTER_WAIT_TIME_MS to $RUNTIME_FILTER_WAIT_TIME_MS. Change-Id: I111ed15947bd2812753ae68d3bbb8a9871e25b08 Reviewed-on: http://gerrit.cloudera.org:8080/12224 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-14 09:07:03 +00:00
paul-rogers	a7ea86b768	IMPALA-8021: Add estimated cardinality to EXPLAIN output Cardinality is vital to understanding why a plan has the form it does, yet the planner normally emits cardinality information only for the detailed levels. Unfortunately, most query profiles we see are at the standard level without this information (except in the summary table), making it hard to understand what happened. This patch adds cardinality to the standard EXPLAIN output. It also changes the displayed cardinality value to be in abbreviated "metric" form: 1.23K instead of 1234, etc. Changing the DESCRIBE output has a huge impact on PlannerTest: all the "golden" test files must change. To avoid doing this twice, this patch also includes: IMPALA-7919: Add predicates line in plan output for partition key predicates This is also the time to also include: IMPALA-8022: Add cardinality checks to PlannerTest The comparison code was changed to allow a set of validators, one of which compares cardinality to ensure it is within 5% of the expected value. This should ensure we don't change estimates unintentionally. While many planner tests are concerned with cardinality, many others are not. Testing showed that the cardinality is actually unstable within tests. For such tests, added filters to ignore cardinality. The filter is enabled by default (for backward compatibility) but disabled (to allow cardinality verification) for the critical tests. Rebasing the tests was complicated by a bug in the error-matching code, so this patch also fixes: IMPALA-8023: Fix PlannerTest to handle error lines consistently Now, the error output written to the output "save results" file matches that expected in the "golden" file -- no more handling these specially. Testing: * Added cardinality verification. * Reran all FE tests. * Rebased all PlannerTest .test files. * Adjusted the metadata/test_explain.py test to handle the changed EXPLAIN output. Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986 Reviewed-on: http://gerrit.cloudera.org:8080/12136 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-12 04:03:26 +00:00
Janaki Lahorani	aacd5c35d3	IMPALA-6533: Add min-max filter for decimal types on kudu tables. The code mimics the code written for other min-max filters. Decimal data can be stored using 4 bytes, 8 bytes and 16 bytes. The code respectively handles these 3 storage configurations. The column definition states the precision and the precision determines the storage size. The minimum and maximum values are stored in a union. The precision from the column will come in as an input. Based on the precision the size will be found, and depending on the size appropriate variable will be used. The code in min-max-filter* follows the general convention of the file, hence uses macros. The test includes 24 decimal columns (as listed below) with the following joins: 1. Inner Join with broadcast (2 tables) 1a. 1 predicate 1b. 4 predicates - all results in decimal min-max filter 1c. 4 predicates - 3 results in decimal min=max filter; 1 doesn't 2. Inner Join with Shuffle (3 tables) 3. Right outer join (2 tables) 4. Left Semi join (2 tables) 5. Right Semi join (2 tables) Decimal Columns: 4bytes: (5,0), (5,1), (5,3), (5,5) (9,0), (9,1), (9,5), (9,9) 8 bytes: (14,0), (14,1), (14,7), (14,14) (18,0), (18,1), (18,9), (18,18) 16 bytes: (28,0), (28,1), (28,14), (28,28) (38,0), (38,1), (38,19), (38,38) The test aggregates the count of probe rows. This shows that the min-max filter is exercised, because the number of probe rows is less than the total number of rows in the probe side table. The count of probe rows is considered to be deterministic. But, it will be beneficial to look out for changes in Kudu that can change the way data is partitioned. Such a change could change the probe row count and in that case, the test will have to be updated. impala_test_suite.py and test_result_verifier.py are enhanced to support saving of aggregation using update_results. Change-Id: Ib7e7278e902160d7060f8097290bc172d9031f94 Reviewed-on: http://gerrit.cloudera.org:8080/12113 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-10 03:32:25 +00:00
Tim Armstrong	ae65ff8319	IMPALA-7446: enable buffer pool GC when near process mem limit GC is performed when: * The amount of memory allocated from the system for the buffer pool exceeds the reservation (i.e. free buffers and clean pages are not offset by unused reservation). * The soft or hard process memory limit would otherwise cause an allocation to fail. Testing: Looped the old version of the semi_joins_exhaustive test, which reliably reproduced the issue. I confirmed that the buffer pool GC was running and that it preventing the query failures. Added a backend test that reproed the issue. A large chunk of the code change is to add infrastructure to use TCMalloc memory metrics for the process memory tracker in backend tests. Ran exhaustive tests. Change-Id: I81e8e29f1ba319f1b499032f9518d32c511b4b21 Reviewed-on: http://gerrit.cloudera.org:8080/12133 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-09 05:26:50 +00:00
Philip Zeyliger	635149b85f	IMPALA-8016: Fix lifecycle of classloader for UDFs. The ClassLoader whence a UDF was loaded needs to be kept open for executions of the UDF, so that the UDF can load other classes from the same jar. (A typical scenario might be a utility class.) This was broken by the fix to IMPALA-7668. This commit moves closing the ClassLoader to the close() function. A test for a UDF that imports a static method from another file has been added. Doing so failed without this change. Change-Id: Ic02e42fb25a2754ede21fe00312a60f07e0ba8a2 Reviewed-on: http://gerrit.cloudera.org:8080/12125 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-12-27 21:02:57 +00:00
Bikramjeet Vig	c209fed535	IMPALA-7351: Add estimates to kudu table sink The kudu table sink allocates untracked memory which is bounded by limits that impala enforces through the kudu client API. This patch adds a constant estimate to this table sink which is based on those limits. Testing: Modified planner tests accordingly. Change-Id: I89a45dce0cfbbe3cc0bc17d55ffdbd41cd7dbfbd Reviewed-on: http://gerrit.cloudera.org:8080/12077 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-12-13 03:46:35 +00:00
Bikramjeet Vig	899bfd10a2	IMPALA-7960: Revert "IMPALA-5929: Remove redundant explicit casts to string" The fix for IMPALA-5929 introduced a bug that produced wrong results. This bug is detailed in IMPALA-7960. Reverting for now. This reverts commit `545163bb0a`. Change-Id: I6f0da62a7ff86f05859a2acbec13a726a9bd6f4c Reviewed-on: http://gerrit.cloudera.org:8080/12073 Reviewed-by: Zoram Thanga <zoram@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-12-12 04:47:41 +00:00
Paul Rogers	d94a474658	IMPALA-7907: Fix ScalarFunction.toSql() Each AST statement node has a toSql() implementation. The code for ScalarFunction and ToSqlUtils has a number of issues: * If Location or Symbol are not set, they are shown as 'null'. Better to omit these clauses if the items are not available. This is mostly an issue during testing. * The generated SQL does not follow the CREATE TABLE syntax. For example, the signature and return value are provided for Java functions, but should not be. * Unlike other statements, this one is generated with a trailing newline. * ToSql.getCreateFunctionSql() fails to separate functions with the semi-colon statement separator. These are all minor issues, but we might as well fix the code to work as intended. Testing: * Added a new unit tests to verify the behavior (no tested existed previously.) * Re-ran all FE tests. Change-Id: Id34d6df97760a11c299092dff8edbdb7033bce1c Reviewed-on: http://gerrit.cloudera.org:8080/12014 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-12-11 02:47:16 +00:00
Csaba Ringhofer	56dd5767b8	IMPALA-7853: Add support to read int64 NANO timestamps from Parquet PARQUET-1387 added int64 timestamps with nanosecond precision that stores timestamps as nanoseconds since the Unix epoch. As 64 bits are not enough to represent the whole 1400..9999 range of Impala timestamps, this new type works with a limited range: 1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC The benefit of the reduced range is that no validation is necessary during scanning, as every possible 64 bit value represents a valid timestamp in Impala. This may mean that this has the potential be the fastest way to store timestamps in Impala + Parquet. Another way NANO differs from MICRO and MILLI is that NANO can be only described with new logical types in Parquet, it has no converted type equivalent. This made implementing CREATE TABLE LIKE PARQUET less trivial than it was for MICRO/MILLI: the type conversion logic in ParquetHelper.java had to be rewritten to use LogicalTypeAnnotation instead of ConvertedType. The changes on Java side also made bumping CDH_BUILD_NUMBER necessary. Testing: - added a new testfile with int64 nano timestamps - ran core tests Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f Reviewed-on: http://gerrit.cloudera.org:8080/11984 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-12-10 15:39:24 +00:00
Bharath Vissapragada	04d027df13	IMPALA-7659: Populate NULL count while computing column stats It was disabled for performance reasons (IMPALA-1003) and this patch re-enables it since a lot of codegen improvements have happened since then. This patch switches the aggregation to use the CASE conditional instead of IF since the former has proper codegen support (IMPALA-7655). Tests: ===== - Updated the affected tests to include the null counts. - Added unit tests that verify IS [NOT] NULL predicates' cardinality estimation. Perf note: ========= I reran the compute stats child query with null counts included on the store_sales table from 1000 SF (1TB) tpcds dataset. The table had 22 non-partitioned columns (on which null counts were computed) and ~2.8B rows. This experiment showed around 7-8% perf drop compared to the same child query without null counts for these columns. Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Reviewed-on: http://gerrit.cloudera.org:8080/11565 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-12-08 00:21:55 +00:00
Joe McDonnell	2ab57438fd	Regenerate missing configuration files in run-all.sh When bin/clean.sh runs, it removes some configuration files in fe/src/test/resources that are required for the minicluster to function. This changes testdata/bin/run-all.sh to detect when these configurations are missing and regenerate them. It detects issues due to running clean.sh, which is the most common case. It is not intended to fix anything else. Change-Id: I9f5955574d304b343e904851cfb9081648e350f8 Reviewed-on: http://gerrit.cloudera.org:8080/12006 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-29 22:09:26 +00:00
Paul Rogers	e64261adb7	IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test Expected results from spillable-bufe-sizing.test and max-row-size.test included incorrect expressions: the un-anayzed GROUP BY expression with ordinals represented as (invalid) casts. SelectStmt is a bit of a mess. There are two copies of the grouping expressions. Here we want to use the analyzed version with the ordinals replaced. Testing: * Problem found when running PlannerTest. PlannerTest now passes, with correct results, after this change. * Turns out there is another path used when generating SQL for a view which does toSql() on an unanalyzed query. Added unit tests for this case. Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a Reviewed-on: http://gerrit.cloudera.org:8080/11993 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-29 05:16:15 +00:00
Tim Armstrong	07fd332089	IMPALA-7869: break up parquet-column-readers.cc Move parquet classes into exec/parquet. Move CollectionColumnReader and ParquetLevelDecoder into separate files. Remove unnecessary 'encoding_' field from ParquetLevelDecoder. Switch BOOLEAN decoding to use composition instead of inheritance. This lets the boolean decoding use the faster batched implementations in ScalarColumnReader and avoids some confusing aspects of the class hierarchy, like the ReadValueBatch() implementation on the base class that was shared between BoolColumnReader and CollectionColumnReader. Improve compile times by instantiating BitPacking templates in a separate file (this looks to give a 30s+ speedup for compiling parquet-column-readers.cc). Testing: Ran exhaustive tests. Change-Id: I0efd5c50b781fe9e3c022b33c66c06cfb529c0b8 Reviewed-on: http://gerrit.cloudera.org:8080/11949 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-27 02:01:12 +00:00
poojanilangekar	5e2dcd25d8	IMPALA-7873: Fix flakiness in TestExchangeMemUsage IMPALA-7367 reduced the memory requirement of the query tested in test_exchange_mem_usage_scaling. This change reduces the mem limit to ensure that the query runs out of memory as expected. Testing: Ran the test 100 times in a loop without any failures. Change-Id: Ib2f063fb88ebf0c7f994b55ecfc860d81726fdd8 Reviewed-on: http://gerrit.cloudera.org:8080/11965 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-21 00:08:25 +00:00
Michael Ho	9ef9dacaf7	IMPALA-7829: Mark a fragment instance as done only after Close() is called As shown in IMPALA-7828. there is some non-determinism on whether the errors detected in FragmentInstanceState::Close() will show up in the final profile sent to the coordinator. The reason is that the current code marks a fragment instance as "done" after ExecInternal() completes but before Close() is called. There is a window between when the final status report is sent and when Close() finishes. This change fixes the problem by not sending the final report until Close() is called. This has no implication on the first row available time for normal queries. It may slightly lengthen the first row available time for DML queries. Testing done: Updated udf-no-expr-rewrite.test to exercise this test Perf run on an 8 node clusters didn't show any regression: TPCH-300 +------------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +------------+-----------------------+---------+------------+------------+----------------+ \| TPCH(_300) \| parquet / none / none \| 23.94 \| -2.05% \| 12.55 \| -2.62% \| +------------+-----------------------+---------+------------+------------+----------------+ Small concurrency +-------------------------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +-------------------------+-----------------------+---------+------------+------------+----------------+ \| TPCDS-UNMODIFIED(_1000) \| parquet / none / none \| 6.89 \| -0.66% \| 6.62 \| +0.41% \| +-------------------------+-----------------------+---------+------------+------------+----------------+ Medium concurrency +-------------------------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +-------------------------+-----------------------+---------+------------+------------+----------------+ \| TPCDS-UNMODIFIED(_1000) \| parquet / none / none \| 55.57 \| -1.04% \| 55.27 \| -0.98% \| +-------------------------+-----------------------+---------+------------+------------+----------------+ Change-Id: I61618854ae3f4e7ef20028dcb0ff5cbcfa8adb01 Reviewed-on: http://gerrit.cloudera.org:8080/11939 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-20 23:49:11 +00:00
Joe McDonnell	70fbd1df44	IMPALA-7871: Don't load Hive builtins Dataload has a step of "Loading Hive builtins" that loads a bunch of jars into HDFS/S3/etc. Despite its name, nothing seems to be using these. Dataload and core tests succeed without this step. This removes the Hive builtins step and associated scripts. Change-Id: Iaca5ffdaca4b5506e9401b17a7806d37fd7b1844 Reviewed-on: http://gerrit.cloudera.org:8080/11944 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-19 23:33:20 +00:00
poojanilangekar	2a4835cfba	IMPALA-7367: Pack StringValue and CollectionValue slots This change packs StringValue and CollectionValue slots to ensure they now occupy 12 bytes instead of 16 bytes. This reduces the memory requirements and improves the performance. Since Kudu tuples are populated using a memcopy, 4 bytes of padding was added to StringSlots in Kudu tables. Testing: Ran core tests. Added static asserts to ensure the value sizes are as expected. Performance tests on TPCH-40 produced 3.96% improvement. Change-Id: I32f3b06622c087e4aa288e8db1bf4581b10d386a Reviewed-on: http://gerrit.cloudera.org:8080/11599 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-11-19 17:27:13 +00:00
Tim Armstrong	153663c22f	IMPALA-4123: Columnar decoding in Parquet The idea is to optimise the common case where there are long runs of NULL or non-NULL values (i.e. the def level is repeated). We can detect this cheaply by keying the decoding loop in the column reader off the state of the def level RLE decoder - if there's a long run of repeated levels, we can skip checking the def level for every value. We still fall back to decoding, caching and reading value-by-value a batch of def levels whenever the next def level is not in a repeated run. We still use the old approach for decoding rep levels. There might be some benefit to using the same approach for rep levels if repeated def and rep level runs line up. These changes should unlock further optimizations because more time is spent in simple kernel functions, e.g. UnpackAndDecode32Values() for dictionary decompression, which is very optimisable using SIMD etc. Snappy decompression now seems to be the main CPU bottleneck for decoding snappy-compressed Parquet. Perf: Running TPC-H scale factor 60 on uncompressed and snappy parquet both showed a ~4% speedup overall. Microbenchmarks on uncompressed parquet show scans only doing dictionary decoding on uncompressed Parquet is ~75% faster: set mt_dop=1; select min(l_returnflag) from lineitem; Testing: We have alltypes agg with a mix of null and non-null. Many tables have long runs of non-null values. Added new test data and coverage: * a test table manynulls with long runs of null values. * a large CHAR test table * missing coverage for materialising pos slot in flattened nested types scan. * Extended dict test to test longer runs. * A larger version of complextypestbl with interesting collection shapes - NULL collections, empty collections, etc, particularly runs of collections with the same shape. * Test interaction of timestamp validation with conversion * Ran code coverage build to confirm all code paths are tested * ASAN and exhaustive runs. Change-Id: I8c03006981c46ef0dae30602f2b73c253d9b49ef Reviewed-on: http://gerrit.cloudera.org:8080/8319 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-17 01:48:05 +00:00
Andrew Sherman	d3948d9a01	IMPALA-5821: Add query with implicit casts to extended explain output. If explain_level is at 'extended' level or higher, then enhance the output from the explain command. (1) Show the analyzed sql in the explain header, this is the rewritten sql, which includes implicit casts, and literals are printed with a cast so that their type is visible. (2) When predicates are shown in the plan these are shown in the same format. The toSql() method can be called on a ParseNode tree to return the sql corresponding ot the tree. In the past toSQl() has been enhanced to print rewritten sql by partially overloading toSql() [with toSql(boolean)]. This current change requires changing toSQl() in many places as NumericLiteral can appear at different points in ia parse tree. To avoid many new fragile overloads of toSql() I added toSql(ToSqlOptions), where ToSqlOptions is an enum which controls the form of the Sql that is returned. This changes many files but is safer and means that any future options to toSql() can be added painlessly. If SHOW_IMPLICIT_CASTS is passed to toSql() then - in CastExpr print the implicit cast - in NumericLiteral print the literal with a cast to show the type Add a PlannerTestOption directive that will force the query text showing implicit casts to be included in the PLAN section of a .test file. The analyzed query text is wrapped at 80 characters. Note that the analyzed query cannot always be executed as queries rewritten to use LEFT SEMI JOIN are not legal sql. In addition some space characters may be removed from the query for prettier display. Documentation of this change will be done as IMPALA-7718 EXAMPLE OUTPUT: [localhost:21000] default> set explain_level=2; EXPLAIN_LEVEL set to 2 [localhost:21000] default> explain select * from functional_kudu.alltypestiny where bigint_col < 1000 / 100; Query: explain select * from functional_kudu.alltypestiny where bigint_col < 1000 / 100 Max Per-Host Resource Reservation: Memory=0B Threads=2 Per-Host Resource Estimates: Memory=10MB Codegen disabled by planner Analyzed query: SELECT * FROM functional_kudu.alltypestiny WHERE CAST(bigint_col AS DOUBLE) < CAST(10 AS DOUBLE) "" F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 \| Per-Host Resources: mem-estimate=4.88MB mem-reservation=0B thread-reservation=2 PLAN-ROOT SINK \| mem-estimate=0B mem-reservation=0B thread-reservation=0 \| 00:SCAN KUDU [functional_kudu.alltypestiny] predicates: CAST(bigint_col AS DOUBLE) < CAST(10 AS DOUBLE) mem-estimate=4.88MB mem-reservation=0B thread-reservation=1 tuple-ids=0 row-size=97B cardinality=1 in pipelines: 00(GETNEXT) Fetched 16 row(s) in 0.03s TESTING: All end-to-end tests pass. Added a new test in ExprRewriterTest which prints sql with implict casts for some interesting queries. Add a unit test for the code which wraps text at 60 characters. The output of some Planner Tests in .test files has been updated to include the Analyzed sql that is printed when explain_level is at at least 'extended' level. Change-Id: I55c3bdacc295137f66b2316a912fc347da30d6b0 Reviewed-on: http://gerrit.cloudera.org:8080/11719 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Thomas Marshall <thomasmarshall@cmu.edu>	2018-11-15 21:32:07 +00:00
Bikramjeet Vig	0d0356c932	IMPALA-7837: Fix flakiness in test_resource_limits for release builds test_resource_limits was failing in release build because the queries used were finishing earlier than expected. This resulted in fragment instances not being able to send enough updates to the coordinator in order to hit the limits used for the tests. This patches adds a deterministic sleep to the queries which gives enough time to the coordinator to catch up on reports. Testing: Checked that tests passed on release builds. Change-Id: I4a47391e52f3974db554dfc0d38139d3ee18a1b4 Reviewed-on: http://gerrit.cloudera.org:8080/11933 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-15 05:48:39 +00:00
Csaba Ringhofer	60095a4c6b	IMPALA-5050: Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS from Parquet Changes: - parquet.thrift is updated to a newer version which contains the timestamp logical type. - INT64 columns with converted types TIMESTAMP_MILLIS and TIMESTAMP_MICROS can be read as TIMESTAMP. - If the logical type is timestamp, then the type will contain the information whether the UTC->local conversion is necessary. This feature is only supported for the new timestamp types, so INT96 timestamps must still use flag convert_legacy_hive_parquet_utc_timestamps. - Min/max stat filtering is enabled again for columns that need UTC->local conversion. This was disabled in IMPALA-7559 because it could incorrectly drop column chunks. - CREATE TABLE LIKE PARQUET converts these columns to TIMESTAMP - before the change, an error was returned instead. - Bulk of the Parquet column stat logic was moved to a new class called "ColumnStatsReader". Testing: - Added unit tests for timezone conversion (this needed a new public function in timezone_db.h and adding CET to tzdb_tiny). - Added parquet files (created with parquet-mr) with int64 timestamp columns. Change-Id: I4c7c01fffa31b3d2ca3480adf6ff851137dadac3 Reviewed-on: http://gerrit.cloudera.org:8080/11057 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-14 20:16:14 +00:00
Paul Rogers	b2dbc0f0bc	IMPALA-7805: Emit zero as "0" in toSql() It turns out that Impala has a somewhat Baroque way to represent the value of a numeric 0. NumericLiteral.toSql() uses the Java BigDecimal class to convert a numeric value to a string for use in explained plans and in verifying expression rewrites. The default Java behavior is to consider scale when rendering numbers, including 0. Thus, depending on precision and scale, you may get: 0 0.0 0.00 0.000 ... 0E-38 However, mathematically, zero is zero. Plans attach no special meaning to the extra decimal points or trailing zeros. To make testing easier, changed the behavior to always emit "0" when the value is zero, regardless of precision or scale. Testing: Reran the planner tests and modified captured plans that had the 0.0, 0.00 variations of zero. Since this change affects only EXPLAIN output, it cannot affect the operation of queries. If may impact other tests that compare EXPLAIN output to a "golden" copy. Change-Id: I0b2f2f34fe5e6003de407301310ccf433841b9f1 Reviewed-on: http://gerrit.cloudera.org:8080/11878 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-14 10:30:30 +00:00
Thomas Tauber-Marshall	cd26e807f1	IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test IMPALA-110 added support for queries with multiple DISTINCT aggregates in a single select list. This patch adds queries to test this functionality to our targeted-perf workloads and fixes some incorrect return types in another targeted-perf aggregation query. It also adds some targeted queries to the stress test by extending the regex for stress test files to accept files of the form 'tpch-stress-*' and to allow for multiple tests per file. Testing: - Added an e2e test that runs the stress test file. Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9 Reviewed-on: http://gerrit.cloudera.org:8080/11805 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-13 23:25:02 +00:00
Bikramjeet Vig	898c515882	IMPALA-7528: Fix division by zero when computing cardinalities This patch fixes a case where there can be a division by zero when computing cardinalities of many to many joins on NULL columns. Testing: Added a planner test case Change-Id: Ieedd51d3ad6131a4ea2a5883f05104e6a0f2cd14 Reviewed-on: http://gerrit.cloudera.org:8080/11901 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-08 05:42:45 +00:00
Michael Ho	d1aa1c009f	IMPALA-7828: A temporary workaround for flaky UDF test (test_mem_leak()) Before IMPALA-4063, the error message detected during FragmentInstanceState::Close() was always lost. After IMPALA-4063, we may sometimes get the error message in FragmentInstanceState::Close(). It's non-deterministic as the fragment instance thread may race with the query state thread which reports the final status. The UDF test currently tries to handle this non-determinism by using "row_regex:.*" in the ERRORS section but it doesn't always seem to work. This change workarounds the issue by commenting out the ERRORS section in udf-no-expr-rewrite.text for now. The actual fix will be done in IMPALA-7829. Change-Id: I6a55d5ad1a5a7278e7390f60854a8df28c1b9f28 Reviewed-on: http://gerrit.cloudera.org:8080/11900 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-08 03:53:53 +00:00
Thomas Tauber-Marshall	c28bc3e4f3	IMPALA-3652: Fix resource transfer in subplans with limits Impala assumes that when Reset() is called on an ExecNode, all of the memory returned from that node by GetNext() has been attached to the output RowBatch. In a query with a LIMIT on the subplan, such that some nodes don't reach 'eos', this may not be the case. The solution is to have Reset() take a RowBatch that any such memory can be attached to. I examined all ExecNodes for resources being transferred on 'eos' and added transferring of those resources in Resst(). Testing: - Added e2e tests that repro the issue for hash and nested loop joins. Change-Id: I3968a379fcbb5d30fcec304995d3e44933dbbc77 Reviewed-on: http://gerrit.cloudera.org:8080/11852 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-07 22:53:57 +00:00
poojanilangekar	b48e256895	IMPALA-7791: Compute AggregationNode's estimated rows using # instances Previously, the AggregationNode calculated the estimated number of rows based on input cardinality without accounting for the division of input data across multiple fragment instances. This bloated up the memory estimates for the node. After this change, the AggregationNode accounts for the number of fragment instances while estimating the number of rows per instance. A skew factor of 1.5 was added to account for data skew among multiple fragment instances. This number was derived using empirical analysis of real-world and benchmark (tpch, tpcds) queries. Testing: Tested queries with changed estimates to avoid cases of significant underestimation of memory. Ran front-end and end-to-end tests affected by this change. Change-Id: I2cb9746fafa3e5952e28caa952837e285bcc22ac Reviewed-on: http://gerrit.cloudera.org:8080/11854 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-07 22:00:22 +00:00
Tim Armstrong	250d85e94e	IMPALA-7822: handle overflows in repeat() builtin We need to carefully check that the intermediate value fits in an int64_t and the final size fits in an int. If they don't we raise an error and fail the query. Testing: Added a couple of backend tests to exercise the overflow check code paths. Change-Id: I872ce77bc2cb29116881c27ca2a5216f722cdb2a Reviewed-on: http://gerrit.cloudera.org:8080/11889 Reviewed-by: Thomas Marshall <thomasmarshall@cmu.edu> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-07 21:49:15 +00:00
stakiar	98d923243f	IMPALA-5004: Switch to sorting node for large TopN queries Adds a new query option 'topn_bytes_limit' that places a limit on the number of estimated bytes that a TopN operator can process. If the Impala planner estimates that a TopN operator will process more bytes than this limit, it will replace the TopN operator with a sort operator. Since the TopN operator cannot spill to disk, it has to buffer everything in memory. This can cause frequent OOM issues when running with a large limit + offset. Switching to a sort operator allows Impala to spill to disk. We prefer to use the TopN operator when possible as it has better performance than the sort operator for 'order by limit [offset]' queries. The default limit is set to 512MB and is based on micro-benchmarking the topn vs. sort operator for various limits (see the JIRA for full details). The default is set to an intentionally high value in order to avoid performance regressions. Testing: * Added a new planner test to fuctional-planner/ to validate that 'topn_bytes_limit' properly switches between topn and sort operators. Change-Id: I34c9db33c9302b55e9978f53f9c7061f2806c8a9 Reviewed-on: http://gerrit.cloudera.org:8080/11698 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-11-07 01:17:34 +00:00
Michal Ostrowski	cf89c73485	IMPALA-6323 Allow constant analytic window expressions. The constraint imposed by IMPALA-1354 was artificial. If there are constant "partition by" expressions, simply drop them, they are no-ops. Constant "order by" expressions can be ignored as well, though in effect they should be accounted for as null expressions in the backend, with the effect that combine all rows in the same window (i.e. no window breaks). Change-Id: Idf129026c45120e9470df601268863634037908c Reviewed-on: http://gerrit.cloudera.org:8080/11556 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Ho <kwho@cloudera.com>	2018-11-06 07:54:47 +00:00
Michael Ho	941038229a	IMPALA-4063: Merge report of query fragment instances per executor Previously, each fragment instance executing on an executor will independently report its status to the coordinator periodically. This creates a huge amount of RPCs to the coordinator under highly concurrent workloads, causing lock contention in the coordinator's backend states when multiple fragment instances send them at the same time. In addition, due to the lack of coordination between query fragment instances, a query may end without collecting the profiles from all fragment instances when one of them hits an error before another fragment instance manages to finish Prepare(), leading to missing profiles for certain fragment instances. This change fixes the problem above by making a thread per QueryState (started by QueryExecMgr) to be responsible for periodically reporting the status and profiles of all fragment instances of a query running on a backend. As part of this refactoring, each query fragment instance will not report their errors individually. Instead, there is a cumulative status maintained per QueryState. It's set to the error status of the first fragment instance which hits an error or any general error (e.g. failure to start a thread) when starting fragment instances. With this change, the status reporting threads are also removed. Testing done: exhaustive tests This patch is based on a patch by Sailesh Mukil Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa Reviewed-on: http://gerrit.cloudera.org:8080/11615 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-06 01:01:07 +00:00
Tim Armstrong	58cd69ac48	IMPALA-402: test for random partitioning in insert This adds a basic regression test for the bug reported in IMPALA-402. Testing: Exhaustive build. Looped the modified test overnight. Change-Id: I4bbca5c64977cadf79dabd72f0c8876a40fdf410 Reviewed-on: http://gerrit.cloudera.org:8080/11799 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-06 00:00:12 +00:00
stakiar	31669a6703	IMPALA-7777: Fix crash due to arithmetic overflows in Exchange Node Fixes an arithmetic overflow in ExchangeNode::GetNextMerging. Prior to this patch, the code read: int rows_to_keep = num_rows_skipped_ - offset_; Where num_rows_skipped_ and offset_ were of type int64_t. The result was cast to an int which can lead to an overflow if the result exceeds the value of 2^31. The value of rows_to_keep would be passed into row-batch.h::CopyRows which would crash due to a DCHECK_LE error. This crash arises when the value of the OFFSET is a large number, for example, the query: select int_col from functional.alltypes order by 1 limit 1 offset 9223372036854775800; Would crash the Impalad executor for this query. The fix is to change rows_to_keep to an int64_t to avoid the overflow, which prevents the DCHECK_LE from failing. Change-Id: I8bb8064aae6ad25c8a19f6a8869086be7e70400a Reviewed-on: http://gerrit.cloudera.org:8080/11844 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-05 21:58:45 +00:00
Tim Armstrong	f876b320aa	IMPALA-5946,IMPALA-5956: add TPC-DS q31,q59,q89 Q31: the substitution variables didn't match the TPC-DS spec. After fixing this, the results match up to 4 digits of rounding (there is some error introduced in intermediate calculations). Q59: the results match the reference results up to rounding. Q89: the results match up to 5 digits of rounding. I verified the matches by using a spreadsheet comparing reference and actual results. https://docs.google.com/spreadsheets/d/1MNEqkfYRRAd3xqY6m20tTHquqjtCThDaGdizzRAQ8co/edit?usp=sharing *https://github.com/gregrahn/tpcds-kit/blob/master/specification/TPC-DS_v2.10.0.pdf ^https://github.com/gregrahn/tpcds-kit/tree/master/answer_sets Change-Id: I49634e8f63066773c9c78dd5585a0ee69daf720a Reviewed-on: http://gerrit.cloudera.org:8080/11845 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-02 22:11:03 +00:00
Michal Ostrowski	15d48c3205	IMPALA-6661 Make NaN values equal for grouping purposes. Similar to the treatment of NULLs, we want to consider NaN values as equal when grouping. - When detecting a NaN in a set of row values, the NaN value must be converted to a canonical value - so that all NaN values have the same bit-pattern for hashing purposes. - When doing equality evaluation, floating point types must have additional logic to consider NaN values as equal. - Existing logic for handling NULLs in this way is appropriate for triggering this behavior for NaN values. - Relabel "force null equality" as "inclusive equality" to expand the scope of the concept to a more generic form that includes NaN. Change-Id: I996c4a2e1934fd887046ed0c55457b7285375086 Reviewed-on: http://gerrit.cloudera.org:8080/11535 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Ho <kwho@cloudera.com>	2018-11-02 02:59:07 +00:00
Tim Armstrong	95b56d0e2d	IMPALA-7586: fix predicate pushdown of escaped strings This fixes a class of bugs where the planner incorrectly uses the raw string from the parser instead of the unescaped string. This occurs in several places that push predicates down to the storage layer: * Kudu scans * HBase scans * Data source scans There are some more complex issues with escapes and the LIKE predicate that are tracked separately by IMPALA-2422. This also uncovered a different issue with RCFiles that is tracked by IMPALA-7778 and is worked around by the tests added. In order to make bugs like this more obvious in future, I renamed getValue() to getValueWithOriginalEscapes(). Testing: Added regression test that tests handling of backslash escapes on all file formats. I did not add a regression test for the data source bug since it seems to require some major modification of the data source test infrastructure. Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Reviewed-on: http://gerrit.cloudera.org:8080/11814 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-01 21:27:13 +00:00
Tim Armstrong	e3a702707c	IMPALA-5950: fix TPC-DS Q35a and Q48 queries The query text (for Q48) and substitution parameters didn't match the TPC-DS standard for qualification queries. After fixing that, the queries return the results expected by the TPC-DS standards. Note that this may affect the performance of perf workloads running tpcds-unmodified. Change-Id: Ic7c737f68adf616738d6eb6e5a02593af25bcbaf Reviewed-on: http://gerrit.cloudera.org:8080/11833 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-01 08:05:48 +00:00
Tim Armstrong	0e1f304e6b	IMPALA-7792: fix disabling of ORC scans The previous approach could lead to hangs or cryptic error messages because it removed the ORC data type from a lookup table. Instead check explicitly in the planner for ORC scans and throw a more helpful error message. Testing: Added custom cluster test to exercise code and check error message. Change-Id: I209e79b18745c48d0182800a916d6566083f4609 Reviewed-on: http://gerrit.cloudera.org:8080/11835 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-01 07:30:58 +00:00
Fredy Wijaya	aff174c728	IMPALA-7786: Start HMS and Sentry in debug mode in the minicluster This patch updates the HMS and Sentry run scripts to start HMS and Sentry in debug mode in the minicluster to make it easier to debug issues related to HMS and Sentry. HMS debug port: 30010 Sentry debug port: 30020 Testing: - Connected the debugger to both HMS and Sentry. Change-Id: I29b025cbde36ef398ea36fbe69eff26e27d93e48 Reviewed-on: http://gerrit.cloudera.org:8080/11826 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-01 02:01:07 +00:00
Tim Armstrong	85166afa8a	IMPALA-6374: fix handling of commas in .test files The .test file parser implemented an unconventional method for parsing single-quoted strings in comma-separated value format. This didn't handle trailing commas in the string correctly. This commit switches to using a conventional method for parsing comma-separated value format: * Commas enclosed by single quotes are not treated as field separators * Single quotes can be escaped within a string by doubling them. I looked into using Python's .csv module for this, but it wouldn't work without modifying the test file format more because it automatically discards the quotes during parsing, which are actually semantically important in .test files. E.g. without the quotes we can't distinguish between the literal string 'regex:...' and the regex regex:.... Testing: Ran exhaustive tests and fixed .test files that required modifications. Will rerun before merging. Added a couple of tests to exercise edge cases in the test file parser. Change-Id: I18ddcb0440490ddf8184be66d3681038a1615dd9 Reviewed-on: http://gerrit.cloudera.org:8080/11800 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-10-30 22:17:49 +00:00
Bharath Vissapragada	35c0e4416d	IMPALA-7727: Fix TStatusCode to TErrorCode mapping - Uses a "GENERAL" TErrorCode type for all non-OK statuses. - Detailed regression root cause description in the jira IMPALA-7727. - Added a regression test. Change-Id: Ie62527734aa73c1524c731773638590bdac9e789 Reviewed-on: http://gerrit.cloudera.org:8080/11778 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-30 19:55:12 +00:00
poojanilangekar	44e69e8182	IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality Prior to this change, the AggregationNode's perInstanceCardinality was influenced by the node's selectivity and limit. This was incorrect because the hash table is constructed over the entire input stream before any row batches are produced. This change ensures that the input cardinality is used to determine the perInstanceCardinality. Testing: Added a planner test which ensures that an AggregationNode with a limit estimates memory based on the input cardinality. Ran front-end and end-to-end tests affected by this change. Change-Id: Ifd95d2ad5b677fca459c9c32b98f6176842161fc Reviewed-on: http://gerrit.cloudera.org:8080/11806 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-29 20:40:50 +00:00
Tim Armstrong	449fe73d21	IMPALA-7662: fix error race when scanner open fails This is very similar to IMPALA-7335, except happens when 'progress_' is incremented in the call chain HdfsScanNode::ProcessSplit -> HdfsScanNodeBase::CreateAndOpenScanner() -> HdfsScanner::Close() The fix required restructuring the code so that SetDoneInternal() is called with the error before HdfsScanner::Close(). This required a refactoring because HdfsScanNodeBase doesn't actually know about SetDoneInternal(). My fix is to put the common logic between HdfsScanNode and HdfsScanNodeMt into a helper in HdfsScanNodeBase, then in HdfsScanNode, make sure to call SetDoneInternal() before closing the scanner. I also reworked HdfsScanNode::ProcessSplit() to handle error propagation internally. I think the joint responsibility between ProcessSplit() and its caller for handling errors made things harder than necessary. Testing: Added a debug action and test that reproduced the race before the fix. Change-Id: I45a61210ca7d057b048c77d9f2f2695ec450f19b Reviewed-on: http://gerrit.cloudera.org:8080/11596 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-27 04:07:51 +00:00
David Knupp	2e5d65819a	IMPALA-7758: Fix LOCATION clause when creating chars_formats_* The current location resolves to /user/hive/warehouse/chars_formats_. Impala's test data actually lives at /test-warehouse/chars_formats_. Tested this by reloading data from scratch and running the core tests. Change-Id: I781b484e7a15ccaa5de590563d68b3dca6a658e5 Reviewed-on: http://gerrit.cloudera.org:8080/11789 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-27 00:56:54 +00:00
Tim Armstrong	aa654d4b87	Update .gitignore A few unversioned artifacts crept in over time without corresponding .gitignore entries. These are the updates based on the git status output on my dev env. Change-Id: I281ab3b5c98ac32e5d60663562628ffda6606a6a Reviewed-on: http://gerrit.cloudera.org:8080/11787 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-26 22:19:35 +00:00
Bikramjeet Vig	8cbec20ea4	IMPALA-7351: Add estimates to Exchange node Added rough estimates for exchange node and a justification of the method in the in-line comments. Testing: Updated Planner tests. Change-Id: I5b577f9511abc48b992e814d50bba4959f23f7fd Reviewed-on: http://gerrit.cloudera.org:8080/11692 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-25 21:17:12 +00:00
Thomas Tauber-Marshall	15e8ce4f27	IMPALA-7677: Fix DCHECK failure in GroupingAggregator After inserting all of its input into its Aggregators, StreamingAggregationNode performs some cleanup, such as calling InputDone() on each Aggregator. Previously, StreamingAggregationNode only checked that all of the child's batches had been fetched before doing this cleanup, which causes problems if the final child batch isn't processed fully in a single GetNext() call. In this case, multiple calls to InputDone() lead to a DCHECK failure. The solution is to only perform the cleanup once the final child batch has been fully processed. Testing: - Added an e2e test with a query that hits this condition. Change-Id: I851007a60472d0e53081c076c863c866c516677c Reviewed-on: http://gerrit.cloudera.org:8080/11626 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-24 01:03:38 +00:00

1 2 3 4 5 ...

1986 Commits