impala

mirror of https://github.com/apache/impala.git synced 2026-01-05 21:00:54 -05:00

Author	SHA1	Message	Date
Taras Bobrovytsky	f810458ca4	IMPALA-6231: Implement decimal_v2 fuzz test Implement a test that generates random decimal numbers in the pytest framework, performs a random mathemtaical operation in Impala and verifies that the result is correct by doing the same operating using the Python decimal module. We try to generate not only completely random decimal numbers, but also numbers that have interesting properties, such as the number being a power of two. Change-Id: I4328125de5c583ec8ead1f78d9a08703b18b2d85 Reviewed-on: http://gerrit.cloudera.org:8080/8898 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Zach Amsden <zamsden@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-10 03:03:52 +00:00
aphadke	38461c524f	IMPALA-5052: Read and write signed integer logical types in Parquet This patch maps a signed integer logical type in parquet to a supported Impala column type. This change introduces the following mapping - INT_8 -> TINYINT INT_16 -> SMALLINT INT_32 -> INT INT_64 -> BIGINT Also, added a parquet file with the following schema for testing - schema { optional int32 id; optional int32 tinyint_col (INT_8); optional int32 smallint_col (INT_16); optional int32 int_col; optional int64 bigint_col; } Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Reviewed-on: http://gerrit.cloudera.org:8080/8548 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Tianyi Wang <twang@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-09 04:55:59 +00:00
Tim Armstrong	d3ff67b8b3	IMPALA-6370: fix partitioned parquet tables with nested types When materialising a nested collection, has_template_tuple() should use the template tuple for the collection, not the top-level tuple. Testing: Added tests based on nested-types-basic.test that operate on a simple partitioned table. The tests reliably crashed Impala before the fix. Change-Id: Ic808b824ce3b31af0539036d8ca23d17b18deab4 Reviewed-on: http://gerrit.cloudera.org:8080/8947 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-05 20:44:21 +00:00
Taras Bobrovytsky	a16fe803ca	IMPALA-5014: Part 1: Round when casting string to decimal In this patch we implement rounding when casting string to decimal if DECIMAL_V2 is enabled. The backend method that parses strings and converts them to decimals is refactored to make it easier to understand. Testing: - Added some BE tests. Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Reviewed-on: http://gerrit.cloudera.org:8080/8774 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-22 11:39:08 +00:00
Zoram Thanga	b581a9d1ee	IMPALA-6225: Part 2: Query profile date-time strings should have ns precision. This commit follows `16d8dd58`. This patch adds a test case that inspects the thrift profile of a completed query, and verifies that the "Start Time" and "End Time" of the query have nanosecond precision. We chose to work with the thrift profile directly, rather than parse the debug web page, as it is the thrift profile which is consumed by management API clients of Impala. Change-Id: Id3421a34cc029ebca551730084c7cbd402d5c109 Reviewed-on: http://gerrit.cloudera.org:8080/8784 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-21 04:26:33 +00:00
Alex Behm	1f7b3b00e9	IMPALA-5310: Part 3: Use SAMPLED_NDV() in COMPUTE STATS. Modifies COMPUTE STATS TABLESAMPLE to use the new SAMPLED_NDV() function. Testing: - modified/improved existing functional tests - core/hdfs run passed Change-Id: I6ec0831f77698695975e45ec0bc0364c765d819b Reviewed-on: http://gerrit.cloudera.org:8080/8840 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-16 04:58:59 +00:00
Jinchul	4feb4f3a54	IMPALA-5754: Improve randomness of rand()/random() Currently implementation of rand/random built-in functions use rand_r of C library. We recognized its randomness was poor. pcg32 of third party library shows better randomness than rand_r. Testing: Revise unit test in expr-test Add E2E test to random.test Change-Id: Idafdd5fe7502ff242c76a91a815c565146108684 Reviewed-on: http://gerrit.cloudera.org:8080/8355 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-12-13 10:04:40 +00:00
Alex Behm	0936e32966	IMPALA-5310: Part 2: Add SAMPLED_NDV() function. Adds a new SAMPLED_NDV() aggregate function that is intended to be used in COMPUTE STATS TABLESAMPLE. This patch only adds the function itself. Integration with COMPUTE STATS will come in a separate patch. SAMPLED_NDV() estimates the number of distinct values (NDV) based on a sample of data and the corresponding sampling rate. The main idea is to collect several x/y data points where x is the number of rows and y is the corresponding NDV estimate. These data points are used to fit an objective function to the data such that the true NDV can be extrapolated. The aggregate function maintains a fixed number of HyperLogLog intermediates to compute the x/y points. Several objective functions are fit and the best-fit one is used for extrapolation. Adds the MPFIT C library to perform curve fitting: https://www.physics.wisc.edu/~craigm/idl/cmpfit.html The library is a C port from Fortran. Scipy uses the Fortran version of the library for curve fitting. Testing: - added functional tests - core/hdfs run passed Change-Id: Ia51d56ee67ec6073e92f90bebb4005484138b820 Reviewed-on: http://gerrit.cloudera.org:8080/8569 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-12 22:20:18 +00:00
Thomas Tauber-Marshall	b4cf5f2174	IMPALA-6298: Skip test_profile_fragment_instances on local filesystem test_profile_fragment_instances was recently added to verify that the final runtime profile for a query has the expected fragments and exec nodes. The test fails on local filesystem builds, though, as it assumes there will be 3 impalads and therefore 3 fragment instances, but there is only 1 impalad on local filesystem builds. The fix is to disable the test on local filesystem builds. Change-Id: I2c98f160406081626f17709809b8efee9eae1450 Reviewed-on: http://gerrit.cloudera.org:8080/8809 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-11 21:45:43 +00:00
Thomas Tauber-Marshall	f3fa3e017f	IMPALA-6081: Fix test_basic_filters runtime profile failure test_basic_filters has been occasionally failing due to a line missing from a runtime profile for a particular query. The problem is that the query returns all of its results before all of its fragment instances are finished executing (due to a limit). Then, when one fragment instance reports its status, the coordinator returns to it a 'cancelled' status, causing all remaining instances for that backend to be cancelled. Sometimes this cancellation happens quickly enough that the relevant fragment instances have not yet sent a status report when they are cancelled. They will still send a report in finalize, but as the coordinator only updates its runtime profile for 'ok' status reports, not 'cancelled', the final runtime profile doesn't end up with any data for those fragment instances, which means the test does not find the line in the runtime profile its checking for. The fix is to have the coordinator update its runtime profile with every status report it recieves, regardless of error status. Testing: - Ran existing runtime profile tests, which rely on profile output, in a loop. - Manually tested some scenarios with failed queries and checked that the new profile output is reasonable. - Added a new e2e test that runs the affected query and checks for the presence of info for all expected exec node in the profile. This repros the underlying issue consistently. Change-Id: I4f581c7c8039f02a33712515c5bffab942309bba Reviewed-on: http://gerrit.cloudera.org:8080/8754 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-07 21:07:02 +00:00
Michael Ho	ed72910e96	IMPALA-6262: Always initialize runtime profile for DataSink This change moves the creation of the runtime profile from DataSink::Prepare() to the ctor of DataSink derived classes. This makes sure that DataSink::Close() and other functions can access the profile even if the DataSink fails to initialize. Testing done: Added a test case which triggers failure in the initialization of output expressions in a HdfsTableSink. Impalad crashed consistently without the fix. Change-Id: I2a683000ef180027b929dbebe78bc2a530a4767e Reviewed-on: http://gerrit.cloudera.org:8080/8770 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-07 09:47:09 +00:00
Vuk Ercegovac	633dbff71d	IMPALA-1422: support a constant on LHS of IN predicates. Currently, constant expressions for the LHS of the IN predicate are not supported. This patch adds this support as a rewrite in StmtRewriter (where subqueries are rewritten to joins). Since there is a nested-loop variant of left semijoin, support for IN is handled by not erring out. NOT IN is handled by a rewrite to corresponding NOT EXISTS predicate. Support for NOT IN with a correlated subquery is not included in this change. Re-organized the frontend subquery analysis tests to expand coverage. Testing: - added frontend subquery analysis tests - added e2e tests Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb Reviewed-on: http://gerrit.cloudera.org:8080/8322 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-02 04:09:05 +00:00
Thomas Tauber-Marshall	7dd28ff431	IMPALA-6201: Fix test_basic_filters on ASAN TestRuntimeFilters.test_basic_filters is flaky on ASAN as sometimes the runtime filters aren't recieved within the specified RUNTIME_FILTER_WAIT_TIME_MS. This patch increases the timeout for ASAN builds. Change-Id: I8c20cbb75a9b6da73137f220657aa75dea9dfdce Reviewed-on: http://gerrit.cloudera.org:8080/8646 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-28 03:01:39 +00:00
Vuk Ercegovac	628f19ed0b	IMPALA-6092: avoid drop/create function interactions in e2e tests The e2e unit tests for udfs can interact via the backend lib_cache, causing test flakes. IMPALA-6215 explains a race between the lib_cache and UdfExecutor in the frontend which is the likely the root cause. Two e2e tests use the same jar (test_java_udfs and test_udf_invalid_symbol), test_udf_invalid_symbol drops a function from that jar, which causes the use of that jar to fail in the test_java_udfs test. Since the state of lib_cache is per process, its state causes these interactions across unit tests. This change avoids the interactions by using separate jars for the separate tests. Change-Id: Ica3538788b1d2ab5e361261e2ade62780b838e65 Reviewed-on: http://gerrit.cloudera.org:8080/8593 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-27 21:20:20 +00:00
Vuk Ercegovac	21a96ed2e3	IMPALA-4985: use parquet stats of nested types for dynamic pruning Currently, parquet row-groups can be pruned at run-time using min/max stats when predicates (in, binary) are specified for column scalar types. This patch extends pruning to nested types for the same class of predicates. A nested value is an instance of a nested type (struct, array, map). A nested value consists of other nested and scalar values (as declared by its type). Predicates that can be used for row-group pruning must be applied to nested scalar values. In addition, the parent of the nested scalar must also be required, that is, not empty. The latter requirement is conservative: some filters that could be used for pruning are not used for correctness reasons. Testing: - extended nested-types-parquet-stats e2e test cases. Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe Reviewed-on: http://gerrit.cloudera.org:8080/8480 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-22 22:00:16 +00:00
Thomas Tauber-Marshall	2510fe0aa0	IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter or a min-max filter, depending on if it has HDFS or Kudu targets, respectively. In RuntimeFilterGenerator in the planner, each hash join node generates a bloom and min-max filter for each equi-join predicate, but only those filters that end up being assigned to a target make it into the final plan. Min-max filters are only assigned to Kudu scans if the target expr is a column, as Kudu doesn't support bounds on general exprs, and only if the join op is '=' and not 'is distinct from', as Kudu doesn't support returning NULLs if a bound is set. Min-max filters are inserted into by the PartitionedHashJoinBuilder. Codegen is used to eliminate branching on the type of filter. String min-max filters truncate their bounds at 1024 chars, so that the max amount of memory used by min-max filters is negligible. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Functional Testing: - Added new planner tests and updated the old ones. (in old tests, a lot of runtime filters are renumbered as we always generate min-max filters even if they don't end up getting assigned and they take up some of the RF ids). - Updated existing runtime filter tests to work with Kudu. - Added e2e tests for min-max filter specific functionality. Perf Testing: - All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu, timings are averages of 3 runs. - Ran a contrived query with a filter that does not eliminate any rows (full self join of lineitem). The difference in running time was negligible - 24.46s with filters on, 24.15s with filters off for a ~1% slowdown. - Ran a contrived query with a filter that elimiates all rows (self join on lineitem with a join condition that never matches). The filters resulted in a significant speedup - 0.26s with filters on, 1.46s with filters off for a ~5.6x speedup. This query is added to targeted-perf. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Reviewed-on: http://gerrit.cloudera.org:8080/7793 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-17 21:33:51 +00:00
Tim Armstrong	ae116b5bf7	IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding Switch the decoders to using more batch-oriented interfaces. As an intermediate step this doesn't make the interfaces of LevelDecoder or DictDecoder batch-oriented, only the lower-level utility classes. The next step would be to change those interfaces to be batch-oriented and make according optimisations in parquet. This could deliver much larger perf improvements than the current patch. The high-level changes are. * BitReader -> BatchedBitReader, which is built to unpack runs of 32 bit-packed values efficiently. * RleDecoder -> RleBatchDecoder, which exposes the repeated and literal runs to the caller and uses BatchedBitReader to unpack literal runs efficiently. * Dict decoding uses RleBatchDecoder to decode repeated runs efficiently and uses the BitPacking utilities to unpack and encode in a single step. Also removes an older benchmark that isn't too interesting (since the batch-oriented approach to encoding and decoding is so much faster than the value-by-value approach). Testing: * Ran core tests. * Updated unit tests to exercise new code. * Added test coverage for the deprecated bit-packed level encoding to that it still works (there was no coverage previously). Perf: Single-node benchmarks showed a few % performance gain. 16 node cluster benchmarks only showed a gain for TPC-H nested. Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6 Reviewed-on: http://gerrit.cloudera.org:8080/8267 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-16 21:23:09 +00:00
Tim Armstrong	9502e21d03	IMPALA-6188: make test_top_n_reclaim less flaky Testing: Previously I needed ~20 iterations to get the test to fail on my local machine. After these changes I haven't been able to reproduce the failure Change-Id: I2bea7b0f770dec362a6df075da4e340402bd1d5d Reviewed-on: http://gerrit.cloudera.org:8080/8562 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-11-16 04:57:39 +00:00
Thomas Tauber-Marshall	3a1073c87c	IMPALA-6173: Fix SHOW CREATE TABLE for unpartitioned Kudu tables IMPALA-5546 added the ability to create unpartitioned Kudu tables, but when SHOW CREATE TABLE is run on it still prints 'PARTITION BY' just without a partition clause. This patch removes the 'PARTITION BY' from the output. Testing: - Added test that runs SHOW CREATE on an unpartitioned Kudu table. Change-Id: Icc327266cfb8b5c05efec97348528cea6904bb20 Reviewed-on: http://gerrit.cloudera.org:8080/8506 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-09 23:59:13 +00:00
Bikramjeet Vig	94236ff2ff	IMPALA-2494: Support for byte array encoded decimals in Parquet scanner Extendes parquet column reader and associated classes to allow for more than one possible physical type for a given logical type. This patch only adds support for variable sized byte array encoded decimals and more will be added in upcoming commits. Also, column level metadata verification which was currently being done per row group will now only be done once per column per file. Testing: Added backend test for verifying newly added decimal types are decoded correctly. Added Query test that decodes both plain and dictionary-encoded decimals using binary encoding. Performance: Initial perf testing using tpcds_1000 shows no regression. Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Reviewed-on: http://gerrit.cloudera.org:8080/7822 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-07 04:34:26 +00:00
Thomas Tauber-Marshall	efe95bd3e4	IMPALA-6127: Fix timeout in TestRuntimeFilter.test_wait_time test_wait_time has been flaky recently on ASAN due to hitting a timeout. The fix is to increase the timeout for ASAN builds. Change-Id: Iee005bee8e0a535ce59d2e23e56be6004f2eb9de Reviewed-on: http://gerrit.cloudera.org:8080/8427 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-31 22:02:21 +00:00
Tim Armstrong	5810c759ca	IMPALA-6093: diagnostics for flaky TestHashJoinTimer We don't know the root cause yet but try to improve things: * Eliminate one possible cause of flakiness - unfinished fragments left from previous queries. * Print a profile if an assertion fails so we can see why it failed. Testing: Ran core tests. Change-Id: Ic332dddd96931db807abb960db43b99e5fd0f256 Reviewed-on: http://gerrit.cloudera.org:8080/8403 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-28 01:55:44 +00:00
Thomas Tauber-Marshall	e7b1c32103	IMPALA-6004: Fix test_row_filters failure on ASAN 'Test case 16' in test_row_filters has been failing occasionaly on ASAN as the runtime filters are not generated within the specified RUNTIME_FILTER_WAIT_TIME_MS. The fix is to increase RUNTIME_FILTER_WAIT_TIME_MS. This patch updates all of the tests in test_row_filters to use the same timeout, which is set to a higher value for ASAN builds. Change-Id: Ia098735594b36a72f02bf7edd051171689618051 Reviewed-on: http://gerrit.cloudera.org:8080/8358 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-25 18:56:00 +00:00
Bikramjeet Vig	ca55b5926e	IMPALA-4236: Codegen CopyRows() for select nodes Testing: Added test case to verify that CopyRows in select node is successfully codegened. Improved test coverage for select node with limit. Performance: Queries used (num_nodes set to 1): 500 Predicates: select * from (select * from tpch_parquet.lineitem limit 6001215) t1 where l_partkey > 10 and l_extendedprice > 10000 and l_linenumber > 1 and l_comment >'foo0' .... and l_comment >'foo500' order by l_orderkey limit 10; 1 Predicate: select * from (select * from tpch_parquet.lineitem limit 6001215) t1 where l_partkey > 10 order by l_orderkey limit 10; +--------------+-----------------------------------------------------+ \| \| 500 Predicates \| 1 Predicate \| \| +------------+-------------+------------+-------------+ \| \| After \| Before \| After \| Before \| +--------------+------------+-------------+------------+-------------+ \| Select Node \| 12s385ms \| 1m1s \| 234ms \| 797ms \| \| Codegen time \| 2s619ms \| 1s962ms \| 200ms \| 181ms \| +--------------+------------+-------------+------------+-------------+ Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Reviewed-on: http://gerrit.cloudera.org:8080/8196 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-25 01:17:28 +00:00
Tim Wood	f05bd241ea	IMPALA-5376: Implement all TPCDS test cases or alternates for Impala. Main source for TPCDS query and result definitions: https://github.com/gregrahn/tpcds-kit. TPC-DS v2.5.0 qualification queries from G. Rahn, Cloudera, Inc. Data set constructed in mini-cluster using $IMPALA_HOME/buildall.sh -testdata.... This commit continues previous work on IMPALA-5376 in the ASF Impala repo and the Cloudera Gerrit service. This commit splits multi-query tests in the TPC-DS suite definition into one query and result set per test file, as the test framework requires. Names for such files have -1, -2... inner suffixes. The portion of the TPC-DS test suite in this commit passes. It contains no failures, as reflected by runs of $IMPALA_HOME/tests/run-tests.py query_test/test_tpcds_queries.py ... IMPALA-6007 addresses the TPC-DS cases that require skipping (because we don't support them or they flap) or expected-failure (xfail, because we support them but they fail due to bugs.) These require some added tooling for non-Pytest frameworks like the stress test to avoid attempting them until they work. Tests that flap are marked to skip, with a bug ID, since they don't reliably pass or xfail. Expected result sets come from the TPC-DS kit. Some TPC-DS test cases in this commit have been modified in sematically-neutral ways so as to pass on Impala. The tests/query_test/test_tpcds_queries.py driver file is authoritative for the active/skip/xfail status for each case and a brief reason. The following list describes the current status as: --- test-name deviance from TPC-DS spec changes made --- tpcds-q22a.test RESULT MISMATCH in LSD of AVG() values FIXED, HAND_ROUNDED AVG() VALUES IN RESULT SET --- tpcds-q26.test RESULT MISMATCH in LSD of AVG() values ABSENT, IMPALA-6087 --- tpcds-q28.test RESULT MISMATCH in LSD of AVG() values ABSENT, IMPALA-6087 --- tpcds-q30.test UNRECOGNIZED CHARACTER ABSENT, IMPALA-5961. --- tpcds-q31.test RESULT MISMATCH in LSD of DECIMAL values ABSENT, IMPALA-5956. --- tpcds-q35a.test RESULT MISMATCH ABSENT, IMPALA-5950. --- tpcds-q36a.test RESULT MISMATCH ABSENT, IMPALA-4741 --- tpcds-q47.test RESULT MISMATCH in LSD of DECIMAL values ABSENT, IMPALA-6087 --- tpcds-q48.test RESULT MISMATCH in scalar value ABSENT, IMPALA-5950. --- tpcds-q49.test RESULT MISMATCH in LSD of DECIMAL values ABSENT, IMPALA-5945 --- tpcds-q57.test RESULT MISMATCH, excess scale in DECIMAL values ABSENT, IMPALA-6087 --- tpcds-q58.test RESULT MISMATCH in DECIMAL values ABSENT, IMPALA-5946 --- tpcds-q59.test RESULT MISMATCH, excess scale in DECIMAL values ABSENT, IMPALA-6087 --- tpcds-q61.test RESULT MISMATCH in DECIMAL value FIXED. CAST RESULT QUOTIENT TO DECIMAL(15, 4), TAKE ACTUAL RESULT AS EXPECTED --- tpcds-q63.test RESULT MISMATCH, excess scale in DECIMAL values ABSENT, IMPALA-6087 --- tpcds-q64.test RESULT MISMATCH ADDED ORDER BY COLUMNS. --- tpcds-q66.test RESULT MISMATCH ABSENT, IMPALA-4741 --- tpcds-q77a.test RESULT MISMATCH FIXED. TAKE ACTUAL RESULT AS EXPECTED --- tpcds-q78.test RESULT MISMATCH FIXED. TAKE ACTUAL RESULT AS EXPECTED --- tpcds-q83.test RESULT MISMATCH ABSENT, IMPALA-5945. --- tpcds-q85.test MISSING TABLE "reason" ABSENT, IMPALA-5960 --- tpcds-q86a.test RESULT MISMATCH FIXED. TAKE ACTUAL RESULT AS EXPECTED --- tpcds-q89.test RESULT MISMATCH, DECIMAL values flap ABSENT, ADDED ROUND(2) TO 8th COLUMN, TAKE ACTUAL RESULTS AS EXPECTED, IMPALA-5956. --- tpcds-q90.test RESULT MISMATCH ABSENT, IMPALA-5945. --- tpcds-q93.test MISSING TABLE "reason" ABSENT, IMPALA-5960 --- tpcds-q98.test RESULT MISMATCH FIXED, ADDED ROUND() TO LAST COLUMN Change-Id: I6e284888600a7a69d1f23fcb7dac21cbb13b7d66 Reviewed-on: http://gerrit.cloudera.org:8080/8102 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-23 19:32:10 +00:00
Tianyi Wang	359b409b96	IMPALA-5789: Add always_false flag in bloom filter This patch adds an always_false flag in bloom filters. The flag is set if nothing has been inserted into the bloom filter. HdfsScanner uses this flag to early terminate the scan at file and split granularities. Testing: It passes existing tests. Two test cases are added checking that an always-false runtime filter can filter out files and splits. In single node perf tests, time spent on primitive_empty_build_join_1 is reduced by 75%. Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d Reviewed-on: http://gerrit.cloudera.org:8080/8170 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-20 03:59:49 +00:00
Michael Brown	0bddf0fb63	IMPALA-6040: skip test_multi_compression_types where Hive isn't supported A recent commit "IMPALA-5448: fix invalid number of splits reported in Parquet scan node" neglected to account for the fact that in some environments, Impala runs without Hive. The typical pattern for tests that use Hive is skip them if they are executed against such environments. Change-Id: I3ad4b72839f8ac3bcb824287d02dd6964eea3e3e Reviewed-on: http://gerrit.cloudera.org:8080/8259 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-12 02:16:20 +00:00
stiga-huang	192cd96d9e	IMPALA-5448: fix invalid number of splits reported in Parquet scan node Parquet splits with multi columns are marked as completed by using HdfsScanNodeBase::RangeComplete(). It duplicately counts the file types as column codec types. Thus the number of parquet splits are the real count multiplies number of materialized columns. Furthermore, according to the Parquet definition, it allows mixed compression codecs on different columns. This's handled in this patch as well. A parquet file using gzip and snappy compression codec will be reported as: FileFormats: PARQUET/(GZIP,SNAPPY):1 This patch introduces a compression types set for the above cases. Testing: Add end-to-end tests handling parquet files with all columns compressed in snappy, and handling parquet files with multi compression codec. Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1 Reviewed-on: http://gerrit.cloudera.org:8080/8147 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-10 01:30:33 +00:00
Pranay	d40047aa9b	IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet test_scanners_fuzz.py currently tests compressed parquet but does not test uncompressed parquet. This fix adds a new test case for uncompressed parquet. Testing ------- Ran the query_test/test_scanners_fuzz.py in a loop (5 times) and there was no impalad crash seen. Change-Id: I760de7203a51cf82b16016fa8043cadc7c8325bc Reviewed-on: http://gerrit.cloudera.org:8080/8056 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-06 00:32:17 +00:00
Thomas Tauber-Marshall	c07391ce51	IMPALA-5994: Lower case struct-field names Impala tries to always store column names in lower case. As part of a cleanup of issues related to upper case Kudu column names, a check was added in Analyzer to enforce this. The check fails when doing star expansion on a struct to select all fields in the case where a table was created in Hive with upper case letters in a struct field name. This happens because Hive does not covert struct field names to all lower case in HMS. The solution is to force StructField names to lower case. Testing: - Added a test in test_nested_types.py - Fixed FE test that expected struct field to be output in upper case. Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e Reviewed-on: http://gerrit.cloudera.org:8080/8169 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-30 01:13:19 +00:00
Alex Behm	c1781b73b3	Move tests related to the old join node. No tests were added/dropped or modified. They are consolidated into fewer .test files. Change-Id: Idda4b34b5e6e9b5012b177a4c00077aa7fec394c Reviewed-on: http://gerrit.cloudera.org:8080/8153 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-28 18:36:17 +00:00
aphadke	f87da848f5	IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec If a scan range is skipped at runtime the scan node skips reading the range and never figures out the underlying compression codec used to compress the files. In such a scenario we default the compression codec to NONE which can be misleading. This change marks these files as filtered in the scan node profile e.g. - File Formats: TEXT/NONE:364 TEXT/NONE(Skipped):1460 Change-Id: I797916505f62e568f4159e07099481b8ff571da2 Reviewed-on: http://gerrit.cloudera.org:8080/7245 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2017-09-21 17:38:08 +00:00
Tianyi Wang	d7e41a3776	IMPALA-4987: Fix flaky test test_row_availability.py This patch keeps test_row_availbility from randomly failing. In this test the time interval between the 'Rows available' timeline event and the previous event in the runtime profile is measured in order to make sure that the rows become available after a specific amount of time. This measurement is not correct since the previous event is that the coordinator finished sending the query fragments to the backends, which means the execution on some backends might have already started. This patch tracks another event "Ready to start" as the beginning of the time interval instead. The coordinator begins to send the query fragments to the backends after this event so the time check should always pass. Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3 Reviewed-on: http://gerrit.cloudera.org:8080/8036 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-13 00:55:05 +00:00
Gabor Kaszab	545eab6d62	IMPALA-4826: Fix error during a scan on repeated root schema in Parquet. Having the repetition level set to REPEATED on the root schema resulted a scan to fail with error when Impala tried to parse that table. As a solution, the 'REPEATED' repetition level is ignored when the root schema is processed. The reasoning behind is that the Parquet format description says that the repetition level of the root schema should not be set to REPEATED anyway, so it's safe to ignore it in case it is set to this value for some reason. Change-Id: I7ea84589e1d122ad9d43adde46893ec0ecc5f9c4 Reviewed-on: http://gerrit.cloudera.org:8080/7870 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-06 20:07:56 +00:00
Tim Wood	95f5fb7548	IMPALA-5617: Include full workload name in tpch_nested query filenames The concurrent test driver did not pick them up because the name prefix did not match the workload dirname. The query test driver used a hardcoded prefix. Testing done: Ran tests/stress/concurrent_select.py, tests/query_test/test_tpch_nested_queries.py locally; latter passed, former hit IMPALA-5855 after correctly locating all 22 new tpch_nested query files. Change-Id: Ie067b201ae20b4f4c61a98be7ac1ec5a3f8febd8 Reviewed-on: http://gerrit.cloudera.org:8080/7891 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: David Knupp <dknupp@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-31 01:08:40 +00:00
Tim Armstrong	caefd86136	IMPALA-5830: SET_DENY_RESERVATION_PROBABILITY test Add a targeted test that confirms that setting the query option will force spilling. Testing: Ran test_spilling locally. Change-Id: Ida6b55b2dee0779b1739af5d75943518ec40d6ce Reviewed-on: http://gerrit.cloudera.org:8080/7809 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-29 23:01:10 +00:00
Thomas Tauber-Marshall	dc7792c980	IMPALA-5799: Kudu DML can crash if schema has changed We check that the number/types of columns in a Kudu DML match the underlying table during analysis. However, it's possible that the schema may be modified between analysis and execution, and if it's modified in incompatible ways it can cause Impala to crash. Once the KuduTable object has been opened by the KuduTableSink, its schema will remain the same, so we can check in Open() that the schema is what we're expecting. If the schema changes between Open() and when the WriteOp is sent to Kudu, Kudu will send back an error and we already handle this gracefully. Testing: - Added an e2e test that concurrently inserts into a Kudu table while dropping and then adding a column. It relies on timing, but running in a loop locally it caused Impala to crash every time without this change. Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587 Reviewed-on: http://gerrit.cloudera.org:8080/7688 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-24 22:37:19 +00:00
Tim Armstrong	bb6b0ce249	IMPALA-5780,IMPALA-5779: extra spilling tests * Test for disable_unsafe_spills * Test for buffer size > I/O size (--read_size) Change-Id: I03de00394bb6bbcf381250f816e22a4b987f1135 Reviewed-on: http://gerrit.cloudera.org:8080/7787 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-24 06:27:05 +00:00
Tim Armstrong	acaf8b9f0c	IMPALA-5570: fix spilling null-aware anti join IMPALA-4672: Part 2 regressed NAAJ by tightening up the spilling invariants (e.g. can't unpin with spilling disabled) but we didn't have tests for spilling NAAJs that could detect the regression. This patch adds those tests, fixes the regressions, and improves NAAJ by reliably spilling the probe side and not trying to bring the whole probe side into memory. The changes are: * All null-aware streams start off in memory and are only unpinned if spilling is enabled. * The null-aware build partition can be spilled in the same way as hash partitions. * Probe streams are unpinned whenever there is memory pressure - if spilling is enabled and either a build partition is spilled or appending to the probe stream fails. * Spilled probe streams are not re-pinned in EvaluateNullProbe(). Instead we just iterate over the rows of the stream. Testing: Add query tests where the three different buckets of rows are large enough to spill: the build and probe of the null-aware partition and the null probe rows. Test both spilling and in-memory (with spilling disabled) cases. Change-Id: Ie2e60eb4dd32bd287a31479a6232400df65964c1 Reviewed-on: http://gerrit.cloudera.org:8080/7367 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-24 04:24:10 +00:00
Bikramjeet Vig	c67b198a19	IMPALA-5784: Separate planner and user set query options in profile This separation will help the user better understand the query runtime profile. Testing: Modified an existing test case. Change-Id: Ibfc7832963fa0bd278a45c06a5a54e1bf40d8876 Reviewed-on: http://gerrit.cloudera.org:8080/7721 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-24 02:42:01 +00:00
Tim Armstrong	ed87c40600	IMPALA-3208: max_row_size option Adds support for a "max_row_size" query option that instructs Impala to reserve enough memory to process rows of the specified size. For spilling operators, the planner reserves enough memory to process rows of this size. The advantage of this compared to simply specifying larger values for min_spillable_buffer_size and default_spillable_buffer_size is that operators may be able to handler larger rows without increasing the size of all their buffers. The default value is 512KB. I picked that number because it doesn't increase minimum reservations too much even with smaller buffers like 64kb but should be large enough for almost all reasonable workloads. This is implemented in the aggs and joins using the variable page size support added to BufferedTupleStream in an earlier commit. The synopsis is that each stream requires reservation for one default-sized page per read and write iterator, and temporarily requires reservation for a max-sized page when reading or writing larger pages. The max-sized write reservation is released immediately after the row is appended and the max-size read reservation is released after advancing to the next row. The sorter and analytic simply use max-sized buffers for all pages in the stream. Testing: Updated existing planner tests to reflect default max_row_size. Added new planner tests to test the effect of the query option. Added "set" test to check validation of query option. Added end-to-end tests exercising spilling operators with large rows with and without spilling induced by SET_DENY_RESERVATION_PROBABILITY. Change-Id: Ic70f6dddbcef124bb4b329ffa2e42a74a1826570 Reviewed-on: http://gerrit.cloudera.org:8080/7629 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-23 03:27:26 +00:00
David Knupp	db60c1cd24	IMPALA-5820: Fix string format() syntax in test_scanners_fuzz.py Make the call to format() compatible with older versions of python (< 2.7), which expect an indices in the string being formatted, e.g. "{0} {1} {2}".format('foo', 'bar', 'baz'). Without the numbers, format() raises an exception. Tested by running this test suite using python 2.6.6. Before the patch, the tests failed. After the patch, they pass. Change-Id: I5384aaf83a6a1f3c7643ed9f15de2dba1a5913a5 Reviewed-on: http://gerrit.cloudera.org:8080/7761 Reviewed-by: David Knupp <dknupp@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-22 20:41:07 +00:00
Matthew Jacobs	7264c54751	IMPALA-5644,IMPALA-5810: Min reservation improvements Rejects queries during admission control if: * the largest (across all backends) min buffer reservation is greater than the query mem_limit or buffer_pool_limit * the sum of the min buffer reservations across the cluster is larger than the pool max mem resources There are some other interesting cases to consider later: * every per-backend min buffer reservation is less than the associated backend's process mem_limit; the current admission control code doesn't know about other backend's proc mem_limits. Also reduces minimum non-reservation memory (IMPALA-5810). See the JIRA for experimental results that show this slightly improves min memory requirements for small queries. One reason to tweak this is to compensate for the fact that BufferedBlockMgr didn't count small buffers against the BlockMgr limit, but BufferPool counts all buffers against it. Testing: * Adds new test cases in test_admission_controller.py * Adds BE tests in reservation-tracker-test for the reservation-util code. Change-Id: Iabe87ce8f460356cfe4d1be4d7092c5900f9d79b Reviewed-on: http://gerrit.cloudera.org:8080/7678 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-22 08:27:12 +00:00
Taras Bobrovytsky	082058e2c7	IMPALA-5776: Write partial tuple to the correct mempool In the text scanner, we were writing the partial tuple variable length data to data_buffer_pool_ mempool which caused strange behavior, such as incorrect results. If we are scanning compressed data, the pool gets attached to the row batch at the end of a GetNext() call and gets freed before the next GetNext() call. This is wrong because we expect the data in the partial tuple to survive between the GetNext() calls. If we are scanning non compressed data, data_buffer_pool_ never gets cleared and grows over time until the scanner finishes reading the scan range. We fix the problem by writing the varlen partial tuple data to boundary_pool, which is where the constant length partial tuple data is written. We also make sure that boundary pool does not hold any tuple data of returned batches by always deep copying it to output batches. Testing: - Ran some tests locally on ASAN build. - Updated test_scanners_fuzz.py to make slightly more significant changes to the data files. This change was helpful for finding issues while developing this patch. Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Reviewed-on: http://gerrit.cloudera.org:8080/7639 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-18 23:27:38 +00:00
Tim Armstrong	8609b09a95	IMPALA-5681: release reservation from blocking operators When an in-memory blocking aggregation or join is in the GetNext() phase where it is outputting accumulated rows then we expect memory consumption to monotonically decrease because no more rows will be accumulated in memory. This change adds support to release unused reservation and makes use of it for in-memory aggregations and sorts. We don't release memory for operators with spilled data, since they may need the reservation to bring it back into memory. We also don't release memory in subplans, since it will probably be used in a later iteration of the subplan. Testing: Updated spilling test that now requires less memory. Ran stress test binary search on tpch_parquet. No changes, except Q18 now requires 325MB instead of 450MB to execute without spilling. Ran query with two sorts in the same pipeline and watched /memz to confirm that the first node in the pipeline was incrementally releasing memory. Added a regression test based on this experiment. Added a backend test to directly test reservation decreasing. Change-Id: I6f4d0ad127d5fcd14b9821a7c127eec11d98692f Reviewed-on: http://gerrit.cloudera.org:8080/7619 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-17 20:17:48 +00:00
Tim Armstrong	852e1bb728	IMPALA-3931: arbitrary fixed-size uda intermediate types Make many builtin aggregate functions use fixed-length intermediate types: * avg() * ndv() * stddev(), variance(), etc * distinctpc(), distinctpcsa() sample(), appx_median(), histogram() and group_concat() actually allocate var-len data so aren't changed. This has some major benefits: * Spill-to-disk works properly with these aggregations. * Aggregations are more efficient because there is one less pointer indirection. * Aggregations use less memory, because we don't need an extra 12-byte StringValue for the indirection. Adds a special-purpose internal type FIXED_UDA_INTERMEDIATE. The type is represented in the same way as CHAR - a fixed-size array of bytes, stored inline in tuples. However, it is not user-visible and does not support CHAR semantics, i.e. users can't declare tables, functions, etc with the type. The pointer and length is passed into aggregate functions wrapped in a StringVal. Updates some internal codegen functions to work better with the new type. E.g. store values directly into the result tuple instead of via an intermediate stack allocation. Testing: This change only affects builtin aggregate functions, for which we have test coverage already. If we were to allow wider use of this type, it would need further testing. Added an analyzer test to ensure we can't use the type for UDAs. Added a regression test for spilling avg(). Added a regression test for UDA with CHAR intermediate hitting DCHECK. Perf: Ran TPC-H locally. TPC-H Q17, which has a high-cardinality AVG(), improved dramatically. +----------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +----------+-----------------------+---------+------------+------------+----------------+ \| TPCH(60) \| parquet / none / none \| 18.44 \| -17.54% \| 11.92 \| -5.34% \| +----------+-----------------------+---------+------------+------------+----------------+ +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ \| Workload \| Query \| File Format \| Avg(s) \| Base Avg(s) \| Delta(Avg) \| StdDev(%) \| Base StdDev(%) \| Num Clients \| Iters \| +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ \| TPCH(60) \| TPCH-Q12 \| parquet / none / none \| 18.40 \| 17.64 \| +4.32% \| 0.77% \| 1.09% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q22 \| parquet / none / none \| 7.07 \| 6.90 \| +2.36% \| 0.28% \| 0.30% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q3 \| parquet / none / none \| 12.37 \| 12.11 \| +2.10% \| 0.18% \| 0.15% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q7 \| parquet / none / none \| 42.48 \| 42.09 \| +0.93% \| 2.45% \| 0.80% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q6 \| parquet / none / none \| 3.18 \| 3.15 \| +0.89% \| 0.67% \| 0.76% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q19 \| parquet / none / none \| 7.24 \| 7.20 \| +0.50% \| 0.95% \| 0.67% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q10 \| parquet / none / none \| 13.37 \| 13.30 \| +0.50% \| 0.48% \| 1.39% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q5 \| parquet / none / none \| 7.47 \| 7.44 \| +0.36% \| 0.58% \| 0.54% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q11 \| parquet / none / none \| 2.03 \| 2.02 \| +0.06% \| 0.26% \| 1.95% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q4 \| parquet / none / none \| 5.48 \| 5.50 \| -0.27% \| 0.62% \| 1.12% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q13 \| parquet / none / none \| 22.11 \| 22.18 \| -0.31% \| 0.18% \| 0.55% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q15 \| parquet / none / none \| 8.45 \| 8.48 \| -0.32% \| 0.40% \| 0.47% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q9 \| parquet / none / none \| 33.39 \| 33.66 \| -0.81% \| 0.75% \| 0.59% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q21 \| parquet / none / none \| 71.34 \| 72.07 \| -1.01% \| 1.84% \| 1.79% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q14 \| parquet / none / none \| 5.93 \| 6.00 \| -1.07% \| 0.15% \| 0.69% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q20 \| parquet / none / none \| 5.72 \| 5.79 \| -1.09% \| 0.59% \| 0.51% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q18 \| parquet / none / none \| 45.42 \| 45.93 \| -1.10% \| 1.42% \| 0.50% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q2 \| parquet / none / none \| 4.81 \| 4.89 \| -1.52% \| 1.68% \| 1.01% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q16 \| parquet / none / none \| 5.41 \| 5.52 \| -1.98% \| 0.66% \| 0.73% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q1 \| parquet / none / none \| 27.58 \| 29.13 \| -5.34% \| 0.24% \| 1.51% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q8 \| parquet / none / none \| 12.61 \| 14.30 \| -11.78% \| 6.20% \| * 15.28% * \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q17 \| parquet / none / none \| 43.74 \| 126.58 \| I -65.44% \| 1.34% \| 9.60% \| 1 \| 5 \| +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ Change-Id: Ife90cf27989f98ffb5ef5c39f1e09ce92e8cb87c Reviewed-on: http://gerrit.cloudera.org:8080/7526 Tested-by: Impala Public Jenkins Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2017-08-17 03:12:48 +00:00
Tim Armstrong	df9ecdc45a	IMPALA-5772: also fix TestScratchLimit This reduces the scratch limit to the same value as used in TestScratchDisk. Change-Id: If5c42b6ded44d86c3a430a983096f14c0b88a287 Reviewed-on: http://gerrit.cloudera.org:8080/7664 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-14 09:08:36 +00:00
Matthew Jacobs	4456ead841	IMPALA-5760: Revert IMPALA-4795 Revert commit `3059024bd8` for IMPALA-4795: Allow fetching function obj from catalog using signature This commit seems to cause TestUdfExecution.test_java_udfs to fail periodically. IMPALA-4795 wasn't a critical fix, so lets just revert it until we know we can fix the flaky test at the same time. Change-Id: Iae56a75e8ec44af6dae50f18869a486e5f8b608c Reviewed-on: http://gerrit.cloudera.org:8080/7616 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-08 22:13:11 +00:00
Tim Armstrong	c4f903033c	IMPALA-3200: more buffer pool end-to-end tests This adds most of the end-to-end tests described in the test plan. See http://goo.gl/v3Strz. * End-to-end test for disk spill encryption. * Admission control test for the case when acquiring initial reservation fails. * Initial reservation acquire failure test * scratch_limit tests for Join, Agg, Sort, Analytic * Memory usage scaling tests for Join, Agg, Sort, Analytic Also splits out the slow sort queries in test_spilling and moves them to exhaustive so the individual tests run faster and have better parallelism. Testing: Ran all the core tests. Will do a full exhaustive run before committing. Change-Id: I554aa5ddfef4f8e75295596e720a14eee1afa17f Reviewed-on: http://gerrit.cloudera.org:8080/7552 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-07 00:57:46 +00:00
Tim Armstrong	aa0ee1e10a	IMPALA-4703: reservation denial debug action Add debug action to deny reservation increases with some probability. This allows us to test various scenarios, particularly: * The case when the node only gets its initial reservation and must run to completion without increasing its reservation. * The case when there is some memory pressure and the node sometimes gets a reservation increase and sometimes doesn't. E.g. to deny all reservation requests after an ExecNode has opened: set debug_action=-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0 This was applied to test_spilling. It caught a bug in the PAGG with spilling string aggregations. This required some minor extensions to the debug actions. * Allow debug actions that apply to all ExecNodes if node_id is -1. * Allow passing parameters to debug actions. The current grammar of the actions is not well-oriented towards extension, so I resorted to using @ as a new delimiter. I also optimised ExecDebugAction() so that it is much faster in the common case and extended --disable_mem_pools to prevent the buffer pool from holding onto unused buffers. Change-Id: Ied39bb091b12156e5dc61b528c6c0cd8de3fe657 Reviewed-on: http://gerrit.cloudera.org:8080/7022 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-05 22:13:31 +00:00

1 2 3 4 5 ...

652 Commits