impala

mirror of https://github.com/apache/impala.git synced 2025-12-30 12:02:10 -05:00

Author	SHA1	Message	Date
Jinchul	99962d2e81	IMPALA-4168: Adds Oracle-style hint placement for INSERT/UPSERT Allow to specify Oracle-style hint on INSERT/UPSERT statements. For example, - insert /* +noshuffle / into table functional.alltypes partition(year, month) select from functional.alltypes; - upsert /* +noshuffle / into functional_kudu.alltypes select from functional.alltypes; Testing: Add unit tests to ParserTest#TestPlanHints Add plan check tests to PlannerTest#testInsert, PlannerTest#testKuduUpsert Add tests to ToSqlTest#planHintsTest Change-Id: Ied7629d70197a0270cdc0853e00cc021fdb4dc20 Reviewed-on: http://gerrit.cloudera.org:8080/8676 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-10 03:03:49 +00:00
Tim Armstrong	d3ff67b8b3	IMPALA-6370: fix partitioned parquet tables with nested types When materialising a nested collection, has_template_tuple() should use the template tuple for the collection, not the top-level tuple. Testing: Added tests based on nested-types-basic.test that operate on a simple partitioned table. The tests reliably crashed Impala before the fix. Change-Id: Ic808b824ce3b31af0539036d8ca23d17b18deab4 Reviewed-on: http://gerrit.cloudera.org:8080/8947 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-05 20:44:21 +00:00
Thomas Tauber-Marshall	96b976aff3	IMPALA-6295: Fix mix/max handling of 'nan' and 'inf' This patch fixes several issues related to the min/max aggregate functions and their handling of 'nan' and 'inf': - Previously, if 'inf' or '-inf' was the only value for the min/max and codegen was being used, the result would be incorrect. This occurred, for example in the case of 'inf' and 'min', because we set an initial value of numeric_limits::max, which is less than 'inf', so the returned min was numeric_limits::max when it should be 'inf'. The fix is to set the initial value to numeric_limits::infinity. - Previously, if one of the values was 'nan', the result of min/max was non-deterministic depending on the order the values were evaluated in. This occurs because 'nan' < or > 'any value' is always false, so if the first value added was 'nan', all other comparisons would be false and 'nan' would be returned, whereas if the first value wasn't 'nan' then the 'nan' wouldn't be returned. The fix is to treat 'nan' specially and to always return 'nan' if there is a single 'nan' value. Testing: - Added e2e tests for both scenarios, as well as adding a little extra nan/inf coverage for other aggregate functions. Change-Id: Ia1e206105937ce5afc75ca5044597d39b3dc6a81 Reviewed-on: http://gerrit.cloudera.org:8080/8854 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-04 01:23:43 +00:00
Bikramjeet Vig	545163bb0a	IMPALA-5929: Remove redundant explicit casts to string This patch adds a query rewriter to remove redundant explicit casts to a string type (string, char, varchar) from binary predicates of the form "cast(<non-const expr> to <string type>) <eq/ne op> <string constant>". The cast is redundant if the predicate evaluation is the same even if the cast is removed and the constant is converted to the original type of the expression. For example: cast(int_col as string) = '123456' -> int_col = 123456 Performance: For the following query on a table having 6001215 records - select * from tpch.lineitem where cast(l_linenumber as string) = '0' +-----------------+-----------+--------+ \| \| Scan Time \| +-----------------+-----------+--------+ \| \| Avg \| St dev \| \| Without rewrite \| 1s406ms \| 44ms \| \| With rewrite \| 1s099ms \| 28ms \| +-----------------+-----------+--------+ Testing: - Added unit tests to ExprRewriteRulesTest - Added functional test to expr.test - Current FE planner tests and BE expr-test run successfully with this change. Change-Id: I91b7c6452d0693115f9b9ed9ba09f3ffe0f36b2b Reviewed-on: http://gerrit.cloudera.org:8080/8660 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-03 01:15:42 +00:00
Alex Behm	1f7b3b00e9	IMPALA-5310: Part 3: Use SAMPLED_NDV() in COMPUTE STATS. Modifies COMPUTE STATS TABLESAMPLE to use the new SAMPLED_NDV() function. Testing: - modified/improved existing functional tests - core/hdfs run passed Change-Id: I6ec0831f77698695975e45ec0bc0364c765d819b Reviewed-on: http://gerrit.cloudera.org:8080/8840 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-16 04:58:59 +00:00
Taras Bobrovytsky	7256fcefb4	IMPALA-6284: Mark the intermediate decimal avg struct as packed We saw some failures on the exhaustive release build because the compiler assumed that the pointer to the intermediate struct that is used for computing decimal average was aligned. To fix the problem, we mark the struct with a "packed" attribute so that the compiler does not expect it to be aligned. Testing: - Ran the failing test locally on an release build and it passed. Change-Id: Id25ec6e20dde3f50fb37a22135b355ad251809e0 Reviewed-on: http://gerrit.cloudera.org:8080/8836 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-16 03:26:43 +00:00
Thomas Tauber-Marshall	3f1f706393	IMPALA-6297: Don't partition/sort for DML on unpartitioned Kudu table Impala partitions and sorts rows according to the target table's partitioning scheme before inserting them into Kudu in order to improve the performance of large inserts. A recent change added the ability to create unpartitioned Kudu tables, but Impala still does the partitioning/sorting for them even though its wasted work. This patch modifies the planner to not add the partition/sort for Kudu inserts if the table is unpartitioned, unless the clustered/shuffle hints are used. It also removes the exchange in the case where the partition exprs are all constant. Testing: - Added planner tests for inserting into an unpartitioned Kudu table, with and without hints, and for when the partition exprs are constant. - Ran the existing correctness tests for inserts into unpartitioned Kudu tables in kudu_create.test Change-Id: I3e01a7dd5284767a25df3218656746a5d0ee4632 Reviewed-on: http://gerrit.cloudera.org:8080/8810 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-16 03:06:19 +00:00
stiga-huang	5c593be59c	IMPALA-6301: Fix test failures when username or group name contains dots Some tests use the local user's group name to construct SQLs, which may lead to syntax errors when group name contains dots. We need to quote the group names in SQL to avoid this error. Besides, a test in test_admission_controller uses '\w+' to match the local user name. This expression cannot match usernames with dots, which causes test failure as well. Instead, we should use '\S+'. Change-Id: Ib8ae15bb6a929dc48d3ad2176c8b3fafff87f32b Reviewed-on: http://gerrit.cloudera.org:8080/8807 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-13 23:06:45 +00:00
Jinchul	4feb4f3a54	IMPALA-5754: Improve randomness of rand()/random() Currently implementation of rand/random built-in functions use rand_r of C library. We recognized its randomness was poor. pcg32 of third party library shows better randomness than rand_r. Testing: Revise unit test in expr-test Add E2E test to random.test Change-Id: Idafdd5fe7502ff242c76a91a815c565146108684 Reviewed-on: http://gerrit.cloudera.org:8080/8355 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-12-13 10:04:40 +00:00
Zach Amsden	245df3c69a	IMPALA-6245: Tolerate column indenting from Hive The fix for HIVE-3140 started indenting multi-line comments, which breaks Impala testing when run against Hive 2.1.1. To test this using the pure test runner proved difficult since it would require extensive changes to support both row_regexes (since the columns changed order) and subset support (since the number of rows changed). Instead, we manually verify the hints are present in the output in the python test. The fact that the hints have been reformatted leaves us in an uncertain state as to whether they actually get applied, so a new test case has been added to run EXPLAIN SELECT on the view and verify the joins happen exactly as we expect. Testing: Ran the views-ddl test against Impala mini-cluster setups using both Hive 2.1.1 and Hive 1.1.0 Change-Id: I49e53b1230520ca6e850af28078526e6627d69de Reviewed-on: http://gerrit.cloudera.org:8080/8719 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-12 00:17:56 +00:00
Thomas Tauber-Marshall	2e83ba5796	IMPALA-6280: Materialize TupleIsNullPredicate for insert sorts When a sort is inserted into a plan for an INSERT due to either the target table being a Kudu table or the use of the 'clustered' hint, and a TupleIsNullPredicate is present in the output of the sort, the TupleIsNullPredicate may reference an incorrect tuple (i.e. not the materialized sort tuple), leading to errors. The solution is to materialize the TupleIsNullPredicate into the sort tuple and then perform the appropriate expr substitutions, as is already done for the case of analytic sorts. Testing: - Added an e2e test with a query that would previously fail. Change-Id: I6c0ca717aa4321a5cc84edd1d5857912f8c85583 Reviewed-on: http://gerrit.cloudera.org:8080/8791 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-08 23:58:18 +00:00
Alex Behm	11497c2aa9	IMPALA-6286: Remove invalid runtime filter targets. If the target expression of a runtime filter evaluates to a non-NULL value for outer-join non-matches, then assigning the filter below the nullable side of an outer join may lead to incorrect query results. See IMPALA-6286 for an example and explanation. This patch adds a conservative check that prevents the creation of runtime filters that could potentially have such incorrect targets. Some safe opportunities are deliberately missed to keep the code simple. See RuntimeFilterGenerator#getTargetSlots(). Testing: - added planner tests which passed locally Change-Id: I88153eea9f4b5117df60366fad2bd91776b95298 Reviewed-on: http://gerrit.cloudera.org:8080/8783 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-08 23:11:36 +00:00
Thomas Tauber-Marshall	e94c60833a	IMPALA-6069: Fix CodegenAnyVal's handling of 'nan' Previously, CodegenAnyVal used an LLVM function for floating point comparison that considered 'nan' = 'nan' to be true. This is inconsistent with the way we handle 'nan' in the non-codegen path, where we consider 'nan' = 'nan' to be false, leading to inconsisent results. This patch fixes CodegenAnyVal to use an LLVM function for floating point comparison that considers 'nan' = 'nan' to be false. Testing: - Added e2e tests for the two scenarios affected by this: CASE and joins. Change-Id: I1bb8e5074b3c939927dedc46bc9db63ca24486a1 Reviewed-on: http://gerrit.cloudera.org:8080/8790 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-08 22:42:03 +00:00
Michael Ho	ed72910e96	IMPALA-6262: Always initialize runtime profile for DataSink This change moves the creation of the runtime profile from DataSink::Prepare() to the ctor of DataSink derived classes. This makes sure that DataSink::Close() and other functions can access the profile even if the DataSink fails to initialize. Testing done: Added a test case which triggers failure in the initialization of output expressions in a HdfsTableSink. Impalad crashed consistently without the fix. Change-Id: I2a683000ef180027b929dbebe78bc2a530a4767e Reviewed-on: http://gerrit.cloudera.org:8080/8770 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-07 09:47:09 +00:00
Tianyi Wang	c505a8159b	IMPALA-6210: Add query id to lineage graph logging Some tools use lineage graph logging to collect query metrics. Currently only query hash is present in this log. Adding query id into it makes such accounting easier. Testing: The equality of query id in the query profile and lineage log is checked in test_lineage.py. A test for TUniqueIdUtil is added to the FE tests. Change-Id: I4adbd02df37a234dbb79f58b7c46ca11a914229f Reviewed-on: http://gerrit.cloudera.org:8080/8589 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-06 00:52:19 +00:00
Vuk Ercegovac	16c5f514e0	IMPALA-6273: fixes subquery tests for functional_hbase IMPALA-1422 introduced tests that do not work with the testing setup for hbase. Namely, tinyinttable is not defined in the functional_hbase database, but is defined in the functional database. Exhaustive tests uncovered the issue. This change makes two changes so that tests work with functional_hbase: 1) use a table that is present in both functional and functional_hbase. the tests needed a subquery result with a single int column. tinyinttable is replaced with an inline view that provides this single int column in a portable manner. 2) nulls are handled differently with hbase (see IMPALA-728) so the nulltable used in the tests is set to functional.nulltable to avoid inconsistent results across input formats. Testing: - ran e2e tests with exhaustive exploration strategy for the broken test. Change-Id: Ibaa3a3df7362ac6d3ed07aff133dc4b3520bb4e0 Reviewed-on: http://gerrit.cloudera.org:8080/8765 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-06 00:41:11 +00:00
Vuk Ercegovac	633dbff71d	IMPALA-1422: support a constant on LHS of IN predicates. Currently, constant expressions for the LHS of the IN predicate are not supported. This patch adds this support as a rewrite in StmtRewriter (where subqueries are rewritten to joins). Since there is a nested-loop variant of left semijoin, support for IN is handled by not erring out. NOT IN is handled by a rewrite to corresponding NOT EXISTS predicate. Support for NOT IN with a correlated subquery is not included in this change. Re-organized the frontend subquery analysis tests to expand coverage. Testing: - added frontend subquery analysis tests - added e2e tests Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb Reviewed-on: http://gerrit.cloudera.org:8080/8322 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-02 04:09:05 +00:00
Taras Bobrovytsky	575b5a20e6	IMPALA-5017: Error on decimal overflow Before this patch, decimal operations would either silently overflow (in the case of sum() and avg()), or produce a warning. In this patch, the behaviour is changed so that an error is produced in the case of overflow when DECIMAL_v2 is enabled. Decimal v1 behaviour is unchanged. We introduce overflow checks when computing sum() and avg(). This results in a ~30% performance regression when we are in decimal v2 mode compared to decimal v1. Benchmarks: Query: select sum(dec_38_19) from decimal_tbl Decimal v1: 11.57s Decimal v2: 16.58s Query: select avg(dec_38_19) from decimal_tbl Decimal v1: 12.08s Decimal v2: 17.08s The performance regression is not as bad if we are computing the sum or average of decimal column with a lower precision: Query: select sum(dec_9_5) from decimal_tbl Decimal v1: 11.06s Decimal v2: 13.08s Query: select avg(dec_9_5) from decimal_tbl Decimal v1: 11.56s Decimal v2: 13.57s Testing: - Added several end to end tests. - Updated Expr tests to check for error in case of overflow. Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019 Reviewed-on: http://gerrit.cloudera.org:8080/8404 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-01 23:23:01 +00:00
Zach Amsden	24c2ba0cc5	IMPALA-6244: Fix test failures with Hadoop 3.0 The metadata query test fails when run against Hadoop 3.0 due to some defaults changing for sequence files. Testing: Compared the output of hadoop fs -text /test-warehouse/alltypesmixedformat/year=2009/month=2/000023_0 and verified it is the same after a data load on Hadoop 2.6 and Hadoop 3.0; ran the metadata query test and verified it now passes in both cases. Change-Id: I1ccffdb0f712da1feb55f839e8d87a30f15e4fb6 Reviewed-on: http://gerrit.cloudera.org:8080/8656 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-30 07:33:16 +00:00
Alex Behm	b3d8a507cb	IMPALA-5310: Add COMPUTE STATS TABLESAMPLE. Adds the TABLESAMPLE clause for COMPUTE STATS. Syntax: COMPUTE STATS <table> TABLESAMPLE SYSTEM(<number>) [REPEATABLE(<number>)] Computes and replaces the table-level row count and total file size, as well as all table-level column statistics. Existing partition-level row counts are not modified. The TABLESAMPLE clause can be used to limit the scanned data volume to a desired percentage. When sampling, the unmodified results of the COMPUTE STATS queries are sent to the CatalogServer. There, the stats are extrapolated before storing them into the HMS so as not to confuse other engines like Hive/SparkSQL which may rely on the shared HMS fields being accurate. Limitations - Only works for HDFS tables - TABLESAMPLE is not supported for COMPUTE INCREMENTAL STATS - TABLESAMPLE requires --enable_stats_extrapolation=true Changes to EXPLAIN The stored statistics from the HMS are more clearly displayed under a 'stored statistics' section. Example: 00:SCAN HDFS [functional.alltypes, RANDOM] partitions=24/24 files=24 size=478.45KB stored statistics: table: rows=7300 size=478.45KB partitions: 24/24 rows=7300 columns: all Testing: - added new functional tests - core/hdfs run passed Change-Id: I7f3e72471ac563adada4a4156033a85852b7c8b7 Reviewed-on: http://gerrit.cloudera.org:8080/8136 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-29 22:37:01 +00:00
Michael Ho	63f17e9cea	IMPALA-6187: Fix missing conjuncts evaluation with empty projection Previously, scanners will assume that there are no conjuncts associated with a scan node for queries with no materialized slots (e.g. count()). This is not necessarily the case as one can write queries such as select count() from tpch.lineitem where rand() * 10 < 0; or select count() from tpch.lineitem where rand() > <a partition column>. In which case, the conjuncts should still be evaluated once per row. This change fixes the problem in the short-circuit handling logic for count() to evaluate the conjuncts once per row and only commits a row to the output row batch if the conjuncts evaluate to true. Testing done: Added the example above to the scanner test Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517 Reviewed-on: http://gerrit.cloudera.org:8080/8623 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-29 05:53:15 +00:00
Jinchul	2fba80ee5e	IMPALA-5146: Fix inconsitent results at FROM_UNIXTIME() The FROM_UNIXTIME(epoch) and FROM_UNIXTIME(epoch, format) produce different results when epoch is out of range of TimestampValue. The former produces an empty string, while the latter gives NULL. The fix is to harmonize the results to NULL. Testing: Add unit tests to ExprTest.TimestampFunctions. Change-Id: Ie3a5e9a9cb39d32993fa2c7f725be44d8b9ce9f2 Reviewed-on: http://gerrit.cloudera.org:8080/8629 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-29 05:22:32 +00:00
Zoltan Borok-Nagy	4f11bed407	IMPALA-5936: operator '%' overflows on large decimals Suppose we have a large decimal number, which is greater than INT_MAX. We want to calculate the modulo of this number by 3: BIG_DECIMAL % 3 The result of this calculation can be 0, 1, or 2. This can fit into a decimal with precision 1. The in-memory representation of such small decimals are stored in int32_t in the backend. Let's call this int32_t the result type. The backend had the invalid assumption that it can do the calculation as well using the result type. This assumption is true for multiplying or adding numbers, but not for modulo. Now the backend selects the biggest type of ['return type', '1st operand type', '2nd operand type'] to do the calculation. Change-Id: I2b06c8acd5aa490943e84013faf2eaac7c26ceb4 Reviewed-on: http://gerrit.cloudera.org:8080/8574 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-28 21:11:45 +00:00
Thomas Tauber-Marshall	7dd28ff431	IMPALA-6201: Fix test_basic_filters on ASAN TestRuntimeFilters.test_basic_filters is flaky on ASAN as sometimes the runtime filters aren't recieved within the specified RUNTIME_FILTER_WAIT_TIME_MS. This patch increases the timeout for ASAN builds. Change-Id: I8c20cbb75a9b6da73137f220657aa75dea9dfdce Reviewed-on: http://gerrit.cloudera.org:8080/8646 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-28 03:01:39 +00:00
Gabor Kaszab	88cb68cfbe	IMPALA-2181: Add query option levels for display Four display levels are introduced for each query option: REGULAR, ADVANCED, DEVELOPMENT and DEPRECATED. When the query options are displayed in Impala shell using SET then only the REGULAR and ADVANCED options are shown. A new command called SET ALL shows all the options grouped by their option levels. When the query options are displayed through the SET SQL statement then the result set would contain an extra column indicating the level of each option. Similarly to Impala shell here the SET command only diplays the REGULAR and ADVANCED options while SET ALL shows them all. If the Impala shell connects to an Impala daemon that predates this change then all the options would be displayed in the REGULAR group. Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 Reviewed-on: http://gerrit.cloudera.org:8080/8447 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-28 00:31:15 +00:00
Vuk Ercegovac	21a96ed2e3	IMPALA-4985: use parquet stats of nested types for dynamic pruning Currently, parquet row-groups can be pruned at run-time using min/max stats when predicates (in, binary) are specified for column scalar types. This patch extends pruning to nested types for the same class of predicates. A nested value is an instance of a nested type (struct, array, map). A nested value consists of other nested and scalar values (as declared by its type). Predicates that can be used for row-group pruning must be applied to nested scalar values. In addition, the parent of the nested scalar must also be required, that is, not empty. The latter requirement is conservative: some filters that could be used for pruning are not used for correctness reasons. Testing: - extended nested-types-parquet-stats e2e test cases. Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe Reviewed-on: http://gerrit.cloudera.org:8080/8480 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-22 22:00:16 +00:00
Tianyi Wang	5e9b4e2fd2	IMPALA-5976: Remove equivalence class computation in FE Equivalent class is used to get the equivalencies between slots. It is ill-defined and the current implementation is inefficient. This patch removes it and directly uses the information from the value transfer graph instead. Value transfer graph is reimplemented using Tarjan's strongly connected component algorithm and BFS with adjacency lists to speed up on both condensed and sparse graphs. Testing: It passes the existing tests. In planner tests the equivalence between SCC-condensed graph and uncondensed graph is checked. A test case is added for a helper class IntArrayList. An outer-join edge case is added in planner test. On a query with 1800 union operations, the equivalence class computation time is reduced from 7m57s to 65ms and the planning time is reduced from 8m5s to 13s. Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a Reviewed-on: http://gerrit.cloudera.org:8080/8317 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-18 09:07:06 +00:00
Thomas Tauber-Marshall	2510fe0aa0	IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter or a min-max filter, depending on if it has HDFS or Kudu targets, respectively. In RuntimeFilterGenerator in the planner, each hash join node generates a bloom and min-max filter for each equi-join predicate, but only those filters that end up being assigned to a target make it into the final plan. Min-max filters are only assigned to Kudu scans if the target expr is a column, as Kudu doesn't support bounds on general exprs, and only if the join op is '=' and not 'is distinct from', as Kudu doesn't support returning NULLs if a bound is set. Min-max filters are inserted into by the PartitionedHashJoinBuilder. Codegen is used to eliminate branching on the type of filter. String min-max filters truncate their bounds at 1024 chars, so that the max amount of memory used by min-max filters is negligible. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Functional Testing: - Added new planner tests and updated the old ones. (in old tests, a lot of runtime filters are renumbered as we always generate min-max filters even if they don't end up getting assigned and they take up some of the RF ids). - Updated existing runtime filter tests to work with Kudu. - Added e2e tests for min-max filter specific functionality. Perf Testing: - All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu, timings are averages of 3 runs. - Ran a contrived query with a filter that does not eliminate any rows (full self join of lineitem). The difference in running time was negligible - 24.46s with filters on, 24.15s with filters off for a ~1% slowdown. - Ran a contrived query with a filter that elimiates all rows (self join on lineitem with a join condition that never matches). The filters resulted in a significant speedup - 0.26s with filters on, 1.46s with filters off for a ~5.6x speedup. This query is added to targeted-perf. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Reviewed-on: http://gerrit.cloudera.org:8080/7793 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-17 21:33:51 +00:00
Michael Ho	3ddafcd295	IMPALA-6184: Clean up after ScalarExprEvaluator::Clone() fails When ScalarExprEvaluator::Clone() fails, the newly created evaluator was not added to the output vector. This makes it impossible for callers to close and clean up the evaluators afterwards. This change fixes this by always adding the newly created evaluator to the output vector before checking for the error status. This path is only exercised in the scanner code. Two new tests are added to exercise the failure paths. Testing done: newly added tests in udf-errors.test Change-Id: I45ffd722d0a69ad05ae3c748cf504c7f1a959a1d Reviewed-on: http://gerrit.cloudera.org:8080/8572 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-17 02:24:48 +00:00
Tim Armstrong	ae116b5bf7	IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding Switch the decoders to using more batch-oriented interfaces. As an intermediate step this doesn't make the interfaces of LevelDecoder or DictDecoder batch-oriented, only the lower-level utility classes. The next step would be to change those interfaces to be batch-oriented and make according optimisations in parquet. This could deliver much larger perf improvements than the current patch. The high-level changes are. * BitReader -> BatchedBitReader, which is built to unpack runs of 32 bit-packed values efficiently. * RleDecoder -> RleBatchDecoder, which exposes the repeated and literal runs to the caller and uses BatchedBitReader to unpack literal runs efficiently. * Dict decoding uses RleBatchDecoder to decode repeated runs efficiently and uses the BitPacking utilities to unpack and encode in a single step. Also removes an older benchmark that isn't too interesting (since the batch-oriented approach to encoding and decoding is so much faster than the value-by-value approach). Testing: * Ran core tests. * Updated unit tests to exercise new code. * Added test coverage for the deprecated bit-packed level encoding to that it still works (there was no coverage previously). Perf: Single-node benchmarks showed a few % performance gain. 16 node cluster benchmarks only showed a gain for TPC-H nested. Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6 Reviewed-on: http://gerrit.cloudera.org:8080/8267 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-16 21:23:09 +00:00
Jinchul	f9155f0d81	IMPALA-5341: Avoid unintended filter out in fe test Change-Id: Ie79f644a37b0ffab7b0d8e94e77650d56423697a Reviewed-on: http://gerrit.cloudera.org:8080/8543 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-16 09:01:37 +00:00
Taras Bobrovytsky	d1b92c8b52	IMPALA-6183: Fix Decimal to Double conversion When converting a decimal to a double, we incorrectly used the powf() function in the backend, which returns a float instead of a double. This caused us to lose precision. We fix the problem by replacing the powf() function with a pow() function, which returns a double. Testing: - Added an EE test. Change-Id: I9bf81d039e5037f22c64a32b328832235aafe9e3 Reviewed-on: http://gerrit.cloudera.org:8080/8547 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-15 02:54:53 +00:00
Tianyi Wang	73d1bc3ad4	IMPALA-2281: Replace FNV with FastHash in exchange nodes FNV is not a good enough hash function. This patch introduces FastHash into the codebase and uses it in exchange nodes. Testing: Two test cases involving arbitrary ordering are changed. Single node performance benchmark shows no performance difference. Change-Id: I778317d982dcdb94173a369a65b39f32b4f7ded2 Reviewed-on: http://gerrit.cloudera.org:8080/8417 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-11-08 07:39:02 +00:00
Tim Wood	9923b8297a	IMPALA-6155: Allow tests to pass when ORDER BY does not cover the query. Use VERIFY_IS_EQUAL_SORTED tag on RESULTS section to allow low-order sort deviations to compare. Testing: - Passed local tests/run-tests.py ... - Changed order of expected rows by low-order column and verified test still passes. - Analyzed all TPC-DS query files for similar patterns, found no others. Change-Id: Ib42ba64ce6ac9b75b4a532f20cee0055aaed5a6c Reviewed-on: http://gerrit.cloudera.org:8080/8484 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-08 00:13:56 +00:00
Bikramjeet Vig	94236ff2ff	IMPALA-2494: Support for byte array encoded decimals in Parquet scanner Extendes parquet column reader and associated classes to allow for more than one possible physical type for a given logical type. This patch only adds support for variable sized byte array encoded decimals and more will be added in upcoming commits. Also, column level metadata verification which was currently being done per row group will now only be done once per column per file. Testing: Added backend test for verifying newly added decimal types are decoded correctly. Added Query test that decodes both plain and dictionary-encoded decimals using binary encoding. Performance: Initial perf testing using tpcds_1000 shows no regression. Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Reviewed-on: http://gerrit.cloudera.org:8080/7822 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-07 04:34:26 +00:00
Bikramjeet Vig	f11181cbe5	IMPALA-6123: Fix column order of a query test in test_inline_view_limit Currently a "select *" query in test_inline_view_limit fails during exhaustive testing because Impala returns columns from HBase tables in a different order (IMPALA-886) than the one expected. This fix ensures the column order is consistent by specifying the output columns in the right order in the select query. Testing: Tested locally, with and without exhaustive exploration strategy. Change-Id: I11667872b8788a8b0040bf9252bf07b987b5d330 Reviewed-on: http://gerrit.cloudera.org:8080/8409 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-28 06:29:38 +00:00
Attila Jeges	00fd8388c3	IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to the single node plan without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. The runtime filter assignment is done on the distributed plan and the above constraints are enforced there directly. Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Reviewed-on: http://gerrit.cloudera.org:8080/7564 Tested-by: Impala Public Jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2017-10-26 17:06:32 +00:00
Thomas Tauber-Marshall	e7b1c32103	IMPALA-6004: Fix test_row_filters failure on ASAN 'Test case 16' in test_row_filters has been failing occasionaly on ASAN as the runtime filters are not generated within the specified RUNTIME_FILTER_WAIT_TIME_MS. The fix is to increase RUNTIME_FILTER_WAIT_TIME_MS. This patch updates all of the tests in test_row_filters to use the same timeout, which is set to a higher value for ASAN builds. Change-Id: Ia098735594b36a72f02bf7edd051171689618051 Reviewed-on: http://gerrit.cloudera.org:8080/8358 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-25 18:56:00 +00:00
Bikramjeet Vig	ca55b5926e	IMPALA-4236: Codegen CopyRows() for select nodes Testing: Added test case to verify that CopyRows in select node is successfully codegened. Improved test coverage for select node with limit. Performance: Queries used (num_nodes set to 1): 500 Predicates: select * from (select * from tpch_parquet.lineitem limit 6001215) t1 where l_partkey > 10 and l_extendedprice > 10000 and l_linenumber > 1 and l_comment >'foo0' .... and l_comment >'foo500' order by l_orderkey limit 10; 1 Predicate: select * from (select * from tpch_parquet.lineitem limit 6001215) t1 where l_partkey > 10 order by l_orderkey limit 10; +--------------+-----------------------------------------------------+ \| \| 500 Predicates \| 1 Predicate \| \| +------------+-------------+------------+-------------+ \| \| After \| Before \| After \| Before \| +--------------+------------+-------------+------------+-------------+ \| Select Node \| 12s385ms \| 1m1s \| 234ms \| 797ms \| \| Codegen time \| 2s619ms \| 1s962ms \| 200ms \| 181ms \| +--------------+------------+-------------+------------+-------------+ Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Reviewed-on: http://gerrit.cloudera.org:8080/8196 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-25 01:17:28 +00:00
Taras Bobrovytsky	5ebea0ec4d	IMPALA-5018: Error on decimal modulo or divide by zero Before this patch, decimal operations would never produce an error. Division by and modulo zero would result in a NULL. In this patch, we change this behavior so that we raise an error instead of returning a NULL. We also modify the format of the decimal expr tests format to also include an error field. Testing: - Added several expr and end to end tests. Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb Reviewed-on: http://gerrit.cloudera.org:8080/8344 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-25 00:44:34 +00:00
Tim Wood	f05bd241ea	IMPALA-5376: Implement all TPCDS test cases or alternates for Impala. Main source for TPCDS query and result definitions: https://github.com/gregrahn/tpcds-kit. TPC-DS v2.5.0 qualification queries from G. Rahn, Cloudera, Inc. Data set constructed in mini-cluster using $IMPALA_HOME/buildall.sh -testdata.... This commit continues previous work on IMPALA-5376 in the ASF Impala repo and the Cloudera Gerrit service. This commit splits multi-query tests in the TPC-DS suite definition into one query and result set per test file, as the test framework requires. Names for such files have -1, -2... inner suffixes. The portion of the TPC-DS test suite in this commit passes. It contains no failures, as reflected by runs of $IMPALA_HOME/tests/run-tests.py query_test/test_tpcds_queries.py ... IMPALA-6007 addresses the TPC-DS cases that require skipping (because we don't support them or they flap) or expected-failure (xfail, because we support them but they fail due to bugs.) These require some added tooling for non-Pytest frameworks like the stress test to avoid attempting them until they work. Tests that flap are marked to skip, with a bug ID, since they don't reliably pass or xfail. Expected result sets come from the TPC-DS kit. Some TPC-DS test cases in this commit have been modified in sematically-neutral ways so as to pass on Impala. The tests/query_test/test_tpcds_queries.py driver file is authoritative for the active/skip/xfail status for each case and a brief reason. The following list describes the current status as: --- test-name deviance from TPC-DS spec changes made --- tpcds-q22a.test RESULT MISMATCH in LSD of AVG() values FIXED, HAND_ROUNDED AVG() VALUES IN RESULT SET --- tpcds-q26.test RESULT MISMATCH in LSD of AVG() values ABSENT, IMPALA-6087 --- tpcds-q28.test RESULT MISMATCH in LSD of AVG() values ABSENT, IMPALA-6087 --- tpcds-q30.test UNRECOGNIZED CHARACTER ABSENT, IMPALA-5961. --- tpcds-q31.test RESULT MISMATCH in LSD of DECIMAL values ABSENT, IMPALA-5956. --- tpcds-q35a.test RESULT MISMATCH ABSENT, IMPALA-5950. --- tpcds-q36a.test RESULT MISMATCH ABSENT, IMPALA-4741 --- tpcds-q47.test RESULT MISMATCH in LSD of DECIMAL values ABSENT, IMPALA-6087 --- tpcds-q48.test RESULT MISMATCH in scalar value ABSENT, IMPALA-5950. --- tpcds-q49.test RESULT MISMATCH in LSD of DECIMAL values ABSENT, IMPALA-5945 --- tpcds-q57.test RESULT MISMATCH, excess scale in DECIMAL values ABSENT, IMPALA-6087 --- tpcds-q58.test RESULT MISMATCH in DECIMAL values ABSENT, IMPALA-5946 --- tpcds-q59.test RESULT MISMATCH, excess scale in DECIMAL values ABSENT, IMPALA-6087 --- tpcds-q61.test RESULT MISMATCH in DECIMAL value FIXED. CAST RESULT QUOTIENT TO DECIMAL(15, 4), TAKE ACTUAL RESULT AS EXPECTED --- tpcds-q63.test RESULT MISMATCH, excess scale in DECIMAL values ABSENT, IMPALA-6087 --- tpcds-q64.test RESULT MISMATCH ADDED ORDER BY COLUMNS. --- tpcds-q66.test RESULT MISMATCH ABSENT, IMPALA-4741 --- tpcds-q77a.test RESULT MISMATCH FIXED. TAKE ACTUAL RESULT AS EXPECTED --- tpcds-q78.test RESULT MISMATCH FIXED. TAKE ACTUAL RESULT AS EXPECTED --- tpcds-q83.test RESULT MISMATCH ABSENT, IMPALA-5945. --- tpcds-q85.test MISSING TABLE "reason" ABSENT, IMPALA-5960 --- tpcds-q86a.test RESULT MISMATCH FIXED. TAKE ACTUAL RESULT AS EXPECTED --- tpcds-q89.test RESULT MISMATCH, DECIMAL values flap ABSENT, ADDED ROUND(2) TO 8th COLUMN, TAKE ACTUAL RESULTS AS EXPECTED, IMPALA-5956. --- tpcds-q90.test RESULT MISMATCH ABSENT, IMPALA-5945. --- tpcds-q93.test MISSING TABLE "reason" ABSENT, IMPALA-5960 --- tpcds-q98.test RESULT MISMATCH FIXED, ADDED ROUND() TO LAST COLUMN Change-Id: I6e284888600a7a69d1f23fcb7dac21cbb13b7d66 Reviewed-on: http://gerrit.cloudera.org:8080/8102 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-23 19:32:10 +00:00
Csaba Ringhofer	41f0c6a5a6	IMPALA-5664: Unix time to timestamp conversions may crash Impala TimestampValue::FromSubsecondUnixTime() and UtcFromUnixTimeMicros() are incorrect only in case of the last second of 1399, because these sub-second values are rounded first towards 1400-01-01 00:00:00, which is accepted as a valid date, and the sub-second part is subtracted afterwards, leading to a date outside the valid interval. The maximum case, 9999-12-31 59:59:59 is a bit different, because as I understand, with nanosecond precision posix times, the maximum value is actually 10000-01-01. 00:00:00 minus 1 nanosec. TimestampValue::FromUnixTimeNanos() can create problematic TimestampValues both <1400 and 10000<=. These timestamps can cause problems, because most code assumes that if HasDate/HasTime is true, then it really is a valid timestamp. To fix this, the posix times are checked in the constructor of TimestampValue, and if it is outside the valid interval,both time_ and date_ are set to not_a_date_time. Test: select cast(-17987443200-0.1 as timestamp); This query no longer crashes, but returns NULL, similarly to other < 1400 timestamps. Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Reviewed-on: http://gerrit.cloudera.org:8080/7954 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-21 03:13:14 +00:00
Tianyi Wang	87065638f0	IMPALA-5425: Add test for validating input when setting query options This patch adds multiple query option validation testcases to be/src/service/query-options-test.cc The test cases include parsing edge cases, bondary values, special cases for some options and some testcases moved from testdata/workloads/functional-query/queries/QueryTest/set.test This patch also fixes a bug generating wrong error message for query option RUNTIME_FILTER_WAIT_TIME_MS. Change-Id: I510e02bb0776673d8cbfc22b903831882c6908d7 Reviewed-on: http://gerrit.cloudera.org:8080/7805 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-10 21:54:33 +00:00
stiga-huang	192cd96d9e	IMPALA-5448: fix invalid number of splits reported in Parquet scan node Parquet splits with multi columns are marked as completed by using HdfsScanNodeBase::RangeComplete(). It duplicately counts the file types as column codec types. Thus the number of parquet splits are the real count multiplies number of materialized columns. Furthermore, according to the Parquet definition, it allows mixed compression codecs on different columns. This's handled in this patch as well. A parquet file using gzip and snappy compression codec will be reported as: FileFormats: PARQUET/(GZIP,SNAPPY):1 This patch introduces a compression types set for the above cases. Testing: Add end-to-end tests handling parquet files with all columns compressed in snappy, and handling parquet files with multi compression codec. Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1 Reviewed-on: http://gerrit.cloudera.org:8080/8147 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-10 01:30:33 +00:00
Tim Armstrong	c14a090400	IMPALA-5844: use a MemPool for expr result allocations This is also a step towards IMPALA-2399 (remove QueryMaintenance()). "local" allocations containing expression results (either intermediate or final results) have the following properties: * They are usually small allocations * They can be made frequently (e.g. every function call) * They are owned and managed by the Impala runtime * They are freed in bulk at various points in query execution. A MemPool (i.e. bump allocator) is the right mechanism to manage allocations with the above properties. Before this patch FunctionContext's used a FreePool + vector of allocations to emulate the above behaviour. This patch switches to using a MemPool to bring these allocations in line with the rest of the codebase. The steps required to do this conversion. * Use a MemPool for FunctionContext local allocations. * Identify appropriate MemPools for all of the local allocations from function contexts so that the memory lifetime is correct. * Various cleanup and documentation of existing MemPools. * Replaces calls to FreeLocalAllocations() with calls to MemPool::Clear() More involved surgery was required in a few places: * Made the Sorter own its comparator, exprs and MemPool. * Remove FunctionContextImpl::ReallocateLocal() and just have StringFunctions::Replace() do the doubling itself to avoid the need for a special interface. Worst-case this doubles the memory requirements for Replace() since n / 2 + n / 4 + n / 8 + .... bytes of memory could be wasted instead of recycled for an n-byte output string. * Provide a way redirect agg fn Serialize()/Finalize() allocations to come directly from the output RowBatch's MemPool. This is also potentially applicable to other places where we currently copy out strings from local allocations, e.g. AnalyticEvalNode::AddResultTuple() and Tuple::MaterializeExprs(). * --stress_free_pool_alloc was changed to instead intercept at the FunctionContext layer so that it retains the old behaviour even though allocations do not all come from FreePools. The "local" allocation concept was not exposed directly in udf.h so this patch also renames them to better reflect that they're used for expr results. Testing: * ran exhaustive and ASAN Change-Id: I4ba5a7542ed90a49a4b5586c040b5985a7d45b61 Reviewed-on: http://gerrit.cloudera.org:8080/8025 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-06 00:01:08 +00:00
Philip Zeyliger	c9740b43d1	IMPALA-5908: Allow SET to unset modified query options. The query 'SET <option>=""' will now unset an option within the session, reverting it to its default state. This change became necessary when "SET" started returning an empty string for unset options which don't have a default. The test infrastructure (impala_test_suite.py) resets options to what it thinks is its defaults, and, when this broke, some ASAN builds started to fail, presumably due to a timing issue with how we re-use connections between tests. Previously, SessionState copied over the default options from the server when the session was created and then mutated that. To support unsetting options at the session layer, this change keeps a pointer to the default server settings, keeps separately the mutations, and overlays the options each time they're requested. Similarly, for configuration overlays that happen per-query, the overlay is now done explicitly, because empty per-query overlay values (key=..., value="") now have no effect. Because "set key=''" is ambiguous between "set to the empty string" and "unset", it's now impossible to set to the empty string, at the session layer, an option that is configured at a previous layer. In practice, this is just debug_action and request_pool. debug_action is essentially an internal tool. For request_pool, this means that setting the default request_pool via impalad command line is now a bad idea, as it can't be cleared at a per-session level. For request_pool, the correct course of action for users is to use placement rules, and to have a default placement rule. Testing: * Added a simple test that triggered this side-effect without this code. Specifically, "impala-python infra/python/env/bin/py.test tests/metadata/test_set.py -s" with the modified set.test triggers. * Amended tests/custom_cluster/test_admission_controller.py; it was useful for testing these code paths. * Added cases to query-options-test to check behavior for both defaulted and non-defaulted values. * Added a custom cluster test that checks that overlays are working against * Ran an ASAN build where this was triggering previously. Change-Id: Ia8c383e68064f839cb5000118901dff77b4e5cb9 Reviewed-on: http://gerrit.cloudera.org:8080/8070 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-05 03:04:38 +00:00
Bikramjeet Vig	0601f06cb6	IMPALA-4951: Fix database visibility for user with only column privilege Currently a database is not visible to a user that only has column level privileges for tables in that database. This patch will make the database visible, which is the expected behavior in this case. Testing: added a test case to verify the same. Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a Reviewed-on: http://gerrit.cloudera.org:8080/8168 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-04 03:14:20 +00:00
Philip Zeyliger	eb11b46be6	Re-apply: IMPALA-5589: change "set" in impala-shell to show empty string for unset query options (Re-applies reverted commit `387bde0639`. The commit broke ASAN tests due to a race in how test infrastructure re-uses connections. The fix for that is in an adjacent commit.) When converting TQueryOptions to a map<string,string>, we now convert unset options to the empty string. Within TQueryOptions we have optional options (like mt_dop or compression_codec) with no default specified. In this case, the user was seeing 0 for numeric types and the first enum option for enumeration types (e.g., "NONE" in the compression case). This was confusing as the implementation handles this "null" case differently (e.g., using SNAPPY as the default codec in the case reported in the JIRA). When running "set" in impala-shell, the difference is as follows: - BUFFER_POOL_LIMIT: [0] + BUFFER_POOL_LIMIT: [] - COMPRESSION_CODEC: [NONE] + COMPRESSION_CODEC: [] - MT_DOP: [0] + MT_DOP: [] - RESERVATION_REQUEST_TIMEOUT: [0] + RESERVATION_REQUEST_TIMEOUT: [] - SEQ_COMPRESSION_MODE: [0] + SEQ_COMPRESSION_MODE: [] - V_CPU_CORES: [0] + V_CPU_CORES: [] Obviously, the empty string is a valid value for a string-typed option, where it will be impossible to tell the difference between "unset" and "set to empty string." Today, there are two string-typed options: debug_string defaults to "" and request_pool has no default. An alternative would have been to use a special token like "_unset" or to introduce a new field in the beeswax Thrift ConfigVariable struct. I think the empty string approach is clearest. The other users of this state, which I believe are HiveServer2's OpenSession() call and HiveServer2's response to a "SET" query are affected. They benefit from the same fix, and a new test has been added to test_hs2.py. I did a mild refactoring in the HS2 tests to write a helper method for the very common pattern of excecuting a query. Testing: * Manual testing with impala-shell * Modified impala-shell tests to check this explicitly for one case. * Modified HS2 test to check this as well as the SET k=v statement, which looked otherwise untested. Change-Id: I29f5d8ab874cb1338077f16019a9537766cac0c4 Reviewed-on: http://gerrit.cloudera.org:8080/8096 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-03 01:11:50 +00:00
Thomas Tauber-Marshall	5d92264c48	IMPALA-5951: Remove flaky test_catalogd_timeout test_catalogd_timeout sets a Kudu operation timeout of 1ms and then performs various Kudu operations which it expects to fail due to a timeout. Since the test was written, things have sped up - for example, Impala used to create a new Kudu client for each operation, but that was changed in IMPALA-5167, such that the operations now occasionally complete quickly enough that they don't timeout. There's not really any way to rewrite this test to ensure that it won't be flaky, so the patch removes it. Change-Id: I29fd67d0acc0ee15943c416f2179ad716d2cac05 Reviewed-on: http://gerrit.cloudera.org:8080/8154 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-30 00:50:43 +00:00
Alex Behm	c1781b73b3	Move tests related to the old join node. No tests were added/dropped or modified. They are consolidated into fewer .test files. Change-Id: Idda4b34b5e6e9b5012b177a4c00077aa7fec394c Reviewed-on: http://gerrit.cloudera.org:8080/8153 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-28 18:36:17 +00:00

1 2 3 4 5 ...

1374 Commits