impala

mirror of https://github.com/apache/impala.git synced 2026-01-08 12:02:54 -05:00

Author	SHA1	Message	Date
Taras Bobrovytsky	f810458ca4	IMPALA-6231: Implement decimal_v2 fuzz test Implement a test that generates random decimal numbers in the pytest framework, performs a random mathemtaical operation in Impala and verifies that the result is correct by doing the same operating using the Python decimal module. We try to generate not only completely random decimal numbers, but also numbers that have interesting properties, such as the number being a power of two. Change-Id: I4328125de5c583ec8ead1f78d9a08703b18b2d85 Reviewed-on: http://gerrit.cloudera.org:8080/8898 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Zach Amsden <zamsden@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-10 03:03:52 +00:00
aphadke	38461c524f	IMPALA-5052: Read and write signed integer logical types in Parquet This patch maps a signed integer logical type in parquet to a supported Impala column type. This change introduces the following mapping - INT_8 -> TINYINT INT_16 -> SMALLINT INT_32 -> INT INT_64 -> BIGINT Also, added a parquet file with the following schema for testing - schema { optional int32 id; optional int32 tinyint_col (INT_8); optional int32 smallint_col (INT_16); optional int32 int_col; optional int64 bigint_col; } Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Reviewed-on: http://gerrit.cloudera.org:8080/8548 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Tianyi Wang <twang@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-09 04:55:59 +00:00
Bharath Vissapragada	6a87eb20a5	IMPALA-6348: Redact only sensitive fields in runtime profiles Without this patch, redaction is applied to every field in the runtime profile. This approach has an undesired side effect when Kerberos auth + email redaction is in place. Since the redaction applies to every field, even principals (from Connected/Delegated User fields) are redacted, as the Kerberos principal format generally pattern matches with an email redactor template. This is particularly problematic for monitoring tools that consume runtime profiles and use these fields to group the queries by user. This patch fixes the problem by redacting only the following sensitive fields. - Query Statement - Error logs (since they can contain column references etc.) - Query Status - Query Plan Other fields in the runtime profile are left unredacted. Change-Id: Iae3b6726009bf458a7ec73131e5d659b12ab73cf Reviewed-on: http://gerrit.cloudera.org:8080/8934 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-06 22:54:17 +00:00
Zoltan Borok-Nagy	ce65b43d47	IMPALA-2248: Make idle_session_timeout a query option This commit makes idle_session_timeout a query option. idle_session_timeout currently can be set as a command line option, which will be the default timeout for sessions. HS2 sessions can override it with a smaller value by setting it in the configuration overlay of HS2 OpenSession(). However, we can't override idle_session_timeout for JDBC/ODBC connections, because we cannot put this in the connection string. This commit is a workaround for this problem, it allows JDBC/ODBC connections to set the session timeout as a query option with the SET statement. After this commit, the session timeout can be overridden to any value, i.e. the command line flag idle_session_timeout doesn't limit this option anymore. I created an automated test case in JdbcTest.java based on test_hs2.py::test_concurrent_session_mixed_idle_timeout. I also extended the test_session_expiration and test_set_and_unset test suites. Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e Reviewed-on: http://gerrit.cloudera.org:8080/8490 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-06 01:47:47 +00:00
Tim Armstrong	d3ff67b8b3	IMPALA-6370: fix partitioned parquet tables with nested types When materialising a nested collection, has_template_tuple() should use the template tuple for the collection, not the top-level tuple. Testing: Added tests based on nested-types-basic.test that operate on a simple partitioned table. The tests reliably crashed Impala before the fix. Change-Id: Ic808b824ce3b31af0539036d8ca23d17b18deab4 Reviewed-on: http://gerrit.cloudera.org:8080/8947 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-05 20:44:21 +00:00
Gabor Kaszab	7810d1f9a2	IMPALA-6318: Adjustment for hanging query cancellation test Apparently test_query_cancellation_during_fetch hangs occasionally in Jenkins builds. The Impala debug page shows the query being cancelled, however, on the host the ImpalaShell process related to that query is still running. Since I had no luck in reproducing the issue locally I only have a theory what might be going on here: The query is cancelled successfully on Impala backend and when the test tries to get the stdout and stderr from the ImpalaShell it gets stuck. It might be the case that ImpalaShell process fetching the query results holds the stdout. According to the documentation of subprocess.communicate() it may cause issues to fetch data when the data size is large or unlimited, that we can consider to be the case here. As a workaround there is a new optional parameter to util.ImpalaShell to omit the stdout because this test wouldn't use it anyway and we get rid of fetching the large result from ImpalaShell. Change-Id: I082c83b91b6d0c527de92c7992f0dc9d1b290433 Reviewed-on: http://gerrit.cloudera.org:8080/8852 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-03 20:32:24 +00:00
Taras Bobrovytsky	a16fe803ca	IMPALA-5014: Part 1: Round when casting string to decimal In this patch we implement rounding when casting string to decimal if DECIMAL_V2 is enabled. The backend method that parses strings and converts them to decimals is refactored to make it easier to understand. Testing: - Added some BE tests. Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Reviewed-on: http://gerrit.cloudera.org:8080/8774 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-22 11:39:08 +00:00
Zoram Thanga	b581a9d1ee	IMPALA-6225: Part 2: Query profile date-time strings should have ns precision. This commit follows `16d8dd58`. This patch adds a test case that inspects the thrift profile of a completed query, and verifies that the "Start Time" and "End Time" of the query have nanosecond precision. We chose to work with the thrift profile directly, rather than parse the debug web page, as it is the thrift profile which is consumed by management API clients of Impala. Change-Id: Id3421a34cc029ebca551730084c7cbd402d5c109 Reviewed-on: http://gerrit.cloudera.org:8080/8784 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-21 04:26:33 +00:00
Alex Behm	1f7b3b00e9	IMPALA-5310: Part 3: Use SAMPLED_NDV() in COMPUTE STATS. Modifies COMPUTE STATS TABLESAMPLE to use the new SAMPLED_NDV() function. Testing: - modified/improved existing functional tests - core/hdfs run passed Change-Id: I6ec0831f77698695975e45ec0bc0364c765d819b Reviewed-on: http://gerrit.cloudera.org:8080/8840 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-16 04:58:59 +00:00
Jinchul	bfbcd1fe86	IMPALA-4664: Unexpected string conversion in Shell Impala shell can accidentally convert certain literal strings to lowercase. Impala shell splits each command into tokens and then converts the first token to lowercase to figure out how it should execute the command. The splitting is done by spaces only. Thus, if the user types a TAB after the SELECT, the first token after the split becomes the SELECT plus whatever comes after it. Testing: TestImpalaShellInteractive.test_case_sensitive_command TestImpalaShellInteractive.test_unexpected_conversion_for_literal_string_to_lowercase TestImpalaShell.test_var_substitution Change-Id: Ifdce9781d1d97596c188691b62a141b9bd137610 Reviewed-on: http://gerrit.cloudera.org:8080/8762 Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-15 21:32:20 +00:00
stiga-huang	5c593be59c	IMPALA-6301: Fix test failures when username or group name contains dots Some tests use the local user's group name to construct SQLs, which may lead to syntax errors when group name contains dots. We need to quote the group names in SQL to avoid this error. Besides, a test in test_admission_controller uses '\w+' to match the local user name. This expression cannot match usernames with dots, which causes test failure as well. Instead, we should use '\S+'. Change-Id: Ib8ae15bb6a929dc48d3ad2176c8b3fafff87f32b Reviewed-on: http://gerrit.cloudera.org:8080/8807 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-13 23:06:45 +00:00
Philip Zeyliger	2fcbf36c32	IMPALA-6270: remove redundant version properties Removes properties that are already defined in the impala-parent pom. I ran the tests. Change-Id: I6812e11bb41716450ef29bb523773479e9f76eec Reviewed-on: http://gerrit.cloudera.org:8080/8827 Reviewed-by: Zach Amsden <zamsden@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-13 22:48:10 +00:00
Jinchul	4feb4f3a54	IMPALA-5754: Improve randomness of rand()/random() Currently implementation of rand/random built-in functions use rand_r of C library. We recognized its randomness was poor. pcg32 of third party library shows better randomness than rand_r. Testing: Revise unit test in expr-test Add E2E test to random.test Change-Id: Idafdd5fe7502ff242c76a91a815c565146108684 Reviewed-on: http://gerrit.cloudera.org:8080/8355 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-12-13 10:04:40 +00:00
Alex Behm	0936e32966	IMPALA-5310: Part 2: Add SAMPLED_NDV() function. Adds a new SAMPLED_NDV() aggregate function that is intended to be used in COMPUTE STATS TABLESAMPLE. This patch only adds the function itself. Integration with COMPUTE STATS will come in a separate patch. SAMPLED_NDV() estimates the number of distinct values (NDV) based on a sample of data and the corresponding sampling rate. The main idea is to collect several x/y data points where x is the number of rows and y is the corresponding NDV estimate. These data points are used to fit an objective function to the data such that the true NDV can be extrapolated. The aggregate function maintains a fixed number of HyperLogLog intermediates to compute the x/y points. Several objective functions are fit and the best-fit one is used for extrapolation. Adds the MPFIT C library to perform curve fitting: https://www.physics.wisc.edu/~craigm/idl/cmpfit.html The library is a C port from Fortran. Scipy uses the Fortran version of the library for curve fitting. Testing: - added functional tests - core/hdfs run passed Change-Id: Ia51d56ee67ec6073e92f90bebb4005484138b820 Reviewed-on: http://gerrit.cloudera.org:8080/8569 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-12 22:20:18 +00:00
Philip Zeyliger	d2fe9f437e	IMPALA-6270: create Impala parent pom This commit links together all the individual pom.xml files to have a new "impala-parent" pom as the parent. This enables de-duplicating all the repository configuration. I ran the build to test this. Change-Id: Id744e4357ee4d8e4be4e5490b2159bb76a2192f0 Reviewed-on: http://gerrit.cloudera.org:8080/8753 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-12 04:30:15 +00:00
Zach Amsden	245df3c69a	IMPALA-6245: Tolerate column indenting from Hive The fix for HIVE-3140 started indenting multi-line comments, which breaks Impala testing when run against Hive 2.1.1. To test this using the pure test runner proved difficult since it would require extensive changes to support both row_regexes (since the columns changed order) and subset support (since the number of rows changed). Instead, we manually verify the hints are present in the output in the python test. The fact that the hints have been reformatted leaves us in an uncertain state as to whether they actually get applied, so a new test case has been added to run EXPLAIN SELECT on the view and verify the joins happen exactly as we expect. Testing: Ran the views-ddl test against Impala mini-cluster setups using both Hive 2.1.1 and Hive 1.1.0 Change-Id: I49e53b1230520ca6e850af28078526e6627d69de Reviewed-on: http://gerrit.cloudera.org:8080/8719 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-12 00:17:56 +00:00
Thomas Tauber-Marshall	b4cf5f2174	IMPALA-6298: Skip test_profile_fragment_instances on local filesystem test_profile_fragment_instances was recently added to verify that the final runtime profile for a query has the expected fragments and exec nodes. The test fails on local filesystem builds, though, as it assumes there will be 3 impalads and therefore 3 fragment instances, but there is only 1 impalad on local filesystem builds. The fix is to disable the test on local filesystem builds. Change-Id: I2c98f160406081626f17709809b8efee9eae1450 Reviewed-on: http://gerrit.cloudera.org:8080/8809 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-11 21:45:43 +00:00
Thomas Tauber-Marshall	f3fa3e017f	IMPALA-6081: Fix test_basic_filters runtime profile failure test_basic_filters has been occasionally failing due to a line missing from a runtime profile for a particular query. The problem is that the query returns all of its results before all of its fragment instances are finished executing (due to a limit). Then, when one fragment instance reports its status, the coordinator returns to it a 'cancelled' status, causing all remaining instances for that backend to be cancelled. Sometimes this cancellation happens quickly enough that the relevant fragment instances have not yet sent a status report when they are cancelled. They will still send a report in finalize, but as the coordinator only updates its runtime profile for 'ok' status reports, not 'cancelled', the final runtime profile doesn't end up with any data for those fragment instances, which means the test does not find the line in the runtime profile its checking for. The fix is to have the coordinator update its runtime profile with every status report it recieves, regardless of error status. Testing: - Ran existing runtime profile tests, which rely on profile output, in a loop. - Manually tested some scenarios with failed queries and checked that the new profile output is reasonable. - Added a new e2e test that runs the affected query and checks for the presence of info for all expected exec node in the profile. This repros the underlying issue consistently. Change-Id: I4f581c7c8039f02a33712515c5bffab942309bba Reviewed-on: http://gerrit.cloudera.org:8080/8754 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-07 21:07:02 +00:00
Michael Ho	ed72910e96	IMPALA-6262: Always initialize runtime profile for DataSink This change moves the creation of the runtime profile from DataSink::Prepare() to the ctor of DataSink derived classes. This makes sure that DataSink::Close() and other functions can access the profile even if the DataSink fails to initialize. Testing done: Added a test case which triggers failure in the initialization of output expressions in a HdfsTableSink. Impalad crashed consistently without the fix. Change-Id: I2a683000ef180027b929dbebe78bc2a530a4767e Reviewed-on: http://gerrit.cloudera.org:8080/8770 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-07 09:47:09 +00:00
Zoltan Borok-Nagy	d52fa75cb9	IMPALA-3804: Re-enable per-scan filtering for sequence-based scanners IMPALA-3798 disabled per-scan filtering for sequence- based scanners due to a race between runtime filter arrival and header splits processing. This commit enables per-scan filtering again for the sequence based files. In HdfsScanNode::ProcessSplit() we check if the current range is the header of a sequence file. If so, and the filters reject the file, the whole file skipped. If it is not a sequence header, but the filters reject the partition, we call RangeComplete() on the current scan range. Change-Id: I4b38c26bcbe67f83efcc65a1723d766626ae3d3e Reviewed-on: http://gerrit.cloudera.org:8080/8684 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-07 07:13:29 +00:00
Tianyi Wang	c505a8159b	IMPALA-6210: Add query id to lineage graph logging Some tools use lineage graph logging to collect query metrics. Currently only query hash is present in this log. Adding query id into it makes such accounting easier. Testing: The equality of query id in the query profile and lineage log is checked in test_lineage.py. A test for TUniqueIdUtil is added to the FE tests. Change-Id: I4adbd02df37a234dbb79f58b7c46ca11a914229f Reviewed-on: http://gerrit.cloudera.org:8080/8589 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-06 00:52:19 +00:00
Gabor Kaszab	75121819be	IMPALA-6265 Query cancellation test enhancements In the query cancellation tests it is essential to wait until the query gets to a desired state (waiting_to_finish, fetching) and then cancel it. Apparently, ASAN query execution happens slower than on a Release build. As a result a hard coded timeout threshold is not sufficient to cover all the builds, or should be set to a wastingly high value. As a solution the query state is checked on the Impala debug page in intervals until it reaches the desired state or the maximum retry attempt value is reached. Change-Id: Ie0bff485a872df7be8efd784314a6ca91aaadd11 Reviewed-on: http://gerrit.cloudera.org:8080/8713 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-05 21:40:11 +00:00
Vuk Ercegovac	633dbff71d	IMPALA-1422: support a constant on LHS of IN predicates. Currently, constant expressions for the LHS of the IN predicate are not supported. This patch adds this support as a rewrite in StmtRewriter (where subqueries are rewritten to joins). Since there is a nested-loop variant of left semijoin, support for IN is handled by not erring out. NOT IN is handled by a rewrite to corresponding NOT EXISTS predicate. Support for NOT IN with a correlated subquery is not included in this change. Re-organized the frontend subquery analysis tests to expand coverage. Testing: - added frontend subquery analysis tests - added e2e tests Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb Reviewed-on: http://gerrit.cloudera.org:8080/8322 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-02 04:09:05 +00:00
Taras Bobrovytsky	575b5a20e6	IMPALA-5017: Error on decimal overflow Before this patch, decimal operations would either silently overflow (in the case of sum() and avg()), or produce a warning. In this patch, the behaviour is changed so that an error is produced in the case of overflow when DECIMAL_v2 is enabled. Decimal v1 behaviour is unchanged. We introduce overflow checks when computing sum() and avg(). This results in a ~30% performance regression when we are in decimal v2 mode compared to decimal v1. Benchmarks: Query: select sum(dec_38_19) from decimal_tbl Decimal v1: 11.57s Decimal v2: 16.58s Query: select avg(dec_38_19) from decimal_tbl Decimal v1: 12.08s Decimal v2: 17.08s The performance regression is not as bad if we are computing the sum or average of decimal column with a lower precision: Query: select sum(dec_9_5) from decimal_tbl Decimal v1: 11.06s Decimal v2: 13.08s Query: select avg(dec_9_5) from decimal_tbl Decimal v1: 11.56s Decimal v2: 13.57s Testing: - Added several end to end tests. - Updated Expr tests to check for error in case of overflow. Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019 Reviewed-on: http://gerrit.cloudera.org:8080/8404 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-01 23:23:01 +00:00
Lars Volker	ea8d2ba7f6	IMPALA-6255: Add device names to DiskIoMgr thread names This change adds device names to the DiskIoMgr thread names. It will make them easier to identify during debugging. Change-Id: I30faeda6db8846e4aad64ce29ca811366d84910b Reviewed-on: http://gerrit.cloudera.org:8080/8669 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-01 05:51:59 +00:00
Alex Behm	b3d8a507cb	IMPALA-5310: Add COMPUTE STATS TABLESAMPLE. Adds the TABLESAMPLE clause for COMPUTE STATS. Syntax: COMPUTE STATS <table> TABLESAMPLE SYSTEM(<number>) [REPEATABLE(<number>)] Computes and replaces the table-level row count and total file size, as well as all table-level column statistics. Existing partition-level row counts are not modified. The TABLESAMPLE clause can be used to limit the scanned data volume to a desired percentage. When sampling, the unmodified results of the COMPUTE STATS queries are sent to the CatalogServer. There, the stats are extrapolated before storing them into the HMS so as not to confuse other engines like Hive/SparkSQL which may rely on the shared HMS fields being accurate. Limitations - Only works for HDFS tables - TABLESAMPLE is not supported for COMPUTE INCREMENTAL STATS - TABLESAMPLE requires --enable_stats_extrapolation=true Changes to EXPLAIN The stored statistics from the HMS are more clearly displayed under a 'stored statistics' section. Example: 00:SCAN HDFS [functional.alltypes, RANDOM] partitions=24/24 files=24 size=478.45KB stored statistics: table: rows=7300 size=478.45KB partitions: 24/24 rows=7300 columns: all Testing: - added new functional tests - core/hdfs run passed Change-Id: I7f3e72471ac563adada4a4156033a85852b7c8b7 Reviewed-on: http://gerrit.cloudera.org:8080/8136 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-29 22:37:01 +00:00
Tim Armstrong	72ed4fc887	Update incubator-impala -> impala URLs This fixes push_to_asf.py and various other scripts that had the Apache repo location hard-coded. Also fixed the location of the github mirror and mailing list archives. Testing: Ran push_to_asf.py to check I got the URL right. Checked a couple of the github and mailing list URLs to make sure the new URL is valid. Change-Id: Ie49221300340ef34bdd7c01670c35bdbbce3e84f Reviewed-on: http://gerrit.cloudera.org:8080/8685 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Reviewed-by: Jim Apple <jbapple-impala@apache.org> Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-29 20:58:50 +00:00
Gabor Kaszab	6d9da17288	IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C Issue 1: When query is cancelled via CTRL-C while being executed in Impala-shell then an exception is thrown from Impala backend saying 'Invalid query handle'. This is because one ImpalaClient was making RPC's while another ImpalaClient cancelled the query on the backend. As a result RPC handlers in ImpalaServer try to access a ClientRequestState that had been cleared from the backend. The issue is confidently reproducable both in wait_to_finish and in fetch states of the query. As a solution the query cancellation is indicated to ImpalaClient via a bool flag. Once a cancellation originated exception reaches Impala shell this flag is checked to decide whether to suppress the error or not. Issue 2: Every time a query was cancelled a 'use db' command was issued automatically. This happened to historical reasons but is not needed anymore (see Jira for more details). Change-Id: I6cefaf1dae78baae238289816a7cb9d210fb38e2 Reviewed-on: http://gerrit.cloudera.org:8080/8549 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-29 03:44:51 +00:00
Tim Armstrong	dc1282fbc9	IMPALA-6241: timeout in admission control test under ASAN The fix for IMPALA-6241 is to increase the timeout for all slow builds. While testing that fix, I discovered that the ASAN build detection logic was failing silently, resulting in it assuming that it was testing a DEBUG build. The error was: Unexpected DW_AT_name in first CU: /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-ubuntu-16-04/toolchain/source/llvm/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_preinit.cc; choosing DEBUG The fix for that issue is to remove the build type detection heuristic and instead just write a file with the build type as part of the build process. Testing: Before this change I was able to reproduce locally every 5-10 test iterations. After this change I haven't seen it reproduce. Change-Id: Ia4ed949cac99b9925f72e19e4adaa2ead370b536 Reviewed-on: http://gerrit.cloudera.org:8080/8652 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-29 03:28:22 +00:00
Dimitris Tsirogiannis	a88c3b9c52	Revert "IMPALA-5538: Use explicit catalog versions for deleted objects" This reverts commit `dd340b8810`. This commit caused a number of issues tracked in IMPALA-6001. The issues were due to the lack of atomicity between the catalog version change and the addition to the delete log of a catalog object. Conflicts: be/src/service/impala-server.cc Change-Id: I3a2cddee5d565384e9de0e61b3b7d0d9075e0dce Reviewed-on: http://gerrit.cloudera.org:8080/8667 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-29 02:19:50 +00:00
Thomas Tauber-Marshall	7dd28ff431	IMPALA-6201: Fix test_basic_filters on ASAN TestRuntimeFilters.test_basic_filters is flaky on ASAN as sometimes the runtime filters aren't recieved within the specified RUNTIME_FILTER_WAIT_TIME_MS. This patch increases the timeout for ASAN builds. Change-Id: I8c20cbb75a9b6da73137f220657aa75dea9dfdce Reviewed-on: http://gerrit.cloudera.org:8080/8646 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-28 03:01:39 +00:00
Gabor Kaszab	88cb68cfbe	IMPALA-2181: Add query option levels for display Four display levels are introduced for each query option: REGULAR, ADVANCED, DEVELOPMENT and DEPRECATED. When the query options are displayed in Impala shell using SET then only the REGULAR and ADVANCED options are shown. A new command called SET ALL shows all the options grouped by their option levels. When the query options are displayed through the SET SQL statement then the result set would contain an extra column indicating the level of each option. Similarly to Impala shell here the SET command only diplays the REGULAR and ADVANCED options while SET ALL shows them all. If the Impala shell connects to an Impala daemon that predates this change then all the options would be displayed in the REGULAR group. Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 Reviewed-on: http://gerrit.cloudera.org:8080/8447 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-28 00:31:15 +00:00
Thomas Tauber-Marshall	abd9b0e70a	IMPALA-4591: Bound Kudu client error mem usage Previously, Kudu client errors could grow in size unbounded, potentially causing the process to be killed. This patch sets a bound on the mem that can be used for these error messages, with the size determined by the flag 'kudu_error_buffer_size'. If the errors for a Kudu client exceed this size, the query will fail, as some errors will be dropped and we won't be able to tell if all of the errors can be safely ignored. Testing: - Added a custom cluster test that verifies that a query that exceeds the limit fails. Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14 Reviewed-on: http://gerrit.cloudera.org:8080/8464 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-27 22:28:37 +00:00
Vuk Ercegovac	628f19ed0b	IMPALA-6092: avoid drop/create function interactions in e2e tests The e2e unit tests for udfs can interact via the backend lib_cache, causing test flakes. IMPALA-6215 explains a race between the lib_cache and UdfExecutor in the frontend which is the likely the root cause. Two e2e tests use the same jar (test_java_udfs and test_udf_invalid_symbol), test_udf_invalid_symbol drops a function from that jar, which causes the use of that jar to fail in the test_java_udfs test. Since the state of lib_cache is per process, its state causes these interactions across unit tests. This change avoids the interactions by using separate jars for the separate tests. Change-Id: Ica3538788b1d2ab5e361261e2ade62780b838e65 Reviewed-on: http://gerrit.cloudera.org:8080/8593 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-27 21:20:20 +00:00
Tim Armstrong	1a7b0d0bdc	IMPALA-6227: deflake admission stress tests The problem was that, during the initial admission decision phase, some queries were initially queued then dequeued once memory came available. All of the accounting in the test implicitly relies on queries not being dequeued until queries are later explicitly ended, so if this happened, the test broke in multiple subtle ways. This happened because the query only scanned a small number of rows, which could be all buffered on the receiver side of the exchange even before the client fetched any rows from the coordinator. This means that the reserved memory on some backends could increase then decrease during the initial admission phase, resulting in a query being queued then dequeued. The fix is to increase the number of rows returned by the query so that all fragments remain active during the initial admission phase. This increased test execution time somewhat, so I also had to bump the queue wait timeout for the admission stress tests (they assume that queries don't time out in the queue). Testing: Ran the test under debug, release and ASAN builds, i.e. impala-py.test tests/custom_cluster/test_admission_controller.py \ --workload_exploration_strategy="functional-query:exhaustive" I looped the mem_limit test for a while to confirm it didn't reproduce (it reproduced reliably every 2-3 iterations before this fix). It still reproduces every 5-10 runs with exhaustive+release, so I need to do further work to make it more robust. Change-Id: Iafb3af0ce68f96e5d713dbb3b37dd0b50ea66bb4 Reviewed-on: http://gerrit.cloudera.org:8080/8631 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-23 07:48:18 +00:00
Vuk Ercegovac	21a96ed2e3	IMPALA-4985: use parquet stats of nested types for dynamic pruning Currently, parquet row-groups can be pruned at run-time using min/max stats when predicates (in, binary) are specified for column scalar types. This patch extends pruning to nested types for the same class of predicates. A nested value is an instance of a nested type (struct, array, map). A nested value consists of other nested and scalar values (as declared by its type). Predicates that can be used for row-group pruning must be applied to nested scalar values. In addition, the parent of the nested scalar must also be required, that is, not empty. The latter requirement is conservative: some filters that could be used for pruning are not used for correctness reasons. Testing: - extended nested-types-parquet-stats e2e test cases. Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe Reviewed-on: http://gerrit.cloudera.org:8080/8480 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-22 22:00:16 +00:00
Tim Armstrong	7487c5de04	IMPALA-1575: part 2: yield admission control resources This change releases admission control resources more eagerly, once the query has finished actively executing. Some resources (tracked and untracked) are still consumed by the client request as long as it remains open, e.g. memory for control structures and the result cache. However, these resources are relatively small and should not block admission of new queries. The same as in part 1, query execution is considered to be finished under any of the following conditions: 1. The query encounters an error and fails 2. The query is cancelled due to the idle query timeout 3. The query reaches eos (or the DML completes) 4. The client cancels the query without closing the query Admission control resources are released in two ways: 1. by calling AdmissionController::ReleaseQuery() on the coordinator promptly after query execution finishes, instead of waiting for UnregisterQuery(). This means that the query and its memory is no longer considered "admitted". 2. by changing the behaviour of MemTracker::GetPoolMemReserved() so that it is aware of when a query has finished executing and does not consider its entire memory limit to be "reserved". The preconditions for releasing an admitted query are subtle because the queries are being admitted to a distributed system, not just the coordinator. The comment for ReleaseAdmissionControlResources() documents the preconditions and rationale. Note that the preconditions are not weaker than the preconditions of calling UnregisterQuery() before this patch. Testing: TestAdmissionController is extended to end queries in four ways: cancellation by client, idle timeout, the last row being fetched, and the client closing the query. The test uses a mix of all four. After the query ends, all clients wait for the test to complete before closing the query or closing the connection. This ensures that the admission control decisions are based entirely on the query end behavior. This test works for both query admission control and mem_limit admission control and can detect both kinds of admission control resources ("admitted" and "reserved") not being released promptly. I ran into a problem similar to IMPALA-3772 with the admission control tests becoming flaky due to query timeouts on release builds, which I solved in a similar way by increasing the frequency of statestore updates. This is based on an earlier patch by Joe McDonnell. Change-Id: Ib1fae8dc1c4b0eca7bfa8fadae4a56ef2b37947a Reviewed-on: http://gerrit.cloudera.org:8080/8581 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-20 04:34:47 +00:00
Lars Volker	10334260d8	IMPALA-6109: xfail TestHdfsUnknownErrors::test_hdfs_safe_mode_error_255 The test puts the HDFS name node into safe mode to trigger an "Unknown Error 255" and verifies that the error details can be obtained correctly via the libHDFS API. However, putting the name node into safe mode can trip up HBase (HBASE-18738), which causes sporadic failures of our other HBase tests. To prevent this, we xfail the test until the HBase issue has been addressed (or we find a better way to trigger a 255 error). IMPALA-6212 tracks re-enabling the test in the future. Change-Id: I55979bed07147409949b798d4beb7a3b3b7ec5c3 Reviewed-on: http://gerrit.cloudera.org:8080/8590 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-18 03:22:27 +00:00
Thomas Tauber-Marshall	2510fe0aa0	IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter or a min-max filter, depending on if it has HDFS or Kudu targets, respectively. In RuntimeFilterGenerator in the planner, each hash join node generates a bloom and min-max filter for each equi-join predicate, but only those filters that end up being assigned to a target make it into the final plan. Min-max filters are only assigned to Kudu scans if the target expr is a column, as Kudu doesn't support bounds on general exprs, and only if the join op is '=' and not 'is distinct from', as Kudu doesn't support returning NULLs if a bound is set. Min-max filters are inserted into by the PartitionedHashJoinBuilder. Codegen is used to eliminate branching on the type of filter. String min-max filters truncate their bounds at 1024 chars, so that the max amount of memory used by min-max filters is negligible. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Functional Testing: - Added new planner tests and updated the old ones. (in old tests, a lot of runtime filters are renumbered as we always generate min-max filters even if they don't end up getting assigned and they take up some of the RF ids). - Updated existing runtime filter tests to work with Kudu. - Added e2e tests for min-max filter specific functionality. Perf Testing: - All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu, timings are averages of 3 runs. - Ran a contrived query with a filter that does not eliminate any rows (full self join of lineitem). The difference in running time was negligible - 24.46s with filters on, 24.15s with filters off for a ~1% slowdown. - Ran a contrived query with a filter that elimiates all rows (self join on lineitem with a join condition that never matches). The filters resulted in a significant speedup - 0.26s with filters on, 1.46s with filters off for a ~5.6x speedup. This query is added to targeted-perf. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Reviewed-on: http://gerrit.cloudera.org:8080/7793 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-17 21:33:51 +00:00
Tim Armstrong	ae116b5bf7	IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding Switch the decoders to using more batch-oriented interfaces. As an intermediate step this doesn't make the interfaces of LevelDecoder or DictDecoder batch-oriented, only the lower-level utility classes. The next step would be to change those interfaces to be batch-oriented and make according optimisations in parquet. This could deliver much larger perf improvements than the current patch. The high-level changes are. * BitReader -> BatchedBitReader, which is built to unpack runs of 32 bit-packed values efficiently. * RleDecoder -> RleBatchDecoder, which exposes the repeated and literal runs to the caller and uses BatchedBitReader to unpack literal runs efficiently. * Dict decoding uses RleBatchDecoder to decode repeated runs efficiently and uses the BitPacking utilities to unpack and encode in a single step. Also removes an older benchmark that isn't too interesting (since the batch-oriented approach to encoding and decoding is so much faster than the value-by-value approach). Testing: * Ran core tests. * Updated unit tests to exercise new code. * Added test coverage for the deprecated bit-packed level encoding to that it still works (there was no coverage previously). Perf: Single-node benchmarks showed a few % performance gain. 16 node cluster benchmarks only showed a gain for TPC-H nested. Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6 Reviewed-on: http://gerrit.cloudera.org:8080/8267 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-16 21:23:09 +00:00
Philip Zeyliger	155bb77649	Remove unused/defunct Maven repositories. Removes three Maven repositories. davidtrott and codehaus both don't exist any more, so they're not doing anyone any good. (We had previously cleaned up Codehaus in IMPALA-5224, but a reference was resurrected.) The libphonenumber repo was simply misconfigured: the library exists in Maven central in the "normal" place, and a subdirectory repo is unnecessary. To test this, I ran "buildall" after removing ~/.m2/ on my machine. Change-Id: I79eb6c483561726c7cbaf86874001f1979128720 Reviewed-on: http://gerrit.cloudera.org:8080/8497 Tested-by: Impala Public Jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2017-11-16 15:51:48 +00:00
Vuk Ercegovac	6769220e28	IMPALA-6198: marks a test as debug-only The test_catalog_wait test uses flags that are only compiled for debug binaries. This change marks the test as debug-only so that it does not break release tests. Change-Id: I92640b8192545cccea0411c04cc5fcf59fbefed0 Reviewed-on: http://gerrit.cloudera.org:8080/8573 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-16 12:04:17 +00:00
Tim Armstrong	9502e21d03	IMPALA-6188: make test_top_n_reclaim less flaky Testing: Previously I needed ~20 iterations to get the test to fail on my local machine. After these changes I haven't been able to reproduce the failure Change-Id: I2bea7b0f770dec362a6df075da4e340402bd1d5d Reviewed-on: http://gerrit.cloudera.org:8080/8562 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-11-16 04:57:39 +00:00
Zoltan Borok-Nagy	6539e89c81	IMPALA-2235: Fix current db when shell auto-reconnects The ImpalaShell didn't issue the 'USE <current-db>' command after reconnecting to the Impala daemon. Therefore the client session used the default DB after reconnection, not the previously selected DB. Setting the current DB is done by the _validate_database method. Before this commit it appended the "use <db>" command to the command queue of the Cmd class. But, at this point we might already have commands in the command queue that will run before the "use <db>" command. In case of reconnection, we want to invoke the USE command right away. Also, the command processed by the precmd() method can entirely skip the command queue, therefore it is not enough to insert the USE command to the front of the command queue. We need to issue the USE command with the onecmd() method to execute it immediately. I extended the _validate_database method with an "immediately" flag. If this flag is true, _validate_database will use the onecmd() method. Otherwise, it will append the USE command to the command queue to maintain the previous behaviour. I added a new automated test suite named test_shell_interactive_reconnect.py to the "custom cluster" tests. It sets the default database, and after reconnection it checks if the shell set it again automatically. One test case checks if the shell set the DB after manually reconnecting to the impala daemon by issuing the CONNECT command. The other test case checks if the shell set the DB after automatic reconnection due to cluster restart. I needed to backup the impala shell history file because I didn't want to pollute it by the test cases (just like the way it is done in tests/shell/test_shell_interactive.py). I created utility functions for this in tests/shell/util.py and now test_shell_interactive.py and the newly created test suite are using these utility functions. Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e Reviewed-on: http://gerrit.cloudera.org:8080/8368 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-15 22:42:22 +00:00
Tim Wood	a8c123b55a	IMPALA-6160: Allow multiple statements in a Query object. Testing: - Reproduced problem with bin/run-workload.py. - Ran bin/run-workload.py --workloads=tpch,targeted-perf,tpcds --impalads=localhost:21000,localhost:21001,localhost:21002 --results_json_file=$PWD/perf_results/IMPALA-6160.json --query_iterations=3 --table_formats=parquet/none --plan_first --query_names='.*' (Close to command line that single_node_perf_run.py builds.) - Manually reviewed perf_results/IMPALA-6160.json to verify presence of plans and proper splitting of query batches. Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297 Reviewed-on: http://gerrit.cloudera.org:8080/8513 Tested-by: Impala Public Jenkins Reviewed-by: Jim Apple <jbapple-impala@apache.org>	2017-11-15 19:38:30 +00:00
Vuk Ercegovac	6a2b7a64fb	IMPALA-4704: Turns on client connections when local catalog initialized. Currently, impalad starts beeswax and hs2 servers even if the catalog has not yet been initialized. As a result, client connections see an error message stating that the impalad is not yet ready. This patch changes the impalad startup sequence to wait until the catalog is received before opening beeswax and hs2 ports and starting their servers. Testing: - python e2e tests that start a cluster without a catalog and check that client connections are rejected as expected. Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6 Reviewed-on: http://gerrit.cloudera.org:8080/8202 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-13 21:14:14 +00:00
Tianyi Wang	fdf94a4003	IMPALA-6164: Fix stale query profile in TestAlwaysFalseFilter TestAlwaysFalseFilter gets the query profile without fetching all the rows, resulting in a stale query profile and failing the test. With this patch all the rows are fetched before getting the query profile. This is enough to get the final profile because the query profile finalization is performed in Coordinator::GetNext after we hit eos. A bug in Base64Decode related to query profile decoding is also fixed. Currently Base64Decode may produce incorrect output length if the output parameter is not initialized with 0. Testing: TestAlwaysFalseFilter is run and passes 1000 times. It doesn't pass 1000 times consecutively without this patch. Change-Id: I04bb76d20541fa035d88167b593d1b8bc3873e89 Reviewed-on: http://gerrit.cloudera.org:8080/8498 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-10 04:27:59 +00:00
Thomas Tauber-Marshall	3a1073c87c	IMPALA-6173: Fix SHOW CREATE TABLE for unpartitioned Kudu tables IMPALA-5546 added the ability to create unpartitioned Kudu tables, but when SHOW CREATE TABLE is run on it still prints 'PARTITION BY' just without a partition clause. This patch removes the 'PARTITION BY' from the output. Testing: - Added test that runs SHOW CREATE on an unpartitioned Kudu table. Change-Id: Icc327266cfb8b5c05efec97348528cea6904bb20 Reviewed-on: http://gerrit.cloudera.org:8080/8506 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-09 23:59:13 +00:00
Tim Armstrong	a772f84562	IMPALA-6171: Revert "IMPALA-1575: part 2: yield admission control resources" This reverts commit `fe90867d89`. Change-Id: I3eec4b5a6ff350933ffda0bb80949c5960ecdf25 Reviewed-on: http://gerrit.cloudera.org:8080/8499 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-08 22:03:59 +00:00
Tim Armstrong	bf9c2f521f	IMPALA-6151: add query-level fragment/backend counters This adds NumBackends, NumFragments and NumFragmentInstances counters to the query profile to make it easier to manually or programmatically analyse the query. Also add a num-queries-registered metric to track the number of queries that have been executed but are not yet unregistered. Testing: Ran "select count(*) from alltypessmall" and checked profile: Backend startup latencies: Count: 3, min / max: 1ms / 1ms, 25th %-ile: 1ms, 50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 1ms, 95th %-ile: 1ms, 99.9th %-ile: 1ms Per Node Peak Memory Usage: tarmstrong-box:22000(1.10 MB) tarmstrong-box:22001(1.02 MB) tarmstrong-box:22002(1.02 MB) - FiltersReceived: 0 (0) - FinalizationTimer: 0.000ns - NumBackends: 3 (3) - NumFragmentInstances: 4 (4) - NumFragments: 2 (2) Ran some query tests (both beeswax and HS2) and manually checked the num-queries-registered metric on the /metrics page when the queries were running and after they finished. Added the metric to test_metrics_are_zero() to make sure that there are no accounting errors. Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9 Reviewed-on: http://gerrit.cloudera.org:8080/8461 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-07 21:44:34 +00:00

1 2 3 4 5 ...

1363 Commits