impala

mirror of https://github.com/apache/impala.git synced 2026-02-03 00:00:40 -05:00

Author	SHA1	Message	Date
Fang-Yu Rao	efc627d050	IMPALA-10158: Set timezone to UTC for Iceberg-related E2E tests We found that the tests of test_iceberg_query and test_iceberg_profile fail after the patch for IMPALA-9741 has been merged and that it is due to the default timezone of Impala not being UTC. This patch fixes the issue by adding "SET TIMEZONE=UTC;" before those test queries are run. Testing: - Verified in a local development environment that the tests of test_iceberg_query and test_iceberg_profile could pass after applying this patch. Change-Id: Ie985519e8ded04f90465e141488bd2dda78af6c3 Reviewed-on: http://gerrit.cloudera.org:8080/16425 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-09 13:26:42 +00:00
wzhou-code	0c89a9d562	IMPALA-10140: Fix CatalogExeception for creating database with sync_ddl as true IMPALA-7961 handle the cases for query "create table if not exists" with sync_ddl as true. Customers reported similar issue which happened for query "create database if not exists" with sync_ddl as true. This patch adds the similar fixing as the fixing for IMPALA-7961 to function CatalogOpExecutor.createDatabase() to fix the issue. Testing: - Manual tests Since this is a racy bug, I could only reproduce it by forcing frequent topicUpdateLog GCs along with a specific sequence of actions, like: run some DDLs and REFRESHs to trigger a GC in topicUpdateLog, then run query "create database if not exists" with sync_ddl as true. Verified that the issue couldn't be reproduced after applying this patch. - Passed exhaustive test. Change-Id: Id623118f8938f416414c45d93404fb70d036a9df Reviewed-on: http://gerrit.cloudera.org:8080/16421 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-09 06:52:53 +00:00
Qifan Chen	9f51673a40	IMPALA-10129 Data race in MemTracker::GetTopNQueriesAndUpdatePoolStats This work addresses a data race condition in admission controller by providing the initializing values for two data members ( is_query_mem_tracker_ and query_id_) in a constructor for the MemTracker class. Without doing so, the two data members are set, without lock protection, after the object is constructed, which allows other threads to modify either of them at the same time. Testing: 1. Ran the python admission controller test successfully with a tsan build. Data race was not observed with the enhancement. Data race was observed without the enhancement. 2. Ran the core test. Change-Id: I9c4ffe8064d3e099a525cc48c218ef73112fb67b Reviewed-on: http://gerrit.cloudera.org:8080/16408 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-09 04:23:17 +00:00
huangtianhua	6aaea3216c	IMPALA-10090 Pull newest code of native-toolchain before build it If native-toolchain exists we should pull the newest code before build it. Change-Id: I2da3ffce7abb88190be0a5ea0e2cf603f98ee15e Reviewed-on: http://gerrit.cloudera.org:8080/16402 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-08 23:35:41 +00:00
Bikramjeet Vig	fc51cd3bc0	IMPALA-10052: Expose daemon health endpoint for statestore and catalog This change exposes the daemon health of statestored and catalogd via an HTTP endpoint '/healthz'. If the server is healthy, this endpoint will return HTTP code 200 (OK). If it is unhealthy, it will return 503 (Service Unavailable). This is consistent with the endpoint added for impalads in IMPALA-8895. Testing: - Extended test in test_web_pages.py Change-Id: I7714734df8e50dabbbebcb77a86a5a00bd13bf7c Reviewed-on: http://gerrit.cloudera.org:8080/16295 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-08 22:11:23 +00:00
Qifan Chen	1e63722f8d	IMPALA-10124 admission-controller-test fails with no such file or directory error This work addresses a failure by disabling undefined behavior sanitizer testing for AdmissionControllerTest.TopNQueryCheck test. In the test, std::regex_match() is used to verify the appearance of certain strings and can produce a core with very long stack trace failling in std::vector::operator[](). Testing: 1. Ran the test in both regular and disabling undefined behavior sanitizer check modes. No core was seen. Change-Id: I16d6cff8fad8d0e93a24ec3fefa9cc1f8c471aad Reviewed-on: http://gerrit.cloudera.org:8080/16404 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-08 21:33:38 +00:00
skyyws	fb6d96e001	IMPALA-9741: Support querying Iceberg table by impala This patch mainly realizes the querying of iceberg table through impala, we can use the following sql to create an external iceberg table: CREATE EXTERNAL TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg_file_format'='parquet'); Or just including table name and location like this: CREATE EXTERNAL TABLE default.iceberg_test STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg_file_format'='parquet'); 'iceberg_file_format' is the file format in iceberg, currently only support PARQUET, other format would be supported in the future. And if you don't specify this property in your SQL, default file format is PARQUET. We achieved this function by treating the iceberg table as normal unpartitioned hdfs table. When querying iceberg table, we pushdown partition column predicates to iceberg to decide which data files need to be scanned, and then transfer this information to BE to do the real scan operation. Testing: - Unit test for Iceberg in FileMetadataLoaderTest - Create table tests in functional_schema_template.sql - Iceberg table query test in test_scanners.py Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 Reviewed-on: http://gerrit.cloudera.org:8080/16143 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-06 02:12:07 +00:00
Adam Tamas	fe6e625747	IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix While the ds_hll_sketch() generates a string value as output the data is not an ascii encoded text but a bitsketch, because of this, when the shell get this data it disconnect while it tries to decode it. The issue can be reproduced with a simple method like using unhex with a wrong input. Example: SELECT unhex("aa"); This patch contains a solution, where we replace any not UTF-8 decodable characters if we run into an UnicodeDecodeError after fetching it. This solution is working with the Thrift 0.9.3 autogenerated gen-py but still fails with Thrift 0.11.0. For Thrift 0.11.0 the error is catched and an error message is sent (not working with beeswax protocol, because it generates a different error (TypeError) which can come for other reasons too). Testing: -manual testing with these protocols: 'hs2-http', 'hs2', 'beeswax' Change-Id: I0c5f1290356e21aed8ca7f896f953541942aed05 Reviewed-on: http://gerrit.cloudera.org:8080/16418 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>	2020-09-05 09:42:46 +00:00
Csaba Ringhofer	b7965d8240	Revert "IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix" This reverts commit `75146c9138`. Change-Id: I57f790389a8c847877999d2b9b8185939b416c07 Reviewed-on: http://gerrit.cloudera.org:8080/16417 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>	2020-09-04 12:28:56 +00:00
Adam Tamas	75146c9138	IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix While the ds_hll_sketch() generates a string value as output the data is not an ascii encoded text but a bitsketch, because of this, when the shell get this data it disconnect while it tries to decode it. The issue can be reproduced with a simple method like using unhex with a wrong input. Example: SELECT unhex("aa"); This patch contains a solution, where we replace any not UTF-8 decodable characters if we run into an UnicodeDecodeError after fetching it. This solution is working with the Thrift 0.9.3 autogenerated gen-py but still fails with Thrift 0.11.0. For Thrift 0.11.0 the error is catched and an error message is sent (not working with beeswax protocol, because it generates a different error (TypeError) which can come for other reasons too). Testing: -manual testing with these protocols: 'hs2-http', 'hs2', 'beeswax' Change-Id: Ic5cfb907871ca83e5f04a39ca9d7a8e138d711a8 Reviewed-on: http://gerrit.cloudera.org:8080/16305 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>	2020-09-04 12:18:28 +00:00
Daniel Becker	dc08b657e8	IMPALA-7658: Proper codegen for HiveUdfCall Implementing codegen for HiveUdfCall. Testing: Verified that java udf tests pass locally. Benchmarks: Used a UDF from TestUdf.java that adds three integers: create function tpch15_parquet.sum3(int, int, int) returns int location '/test-warehouse/impala-hive-udfs.jar' symbol='org.apache.impala.TestUdf'; Used the following query on the master branch and the change's branch: set num_nodes=1; set mt_dop=1; select min(tpch15_parquet.sum3(cast(l_orderkey as int), cast(l_partkey as int), cast(l_suppkey as int))) from tpch15_parquet.lineitem; Results averaged over 100 runs after warmup: Master: 20.6346s, stddev: 0.3132411856765332 This change: 19.0256s, stddev: 0.42039019873436 This is a ~7.8% improvement. Change-Id: I2f994dac550f297ed3c88491816403f237d4d747 Reviewed-on: http://gerrit.cloudera.org:8080/16314 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-04 00:55:02 +00:00
Tamas Mate	2359a1be9d	IMPALA-10119: Fix impala-shell history duplication test The flaky test was TestImpalaShellInteractive.test_history_does_not_duplicate_on_interrupt The test failed with timeout error when the interrupt signal arrived later after the next test query was started. The impala-shell output was ^C instead of the expected query result. This change adds an additional blocking expect call to wait for the interrupt signal to arrive before sending in the next query. Change-Id: I242eb47cc8093c4566de206f46b75b3feab1183c Reviewed-on: http://gerrit.cloudera.org:8080/16391 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-09-03 18:25:58 +00:00
Adam Tamas	99e5f5a885	IMPALA-10133:Implement ds_hll_stringify function. This function receives a string that is a serialized Apache DataSketches HLL sketch and returns its stringified format. A stringified format should look like and contains the following data: select ds_hll_stringify(ds_hll_sketch(float_col)) from functional_parquet.alltypestiny; +--------------------------------------------+ \| ds_hll_stringify(ds_hll_sketch(float_col)) \| +--------------------------------------------+ \| ### HLL sketch summary: \| \| Log Config K : 12 \| \| Hll Target : HLL_4 \| \| Current Mode : LIST \| \| LB : 2 \| \| Estimate : 2 \| \| UB : 2.0001 \| \| OutOfOrder flag: false \| \| Coupon count : 2 \| \| ### End HLL sketch summary \| \| \| +--------------------------------------------+ Change-Id: I85dbf20b5114dd75c300eef0accabe90eac240a0 Reviewed-on: http://gerrit.cloudera.org:8080/16382 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-03 12:11:48 +00:00
Aman Sinha	5e9f10d34c	IMPALA-10064: Support constant propagation for eligible range predicates This patch adds support for constant propagation of range predicates involving date and timestamp constants. Previously, only equality predicates were considered for propagation. The new type of propagation is shown by the following example: Before constant propagation: WHERE date_col = CAST(timestamp_col as DATE) AND timestamp_col BETWEEN '2019-01-01' AND '2020-01-01' After constant propagation: WHERE date_col >= '2019-01-01' AND date_col <= '2020-01-01' AND timestamp_col >= '2019-01-01' AND timestamp_col <= '2020-01-01' AND date_col = CAST(timestamp_col as DATE) As a consequence, since Impala supports table partitioning by date columns but not timestamp columns, the above propagation enables partition pruning based on timestamp ranges. Existing code for equality based constant propagation was refactored and consolidated into a new class which handles both equality and range based constant propagation. Range based propagation is only applied to date and timestamp columns. Testing: - Added new range constant propagation tests to PlannerTest. - Added e2e test for range constant propagation based on a newly added date partitioned table. - Ran precommit tests. Change-Id: I811a1f8d605c27c7704d7fc759a91510c6db3c2b Reviewed-on: http://gerrit.cloudera.org:8080/16346 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-02 22:57:55 +00:00
Adam Tamas	f9936549dc	IMPALA-10106: Upgrade DataSketches to version 2.1.0 Upgrade the external DataSketches files for HLL/KLL to version 2.1.0 tests: -Ran the tests from tests/query_test/test_datasketches.py Change-Id: I4faa31c0b628a62c7e56a6c4b9549d0aaa8a02ff Reviewed-on: http://gerrit.cloudera.org:8080/16360 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-02 22:06:50 +00:00
Zoltan Borok-Nagy	502e1134be	IMPALA-10071: Impala shouldn't create filename starting with underscore during ACID TRUNCATE When Impala TRUNCATEs an ACID table, it creates a new base directory with the hidden file "_empty" in it. Newer Hive versions ignore files starting with underscore, therefore they ignore the whole base directory. To resolve this issue we can simply rename the empty file to "empty". Testing: * update acid-truncate.test accordingly Change-Id: Ia0557b9944624bc123c540752bbe3877312a7ac9 Reviewed-on: http://gerrit.cloudera.org:8080/16396 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-02 13:29:25 +00:00
Adam Tamas	4cb3c3556e	IMPALA-10108: Implement ds_kll_stringify function This function receives a string that is a serialized Apache DataSketches KLL sketch and returns its stringified format. A stringified format should look like and contains the following data: select ds_kll_stringify(ds_kll_sketch(float_col)) from functional_parquet.alltypestiny; +--------------------------------------------+ \| ds_kll_stringify(ds_kll_sketch(float_col)) \| +--------------------------------------------+ \| ### KLL sketch summary: \| \| K : 200 \| \| min K : 200 \| \| M : 8 \| \| N : 8 \| \| Epsilon : 1.33% \| \| Epsilon PMF : 1.65% \| \| Empty : false \| \| Estimation mode: false \| \| Levels : 1 \| \| Sorted : false \| \| Capacity items : 200 \| \| Retained items : 8 \| \| Storage bytes : 64 \| \| Min value : 0 \| \| Max value : 1.1 \| \| ### End sketch summary \| \| \| +--------------------------------------------+ Change-Id: I97f654a4838bf91e3e0bed6a00d78b2c7aa96f75 Reviewed-on: http://gerrit.cloudera.org:8080/16370 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-02 10:49:10 +00:00
zhaorenhai	0098113d95	IMPALA-10090: Create aarch64 development environment on ubuntu 18.04 Including following changes: 1 build native-toolchain local by script on aarch64 platform 2 change some native-toolchain's lib version number 3 split SKIP_TOOLCHAIN_BOOTSTRAP and DOWNLOAD_CDH_COMPONETS to two things, because on aarch64, just need to download cdp components , but not need to download toolchain. 4 download hadoop aarch64 nativelibs , impala building needs these libs. With this commit, on ubuntu 18.04 aarch64 version, just need to run bin/bootstrap_development.sh, just like x86. Change-Id: I769668c834ab0dd504a822ed9153186778275d59 Reviewed-on: http://gerrit.cloudera.org:8080/16065 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-02 06:47:30 +00:00
Vihang Karajgaonkar	28b1542db0	IMPALA-10094: Skip test_refresh_updated_partitions on S3 The test test_refresh_updated_partitions runs some commands using Hive which causes it fail on S3 specific jobs since we don't run HiveServer2 in those environments. This patch skips the test on non-hdfs environments. Change-Id: I0d27dd76e772e396a07419a58821ba899ac74188 Reviewed-on: http://gerrit.cloudera.org:8080/16399 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-02 04:31:52 +00:00
Joe McDonnell	106dea63ba	IMPALA-10121: Generate JUnitXML for TSAN messages This adds logic in bin/jenkins/finalize.sh to check the ERROR log for TSAN messages (i.e. WARNING: ThreadSanitizer: ...) and generate a JUnitXML with the message. This happens when TSAN aborts Impala. Testing: - Ran TSAN build (which is currently failing) Change-Id: I44ea33a78482499decae0ec4c7c44513094b2f44 Reviewed-on: http://gerrit.cloudera.org:8080/16397 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-01 23:30:55 +00:00
Zoltan Borok-Nagy	329bb41294	IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files Currently Impala checks file metadata 'hive.acid.version' to decide the full ACID schema. There are cases when Hive forgets to set this value for full ACID files, e.g. query-based compactions. So it's more robust to check the schema elements instead of the metadata field. Also, sometimes Hive write the schema with different character cases, e.g. originalTransaction vs originaltransaction, so we should rather compare the column names in a case insensitive way. Testing: * added test for full ACID compaction * added test_full_acid_schema_without_file_metadata_tag to test full ACID file without metadata 'hive.acid.version' Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 Reviewed-on: http://gerrit.cloudera.org:8080/16383 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-01 22:27:27 +00:00
abeltian	69d0d0af47	IMPALA-10087: IMPALA-6050 causes alluxio not to be supported This change adds file type support for alluxio. Alluxio URLs have a different prefix such as：alluxio://zk@zk-1:2181,zk-2:2181,zk-3:2181/path/ Testing: Add unit test for alluxio file system type checks. Change-Id: Id92ec9cb0ee241a039fe4a96e1bc2ab3eaaf8f77 Reviewed-on: http://gerrit.cloudera.org:8080/16379 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-01 09:43:03 +00:00
Shant Hovsepian	f4273a40fe	IMPALA-7310: Partial fix for NDV cardinality with NULLs. This fix just handles the case where a column's cardinality is zero however it's nullable and we have null stats to indicate there are null values, therefore we adjust the cardinality from 0 to 1. The cardinality of zero was especially problematic when calculating cardinalities for multiple predicates with multiplication. The 0 would propagate up the plan tree and result in poor plan choices such as always using broadcast joins where shuffle would've been more optimal. Testing: * 26 Node TPC-DS 30TB run had better plans for Q4 and Q11 - Q4 172s -> 80s - Q11 103s -> 77s * CardinalityTest * TpcdsPlannerTest Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Reviewed-on: http://gerrit.cloudera.org:8080/16349 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-01 08:54:42 +00:00
Sahil Takiar	f85dbff976	IMPALA-10030: Remove unnecessary jar dependencies Remove the dependency on hadoop-hdfs, this jar file contains the core code for implementing HDFS, and thus pulls in a bunch of unnecessary transitive dependencies. Impala currently only requires this jar for some configuration key names. Most of these configuration key names have been moved to the appropriate HDFS client jars, and some others are deprecated altogether. Removing this jar required making a few code changes to move the location of the referenced configuration keys. Removes all transitive Kafka dependencies from the Apache Ranger dependency. Previously, Impala only excluded Kafka jars with binary version kafka_2.11, however, it seems the Ranger recently upgraded the dependency version to kafka_2.12. Now all Kafka dependencies are excluded, regardless of artifact name. Removes all transitive dependencies from the Apache Ozone dependency. Impala has a dependency on the Ozone client shaded-jar, which already includes all required transitive dependencies. For some reason, Ozone still pulls in some transitive dependencies even though they are not needed. Made some other minor cleanup / improvements in the fe/pom.xml file. This saves about 70 MB of space in the Docker images. Testing: * Ran exhaustive tests * Ran on-prem cluster E2E tests Change-Id: Iadbb6142466f73f067dd7cf9d401ff81145c74cc Reviewed-on: http://gerrit.cloudera.org:8080/16311 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-01 01:15:30 +00:00
Fang-Yu Rao	1cdae465b7	IMPALA-10118: Update shaded-deps/hive-exec/pom.xml for GenericHiveLexer In HIVE-19064 the class of GenericHiveLexer was introduced as an intermediate class between the classes of HiveLexer and Lexer. In order for ToSqlUtils.java to be compiled once we bump up CDP_BUILD_NUMBER that includes this change on the Hive side, this patch updates shaded-deps/hive-exec/pom.xml to include the jar of GenericHiveLexer so that Impala could be successfully built. Testing: - Verified that Impala could compile in a local development environment after applying this patch. Change-Id: I27db1cb8de36dd86bae08b7177ae3f1c156d73bc Reviewed-on: http://gerrit.cloudera.org:8080/16390 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-01 00:09:48 +00:00
Shant Hovsepian	827070b473	IMPALA-10099: Push down DISTINCT in Set operations INTERSECT/EXCEPT are not duplicate preserving operations. The distinct aggregations can happen in each operand, the leftmost operand only, or after all the operands in a separate aggregation step. Except for a couple special cases we would use the last strategy most often. This change pushes the distinct aggregation down to the leftmost operand in cases where there are no analytic functions, or when a distinct or grouping operation already eliminates duplicates. In general DISTINCT placement such as in this case should be done throughout the entire plan tree in a cost based manner as described in IMPALA-5260 Testing: * TpcdsPlannerTest * PlannerTest * TPC-DS 30TB Perf run for any affected queries - Q14-1 180s -> 150s - Q14-2 109s -> 90s - Q8 no significant change * SetOperation Planner Tests * Analyzer tests * Tpcds Functional Workload Change-Id: Ia248f1595df2ab48fbe70c778c7c32bde5c518a5 Reviewed-on: http://gerrit.cloudera.org:8080/16350 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-08-31 18:34:07 +00:00
stiga-huang	578933fe74	IMPALA-10065: Fix DCHECK when retrying a query in FINISHED state A query will come into the FINISHED state when some rows are available, even when some fragment instances are still executing. When a retryable query comes into the FINISHED state and the client hasn't fetched any results, we are still able to retry it for any retryable failures. This patch fixes a DCHECK when retrying a FINISHED state query. Tests: - Add a test in test_query_retries.py for retrying a query in FINISHED state. Change-Id: I11d82bf80640760a47325833463def8a3791bdda Reviewed-on: http://gerrit.cloudera.org:8080/16351 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-31 13:28:29 +00:00
Tim Armstrong	ea75e68f9e	IMPALA-10110: bloom filter target fpp query option This adds a BLOOM_FILTER_ERROR_RATE option that takes a value between 0 and 1 (exclusive) that can override the default target false positive probability (fpp) value of 0.75 for selecting the filter size. It does not affect whether filters are disabled at runtime. Adds estimated FPP and bloom size to the routing table so we have some observability. Here is an example: tpch_kudu> select count(*) from customer join nation on n_nationkey = c_nationkey; ID Src. Node Tgt. Node(s) Target type Partition filter Pending (Expected) First arrived Completed Enabled Bloom Size Est fpp ----------------------------------------------------------------------------------------------------------------------------------------- 1 2 0 LOCAL false 0 (3) N/A N/A true MIN_MAX 0 2 0 LOCAL false 0 (3) N/A N/A true 1.00 MB 1.04e-37 Testing: Added a test that shows the query option affecting filter size. Ran core tests. Change-Id: Ifb123a0ea1e0e95d95df9837c1f0222fd60361f3 Reviewed-on: http://gerrit.cloudera.org:8080/16377 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-29 05:48:37 +00:00
Sahil Takiar	5daff34724	IMPALA-10073: Create shaded dependency for S3A and aws-java-sdk-bundle The aws-java-sdk-bundle is one of the largest dependencies in the Impala Docker images and continues to grow. The jar includes SDKs for every single AWS service. This patch removes most of the unnecessary SDKs from the aws-java-sdk-bundle, thus drastically decreasing the size of the dependency. The Maven shade plugin is used to do this, and the implementation is similar to what is currently done for the hive-exec jar. This patch takes a conservative approach to removing packages from the aws-java-sdk-bundle jar, and I ensured no direct dependencies of the S3 SDK were removed. The idea is to only remove dependencies that S3A would never conceivably need. Given the huge number of AWS services, I only focused on removing the largest SDKs (the size of each SDK is estimated by the number of classes in the SDK). This decreases the size of the Docker images by about 100 MB. Testing: * Ran core tests against S3 Change-Id: I0939f73be986f83cc1fd07921563b4d9201780f2 Reviewed-on: http://gerrit.cloudera.org:8080/16342 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-29 00:58:34 +00:00
Qifan Chen	2ef6184ee1	IMPALA-9989 Improve admission control pool stats logging This work addresses the current limitation in admission controller by appending the last known memory consumption statistics about the set of queries running or waiting on a host or in a pool to the existing memory exhaustion message. The statistics is logged in impalad.INFO when a query is queued or queued and then timed out due to memory pressure in the pool or on the host. The statistics can also be part of the query profile. The new memory consumption statistics can be either stats on host or aggregated pool stats. The stats on host describes memory consumption for every pool on a host. The aggregated pool stats describes the aggregated memory consumption on all hosts for a pool. For each stats type, information such as query Ids and memory consumption of up to top 5 queries is provided, in addition to the min, the max, the average and the total memory consumption for the query set. When a query request is queued due to memory exhaustion, the above new consumption statistics is logged when the BE logging level is set at 2. When a query request is timed out due to memory exhaustion, the above new consumption statistics is logged when the BE logging level is set at 1. Testing: 1. Added a new test TopNQueryCheck in admission-controller-test.cc to verify that the topN query memory consumption details are reported correctly. 2. Add two new tests in test_admission_controller.py to simulate queries being queued and then timed out due to pool or host memory pressure. 3. Added a new test TopN in mem-tracker-test.cc to verify that the topN query memory consumption details are computed correctly from a mem tracker hierarchy. 4. Ran Core tests successfully. Change-Id: Id995a9d044082c3b8f044e1ec25bb4c64347f781 Reviewed-on: http://gerrit.cloudera.org:8080/16220 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-28 20:22:57 +00:00
wzhou-code	3733c4cc2c	IMPALA-10050: Fixed DCHECK error for backend in terminal state. Recent patch for IMPALA-6788 makes coordinator to cancel inflight query fragment instances when it receives failure report from one backend. It's possible the BackendState::Cancel() is called for one fragment instance before the first execution status report from its backend is received and processed by the coordinator. Since the status of BackendState is set as Cancelled after Cancel() is called, the execution of the fragment instance is treated as Done in such case so that the status report will NOT be processed. Hence the backend receives response OK from coordinator even it sent a report with execution error. This make backend hit DCHECK error if backend in the terminal state with error. This patch fixs the issue by making coordinator send CANCELLED status in the response of status report if the backend status is not ok and the execution status report is not applied. Testing: - The issue could be reproduced by running test_failpoints for about 20 iterations. Verified the fixing by running test_failpoints over 200 iterations without DCHECK failure. - Passed TestProcessFailures::test_kill_coordinator. - Psssed TestRPCException::test_state_report_error. - Passed exhaustive tests. Change-Id: Iba6a72f98c0f9299c22c58830ec5a643335b966a Reviewed-on: http://gerrit.cloudera.org:8080/16303 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-28 01:39:55 +00:00
Fang-Yu Rao	34668fab87	IMPALA-10092: Do not skip test vectors of Kudu tests in a custom cluster We found that the following 4 tests do not run even we remove all the decorators like "@SkipIfKudu.no_hybrid_clock" or "@SkipIfHive3.kudu_hms_notifications_not_supported" to skip the tests. This is due to the fact that those 3 classes inherit the class of CustomClusterTestSuite, which adds a constraint that only allows test vectors with 'file_format' and 'compression_codec' being "text" and "none", respectively, to be run. 1. TestKuduOperations::test_local_tz_conversion_ops 2. TestKuduClientTimeout::test_impalad_timeout 3. TestKuduHMSIntegration::test_create_managed_kudu_tables 4. TestKuduHMSIntegration::test_kudu_alter_table To address this issue, in this patch we create a parent class for those 3 classes above and override the method of add_custom_cluster_constraints() for this newly created parent class so that we do not skip test vectors with 'file_format' and 'compression_codec' being "kudu" and "none", respectively. On the other hand, this patch also removes a redundant method call to super(CustomClusterTestSuite, cls).add_test_dimensions() in CustomClusterTestSuite.add_custom_cluster_constraints() since super(CustomClusterTestSuite, cls).add_test_dimensions() had already been called immediately before the call to add_custom_cluster_constraints() in CustomClusterTestSuite.add_test_dimensions(). Testing: - Manually verified that after removing the decorators to skip those tests, those tests could be run. Change-Id: I60a4bd4ac5a9026629fb840ab9cc7b5f9948290c Reviewed-on: http://gerrit.cloudera.org:8080/16348 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-28 01:37:16 +00:00
stiga-huang	568b3394b2	IMPALA-10080: Skip loading HDFS cache pools for non-HDFS file systems In global invalidate metadata, we always load HDFS cache pools using the CachePoolReader. Actually, it only works for HDFS file systems, not for other systems like S3 or local, etc. We already handle this in CatalogServiceCatalog#CatalogServiceCatalog(). This patch adds a check in CatalogServiceCatalog#reset() to skip loading cache pools if it's not a true HDFS file system. Tests - Ran tests on S3. Verified that the IllegalStateException doesn't exists anymore. Change-Id: Ib243d349177e1b982b313dd6e87ecc2ef4dfc3d8 Reviewed-on: http://gerrit.cloudera.org:8080/16335 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-27 10:17:31 +00:00
stiga-huang	61dcc805e5	IMPALA-9225: Query option for retryable queries to spool all results before returning any to the client If we have returned any results to the client in the original query, query retry will be skipped to avoid incorrect results. This patch adds a query option, spool_all_results_for_retries, for retryable queries to spool all results before returning any to the client. It defaults to true. If all query results cannot be contained in the allocated result spooling space, we'll return results and thus disabled query retry on the query. Setting spool_all_results_for_retries to false will fallback to the original behavior - client can fetch results when any of them are ready. So we explicitly set it to false in the retried query since it won't be retried. For non retryable queries or queries that don't enable results spooling, the spool_all_results_for_retries option takes no effect. To implement this, this patch defers the time when results are ready to be fetched. By default, the “rows available” event happens when any results are ready. For a retryable query, when spool_query_results and spool_all_results_for_retries are both true, the “rows available” event happens after all results are spooled or any errors stopping us to do so, e.g. batch queue is full, cancellation or failures. After waiting for the root fragment instance’s Open() finishes, the coordinator will wait until results of BufferedPlanRootSink are ready. BufferedPlanRootSink sets the results ready signal in its Send(), Close(), Cancel(), FlushFinal() methods. Tests: - Add a test to verify that a retryable query will spool all its results when results spooling and spool_all_results_for_retries are enabled. - Add a test to verify that query retry succeeds when a retryable query is still spooling its results (spool_all_results_for_retries=true). - Add a test to verify that the retried query won't spool all results even when results spooling and spool_all_results_for_retries are enabled in the original query. - Add a test to verify that the original query can be canceled correctly. We need this because the added logics for spool_all_results_for_retries are related to the cancellation code path. - Add a test to verify results will be returned when all of them can't fit into the result spooling space, and query retry will be skipped. Change-Id: I462dbfef9ddab9060b30a6937fca9122484a24a5 Reviewed-on: http://gerrit.cloudera.org:8080/16323 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-27 04:09:58 +00:00
Shant Hovsepian	0fcf846592	IMPALA-10095: Include query plan tests for all of TPC-DS Added TpcdsPlannerTest to include each TPC-DS query as a separate plan test file. Removed the previous tpcds-all test file. This means when running only PlannerTest no TPC-DS plans are checked, however as part of a full frontend test run the TpcdsPlannerTest will be included. Runs with cardinality and resource checks, as well as using parquet tables to include predicate pushdowns. Change-Id: Ibaf40d8b783be1dc7b62ba3269feb034cb8047da Reviewed-on: http://gerrit.cloudera.org:8080/16345 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-08-27 00:11:56 +00:00
Kevin Risden	c4d4a42528	IMPALA-10060: Upgrade Postgres JDBC driver to 42.2.14 Change-Id: I969a1901b484b7fe6a830ab935e2b32674eaa512 Reviewed-on: http://gerrit.cloudera.org:8080/16362 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-08-26 17:10:48 +00:00
Vihang Karajgaonkar	21c50f8dbb	IMPALA-4364: [Addendum] Compare specific fields in StorageDescriptor The query option REFRESH_UPDATED_HMS_PARTITIONS was introduced earlier in IMPALA-4364 to detect changes in the partition objects in HMS when a refresh table command is issued. Originally, it relied on using the StorageDescriptor#equals() method to determine if the Partition in catalogd is same as partition in HMS with while executing the refresh statement. However, using StorageDescriptor#equals() is dependent on HMS version and may introduce inconsistent behaviors after upgrades. For example, when we backported the original patch to older distribution which uses Hive-2, the SkewedInfo field of StorageDescriptor is not null. This field causes the comparison logic to fail, since catalogd doesn't store the SkewedInfo field in the cached StorageDescriptor to optimize memory usage. This patch modifies the comparison logic to use explicit implementation in HdfsPartition class which compares only some fields which are cached in the HdfsPartition object. Testing: 1. Added a new test for the comparison method. 2. Modified existing test for the query option. Change-Id: I90c797060265f8f508d0b150e15da3d0f9961b9b Reviewed-on: http://gerrit.cloudera.org:8080/16363 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Vihang Karajgaonkar <vihang@cloudera.com>	2020-08-26 17:08:43 +00:00
Gabor Kaszab	28d94851b1	IMPALA-10020: Implement ds_kll_cdf_as_string() function This is the support for Cumulative Distribution Function (CDF) from Apache DataSketches KLL algorithm collection. It receives a serialized KLL sketch and one or more float values to represent ranges in the sketched values. E.g. [1, 5, 10] will mean the following ranges: (-inf, 1), (-inf, 5), (-inf, 10), (-inf, +inf) Returns a comma separated string where each value in the string is a number in the range of [0,1] and shows that what percentage of the data is in the particular ranges. Note, ds_kll_cdf() should return an Array of doubles as the result but with that we have to wait for the complex type support. Until, we provide ds_kll_cdf_as_string() that can be deprecated once we have array support. Tracking Jira for returning complex types from functions is IMPALA-9520. Example: select ds_kll_cdf_as_string(ds_kll_sketch(float_col), 2, 4, 10) from alltypes; +----------------------------------------------------------+ \| ds_kll_cdf_as_string(ds_kll_sketch(float_col), 2, 4, 10) \| +----------------------------------------------------------+ \| 0.2,0.401644,1,1 \| +----------------------------------------------------------+ Change-Id: I77e6afc4556ad05a295b89f6d06c2e4a6bb2cf82 Reviewed-on: http://gerrit.cloudera.org:8080/16359 Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-26 10:59:49 +00:00
Gabor Kaszab	a8a35edbc4	IMPALA-10019: Implement ds_kll_pmf_as_string() function This is the support for Probabilistic Mass Function (PMF) from Apache DataSketches KLL algorithm collection. It receives a serialized KLL sketch and one or more float values to represent ranges in the sketched values. E.g. [1, 5, 10] will mean the following ranges: (-inf, 1), [1, 5), [5, 10), [10, +inf) Returns a comma separated string where each value in the string is a number in the range of [0,1] and shows that what percentage of the data is in the particular ranges. Note, ds_kll_pmf() should return an Array of doubles as the result but with that we have to wait for the complex type support. Until, we provide ds_kll_pmf_as_string() that can be deprecated once we have array support. Tracking Jira for returning complex types from functions is IMPALA-9520. Example: select ds_kll_pmf_as_string(ds_kll_sketch(float_col), 2, 4, 10) from alltypes; +----------------------------------------------------------+ \| ds_kll_pmf_as_string(ds_kll_sketch(float_col), 2, 4, 10) \| +----------------------------------------------------------+ \| 0.202192,0.199452,0.598356,0 \| +----------------------------------------------------------+ Change-Id: I222402f2dce2f49ab2b3f6e81a709da5539293ba Reviewed-on: http://gerrit.cloudera.org:8080/16336 Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-26 01:50:16 +00:00
Tim Armstrong	e133d1838a	IMPALA-7782: fix constant NOT IN subqueries that can return 0 rows The bug was the the statement rewriter converted NOT IN <subquery> predicates to != <subquery> predicates when the subquery could be an empty set. This was invalid, because NOT IN (<empty set>) is true, but != (<empty set>) is false. Testing: Added targeted planner and end-to-end tests. Ran exhaustive tests. Change-Id: I66c726f0f66ce2f609e6ba44057191f5929a67fc Reviewed-on: http://gerrit.cloudera.org:8080/16338 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-25 23:08:40 +00:00
Eugene Zimichev	adf2c464ae	IMPALA-8547: get_json_object fails to get value for numeric key Allows numeric keys for JSON objects in get_json_object. This patch makes Impala consistent with Hive and Postgres behavior for get_json_object. Queries such as "select get_json_object('{"1": 5}', '$.1');" would fail before this patch. Now the query will return '5'. Testing: * Added tests to expr-test Change-Id: I7df037ccf2c79da0ba86a46df1dd28ab0e9a45f4 Reviewed-on: http://gerrit.cloudera.org:8080/14905 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-25 22:29:45 +00:00
Tim Armstrong	b46ea7664c	IMPALA-10103: upgrade jquery to 3.5.1 Testing: Manually clicked through most of the web UI pages and interacted with data tables, etc. Change-Id: Icf0445163a6bf15c56de0c6ca10798e09e0a4fcb Reviewed-on: http://gerrit.cloudera.org:8080/16355 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-25 21:22:15 +00:00
Gabor Kaszab	41065845e9	IMPALA-9962: Implement ds_kll_quantiles_as_string() function This function is very similar to ds_kll_quantile() but this one can receive any number of rank parameters and returns a comma separated string that holds the results for all of the given ranks. For more details about ds_kll_quantile() see IMPALA-9959. Note, ds_kll_quantiles() should return an Array of floats as the result but with that we have to wait for the complex type support. Until, we provide ds_kll_quantiles_as_string() that can be deprecated once we have array support. Tracking Jira for returning complex types from functions is IMPALA-9520. Change-Id: I76f6039977f4e14ded89a3ee4bc4e6ff855f5e7f Reviewed-on: http://gerrit.cloudera.org:8080/16324 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-25 18:06:22 +00:00
stiga-huang	e0a6e942b2	IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator The minimum requirement for a spillable operator is ((min_buffers -2) * default_buffer_size) + 2 * max_row_size. In the min reservation, we only reserve space for two large pages, one for reading, the other for writing. However, to make the non-streaming GroupingAggregator work correctly, we have to manage these extra reservations carefully. So it won't run out of the min reservation when it actually needs to spill a large page, or when it actually needs to read a large page. To be specific, for how to manage the large write page reservation, depending on whether needs_serialize is true or false: - If the aggregator needs to serialize the intermediate results when spilling a partition, we have to save a large page worth of reservation for the serialize stream, in case it needs to write large rows. This space can be restored when all the partitions are spilled so the serialize stream is not needed until we build/repartition a spilled partition and thus have pinned partitions again. If the large write page reservation is used, we save it back whenever possible after we spill or close a partition. - If the aggregator doesn't need the serialize stream at all, we can restore the large write page reservation whenever we fail to add a large row, before spilling any partitions. Reclaim it whenever possible after we spill or close a partition. A special case is when we are processing a large row and it's the last row in building/repartitioning a spilled partition, the large write page reservation can be restored for it no matter whether we need the serialize stream. Because partitions will be read out after this so no needs for spilling. For the large read page reservation, it's transferred to the spilled BufferedTupleStream that we are reading in building/repartitioning a spilled partition. The stream will restore some of it when reading a large page, and reclaim it when the output row batch is reset. Note that the stream is read in attach_on_read mode, the large page will be attached to the row batch's buffers and only get freed when the row batch is reset. Tests: - Add tests in test_spilling_large_rows (test_spilling.py) with different row sizes to reproduce the issue. - One test in test_spilling_no_debug_action becomes flaky after this patch. Revise the query to make the udf allocate larger strings so it can consistently pass. - Run CORE tests. Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Reviewed-on: http://gerrit.cloudera.org:8080/16240 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-08-25 16:11:20 +00:00
Qifan Chen	2ebf554dfd	IMPALA-7779 Parquet Scanner can write binary data into profile This fix addresses the current limitation in that an ill-formatted Parquet version string is not properly formatted before appearing in an error message or impalad.INFO. With the fix, any such string is converted to a hex string first. The hex string is a sequence of four hex digit groups separated by spaces and each group is one or two hex digits, such as "6c 65 2e a". Testing: Ran "core" tests successfully. Change-Id: I281d6fa7cb2f88f04588110943e3e768678b9cf1 Reviewed-on: http://gerrit.cloudera.org:8080/16331 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Sahil Takiar <stakiar@cloudera.com>	2020-08-25 15:42:01 +00:00
zhaorenhai	6390e7e1da	IMPALA-9544 Replace Intel's SSE instructions with ARM's NEON instructions Replace Intel's SSE instructions with ARM's NEON instructions Replace Intel's crc32 instructions with ARM's instructions Replace Intel's popcntq instruction with ARM's mechanism Replace Intel's pcmpestri and pcmpestrm instructions with ARM mechanism Change-Id: Id7dfe17125b2910ece54e7dd18b4e4b25d7de8b9 Reviewed-on: http://gerrit.cloudera.org:8080/15531 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-08-24 16:49:15 +00:00
Shant Hovsepian	3f1b1476af	IMPALA-10034: Add remaining TPC-DS queries to workload. Include remaining TPC-DS queries to the testdata workload definition. Q8 and Q38 were using non standard variants, those have been replaced by the official query versions. Q35 is using an official variant. Had to escape a table alias in Q90 as we treat 'AT' as a reserved keyword. Change-Id: Id5436689390f149694f14e6da1df624de4f5f7ad Reviewed-on: http://gerrit.cloudera.org:8080/16280 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-08-24 16:02:45 +00:00
Tim Armstrong	d65cb05bb8	IMPALA-7714: try to avoid be test crash in statestore We didn't get to a clear root cause for this, so I'm going to try two things. First, under the theory that the problem is somehow the destruction of the strings, convert them to char char* which does not require destruction on process teardown. Second, add some logging if the map lookup fails so we can better understand what may have happened. Change-Id: Id4363a93addb8a808d292906cac44ebd25c16889 Reviewed-on: http://gerrit.cloudera.org:8080/16341 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-19 22:28:35 +00:00
Vihang Karajgaonkar	cd52932125	IMPALA-4364: Query option to refresh updated HMS partitions This patch introduces a new boolean query option REFRESH_UPDATED_HMS_PARTITIONS. When this query option is set the refresh table command reloads the partitions which have been modified in HMS in addition to adding [removing] the new [removed] partitions. In order to do this the refresh table command needs to fetch all the partitions instead of the just the partition names which can cause the performance of refresh table to degrade when the query option is set. However for certain use-cases currently there is no way to detect changed partitions using refresh table command. For instance, if certain partition locations have been changed, a refresh table will not update those partitions. Testing: 1. Added a new test which sets the query option and makes sure that the updated partitions from hive are reloaded after refresh table command. 2. Ran exhaustive tests with the patch. Change-Id: I50e8680509f4eb0712e7bb3de44df5f2952179af Reviewed-on: http://gerrit.cloudera.org:8080/16308 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-15 02:01:05 +00:00
wzhou-code	e0681615c2	IMPALA-10039 (part 2): Fixed Expr-test crash due to race condition The root cause for crash is that QueryState::Cancel() was called before thread unsafe function QueryState::Init() was completed. This patch fixs the race condition between QueryState::Cancel() and QueryState::Init(). QueryState::Init() is safe to be called at any time. Testing: - The issue could be reproduced by running expr-test for 10-20 iterations. Verified the fixing by running expr-test over 1000 iterations without crash. - Passed TestProcessFailures::test_kill_coordinator. - Passed core tests. Change-Id: Ib0d3b9c59924a25b70fa20afeb6e8ca93016eca9 Reviewed-on: http://gerrit.cloudera.org:8080/16313 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>	2020-08-14 22:32:55 +00:00

1 2 3 4 5 ...

9401 Commits