Commit Graph

9401 Commits

Author SHA1 Message Date
Fang-Yu Rao
efc627d050 IMPALA-10158: Set timezone to UTC for Iceberg-related E2E tests
We found that the tests of test_iceberg_query and test_iceberg_profile
fail after the patch for IMPALA-9741 has been merged and that it is due
to the default timezone of Impala not being UTC. This patch fixes the
issue by adding "SET TIMEZONE=UTC;" before those test queries are run.

Testing:
 - Verified in a local development environment that the tests of
   test_iceberg_query and test_iceberg_profile could pass after applying
   this patch.

Change-Id: Ie985519e8ded04f90465e141488bd2dda78af6c3
Reviewed-on: http://gerrit.cloudera.org:8080/16425
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-09 13:26:42 +00:00
wzhou-code
0c89a9d562 IMPALA-10140: Fix CatalogExeception for creating database with sync_ddl as true
IMPALA-7961 handle the cases for query "create table if not exists"
with sync_ddl as true. Customers reported similar issue which happened
for query "create database if not exists" with sync_ddl as true.
This patch adds the similar fixing as the fixing for IMPALA-7961 to
function CatalogOpExecutor.createDatabase() to fix the issue.

Testing:
 - Manual tests
   Since this is a racy bug, I could only reproduce it by forcing
   frequent topicUpdateLog GCs along with a specific sequence of
   actions, like: run some DDLs and REFRESHs to trigger a GC in
   topicUpdateLog, then run query "create database if not exists" with
   sync_ddl as true. Verified that the issue couldn't be reproduced
   after applying this patch.
 - Passed exhaustive test.

Change-Id: Id623118f8938f416414c45d93404fb70d036a9df
Reviewed-on: http://gerrit.cloudera.org:8080/16421
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-09 06:52:53 +00:00
Qifan Chen
9f51673a40 IMPALA-10129 Data race in MemTracker::GetTopNQueriesAndUpdatePoolStats
This work addresses a data race condition in admission controller by
providing the initializing values for two data members (
is_query_mem_tracker_ and query_id_) in a constructor for the MemTracker
class. Without doing so, the two data members are set, without lock
protection, after the object is constructed, which allows other threads
to modify either of them at the same time.

Testing:
1. Ran the python admission controller test successfully with a tsan
   build. Data race was not observed with the enhancement. Data race
   was observed without the enhancement.
2. Ran the core test.

Change-Id: I9c4ffe8064d3e099a525cc48c218ef73112fb67b
Reviewed-on: http://gerrit.cloudera.org:8080/16408
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-09 04:23:17 +00:00
huangtianhua
6aaea3216c IMPALA-10090 Pull newest code of native-toolchain before build it
If native-toolchain exists we should pull the newest code
before build it.

Change-Id: I2da3ffce7abb88190be0a5ea0e2cf603f98ee15e
Reviewed-on: http://gerrit.cloudera.org:8080/16402
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-08 23:35:41 +00:00
Bikramjeet Vig
fc51cd3bc0 IMPALA-10052: Expose daemon health endpoint for statestore and catalog
This change exposes the daemon health of statestored and catalogd via
an HTTP endpoint '/healthz'. If the server is healthy, this endpoint
will return HTTP code 200 (OK). If it is unhealthy, it will return
503 (Service Unavailable). This is consistent with the endpoint added
for impalads in IMPALA-8895.

Testing:
- Extended test in test_web_pages.py

Change-Id: I7714734df8e50dabbbebcb77a86a5a00bd13bf7c
Reviewed-on: http://gerrit.cloudera.org:8080/16295
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-08 22:11:23 +00:00
Qifan Chen
1e63722f8d IMPALA-10124 admission-controller-test fails with no such file or
directory error

This work addresses a failure by disabling undefined behavior sanitizer
testing for AdmissionControllerTest.TopNQueryCheck test. In the test,
std::regex_match() is used to verify the appearance of certain strings
and can produce a core with very long stack trace failling in
std::vector::operator[]().

Testing:
1. Ran the test in both regular and disabling undefined behavior
   sanitizer check modes. No core was seen.

Change-Id: I16d6cff8fad8d0e93a24ec3fefa9cc1f8c471aad
Reviewed-on: http://gerrit.cloudera.org:8080/16404
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-08 21:33:38 +00:00
skyyws
fb6d96e001 IMPALA-9741: Support querying Iceberg table by impala
This patch mainly realizes the querying of iceberg table through impala,
we can use the following sql to create an external iceberg table:
    CREATE EXTERNAL TABLE default.iceberg_test (
        level string,
        event_time timestamp,
        message string,
    )
    STORED AS ICEBERG
    LOCATION 'hdfs://xxx'
    TBLPROPERTIES ('iceberg_file_format'='parquet');
Or just including table name and location like this:
    CREATE EXTERNAL TABLE default.iceberg_test
    STORED AS ICEBERG
    LOCATION 'hdfs://xxx'
    TBLPROPERTIES ('iceberg_file_format'='parquet');
'iceberg_file_format' is the file format in iceberg, currently only
support PARQUET, other format would be supported in the future. And
if you don't specify this property in your SQL, default file format
is PARQUET.

We achieved this function by treating the iceberg table as normal
unpartitioned hdfs table. When querying iceberg table, we pushdown
partition column predicates to iceberg to decide which data files
need to be scanned, and then transfer this information to BE to
do the real scan operation.

Testing:
- Unit test for Iceberg in FileMetadataLoaderTest
- Create table tests in functional_schema_template.sql
- Iceberg table query test in test_scanners.py

Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Reviewed-on: http://gerrit.cloudera.org:8080/16143
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-06 02:12:07 +00:00
Adam Tamas
fe6e625747 IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix
While the ds_hll_sketch() generates a string value as output the data
is not an ascii encoded text but a bitsketch, because of this, when
the shell get this data it disconnect while it tries to decode it.

The issue can be reproduced with a simple method like using unhex
with a wrong input.
Example: SELECT unhex("aa");

This patch contains a solution, where we replace any not UTF-8
decodable characters if we run into an UnicodeDecodeError after
fetching it.

This solution is working with the Thrift 0.9.3 autogenerated gen-py
but still fails with Thrift 0.11.0.

For Thrift 0.11.0 the error is catched and an error message is sent
(not working with beeswax protocol, because it generates a different
error (TypeError) which can come for other reasons too).

Testing:
-manual testing with these protocols: 'hs2-http', 'hs2', 'beeswax'

Change-Id: I0c5f1290356e21aed8ca7f896f953541942aed05
Reviewed-on: http://gerrit.cloudera.org:8080/16418
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
2020-09-05 09:42:46 +00:00
Csaba Ringhofer
b7965d8240 Revert "IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix"
This reverts commit 75146c9138.

Change-Id: I57f790389a8c847877999d2b9b8185939b416c07
Reviewed-on: http://gerrit.cloudera.org:8080/16417
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>
2020-09-04 12:28:56 +00:00
Adam Tamas
75146c9138 IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix
While the ds_hll_sketch() generates a string value as output the data
is not an ascii encoded text but a bitsketch, because of this, when
the shell get this data it disconnect while it tries to decode it.

The issue can be reproduced with a simple method like using unhex
with a wrong input.
Example: SELECT unhex("aa");

This patch contains a solution, where we replace any not UTF-8
decodable characters if we run into an UnicodeDecodeError after
fetching it.

This solution is working with the Thrift 0.9.3 autogenerated gen-py
but still fails with Thrift 0.11.0.

For Thrift 0.11.0 the error is catched and an error message is sent
(not working with beeswax protocol, because it generates a different
error (TypeError) which can come for other reasons too).

Testing:
-manual testing with these protocols: 'hs2-http', 'hs2', 'beeswax'

Change-Id: Ic5cfb907871ca83e5f04a39ca9d7a8e138d711a8
Reviewed-on: http://gerrit.cloudera.org:8080/16305
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>
2020-09-04 12:18:28 +00:00
Daniel Becker
dc08b657e8 IMPALA-7658: Proper codegen for HiveUdfCall
Implementing codegen for HiveUdfCall.

Testing:
Verified that java udf tests pass locally.

Benchmarks:
Used a UDF from TestUdf.java that adds three integers:

  create function tpch15_parquet.sum3(int, int, int) returns int
  location '/test-warehouse/impala-hive-udfs.jar'
  symbol='org.apache.impala.TestUdf';

Used the following query on the master branch and the change's branch:

  set num_nodes=1; set mt_dop=1;
  select min(tpch15_parquet.sum3(cast(l_orderkey as int),
    cast(l_partkey as int), cast(l_suppkey as int)))
  from tpch15_parquet.lineitem;

Results averaged over 100 runs after warmup:
Master: 20.6346s, stddev: 0.3132411856765332
This change: 19.0256s, stddev: 0.42039019873436

This is a ~7.8% improvement.

Change-Id: I2f994dac550f297ed3c88491816403f237d4d747
Reviewed-on: http://gerrit.cloudera.org:8080/16314
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-04 00:55:02 +00:00
Tamas Mate
2359a1be9d IMPALA-10119: Fix impala-shell history duplication test
The flaky test was
TestImpalaShellInteractive.test_history_does_not_duplicate_on_interrupt

The test failed with timeout error when the interrupt signal arrived
later after the next test query was started. The impala-shell output was
^C instead of the expected query result.

This change adds an additional blocking expect call to wait for the
interrupt signal to arrive before sending in the next query.

Change-Id: I242eb47cc8093c4566de206f46b75b3feab1183c
Reviewed-on: http://gerrit.cloudera.org:8080/16391
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2020-09-03 18:25:58 +00:00
Adam Tamas
99e5f5a885 IMPALA-10133:Implement ds_hll_stringify function.
This function receives a string that is a serialized Apache DataSketches
HLL sketch and returns its stringified format.

A stringified format should look like and contains the following data:

select ds_hll_stringify(ds_hll_sketch(float_col)) from
functional_parquet.alltypestiny;
+--------------------------------------------+
| ds_hll_stringify(ds_hll_sketch(float_col)) |
+--------------------------------------------+
| ### HLL sketch summary:                    |
|   Log Config K   : 12                      |
|   Hll Target     : HLL_4                   |
|   Current Mode   : LIST                    |
|   LB             : 2                       |
|   Estimate       : 2                       |
|   UB             : 2.0001                  |
|   OutOfOrder flag: false                   |
|   Coupon count   : 2                       |
| ### End HLL sketch summary                 |
|                                            |
+--------------------------------------------+

Change-Id: I85dbf20b5114dd75c300eef0accabe90eac240a0
Reviewed-on: http://gerrit.cloudera.org:8080/16382
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-03 12:11:48 +00:00
Aman Sinha
5e9f10d34c IMPALA-10064: Support constant propagation for eligible range predicates
This patch adds support for constant propagation of range predicates
involving date and timestamp constants. Previously, only equality
predicates were considered for propagation. The new type of propagation
is shown by the following example:

Before constant propagation:
 WHERE date_col = CAST(timestamp_col as DATE)
  AND timestamp_col BETWEEN '2019-01-01' AND '2020-01-01'
After constant propagation:
 WHERE date_col >= '2019-01-01' AND date_col <= '2020-01-01'
  AND timestamp_col >= '2019-01-01' AND timestamp_col <= '2020-01-01'
  AND date_col = CAST(timestamp_col as DATE)

As a consequence, since Impala supports table partitioning by date
columns but not timestamp columns, the above propagation enables
partition pruning based on timestamp ranges.

Existing code for equality based constant propagation was refactored
and consolidated into a new class which handles both equality and
range based constant propagation. Range based propagation is only
applied to date and timestamp columns.

Testing:
 - Added new range constant propagation tests to PlannerTest.
 - Added e2e test for range constant propagation based on a newly
   added date partitioned table.
 - Ran precommit tests.

Change-Id: I811a1f8d605c27c7704d7fc759a91510c6db3c2b
Reviewed-on: http://gerrit.cloudera.org:8080/16346
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-02 22:57:55 +00:00
Adam Tamas
f9936549dc IMPALA-10106: Upgrade DataSketches to version 2.1.0
Upgrade the external DataSketches files for HLL/KLL to version 2.1.0

tests:
-Ran the tests from tests/query_test/test_datasketches.py

Change-Id: I4faa31c0b628a62c7e56a6c4b9549d0aaa8a02ff
Reviewed-on: http://gerrit.cloudera.org:8080/16360
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-02 22:06:50 +00:00
Zoltan Borok-Nagy
502e1134be IMPALA-10071: Impala shouldn't create filename starting with underscore during ACID TRUNCATE
When Impala TRUNCATEs an ACID table, it creates a new base directory
with the hidden file "_empty" in it. Newer Hive versions ignore files
starting with underscore, therefore they ignore the whole base
directory.

To resolve this issue we can simply rename the empty file to "empty".

Testing:
 * update acid-truncate.test accordingly

Change-Id: Ia0557b9944624bc123c540752bbe3877312a7ac9
Reviewed-on: http://gerrit.cloudera.org:8080/16396
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-02 13:29:25 +00:00
Adam Tamas
4cb3c3556e IMPALA-10108: Implement ds_kll_stringify function
This function receives a string that is a serialized Apache DataSketches
KLL sketch and returns its stringified format.

A stringified format should look like and contains the following data:

select ds_kll_stringify(ds_kll_sketch(float_col))
from functional_parquet.alltypestiny;
+--------------------------------------------+
| ds_kll_stringify(ds_kll_sketch(float_col)) |
+--------------------------------------------+
| ### KLL sketch summary:                    |
|    K              : 200                    |
|    min K          : 200                    |
|    M              : 8                      |
|    N              : 8                      |
|    Epsilon        : 1.33%                  |
|    Epsilon PMF    : 1.65%                  |
|    Empty          : false                  |
|    Estimation mode: false                  |
|    Levels         : 1                      |
|    Sorted         : false                  |
|    Capacity items : 200                    |
|    Retained items : 8                      |
|    Storage bytes  : 64                     |
|    Min value      : 0                      |
|    Max value      : 1.1                    |
| ### End sketch summary                     |
|                                            |
+--------------------------------------------+

Change-Id: I97f654a4838bf91e3e0bed6a00d78b2c7aa96f75
Reviewed-on: http://gerrit.cloudera.org:8080/16370
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-02 10:49:10 +00:00
zhaorenhai
0098113d95 IMPALA-10090: Create aarch64 development environment on ubuntu 18.04
Including following changes:
1 build native-toolchain local by script on aarch64 platform
2 change some native-toolchain's lib version number
3 split SKIP_TOOLCHAIN_BOOTSTRAP and DOWNLOAD_CDH_COMPONETS to two things,
  because on aarch64, just need to download cdp components ,
  but not need to download toolchain.
4 download hadoop aarch64 nativelibs , impala building needs these libs.

With this commit,  on ubuntu 18.04 aarch64 version,
just need to run bin/bootstrap_development.sh, just like x86.

Change-Id: I769668c834ab0dd504a822ed9153186778275d59
Reviewed-on: http://gerrit.cloudera.org:8080/16065
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-02 06:47:30 +00:00
Vihang Karajgaonkar
28b1542db0 IMPALA-10094: Skip test_refresh_updated_partitions on S3
The test test_refresh_updated_partitions runs some commands using Hive which
causes it fail on S3 specific jobs since we don't run HiveServer2 in those
environments. This patch skips the test on non-hdfs environments.

Change-Id: I0d27dd76e772e396a07419a58821ba899ac74188
Reviewed-on: http://gerrit.cloudera.org:8080/16399
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-02 04:31:52 +00:00
Joe McDonnell
106dea63ba IMPALA-10121: Generate JUnitXML for TSAN messages
This adds logic in bin/jenkins/finalize.sh to check the ERROR
log for TSAN messages (i.e. WARNING: ThreadSanitizer: ...)
and generate a JUnitXML with the message. This happens when
TSAN aborts Impala.

Testing:
 - Ran TSAN build (which is currently failing)

Change-Id: I44ea33a78482499decae0ec4c7c44513094b2f44
Reviewed-on: http://gerrit.cloudera.org:8080/16397
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-01 23:30:55 +00:00
Zoltan Borok-Nagy
329bb41294 IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files
Currently Impala checks file metadata 'hive.acid.version' to decide the
full ACID schema. There are cases when Hive forgets to set this value
for full ACID files, e.g. query-based compactions.

So it's more robust to check the schema elements instead of the metadata
field. Also, sometimes Hive write the schema with different character
cases, e.g. originalTransaction vs originaltransaction, so we should
rather compare the column names in a case insensitive way.

Testing:
* added test for full ACID compaction
* added test_full_acid_schema_without_file_metadata_tag to test full
  ACID file without metadata 'hive.acid.version'

Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14
Reviewed-on: http://gerrit.cloudera.org:8080/16383
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-01 22:27:27 +00:00
abeltian
69d0d0af47 IMPALA-10087: IMPALA-6050 causes alluxio not to be supported
This change adds file type support for alluxio.
Alluxio URLs have a different prefix
such as:alluxio://zk@zk-1:2181,zk-2:2181,zk-3:2181/path/

Testing:
Add unit test for alluxio file system type checks.

Change-Id: Id92ec9cb0ee241a039fe4a96e1bc2ab3eaaf8f77
Reviewed-on: http://gerrit.cloudera.org:8080/16379
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-01 09:43:03 +00:00
Shant Hovsepian
f4273a40fe IMPALA-7310: Partial fix for NDV cardinality with NULLs.
This fix just handles the case where a column's cardinality is zero
however it's nullable and we have null stats to indicate there are null
values, therefore we adjust the cardinality from 0 to 1.

The cardinality of zero was especially problematic when calculating
cardinalities for multiple predicates with multiplication. The 0 would
propagate up the plan tree and result in poor plan choices such as
always using broadcast joins where shuffle would've been more optimal.

Testing:
  * 26 Node TPC-DS 30TB run had better plans for Q4 and Q11
    - Q4 172s -> 80s
    - Q11 103s -> 77s
  * CardinalityTest
  * TpcdsPlannerTest

Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032
Reviewed-on: http://gerrit.cloudera.org:8080/16349
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-01 08:54:42 +00:00
Sahil Takiar
f85dbff976 IMPALA-10030: Remove unnecessary jar dependencies
Remove the dependency on hadoop-hdfs, this jar file contains the core
code for implementing HDFS, and thus pulls in a bunch of unnecessary
transitive dependencies. Impala currently only requires this jar for
some configuration key names. Most of these configuration key names have
been moved to the appropriate HDFS client jars, and some others are
deprecated altogether. Removing this jar required making a few code
changes to move the location of the referenced configuration keys.

Removes all transitive Kafka dependencies from the Apache Ranger
dependency. Previously, Impala only excluded Kafka jars with binary
version kafka_2.11, however, it seems the Ranger recently upgraded the
dependency version to kafka_2.12. Now all Kafka dependencies are
excluded, regardless of artifact name.

Removes all transitive dependencies from the Apache Ozone dependency.
Impala has a dependency on the Ozone client shaded-jar, which already
includes all required transitive dependencies. For some reason, Ozone
still pulls in some transitive dependencies even though they are not
needed.

Made some other minor cleanup / improvements in the fe/pom.xml file.

This saves about 70 MB of space in the Docker images.

Testing:
* Ran exhaustive tests
* Ran on-prem cluster E2E tests

Change-Id: Iadbb6142466f73f067dd7cf9d401ff81145c74cc
Reviewed-on: http://gerrit.cloudera.org:8080/16311
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-01 01:15:30 +00:00
Fang-Yu Rao
1cdae465b7 IMPALA-10118: Update shaded-deps/hive-exec/pom.xml for GenericHiveLexer
In HIVE-19064 the class of GenericHiveLexer was introduced as an
intermediate class between the classes of HiveLexer and Lexer. In order
for ToSqlUtils.java to be compiled once we bump up CDP_BUILD_NUMBER that
includes this change on the Hive side, this patch updates
shaded-deps/hive-exec/pom.xml to include the jar of GenericHiveLexer so
that Impala could be successfully built.

Testing:
 - Verified that Impala could compile in a local development
   environment after applying this patch.

Change-Id: I27db1cb8de36dd86bae08b7177ae3f1c156d73bc
Reviewed-on: http://gerrit.cloudera.org:8080/16390
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-01 00:09:48 +00:00
Shant Hovsepian
827070b473 IMPALA-10099: Push down DISTINCT in Set operations
INTERSECT/EXCEPT are not duplicate preserving operations. The distinct
aggregations can happen in each operand, the leftmost operand only, or
after all the operands in a separate aggregation step. Except for a
couple special cases we would use the last strategy most often.

This change pushes the distinct aggregation down to the leftmost operand
in cases where there are no analytic functions, or when a distinct or
grouping operation already eliminates duplicates.

In general DISTINCT placement such as in this case should be done
throughout the entire plan tree in a cost based manner as described in
IMPALA-5260

Testing:
 * TpcdsPlannerTest
 * PlannerTest
 * TPC-DS 30TB Perf run for any affected queries
   - Q14-1 180s -> 150s
   - Q14-2 109s -> 90s
   - Q8 no significant change
 * SetOperation Planner Tests
 * Analyzer tests
 * Tpcds Functional Workload

Change-Id: Ia248f1595df2ab48fbe70c778c7c32bde5c518a5
Reviewed-on: http://gerrit.cloudera.org:8080/16350
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2020-08-31 18:34:07 +00:00
stiga-huang
578933fe74 IMPALA-10065: Fix DCHECK when retrying a query in FINISHED state
A query will come into the FINISHED state when some rows are available,
even when some fragment instances are still executing. When a retryable
query comes into the FINISHED state and the client hasn't fetched any
results, we are still able to retry it for any retryable failures. This
patch fixes a DCHECK when retrying a FINISHED state query.

Tests:
 - Add a test in test_query_retries.py for retrying a query in FINISHED
   state.

Change-Id: I11d82bf80640760a47325833463def8a3791bdda
Reviewed-on: http://gerrit.cloudera.org:8080/16351
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-08-31 13:28:29 +00:00
Tim Armstrong
ea75e68f9e IMPALA-10110: bloom filter target fpp query option
This adds a BLOOM_FILTER_ERROR_RATE option that takes a
value between 0 and 1 (exclusive) that can override
the default target false positive probability (fpp)
value of 0.75 for selecting the filter size.

It does not affect whether filters are disabled
at runtime.

Adds estimated FPP and bloom size to the routing
table so we have some observability. Here is an
example:

tpch_kudu> select count(*) from customer join nation on n_nationkey = c_nationkey;

 ID  Src. Node  Tgt. Node(s)  Target type  Partition filter  Pending (Expected)  First arrived  Completed  Enabled  Bloom Size    Est fpp
-----------------------------------------------------------------------------------------------------------------------------------------
  1          2             0        LOCAL             false               0 (3)            N/A        N/A     true     MIN_MAX
  0          2             0        LOCAL             false               0 (3)            N/A        N/A     true     1.00 MB   1.04e-37

Testing:
Added a test that shows the query option affecting filter size.

Ran core tests.

Change-Id: Ifb123a0ea1e0e95d95df9837c1f0222fd60361f3
Reviewed-on: http://gerrit.cloudera.org:8080/16377
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-08-29 05:48:37 +00:00
Sahil Takiar
5daff34724 IMPALA-10073: Create shaded dependency for S3A and aws-java-sdk-bundle
The aws-java-sdk-bundle is one of the largest dependencies in the Impala
Docker images and continues to grow. The jar includes SDKs for
every single AWS service.

This patch removes most of the unnecessary SDKs from the
aws-java-sdk-bundle, thus drastically decreasing the size of the
dependency. The Maven shade plugin is used to do this, and the
implementation is similar to what is currently done for the hive-exec
jar.

This patch takes a conservative approach to removing packages from the
aws-java-sdk-bundle jar, and I ensured no direct dependencies of the S3
SDK were removed. The idea is to only remove dependencies that S3A would
never conceivably need. Given the huge number of AWS services, I only
focused on removing the largest SDKs (the size of each SDK is estimated
by the number of classes in the SDK).

This decreases the size of the Docker images by about 100 MB.

Testing:
* Ran core tests against S3

Change-Id: I0939f73be986f83cc1fd07921563b4d9201780f2
Reviewed-on: http://gerrit.cloudera.org:8080/16342
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-08-29 00:58:34 +00:00
Qifan Chen
2ef6184ee1 IMPALA-9989 Improve admission control pool stats logging
This work addresses the current limitation in admission controller by
appending the last known memory consumption statistics about the set of
queries running or waiting on a host or in a pool to the existing memory
exhaustion message. The statistics is logged in impalad.INFO when a
query is queued or queued and then timed out due to memory pressure in
the pool or on the host. The statistics can also be part of the query
profile.

The new memory consumption statistics can be either stats on host or
aggregated pool stats. The stats on host describes memory consumption
for every pool on a host. The aggregated pool stats describes the
aggregated memory consumption on all hosts for a pool. For each stats
type, information such as query Ids and memory consumption of up to top
5 queries is provided, in addition to the min, the max, the average and
the total memory consumption for the query set.

When a query request is queued due to memory exhaustion, the above
new consumption statistics is logged when the BE logging level is set
at 2.

When a query request is timed out due to memory exhaustion, the above
new consumption statistics is logged when the BE logging level is set
at 1.

Testing:
1. Added a new test TopNQueryCheck in admission-controller-test.cc to
   verify that the topN query memory consumption details are reported
   correctly.
2. Add two new tests in test_admission_controller.py to simulate
   queries being queued and then timed out due to pool or host memory
   pressure.
3. Added a new test TopN in mem-tracker-test.cc to
   verify that the topN query memory consumption details are computed
   correctly from a mem tracker hierarchy.
4. Ran Core tests successfully.

Change-Id: Id995a9d044082c3b8f044e1ec25bb4c64347f781
Reviewed-on: http://gerrit.cloudera.org:8080/16220
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-08-28 20:22:57 +00:00
wzhou-code
3733c4cc2c IMPALA-10050: Fixed DCHECK error for backend in terminal state.
Recent patch for IMPALA-6788 makes coordinator to cancel inflight
query fragment instances when it receives failure report from one
backend. It's possible the BackendState::Cancel() is called for
one fragment instance before the first execution status report
from its backend is received and processed by the coordinator.
Since the status of BackendState is set as Cancelled after Cancel()
is called, the execution of the fragment instance is treated as
Done in such case so that the status report will NOT be processed.
Hence the backend receives response OK from coordinator even it
sent a report with execution error. This make backend hit DCHECK
error if backend in the terminal state with error.
This patch fixs the issue by making coordinator send CANCELLED
status in the response of status report if the backend status is not
ok and the execution status report is not applied.

Testing:
 - The issue could be reproduced by running test_failpoints for about
   20 iterations. Verified the fixing by running test_failpoints over
   200 iterations without DCHECK failure.
 - Passed TestProcessFailures::test_kill_coordinator.
 - Psssed TestRPCException::test_state_report_error.
 - Passed exhaustive tests.

Change-Id: Iba6a72f98c0f9299c22c58830ec5a643335b966a
Reviewed-on: http://gerrit.cloudera.org:8080/16303
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-08-28 01:39:55 +00:00
Fang-Yu Rao
34668fab87 IMPALA-10092: Do not skip test vectors of Kudu tests in a custom cluster
We found that the following 4 tests do not run even we remove all the
decorators like "@SkipIfKudu.no_hybrid_clock" or
"@SkipIfHive3.kudu_hms_notifications_not_supported" to skip the tests.
This is due to the fact that those 3 classes inherit the class of
CustomClusterTestSuite, which adds a constraint that only allows test
vectors with 'file_format' and 'compression_codec' being "text" and
"none", respectively, to be run.

1. TestKuduOperations::test_local_tz_conversion_ops
2. TestKuduClientTimeout::test_impalad_timeout
3. TestKuduHMSIntegration::test_create_managed_kudu_tables
4. TestKuduHMSIntegration::test_kudu_alter_table

To address this issue, in this patch we create a parent class for those
3 classes above and override the method of
add_custom_cluster_constraints() for this newly created parent class so
that we do not skip test vectors with 'file_format' and
'compression_codec' being "kudu" and "none", respectively.

On the other hand, this patch also removes a redundant method call to
super(CustomClusterTestSuite, cls).add_test_dimensions() in
CustomClusterTestSuite.add_custom_cluster_constraints() since
super(CustomClusterTestSuite, cls).add_test_dimensions() had
already been called immediately before the call to
add_custom_cluster_constraints() in
CustomClusterTestSuite.add_test_dimensions().

Testing:
 - Manually verified that after removing the decorators to skip those
   tests, those tests could be run.

Change-Id: I60a4bd4ac5a9026629fb840ab9cc7b5f9948290c
Reviewed-on: http://gerrit.cloudera.org:8080/16348
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-08-28 01:37:16 +00:00
stiga-huang
568b3394b2 IMPALA-10080: Skip loading HDFS cache pools for non-HDFS file systems
In global invalidate metadata, we always load HDFS cache pools using the
CachePoolReader. Actually, it only works for HDFS file systems, not for
other systems like S3 or local, etc. We already handle this in
CatalogServiceCatalog#CatalogServiceCatalog(). This patch adds a check
in CatalogServiceCatalog#reset() to skip loading cache pools if it's not
a true HDFS file system.

Tests
- Ran tests on S3. Verified that the IllegalStateException doesn't
  exists anymore.

Change-Id: Ib243d349177e1b982b313dd6e87ecc2ef4dfc3d8
Reviewed-on: http://gerrit.cloudera.org:8080/16335
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-08-27 10:17:31 +00:00
stiga-huang
61dcc805e5 IMPALA-9225: Query option for retryable queries to spool all results before returning any to the client
If we have returned any results to the client in the original query,
query retry will be skipped to avoid incorrect results. This patch adds
a query option, spool_all_results_for_retries, for retryable queries to
spool all results before returning any to the client. It defaults to
true. If all query results cannot be contained in the allocated result
spooling space, we'll return results and thus disabled query retry on
the query.

Setting spool_all_results_for_retries to false will fallback to the
original behavior - client can fetch results when any of them are ready.
So we explicitly set it to false in the retried query since it won't be
retried. For non retryable queries or queries that don't enable results
spooling, the spool_all_results_for_retries option takes no effect.

To implement this, this patch defers the time when results are ready to
be fetched. By default, the “rows available” event happens when any
results are ready. For a retryable query, when spool_query_results and
spool_all_results_for_retries are both true, the “rows available” event
happens after all results are spooled or any errors stopping us to do
so, e.g. batch queue is full, cancellation or failures. After waiting
for the root fragment instance’s Open() finishes, the coordinator will
wait until results of BufferedPlanRootSink are ready.
BufferedPlanRootSink sets the results ready signal in its Send(),
Close(), Cancel(), FlushFinal() methods.

Tests:
- Add a test to verify that a retryable query will spool all its results
  when results spooling and spool_all_results_for_retries are enabled.
- Add a test to verify that query retry succeeds when a retryable query
  is still spooling its results (spool_all_results_for_retries=true).
- Add a test to verify that the retried query won't spool all results
  even when results spooling and spool_all_results_for_retries are
  enabled in the original query.
- Add a test to verify that the original query can be canceled
  correctly. We need this because the added logics for
  spool_all_results_for_retries are related to the cancellation code
  path.
- Add a test to verify results will be returned when all of them can't
  fit into the result spooling space, and query retry will be skipped.

Change-Id: I462dbfef9ddab9060b30a6937fca9122484a24a5
Reviewed-on: http://gerrit.cloudera.org:8080/16323
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-08-27 04:09:58 +00:00
Shant Hovsepian
0fcf846592 IMPALA-10095: Include query plan tests for all of TPC-DS
Added TpcdsPlannerTest to include each TPC-DS query as a separate plan
test file. Removed the previous tpcds-all test file.

This means when running only PlannerTest no TPC-DS plans are checked,
however as part of a full frontend test run the TpcdsPlannerTest will be
included.

Runs with cardinality and resource checks, as well as using parquet
tables to include predicate pushdowns.

Change-Id: Ibaf40d8b783be1dc7b62ba3269feb034cb8047da
Reviewed-on: http://gerrit.cloudera.org:8080/16345
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2020-08-27 00:11:56 +00:00
Kevin Risden
c4d4a42528 IMPALA-10060: Upgrade Postgres JDBC driver to 42.2.14
Change-Id: I969a1901b484b7fe6a830ab935e2b32674eaa512
Reviewed-on: http://gerrit.cloudera.org:8080/16362
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2020-08-26 17:10:48 +00:00
Vihang Karajgaonkar
21c50f8dbb IMPALA-4364: [Addendum] Compare specific fields in StorageDescriptor
The query option REFRESH_UPDATED_HMS_PARTITIONS was introduced
earlier in IMPALA-4364 to detect changes in the partition
objects in HMS when a refresh table command is issued. Originally,
it relied on using the StorageDescriptor#equals() method to
determine if the Partition in catalogd is same as partition
in HMS with while executing the refresh statement.

However, using StorageDescriptor#equals() is dependent on HMS
version and may introduce inconsistent behaviors after upgrades.
For example, when we backported the original patch to older
distribution which uses Hive-2, the SkewedInfo field of
StorageDescriptor is not null. This field causes the comparison
logic to fail, since catalogd doesn't store the SkewedInfo
field in the cached StorageDescriptor to optimize memory usage.

This patch modifies the comparison logic to use explicit
implementation in HdfsPartition class which compares only
some fields which are cached in the HdfsPartition object.

Testing:
1. Added a new test for the comparison method.
2. Modified existing test for the query option.

Change-Id: I90c797060265f8f508d0b150e15da3d0f9961b9b
Reviewed-on: http://gerrit.cloudera.org:8080/16363
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Vihang Karajgaonkar <vihang@cloudera.com>
2020-08-26 17:08:43 +00:00
Gabor Kaszab
28d94851b1 IMPALA-10020: Implement ds_kll_cdf_as_string() function
This is the support for Cumulative Distribution Function (CDF) from
Apache DataSketches KLL algorithm collection. It receives a serialized
KLL sketch and one or more float values to represent ranges in the
sketched values.
E.g. [1, 5, 10] will mean the following ranges:
(-inf, 1), (-inf, 5), (-inf, 10), (-inf, +inf)
Returns a comma separated string where each value in the string is a
number in the range of [0,1] and shows that what percentage of the
data is in the particular ranges.

Note, ds_kll_cdf() should return an Array of doubles as the result but
with that we have to wait for the complex type support. Until, we
provide ds_kll_cdf_as_string() that can be deprecated once we
have array support. Tracking Jira for returning complex types from
functions is IMPALA-9520.

Example:
select ds_kll_cdf_as_string(ds_kll_sketch(float_col), 2, 4, 10)
from alltypes;
+----------------------------------------------------------+
| ds_kll_cdf_as_string(ds_kll_sketch(float_col), 2, 4, 10) |
+----------------------------------------------------------+
| 0.2,0.401644,1,1                                         |
+----------------------------------------------------------+

Change-Id: I77e6afc4556ad05a295b89f6d06c2e4a6bb2cf82
Reviewed-on: http://gerrit.cloudera.org:8080/16359
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-08-26 10:59:49 +00:00
Gabor Kaszab
a8a35edbc4 IMPALA-10019: Implement ds_kll_pmf_as_string() function
This is the support for Probabilistic Mass Function (PMF) from Apache
DataSketches KLL algorithm collection. It receives a serialized KLL
sketch and one or more float values to represent ranges in the
sketched values.
E.g. [1, 5, 10] will mean the following ranges:
(-inf, 1), [1, 5), [5, 10), [10, +inf)
Returns a comma separated string where each value in the string is a
number in the range of [0,1] and shows that what percentage of the
data is in the particular ranges.

Note, ds_kll_pmf() should return an Array of doubles as the result but
with that we have to wait for the complex type support. Until, we
provide ds_kll_pmf_as_string() that can be deprecated once we
have array support. Tracking Jira for returning complex types from
functions is IMPALA-9520.

Example:
select ds_kll_pmf_as_string(ds_kll_sketch(float_col), 2, 4, 10)
from alltypes;
+----------------------------------------------------------+
| ds_kll_pmf_as_string(ds_kll_sketch(float_col), 2, 4, 10) |
+----------------------------------------------------------+
| 0.202192,0.199452,0.598356,0                             |
+----------------------------------------------------------+

Change-Id: I222402f2dce2f49ab2b3f6e81a709da5539293ba
Reviewed-on: http://gerrit.cloudera.org:8080/16336
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-08-26 01:50:16 +00:00
Tim Armstrong
e133d1838a IMPALA-7782: fix constant NOT IN subqueries that can return 0 rows
The bug was the the statement rewriter converted NOT IN <subquery>
predicates to != <subquery> predicates when the subquery could
be an empty set. This was invalid, because NOT IN (<empty set>)
is true, but != (<empty set>) is false.

Testing:
Added targeted planner and end-to-end tests.

Ran exhaustive tests.

Change-Id: I66c726f0f66ce2f609e6ba44057191f5929a67fc
Reviewed-on: http://gerrit.cloudera.org:8080/16338
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-08-25 23:08:40 +00:00
Eugene Zimichev
adf2c464ae IMPALA-8547: get_json_object fails to get value for numeric key
Allows numeric keys for JSON objects in get_json_object. This patch
makes Impala consistent with Hive and Postgres behavior for
get_json_object.

Queries such as "select get_json_object('{"1": 5}', '$.1');"
would fail before this patch. Now the query will return '5'.

Testing:
* Added tests to expr-test

Change-Id: I7df037ccf2c79da0ba86a46df1dd28ab0e9a45f4
Reviewed-on: http://gerrit.cloudera.org:8080/14905
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-08-25 22:29:45 +00:00
Tim Armstrong
b46ea7664c IMPALA-10103: upgrade jquery to 3.5.1
Testing:
Manually clicked through most of the web UI pages
and interacted with data tables, etc.

Change-Id: Icf0445163a6bf15c56de0c6ca10798e09e0a4fcb
Reviewed-on: http://gerrit.cloudera.org:8080/16355
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-08-25 21:22:15 +00:00
Gabor Kaszab
41065845e9 IMPALA-9962: Implement ds_kll_quantiles_as_string() function
This function is very similar to ds_kll_quantile() but this one can
receive any number of rank parameters and returns a comma separated
string that holds the results for all of the given ranks.
For more details about ds_kll_quantile() see IMPALA-9959.

Note, ds_kll_quantiles() should return an Array of floats as the result
but with that we have to wait for the complex type support. Until, we
provide ds_kll_quantiles_as_string() that can be deprecated once we
have array support. Tracking Jira for returning complex types from
functions is IMPALA-9520.

Change-Id: I76f6039977f4e14ded89a3ee4bc4e6ff855f5e7f
Reviewed-on: http://gerrit.cloudera.org:8080/16324
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-08-25 18:06:22 +00:00
stiga-huang
e0a6e942b2 IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
The minimum requirement for a spillable operator is ((min_buffers -2) *
default_buffer_size) + 2 * max_row_size. In the min reservation, we only
reserve space for two large pages, one for reading, the other for
writing. However, to make the non-streaming GroupingAggregator work
correctly, we have to manage these extra reservations carefully. So it
won't run out of the min reservation when it actually needs to spill a
large page, or when it actually needs to read a large page.

To be specific, for how to manage the large write page reservation,
depending on whether needs_serialize is true or false:
- If the aggregator needs to serialize the intermediate results when
  spilling a partition, we have to save a large page worth of
  reservation for the serialize stream, in case it needs to write large
  rows. This space can be restored when all the partitions are spilled
  so the serialize stream is not needed until we build/repartition a
  spilled partition and thus have pinned partitions again. If the large
  write page reservation is used, we save it back whenever possible
  after we spill or close a partition.
- If the aggregator doesn't need the serialize stream at all, we can
  restore the large write page reservation whenever we fail to add a
  large row, before spilling any partitions. Reclaim it whenever
  possible after we spill or close a partition.
A special case is when we are processing a large row and it's the last
row in building/repartitioning a spilled partition, the large write page
reservation can be restored for it no matter whether we need the
serialize stream. Because partitions will be read out after this so no
needs for spilling.

For the large read page reservation, it's transferred to the spilled
BufferedTupleStream that we are reading in building/repartitioning a
spilled partition. The stream will restore some of it when reading a
large page, and reclaim it when the output row batch is reset. Note that
the stream is read in attach_on_read mode, the large page will be
attached to the row batch's buffers and only get freed when the row
batch is reset.

Tests:
- Add tests in test_spilling_large_rows (test_spilling.py) with
  different row sizes to reproduce the issue.
- One test in test_spilling_no_debug_action becomes flaky after this
  patch. Revise the query to make the udf allocate larger strings so it
  can consistently pass.
- Run CORE tests.

Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Reviewed-on: http://gerrit.cloudera.org:8080/16240
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2020-08-25 16:11:20 +00:00
Qifan Chen
2ebf554dfd IMPALA-7779 Parquet Scanner can write binary data into profile
This fix addresses the current limitation in that an ill-formatted
Parquet version string is not properly formatted before appearing
in an error message or impalad.INFO. With the fix, any such string is
converted to a hex string first. The hex string is a sequence of
four hex digit groups separated by spaces and each group is one or
two hex digits, such as "6c 65 2e a".

Testing:
 Ran "core" tests successfully.

Change-Id: I281d6fa7cb2f88f04588110943e3e768678b9cf1
Reviewed-on: http://gerrit.cloudera.org:8080/16331
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Sahil Takiar <stakiar@cloudera.com>
2020-08-25 15:42:01 +00:00
zhaorenhai
6390e7e1da IMPALA-9544 Replace Intel's SSE instructions with ARM's NEON instructions
Replace Intel's SSE instructions with ARM's NEON instructions
Replace Intel's crc32 instructions with ARM's instructions
Replace Intel's popcntq instruction with ARM's mechanism
Replace Intel's pcmpestri and pcmpestrm instructions
with ARM mechanism

Change-Id: Id7dfe17125b2910ece54e7dd18b4e4b25d7de8b9
Reviewed-on: http://gerrit.cloudera.org:8080/15531
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2020-08-24 16:49:15 +00:00
Shant Hovsepian
3f1b1476af IMPALA-10034: Add remaining TPC-DS queries to workload.
Include remaining TPC-DS queries to the testdata workload definition.

Q8 and Q38 were using non standard variants, those have been
replaced by the official query versions. Q35 is using an official
variant. Had to escape a table alias in Q90 as we treat 'AT' as a
reserved keyword.

Change-Id: Id5436689390f149694f14e6da1df624de4f5f7ad
Reviewed-on: http://gerrit.cloudera.org:8080/16280
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2020-08-24 16:02:45 +00:00
Tim Armstrong
d65cb05bb8 IMPALA-7714: try to avoid be test crash in statestore
We didn't get to a clear root cause for this, so I'm going
to try two things.

First, under the theory that the problem is somehow the
destruction of the strings, convert them to char char*
which does not require destruction on process teardown.

Second, add some logging if the map lookup fails so
we can better understand what may have happened.

Change-Id: Id4363a93addb8a808d292906cac44ebd25c16889
Reviewed-on: http://gerrit.cloudera.org:8080/16341
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-08-19 22:28:35 +00:00
Vihang Karajgaonkar
cd52932125 IMPALA-4364: Query option to refresh updated HMS partitions
This patch introduces a new boolean query option
REFRESH_UPDATED_HMS_PARTITIONS. When this query option is set
the refresh table command reloads the partitions which have been
modified in HMS in addition to adding [removing] the new [removed]
partitions.

In order to do this the refresh table command needs to fetch all
the partitions instead of the just the partition names which can
cause the performance of refresh table to degrade when the query
option is set. However for certain use-cases currently there is
no way to detect changed partitions using refresh table command.
For instance, if certain partition locations have been changed,
a refresh table will not update those partitions.

Testing:
1. Added a new test which sets the query option and makes sure
that the updated partitions from hive are reloaded after refresh
table command.
2. Ran exhaustive tests with the patch.

Change-Id: I50e8680509f4eb0712e7bb3de44df5f2952179af
Reviewed-on: http://gerrit.cloudera.org:8080/16308
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-08-15 02:01:05 +00:00
wzhou-code
e0681615c2 IMPALA-10039 (part 2): Fixed Expr-test crash due to race condition
The root cause for crash is that QueryState::Cancel() was called
before thread unsafe function QueryState::Init() was completed.
This patch fixs the race condition between QueryState::Cancel()
and QueryState::Init(). QueryState::Init() is safe to be called
at any time.

Testing:
 - The issue could be reproduced by running expr-test for 10-20
   iterations. Verified the fixing by running expr-test over 1000
   iterations without crash.
 - Passed TestProcessFailures::test_kill_coordinator.
 - Passed core tests.

Change-Id: Ib0d3b9c59924a25b70fa20afeb6e8ca93016eca9
Reviewed-on: http://gerrit.cloudera.org:8080/16313
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
2020-08-14 22:32:55 +00:00