Commit Graph

1428 Commits

Author SHA1 Message Date
liuyao
18acca92ee IMPALA-10435: Extend 'compute incremental stats' syntax
to support a list of columns

Modified parser to support compute incremental stats
columns.No need to modify the code of other modules
because it already supports

Change-Id: I4dcc2d4458679c39581446f6d87bb7903803f09b
Reviewed-on: http://gerrit.cloudera.org:8080/16947
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2021-01-21 19:35:26 +00:00
Zoltan Borok-Nagy
90f3b2f491 IMPALA-10432: INSERT INTO Iceberg tables with partition transforms
INSERT INTO Iceberg tables that use partition transforms. Partition
transforms are functions that calculate partition data from row data.

There are the following partition transforms in Iceberg:
https://iceberg.apache.org/spec/#partition-transforms

 * IDENTITY
 * BUCKET
 * TRUNCATE
 * YEAR
 * MONTH
 * DAY
 * HOUR

INSERT INTO identity-partitioned Iceberg tables are already supported.
This patch adds support for the rest of the transforms.

We create the partitioning expressions in InsertStmt. Based on these
expressions data are automatically shuffled and sorted by the backend
executors before rows are given to the table sink operators. The table
sink operator writes the partitions one-by-one and creates a
human-readable partition path for them.

In the end, we will convert the partition path to partition data and
create Iceberg DataFiles with information about the files written.

Testing:
 * added planner test
 * added e2e tests

Change-Id: I3edf02048cea78703837b248c55219c22d512b78
Reviewed-on: http://gerrit.cloudera.org:8080/16939
Reviewed-by: wangsheng <skyyws@163.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-01-18 18:46:42 +00:00
stiga-huang
9bb7157bf0 IMPALA-10387: Add missing overloads of mask functions used in Ranger default masking policies
The mask functions in Hive are implemented through GenericUDFs which can
accept an infinite number of function signatures. Impala currently don't
support GenericUDFs. So we provide builtin mask functions with limited
overloads.

This patch adds some missing overloads that could be used by Ranger
default masking policies, e.g. MASK_HASH, MASK_SHOW_LAST_4,
MASK_DATE_SHOW_YEAR, etc.

Tests:
 - Add test coverage on all default masking policies applied on all
   supported types.

Change-Id: Icf3e70fd7aa9f3b6d6b508b776696e61ec1fcc2e
Reviewed-on: http://gerrit.cloudera.org:8080/16930
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-01-15 13:01:53 +00:00
Zoltan Borok-Nagy
696dafed66 IMPALA-10426: Fix crash when inserting invalid timestamps
Insertion of invalid timestamps causes Impala to crash when it uses
the INT64 Parquet timestamp types.

This patch fixes the error by checking for null values in
Int64TimestampColumnWriterBase::ConvertValue().

Testing:
 * added e2e tests

Change-Id: I74fb754580663c99e1d8c3b73f8d62ea3305ac93
Reviewed-on: http://gerrit.cloudera.org:8080/16951
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-01-14 19:34:38 +00:00
skyyws
1093a563e6 IMPALA-10368: Support required/optional property when creating Iceberg table
We supported create required/optional field for Iceberg table in this
patch. If we set 'NOT NULL' property for Iceberg table column in SQL,
Impala will create required field by Iceberg api, 'NULL' or default
will create optional field.
Besides, 'DESCRIBE XXX' for Iceberg table will display 'optional'
property like this:
+------+--------+---------+----------+
| name | type   | comment | nullable |
+------+--------+---------+----------+
| id   | int    |         | false    |
| name | string |         | true     |
| age  | int    |         | true     |
+------+--------+---------+----------+
And 'SHOW CREATE TABLE XXX' will also display 'NULL'/'NOT NULL'
property for Iceberg table.

Tests:
 * added new test in iceberg-create.test
 * added new test in iceberg-negative.test
 * added new test in show-create-table.test
 * modify 'DESCRIBE XXX' result in iceberg-create.test
 * modify 'DESCRIBE XXX' result in iceberg-alter.test
 * modify create table result in show-create-table.test

Change-Id: I70b8014ba99f43df1b05149ff7a15cf06b6cd8d3
Reviewed-on: http://gerrit.cloudera.org:8080/16904
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-01-11 17:08:21 +00:00
stiga-huang
e7839c4530 IMPALA-10416: Add raw string mode for testfiles to verify non-ascii results
Currently, the result section of the testfile is required to used
escaped strings. Take the following result section as an example:
  --- RESULTS
  'Alice\nBob'
  'Alice\\nBob'
The first line is a string with a newline character. The second line is
a string with a '\' and an 'n' character. When comparing with the actual
query results, we need to escape the special characters in the actual
results, e.g. replace newline characters with '\n'. This is done by
invoking encode('unicode_escape') on the actual result strings. However,
the input type of this method is unicode instead of str. When calling it
on str vars, Python will implicitly convert the input vars to unicode
type. The default encoding, ascii, is used. This causes
UnicodeDecodeError when the str contains non-ascii bytes. To fix this,
this patch explicitly decodes the input str using 'utf-8' encoding.

After fixing the logic of escaping the actual result strings, the next
problem is that it's painful to write unicode-escaped expected results.
Here is an example:
  ---- QUERY
  select "你好\n你好"
  ---- RESULTS
  '\u4f60\u597d\n\u4f60\u597d'
  ---- TYPES
  STRING
It's painful to manually translate the unicode characters.

This patch adds a new comment, RAW_STRING, for the result section to use
raw strings instead of unicode-escaped strings. Here is an example:
  ---- QUERY
  select "你好"
  ---- RESULTS: RAW_STRING
  '你好'
  ---- TYPES
  STRING
If the result contains special characters, it's recommended to use the
default string mode. If the special characters only contain newline
characters, we can use RAW_STRING and the existing MULTI_LINE comment
together.

This patch also fixes the issue that pytest fails to report assertion
failures if any of the compared str values contain non-ascii bytes
(IMPALA-10419). However, pytest works if the compared values are both
in unicode type. So we explicitly converting the actual and expected str
values to unicode type.

Test:
 - Add tests in special-strings.test for raw string mode and the escaped
   string mode (default).
 - Run test_exprs.py::TestExprs::test_special_strings locally.

Change-Id: I7cc2ea3e5849bd3d973f0cb91322633bcc0ffa4b
Reviewed-on: http://gerrit.cloudera.org:8080/16919
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-01-06 04:39:56 +00:00
Tim Armstrong
1d5fe2771f IMPALA-6434: Add support to decode RLE_DICTIONARY encoded pages
The encoding is identical to the already-supported PLAIN_DICTIONARY
encoding but the PLAIN enum value is used for the dictionary pages
and the RLE_DICTIONARY enum value is used for the data pages.

A hidden option -write_new_parquet_dictionary_encodings is
added to turn on writing too, for test purposes only.

Testing:
* Added an automated test using a pregenerated test file.
* Ran core tests.
* Manually tested by writing out TPC-H lineitem with the new encoding
  and reading back in Impala and Hive.

Parquet-tools output for the generated test file:
$ hadoop jar ~/repos/parquet-mr/parquet-tools/target/parquet-tools-1.12.0-SNAPSHOT.jar meta /test-warehouse/att/824de2afebad009f-6f460ade00000003_643159826_data.0.parq
20/12/21 20:28:36 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5
20/12/21 20:28:36 INFO hadoop.ParquetFileReader: reading another 1 footers
20/12/21 20:28:36 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5
file:            hdfs://localhost:20500/test-warehouse/att/824de2afebad009f-6f460ade00000003_643159826_data.0.parq
creator:         impala version 4.0.0-SNAPSHOT (build 7b691c5d4249f0cb1ced8ddf01033fbbe10511d9)

file schema:     schema
--------------------------------------------------------------------------------
id:              OPTIONAL INT32 L:INTEGER(32,true) R:0 D:1
bool_col:        OPTIONAL BOOLEAN R:0 D:1
tinyint_col:     OPTIONAL INT32 L:INTEGER(8,true) R:0 D:1
smallint_col:    OPTIONAL INT32 L:INTEGER(16,true) R:0 D:1
int_col:         OPTIONAL INT32 L:INTEGER(32,true) R:0 D:1
bigint_col:      OPTIONAL INT64 L:INTEGER(64,true) R:0 D:1
float_col:       OPTIONAL FLOAT R:0 D:1
double_col:      OPTIONAL DOUBLE R:0 D:1
date_string_col: OPTIONAL BINARY R:0 D:1
string_col:      OPTIONAL BINARY R:0 D:1
timestamp_col:   OPTIONAL INT96 R:0 D:1
year:            OPTIONAL INT32 L:INTEGER(32,true) R:0 D:1
month:           OPTIONAL INT32 L:INTEGER(32,true) R:0 D:1

row group 1:     RC:8 TS:754 OFFSET:4
--------------------------------------------------------------------------------
id:               INT32 SNAPPY DO:4 FPO:48 SZ:74/73/0.99 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0, max: 7, num_nulls: 0]
bool_col:         BOOLEAN SNAPPY DO:0 FPO:141 SZ:26/24/0.92 VC:8 ENC:RLE,PLAIN ST:[min: false, max: true, num_nulls: 0]
tinyint_col:      INT32 SNAPPY DO:220 FPO:243 SZ:51/47/0.92 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0, max: 1, num_nulls: 0]
smallint_col:     INT32 SNAPPY DO:343 FPO:366 SZ:51/47/0.92 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0, max: 1, num_nulls: 0]
int_col:          INT32 SNAPPY DO:467 FPO:490 SZ:51/47/0.92 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0, max: 1, num_nulls: 0]
bigint_col:       INT64 SNAPPY DO:586 FPO:617 SZ:59/55/0.93 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0, max: 10, num_nulls: 0]
float_col:        FLOAT SNAPPY DO:724 FPO:747 SZ:51/47/0.92 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: -0.0, max: 1.1, num_nulls: 0]
double_col:       DOUBLE SNAPPY DO:845 FPO:876 SZ:59/55/0.93 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: -0.0, max: 10.1, num_nulls: 0]
date_string_col:  BINARY SNAPPY DO:983 FPO:1028 SZ:74/88/1.19 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0x30312F30312F3039, max: 0x30342F30312F3039, num_nulls: 0]
string_col:       BINARY SNAPPY DO:1143 FPO:1168 SZ:53/49/0.92 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0x30, max: 0x31, num_nulls: 0]
timestamp_col:    INT96 SNAPPY DO:1261 FPO:1329 SZ:98/138/1.41 VC:8 ENC:RLE,RLE_DICTIONARY ST:[num_nulls: 0, min/max not defined]
year:             INT32 SNAPPY DO:1451 FPO:1470 SZ:47/43/0.91 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 2009, max: 2009, num_nulls: 0]
month:            INT32 SNAPPY DO:1563 FPO:1594 SZ:60/56/0.93 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 1, max: 4, num_nulls: 0]

Parquet-tools output for one of the lineitem files:
$ hadoop jar ~/repos/parquet-mr/parquet-tools/target/parquet-tools-1.12.0-SNAPSHOT.jar meta /test-warehouse/li2/4b4d9143c575dd71-3f69d3cf00000001_1879643220_data.0.parq
20/12/22 09:39:56 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5
20/12/22 09:39:56 INFO hadoop.ParquetFileReader: reading another 1 footers
20/12/22 09:39:56 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5
file:            hdfs://localhost:20500/test-warehouse/li2/4b4d9143c575dd71-3f69d3cf00000001_1879643220_data.0.parq
creator:         impala version 4.0.0-SNAPSHOT (build 7b691c5d4249f0cb1ced8ddf01033fbbe10511d9)

file schema:     schema
--------------------------------------------------------------------------------
l_orderkey:      OPTIONAL INT64 L:INTEGER(64,true) R:0 D:1
l_partkey:       OPTIONAL INT64 L:INTEGER(64,true) R:0 D:1
l_suppkey:       OPTIONAL INT64 L:INTEGER(64,true) R:0 D:1
l_linenumber:    OPTIONAL INT32 L:INTEGER(32,true) R:0 D:1
l_quantity:      OPTIONAL FIXED_LEN_BYTE_ARRAY L:DECIMAL(12,2) R:0 D:1
l_extendedprice: OPTIONAL FIXED_LEN_BYTE_ARRAY L:DECIMAL(12,2) R:0 D:1
l_discount:      OPTIONAL FIXED_LEN_BYTE_ARRAY L:DECIMAL(12,2) R:0 D:1
l_tax:           OPTIONAL FIXED_LEN_BYTE_ARRAY L:DECIMAL(12,2) R:0 D:1
l_returnflag:    OPTIONAL BINARY R:0 D:1
l_linestatus:    OPTIONAL BINARY R:0 D:1
l_shipdate:      OPTIONAL BINARY R:0 D:1
l_commitdate:    OPTIONAL BINARY R:0 D:1
l_receiptdate:   OPTIONAL BINARY R:0 D:1
l_shipinstruct:  OPTIONAL BINARY R:0 D:1
l_shipmode:      OPTIONAL BINARY R:0 D:1
l_comment:       OPTIONAL BINARY R:0 D:1

row group 1:     RC:1724693 TS:58432195 OFFSET:4
--------------------------------------------------------------------------------
l_orderkey:       INT64 SNAPPY DO:4 FPO:159797 SZ:2839537/13147604/4.63 VC:1724693 ENC:RLE,RLE_DICTIONARY,PLAIN ST:[min: 2142211, max: 6000000, num_nulls: 0]
l_partkey:        INT64 SNAPPY DO:2839640 FPO:3028619 SZ:8179566/13852808/1.69 VC:1724693 ENC:RLE,RLE_DICTIONARY,PLAIN ST:[min: 1, max: 200000, num_nulls: 0]
l_suppkey:        INT64 SNAPPY DO:11019308 FPO:11059413 SZ:3063563/3103196/1.01 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 1, max: 10000, num_nulls: 0]
l_linenumber:     INT32 SNAPPY DO:14082964 FPO:14083007 SZ:412884/650550/1.58 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 1, max: 7, num_nulls: 0]
l_quantity:       FIXED_LEN_BYTE_ARRAY SNAPPY DO:14495934 FPO:14496204 SZ:1298038/1297963/1.00 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 1.00, max: 50.00, num_nulls: 0]
l_extendedprice:  FIXED_LEN_BYTE_ARRAY SNAPPY DO:15794062 FPO:16003224 SZ:9087746/10429259/1.15 VC:1724693 ENC:RLE,RLE_DICTIONARY,PLAIN ST:[min: 904.00, max: 104949.50, num_nulls: 0]
l_discount:       FIXED_LEN_BYTE_ARRAY SNAPPY DO:24881912 FPO:24881976 SZ:866406/866338/1.00 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 0.00, max: 0.10, num_nulls: 0]
l_tax:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:25748406 FPO:25748463 SZ:866399/866325/1.00 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 0.00, max: 0.08, num_nulls: 0]
l_returnflag:     BINARY SNAPPY DO:26614888 FPO:26614918 SZ:421113/421069/1.00 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 0x41, max: 0x52, num_nulls: 0]
l_linestatus:     BINARY SNAPPY DO:27036081 FPO:27036106 SZ:262209/270332/1.03 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 0x46, max: 0x4F, num_nulls: 0]
l_shipdate:       BINARY SNAPPY DO:27298370 FPO:27309301 SZ:2602937/2627148/1.01 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 0x313939322D30312D3032, max: 0x313939382D31322D3031, num_nulls: 0]
l_commitdate:     BINARY SNAPPY DO:29901405 FPO:29912079 SZ:2602680/2626308/1.01 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 0x313939322D30312D3331, max: 0x313939382D31302D3331, num_nulls: 0]
l_receiptdate:    BINARY SNAPPY DO:32504185 FPO:32515219 SZ:2603040/2627498/1.01 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 0x313939322D30312D3036, max: 0x313939382D31322D3330, num_nulls: 0]
l_shipinstruct:   BINARY SNAPPY DO:35107326 FPO:35107408 SZ:434968/434917/1.00 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 0x434F4C4C45435420434F44, max: 0x54414B45204241434B2052455455524E, num_nulls: 0]
l_shipmode:       BINARY SNAPPY DO:35542401 FPO:35542471 SZ:650639/650580/1.00 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 0x414952, max: 0x545255434B, num_nulls: 0]
l_comment:        BINARY SNAPPY DO:36193124 FPO:36711343 SZ:22240470/52696671/2.37 VC:1724693 ENC:RLE,RLE_DICTIONARY,PLAIN ST:[min: 0x20546972657369617320, max: 0x7A7A6C653F20626C697468656C792069726F6E69, num_nulls: 0]

Change-Id: I90942022edcd5d96c720a1bde53879e50394660a
Reviewed-on: http://gerrit.cloudera.org:8080/16893
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-01-05 23:30:35 +00:00
Aman Sinha
49680559b0 IMPALA-10182: Don't add inferred identity predicates to SELECT node
For an inferred equality predicates of type c1 = c2 if both sides
are referring to the same underlying tuple and slot, it is an identity
predicate which should not be evaluated by the SELECT node since it
will incorrectly eliminate NULL rows. This patch fixes the behavior.

Testing:
 - Added planner tests with base table and with outer join
 - Added runtime tests with base table and with outer join
 - Added planner test for IMPALA-9694 (same root cause)
 - Ran PlannerTest .. no other plans changed

Change-Id: I924044f582652dbc50085851cc639f3dee1cd1f4
Reviewed-on: http://gerrit.cloudera.org:8080/16917
Reviewed-by: Aman Sinha <amsinha@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-01-05 23:04:25 +00:00
Zoltan Borok-Nagy
03af0b2c8c IMPALA-10422: EXPLAIN statements leak ACID transactions and locks
Currently EXPLAIN statements might open ACID transactions and
create locks on ACID tables.

This is not necessary since we won't modify the table. But the
real problem is that these transactions and locks are leaked and
open forever. They are even getting heartbeated while the
coordinator is still running.

The solution is to not consume any ACID resources for EXPLAIN
statements.

Testing:
* Added EXPLAIN INSERT OVERWRITE in front of an actual INSERT OVERWRITE
  in an e2e test

Change-Id: I05113b1fd9a3eb2d0dd6cf723df916457f3fbf39
Reviewed-on: http://gerrit.cloudera.org:8080/16923
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-01-05 21:31:05 +00:00
Fucun Chu
4099a60689 IMPALA-10317: Add query option that limits huge joins at runtime
This patch adds support for limiting the rows produced by a join node
such that runaway join queries can be prevented.

The limit is specified by a query option. Queries exceeding that limit
get terminated. The checking runs periodically, so the actual rows
produced may go somewhat over the limit.

JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option.

Rows produced Query profile is updated to include query wide and per
backend metrics for RowsReturned. Example from "
set JOIN_ROWS_PRODUCED_LIMIT = 10000000;
select count(*) from tpch_parquet.lineitem l1 cross join
(select * from tpch_parquet.lineitem l2 limit 5) l3;":

NESTED_LOOP_JOIN_NODE (id=2):
   - InactiveTotalTime: 107.534ms
   - PeakMemoryUsage: 16.00 KB (16384)
   - ProbeRows: 1.02K (1024)
   - ProbeTime: 0.000ns
   - RowsReturned: 10.00M (10002025)
   - RowsReturnedRate: 749.58 K/sec
   - TotalTime: 13s337ms

Testing:
 Added tests for JOIN_ROWS_PRODUCED_LIMIT

Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Reviewed-on: http://gerrit.cloudera.org:8080/16706
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-12-22 06:10:39 +00:00
Fang-Yu Rao
1b863132c6 IMPALA-10211 (Part 1): Add support for role-related statements
This patch adds the support for the following role-related statements.
1. CREATE ROLE <role_name>.
2. DROP ROLE <role_name>.
3. GRANT ROLE <role_name> TO GROUP <group_name>.
4. REVOKE ROLE <role_name> FROM GROUP <group_name>.
5. GRANT <privilege> ON <resource> TO ROLE <role_name>.
6. REVOKE <privilege> ON <resource> FROM ROLE <role_name>.
7. SHOW GRANT ROLE <role_name> ON <resource>.
8. SHOW ROLES.
9. SHOW CURRENT ROLES.
10. SHOW ROLE GRANT GROUP <group_name>.

To support the first 4 statements, we implemented the methods of
createRole()/dropRole(), and grantRoleToGroup()/revokeRoleFromGroup()
with their respective API calls provided by Ranger. To support the 5th
and 6th statements, we modified createGrantRevokeRequest() so that the
cases in which the grantee or revokee is a role could be processed. We
slightly extended getPrivileges() so as to include the case when the
principal is a role for the 7th statement. For the last 3 statements, to
make Impala's behavior consistent with that when Sentry was the
authorization provider, we based our implementation on
SentryImpaladAuthorizationManager#getRoles() at
https://gerrit.cloudera.org/c/15833/8/fe/src/main/java/org/apache/impala/authorization/sentry/SentryImpaladAuthorizationManager.java,
which was removed in IMPALA-9708 when we dropped the support for Sentry.

To test the implemented functionalities, we based our test cases on
those at
https://gerrit.cloudera.org/c/15833/8/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test.
We note that before our tests could be automatically run in a
Kerberized environment (IMPALA-9360), in order to run the statements of
CREATE/DROP ROLE <role_name>,
GRANT/REVOKE ROLE <role_name> TO/FROM GROUP <group_name>, and
SHOW ROLES, we revised security-applicationContext.xml, one of the files
needed when the Ranger server is started, so that the corresponding API
calls could be performed in a non-Kerberized environment.

During the process of adding test cases to grant_revoke.test, we found
the following differences in Impala's behavior between the case when
Ranger is the authorization provider and that when Sentry is the
authorization provider. Specifically, we have the following two major
differences.
1. Before dropping a role in Ranger, we have to remove all the
privileges granted to the role in advance, which is not the case when
Sentry is the authorization provider.
2. The resource has to be specified for the statement of
SHOW GRANT ROLE <role_name> ON <resource>, which is different when
Sentry is the authorization provider. This could be partly due to the
fact that there is no API provided by Ranger that allows Impala to
directly retrieve the list of all privileges granted to a specified
role.
Due to the differences in Impala's behavior described above, we had to
revise the test cases in grant_revoke.test accordingly.

On the other hand, to include as many test cases that were in the
original grant_revoke.test as possible, we had to explicitly add the
test section of 'USER' to specify the connecting user to Impala for some
queries that require the connecting user to be a Ranger administrator,
e.g., CREATE/DROP ROLE <role_name> and
GRANT/REVOKE <role_name> TO/FROM GROUP <group_name>. The user has to be
'admin' in the current grant_revoke.test, whereas it could be the
default user 'getuser()' in the original grant_revoke.test because
previously 'getuser()' was also a Sentry administrator.

Moreover, for some test cases, we had to explicitly alter the owner of a
resource in the original grant_revoke.test when we would like to prevent
the original owner of the resource, e.g., the creator of the resource,
from accessing the resource since the original grant_revoke.test was run
without object ownership being taken into consideration.

We also note that in this patch we added the decorator of
@pytest.mark.execute_serially to each test in test_ranger.py since we
have observed that in some cases, e.g., if we are only running the E2E
tests in the Jenkins environment, some tests do not seem to be executed
sequentially.

Testing:
 - Briefly verified that the implemented statements work as expected in
   a Kerberized cluster.
 - Verified that test_ranger.py passes in a local development
   environment.
 - Verified that the patch passes the exhaustive tests in the DEBUG
   build.

Change-Id: Ic2b204e62a1d8ae1932d955b4efc28be22202860
Reviewed-on: http://gerrit.cloudera.org:8080/16837
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-12-21 14:29:52 +00:00
Zoltan Borok-Nagy
296ed74d6f IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only
This patch adds support to INSERT INTO identity-partitioned
Iceberg tables.

Identity-partitioned Iceberg tables are similar to regular
partitioned tables, they are even stored in the same directory
structure. The difference is that the data files still store
the partitioning columns.

The INSERT INTO syntax is similar to the syntax for non-partitioned
tables, i.e.:

INSERT INTO <iceberg_tbl> VALUES (<val1>, <val2>, <val3>, ...);
Or,
INSERT INTO <iceberg_tbl> SELECT <val1>, <val2>, ... FROM <source_tbl>
(please note that we don't use the PARTITION keyword)

The values must be in column order corresponding to the table schema.
Impala will automatically create/find the partitions based on the
Iceberg partition spec.

Partitioned Iceberg tables are stored as non-partitioned tables
in the Hive Metastore (similarly to partitioned Kudu tables). However,
the InsertStmt still generates the partition expressions for them.
These partition expressions are used to shuffle and sort the input
data so we don't end up writing too many files. The HdfsTableSink
also uses the partition expressions to write the data files with
the proper partition paths.

Iceberg is able to parse the partition paths to generate the
corresponding metadata for the partitions. This happens at the
end in IcebergCatalogOpExecutor.

Testing:
 * added planner test to verify shuffling and sorting
 * added negative tests for unsupported features like PARTITION clause
   and non-identity partition transforms
 * e2e tests with partitioned inserts

Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4
Reviewed-on: http://gerrit.cloudera.org:8080/16825
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-12-17 08:54:51 +00:00
Zoltan Borok-Nagy
a7e71b4523 IMPALA-10358: Correct Iceberg type mappings
The Iceberg format spec defines what types to use for different file
formats, e.g.: https://iceberg.apache.org/spec/#parquet

Impala should follow the specification, so this patch
 * annotates strings with UTF8 in Parquet metadata
 * removes fixed(L) <-> CHAR(L) mapping
 * forbids INSERTs when the Iceberg schema has a TIMESTAMPTZ column

This patch also refactors the type/schema conversions as
Impala => Iceberg conversions were duplicated in
IcebergCatalogOpExecutor and IcebergUtil. I introduced the class
'IcebergSchemaConverter' to contain the code for conversions.

Testing:
 * added test to check CHAR and VARCHAR types are not allowed
 * test that INSERTs are not allowed when the table has TIMESTMAPTZ
 * added test to check that strings are annotated with UTF8

Change-Id: I652565f82708824f5cf7497139153b06f116ccd3
Reviewed-on: http://gerrit.cloudera.org:8080/16851
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-12-15 19:17:51 +00:00
Zoltan Borok-Nagy
87b95a5568 IMPALA-10386: Don't allow PARTITION BY SPEC for non-Iceberg tables
PARTITION BY SPEC is only valid for Iceberg tables so Impala should
raise an error when it is used for non-Iceberg tables.

Testing
 * added e2e test

Change-Id: I6b3ec3e84476614cb11e801b6d89d84eb384dd43
Reviewed-on: http://gerrit.cloudera.org:8080/16846
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-12-14 20:47:10 +00:00
Zoltan Borok-Nagy
eb8b118db5 IMPALA-10384: Make partition names consistent between BE and FE
In the BE we build partition names with the trailing char '/'. In the FE
we build partition names without a trailing char. We should make this
consistent because this causes some annoying string adjustments in
the FE and can cause hidden bugs.

This patch creates partition names without the trailing '/' both in
the BE and the FE. This follows Hive's behavior that also prints
partition names without the trailing '/'.

Testing:
 * Ran exhaustive tests

Change-Id: I7e40111e2d1148aeb01ebc985bbb15db7d6a6012
Reviewed-on: http://gerrit.cloudera.org:8080/16850
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-12-11 19:51:28 +00:00
skyyws
a850cd3cc6 IMPALA-10361: Use field id to resolve columns for Iceberg tables
We supported resolve column by field id for Iceberg table in this
patch. Currently, we use field id to resolve column for Iceberg
tables, which means 'PARQUET_FALLBACK_SCHEMA_RESOLUTION' is invalid
for Iceberg tables.

Change-Id: I057bdc6ab2859cc4d40de5ed428d0c20028b8435
Reviewed-on: http://gerrit.cloudera.org:8080/16788
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
2020-12-10 19:01:08 +00:00
Aman Sinha
b5ba793227 IMPALA-10360: Allow simple limit to be treated as sampling hint
As a follow-up to IMPALA-10314, it is sometimes useful to consider
a simple limit as a way to sample from a table if a relevant hint
has been provided. Doing a sample instead of pure limit serves
dual purposes: (a) it still helps with reducing the planning time
since the scan ranges need be computed only for the sample files,
(b) it allows sufficient number of files/rows to be read from
the table such that after applying filter conditions or joins with
another table, the query may still produce the N rows needed for
limit.

This fuctionality is especially useful if the query is against a
view. Note that TABLESAMPLE clause cannot be applied to a view and
embedding a TABLESAMPLE explicitly on a table within a view will
not work because we don't want to sample if there's no limit.

In this patch, a new table level hint, 'convert_limit_to_sample(n)'
is added. If this hint is attached to a table either in the main
query block or within a view/subquery and simple limit optimization
conditions are satisfied (according to IMPALA-10314), the limit
is converted to a table sample. The parameter 'n' in parenthesis is
required and specifies the sample percentage. It must be an integer
between 1 and 100. For example:

 set optimize_simple_limit = true;
 CREATE VIEW v1 as SELECT * FROM T [convert_limit_to_sample(5)]
    WHERE [always_true] <predicate>;
 SELECT * FROM v1 LIMIT 10;

In this case, the limit 10 is applied on top of a 5 percent sample
of T which is applied after partition pruning.

Testing:
 - Added a alltypes_date_partition_2 table where the date and
   timestamp values match (this helps with setting the
   'always_true' hint).
 - Added views with 'convert_limit_to_sample' and 'always_true'
   hints and added new tests against the views. Modified a few
   existing tests to reference the new table variant.
 - Added an end-to-end test.

Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b
Reviewed-on: http://gerrit.cloudera.org:8080/16792
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-12-10 07:15:36 +00:00
Tim Armstrong
f684ed72c5 IMPALA-10252: fix invalid runtime filters for outer joins
The planner generates runtime filters for non-join conjuncts
assigned to LEFT OUTER and FULL OUTER JOIN nodes. This is
correct in many cases where NULLs stemming from unmatched rows
would result in the predicate evaluating to false. E.g.
x = y is always false if y is NULL.

However, it is incorrect if the NULL returned from the unmatched
row can result in the predicate evaluating to true. E.g.
x = isnull(y, 1) can return true even if y is NULL.

The fix is to detect cases when the source expression from the
left input of the join returns non-NULL for null inputs and then
skip generating the filter.

Examples of expressions that may be affected by this change are
COALESCE and ISNULL.

Testing:
Added regression tests:
* Planner tests for LEFT OUTER and FULL OUTER where the runtime
  filter was incorrectly generated before this patch.
* Enabled end-to-end test that was previously failing.
* Added a new runtime filter test that will execute on both
  Parquet and Kudu (which are subtly different because of nullability of
  slots).

Ran exhaustive tests.

Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Reviewed-on: http://gerrit.cloudera.org:8080/16622
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-12-08 03:15:44 +00:00
Zoltan Borok-Nagy
579f5c67e0 IMPALA-10364: Set the real location for external Iceberg tables stored in HadoopCatalog
Impala tries to come up with the table location of external Iceberg
tables stored in HadoopCatalog. The current method is not correct for
tables that are nested under multiple namespaces.

With this patch Imapala loads the Iceberg table and retrieves the
location from it.

Testing:
 * added e2e test in iceberg-create.test

Change-Id: I04b75d219e095ce00b4c48f40b8dee872ba57b78
Reviewed-on: http://gerrit.cloudera.org:8080/16795
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-12-02 22:42:12 +00:00
Qifan Chen
63146103a7 IMPALA-9355: TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory limit
This patch reduces the memory limit for the following query in
test_exchange_mem_usage_scaling test from 170MB to 164MB
to reduce the chance of not detecting a memory allocation
failure.

set mem_limit=<limit_in_mb>
set num_scanner_threads=1;
select *
from tpch_parquet.lineitem l1
  join tpch_parquet.lineitem l2 on l1.l_orderkey = l2.l_orderkey and
      l1.l_partkey = l2.l_partkey and l1.l_suppkey = l2.l_suppkey
      and l1.l_linenumber = l2.l_linenumber
order by l1.l_orderkey desc, l1.l_partkey, l1.l_suppkey,
l1.l_linenumber limit 5;

In a test with 500 executions of the above query with the memory
limit set to 164MB, there were 500 memory allocation failures in
total (one in each execution), and a total of 266 of them from
Exchange Node #4.

Testing:
  Ran the query in question individually;
  Ran TestExchangeMemUsage.test_exchange_mem_usage_scaling test;
  Ran core tests.

Change-Id: Id945d7e37fac07beb7808e6ccf8530e667cbaad4
Reviewed-on: http://gerrit.cloudera.org:8080/16791
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-30 22:33:57 +00:00
Aman Sinha
5530b62539 IMPALA-10314: Optimize planning time for simple limits
This patch optimizes the planning time for simple limit
queries by only considering a minimal set of partitions
whose file descriptors add up to N (the specified limit).
Each file is conservatively estimated to contain 1 row.

This reduces the number of partitions processed by
HdfsScanNode.computeScanRangeLocations() which, according
to query profiling, has been the main contributor to the
planning time especially for large number of partitions.
Further, within each partition, we only consider the number
of non-empty files that brings the total to N.

This is an opt-in optimization. A new planner option
OPTIMIZE_SIMPLE_LIMIT enables this optimization. Further,
if there's a WHERE clause, it must have an 'always_true'
hint in order for the optimization to be considered. For
example:
  set optimize_simple_limit = true;
  SELECT * FROM T
    WHERE /* +always_true */ <predicate>
  LIMIT 10;

If there are too many empty files in the partitions, it is
possible that the query may produce fewer rows although
those are still valid rows.

Testing:
 - Added planner tests for the optimization
 - Ran query_test.py tests by enabling the optimize_simple_limit
 - Added an e2e test. Since result rows are non-deterministic,
   only simple count(*) query on top of subquery with limit
   was added.

Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Reviewed-on: http://gerrit.cloudera.org:8080/16723
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-28 07:30:06 +00:00
Fucun Chu
8ea49e9b02 IMPALA-10134: Implement ds_hll_union_f() function.
This function receives two strings that are serialized Apache DataSketches
HLL sketches. Union two sketches and returns the resulting sketch of union.

Example:
select ds_hll_estimate(ds_hll_union_f(i_i, h_i))
from hll_sketches_impala_hive2;
+-------------------------------------------+
| ds_hll_estimate(ds_hll_union_f(i_i, h_i)) |
+-------------------------------------------+
| 7                                         |
+-------------------------------------------+

Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
Reviewed-on: http://gerrit.cloudera.org:8080/16711
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-25 14:33:10 +00:00
Tim Armstrong
c4e7977f5e IMPALA-10351,IMPALA-9812: enable mt_dop for DML by default
This allows setting mt_dop for any query with any configuration.
Before this patch it was not supported for DML.

--unlock_mt_dop and --mt_dop_auto_fallback are now ignored.

Testing:
* Updated tests to reflect new behaviour.
* Removed irrelevant tests for fallback/validation.
* Ran exhaustive tests.

Change-Id: I66331481260fe4b69d9e95b0200029b14d230ade
Reviewed-on: http://gerrit.cloudera.org:8080/16775
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-25 03:14:37 +00:00
Zoltan Borok-Nagy
96decf535b IMPALA-10345: Impala hits DCHECK in parquet-column-stats.inline.h
During Parquet file writing, a DCHECK checks if row group stats have
copied the min/max string values into their internal buffers. This check
is at the finalization of each page. The copying of the string values
happened at the end of each row batch.

Thus, if a row batch spans over multiple pages then the min/max
string values don't get copied by the end of the page. Since the
memory is attached to the row batch this isn't really an error.

As a workaround this commit also copies the min/max string values
at the end of the page if they haven't been copied yet.

Testing
 * Added e2e test

Change-Id: I4289bd743e951cc4c607d5a5ea75d27825a1c12b
Reviewed-on: http://gerrit.cloudera.org:8080/16771
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-24 18:35:07 +00:00
Gabor Kaszab
b66045c8a5 IMPALA-10288: Implement DESCRIBE HISTORY for Iceberg tables
The DESCRIBE HISTORY works for Iceberg tables and displays the
snapshot history of the table.

An example output:
DESCRIBE HISTORY iceberg_multi_snapshots;
+----------------------------+---------------------+---------------------+---------------------+
| creation_time              | snapshot_id         | parent_id           | is_current_ancestor |
+----------------------------+---------------------+---------------------+---------------------+
| 2020-10-13 14:01:07.234000 | 4400379706200951771 | NULL                | TRUE                |
| 2020-10-13 14:01:19.307000 | 4221472712544505868 | 4400379706200951771 | TRUE                |
+----------------------------+---------------------+---------------------+---------------------+

The purpose here was to have similar output with this new feature as
what SparkSql returns for "SELECT * from tablename.history".
See "History" section of
https://iceberg.apache.org/spark/#inspecting-tables

Testing:
  - iceberg-negative.test was extended to check that DESCRIBE HISTORY
    is not applicable for non-Iceberg tables.
  - iceberg-table-history.test: Covers basic usage of DESCRIBE
    HISTORY. Tests on tables created with Impala and also with Spark.

Change-Id: I56a4b92c27e8e4a79109696cbae62735a00750e5
Reviewed-on: http://gerrit.cloudera.org:8080/16599
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Reviewed-by: wangsheng <skyyws@163.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-23 21:47:55 +00:00
Zoltan Borok-Nagy
4448b8755b IMPALA-10152: Add support for Iceberg HiveCatalog
HiveCatalog is one of Iceberg's catalog implementations. It uses
the Hive metastore and it is the recommended catalog implementation
when the table data is stored in object stores like S3.

This commit updates the Iceberg version to a newer one, and it also
retrieves Iceberg from the CDP distribution because that version of
Iceberg is built against Hive 3 (Impala is only compatible with
Hive 3).

This commit makes HiveCatalog the default Iceberg catalog in Impala
because it can be used in more environments (e.g. cloud stores),
and it is more featureful. Also, other engines that store their
table metadata in HMS will probably use HiveCatalog as well.

Tables stored in HiveCatalog are similar to Kudu tables with HMS
integration, i.e. modifying an Iceberg table via the Iceberg APIs
also modifies the HMS table. So in CatalogOpExecutor we handle
such Iceberg tables similarly to integrated Kudu tables.

Testing:
 * Added e2e tests for creating, writing, and altering Iceberg
   tables
 * Added SHOW CREATE TABLE tests

Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Reviewed-on: http://gerrit.cloudera.org:8080/16721
Reviewed-by: wangsheng <skyyws@163.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-20 21:40:28 +00:00
Daniel Becker
301d7ebe75 IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.
List all file formats that a HdfsScanNode needs to process in any
fragment instance. It is possible that some file formats will not be
needed in all fragment instances.

This is a step towards sharing codegen between different impala
backends. Using the file formats provided in the thrift file, a backend
can codegen code for file formats that are not needed in its own process
but are needed in other fragment instances running on other backends,
and the resulting binary can be shared between multiple backends.

Codegenning for file formats will be done based on the thrift message
and not on what is needed for the actual backend. This leads to some
extra work in case a file format is not needed for the current backend
and codegen sharing is not available (at this point it is not
implemented). However, the overall number of such cases is low.

Also adding the file formats to the node's explain string at level 3.

Testing:
 - Added tests to verify that the file formats are present in the
   explain string at level 3.

Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d
Reviewed-on: http://gerrit.cloudera.org:8080/16728
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2020-11-20 17:54:08 +00:00
skyyws
5c91ff2737 IMPALA-10346: Rename Iceberg test tables' name with specific cases
We used some unrecognized table names in Iceberg related test cases,
such as iceberg_test1/iceberg_test2 and so on, which resulted in poor
readability. So we better rename these Iceberg test tables' name by
specific cases.

Testing:
  - Renamed tables' name in iceberg-create.test
  - Renamed tables' name in iceberg-alter.test

Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5
Reviewed-on: http://gerrit.cloudera.org:8080/16753
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-20 16:24:56 +00:00
Zoltan Borok-Nagy
26fc6795ec IMPALA-10318: default_transactional_type shouldn't affect Iceberg tables
Query option 'default_transactional_type' shouldn't affect Iceberg
tables. Also, Iceberg tables shouldn't allow setting transactional
properties.

Testing:
 * Added e2e tests

Change-Id: I86d1ac82ecd01a7455a0881a9e84aeb193dd5385
Reviewed-on: http://gerrit.cloudera.org:8080/16742
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-18 22:24:08 +00:00
guojingfeng
1fd5e4279c IMPALA-10310: Fix couldn't skip rows in parquet file on NextRowGroup
In practice we recommend that hdfs block size should align with parquet
row group size.But in fact some compute engine like spark, default
parquet row group size is 128MB, and if ETL user doesn't change the
default property spark will generate row groups that smaller than hdfs
block size. The result is a single hdfs block may contain multiple
parquet row groups.

In planner stage, length of impala generated scan range may be bigger
than row group size. thus a single scan range contains multiple row
group. In current parquet scanner when move to next row group, some of
internal stat in parquet column readers need to reset.
eg: num_buffered_values_, column chunk metadata, reset internal stat of
column chunk readers. But current_row_range_ offset is not reset
currently, this will cause errors
"Couldn't skip rows in file hdfs://xxx" as IMPALA-10310 points out.

This patch simply reset current_row_range_ to 0 when moving into next
row group in parquet column readers. Fix the bug IMPALA-10310.

Testing:
* Add e2e test for parquet multi blocks per file and multi pages
  per block
* Ran all core tests offline.
* Manually tested all cases encountered in my production environment.

Change-Id: I964695cd53f5d5fdb6485a85cd82e7a72ca6092c
Reviewed-on: http://gerrit.cloudera.org:8080/16697
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-12 17:12:22 +00:00
Zoltan Borok-Nagy
8243a97ec2 Bump up CDP_BUILD_NUMBER to 6912987
This change bumps up the CDP_BUILD_NUMBER to 6912987. The new
CDP build includes Iceberg artifacts.

The new Hive version has a few bugs that cause existing tests
to fail. Unfortunately we can't expect them to be fixed soon
in CDP Hive, so I adjusted the tests and added some TODO comments.

Change-Id: Ide03d6b86043e72753485ff3d4056e0a1bb5c36f
Reviewed-on: http://gerrit.cloudera.org:8080/16701
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-11 01:39:01 +00:00
Joe McDonnell
4b654e7c97 IMPALA-10058: Use commit hash as version for Kudu java artifacts
This uses a new version of the native toolchain where Kudu
now uses the commit hash as the version for its jars.
This means that IMPALA_KUDU_VERSION is the same as
IMPALA_KUDU_JAVA_VERSION, so this consolidates everything
to use IMPALA_KUDU_VERSION. This also eliminates SNAPSHOT
versions for the Kudu jars.

Kudu changed one error message, so this updates the impacted
tests.

Testing:
 - Ran a core job

Change-Id: I1a6c9676f4521d6709393143d3e82533486164d3
Reviewed-on: http://gerrit.cloudera.org:8080/16686
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-03 20:05:20 +00:00
skyyws
3e06d600c2 IMPALA-10166 (part 1): ALTER TABLE for Iceberg tables
This patch mainly implements ALTER TABLE for Iceberg
tables, we currently support these statements:
  * ADD COLUMNS
  * RENAME TABLE
  * SET TBL_PROPERTIES
  * SET OWNER
We forbid DROP COLUMN/REPLACE COLUMNS/ALTER COLUMN in this
patch, since these statemens may make Iceberg tables unreadable.
We may support column resolution by field id in the near future,
after that, we will support COLUMN/REPLACE COLUMNS/ALTER COLUMN
for Iceberg tables.

Here something we still need to pay attention:
1.RENAME TABLE is not supported for HadoopCatalog/HadoopTables,
even if we already implement 'RENAME TABLE' statement, so we
only rename the table in the Hive Metastore for external table.
2.We cannot ADD/DROP PARTITION now since there is no API for that
in Iceberg, but related work is already in progess in Iceberg.

Testing:
- Iceberg table alter test in test_iceberg.py
- Iceberg table negative test in test_scanners.py
- Rename tables in iceberg-negative.test

Change-Id: I5104cc47c7b42dacdb52983f503cd263135d6bfc
Reviewed-on: http://gerrit.cloudera.org:8080/16606
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-30 14:03:29 +00:00
Fucun Chu
193c2e773f IMPALA-10132: Implement ds_hll_estimate_bounds_as_string() function.
This function receives a string that is a serialized Apache DataSketches
HLL sketch and optional kappa that is a number of standard deviations
from the mean: 1, 2 or 3 (default 2). Returns estimate and bounds with
the values separated with commas.
The result is three values: estimate, lower bound and upper bound.

   ds_hll_estimate_bounds_as_string(sketch [, kappa])

Kappa:
 1 represent the 68.3% confidence bounds
 2 represent the 95.4% confidence bounds
 3 represent the 99.7% confidence bounds

Note, ds_hll_estimate_bounds() should return an Array of doubles as
the result but with that we have to wait for the complex type support.
Until, we provide ds_hll_estimate_bounds_as_string() that can be
deprecated once we have array support. Tracking Jira for returning
complex types from functions is IMPALA-9520.

Example:
select ds_hll_estimate_bounds_as_string(ds_hll_sketch(int_col)) from
functional_parquet.alltypestiny;
+----------------------------------------------------------+
| ds_hll_estimate_bounds_as_string(ds_hll_sketch(int_col)) |
+----------------------------------------------------------+
| 2,2,2.0002                                               |
+----------------------------------------------------------+

Change-Id: I46bf8263e8fd3877a087b9cb6f0d1a2392bb9153
Reviewed-on: http://gerrit.cloudera.org:8080/16626
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-29 17:45:01 +00:00
Zoltan Borok-Nagy
981ef10465 IMPALA-10215: Implement INSERT INTO for non-partitioned Iceberg tables (Parquet)
This commit adds support for INSERT INTO statements against Iceberg
tables when the table is non-partitioned and the underlying file format
is Parquet.

We still use Impala's HdfsParquetTableWriter to write the data files,
though they needed some modifications to conform to the Iceberg spec,
namely:
 * write Iceberg/Parquet 'field_id' for the columns
 * TIMESTAMPs are encoded as INT64 micros (without time zone)

We use DmlExecState to transfer information from the table sink
operators to the coordinator, then updateCatalog() invokes the
AppendFiles API to add files atomically. DmlExecState is encoded in
protobuf, communication with the Frontend uses Thrift. Therefore to
avoid defining Iceberg DataFile multiple times they are stored in
FlatBuffers.

The commit also does some corrections on Impala type <-> Iceberg type
mapping:
 * Impala TIMESTAMP is Iceberg TIMESTAMP (without time zone)
 * Impala CHAR is Iceberg FIXED

Testing:
 * Added INSERT tests to iceberg-insert.test
 * Added negative tests to iceberg-negative.test
 * I also did some manual testing with Spark. Spark is able to read
   Iceberg tables written by Impala until we use TIMESTAMPs. In that
   case Spark rejects the data files because it only accepts TIMESTAMPS
   with time zone.
 * Added concurrent INSERT tests to test_insert_stress.py

Change-Id: I5690fb6c2cc51f0033fa26caf8597c80a11bcd8e
Reviewed-on: http://gerrit.cloudera.org:8080/16545
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-26 20:01:09 +00:00
Fang-Yu Rao
eda06f41ce IMPALA-9990: Support SET OWNER for Kudu tables
KUDU-3090 adds the support for table ownership and exposes the API's of
setting owner on creating and altering tables, which allows Impala to
also pass to Kudu the new owner of the Kudu table for the ALTER TABLE
SET OWNER statement.

Specifically, based on the API of AlterTableOptions#setOwner(), this
patch stores the ownership information of the Kudu table in the
corresponding instance of AlterTableOptions, which will then be passed
to Kudu via a KuduClient.

Testing:
- Added a FE test in AnalyzeKuduDDLTest.java to verify the statement
  could be correctly analyzed.
- Added an E2E test in kudu_alter.test to verify the statement could be
  correctly executed when the integration between Kudu and HMS is not
  enabled.
- Added an E2E test in kudu_hms_alter.test and verified that the
  statement could be correctly executed when the integration between
  Kudu and HMS is enabled after manually re-enabling
  TestKuduHMSIntegration::test_kudu_alter_table(). Note that this was
  not possible before IMPALA-10092 was resolved due to a bug in the
  class of CustomClusterTestSuite. In addition, we may need to delete
  the Kudu table 'simple' via a Kudu-Python client if the E2E test
  complains that the Kudu table already exists, which may be related to
  IMPALA-8751.
- Manually verified that the views of Kudu server and HMS are consistent
  for a synchronized Kudu table after the ALTER TABLE SET OWNER
  statement even though the Kudu table was once an external and
  non-synchronized table, meaning that the owner from Kudu's perspective
  could be different than that from HMS' perspective. Such a discrepancy
  could be created if we execute the ALTER TABLE SET OWNER statement for
  an external Kudu table with the property of 'external.table.purge'
  being false. The test is performed manually because currently the
  Kudu-Python client adopted in Impala's E2E tests is not up to date so
  that the field of 'owner' cannot be accessed in the E2E tests. On the
  other hand, to verify the owner of a Kudu table from Kudu's
  perspective, we used the latest Kudu-Python client as provided at
  github.com/apache/kudu/tree/master/examples/python/basic-python-example.
- Verified that the patch could pass the exhaustive tests in the DEBUG
  mode.

Change-Id: I29d641efc8db314964bc5ee9828a86d4a44ae95c
Reviewed-on: http://gerrit.cloudera.org:8080/16273
Reviewed-by: Vihang Karajgaonkar <vihang@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-23 04:01:26 +00:00
Zoltan Borok-Nagy
9384a18180 IMPALA-10257: Relax check for page filtering
HdfsParquetScanner::CheckPageFiltering() is a bit too strict. It checks
that all column readers agree on the top level rows. Column readers
have different strategies to read columns. One strategy reads ahead
the Parquet def/rep levels, the other strategy reads levels and
values simoultaneously, i.e. no readahead of levels.

We calculate the ordinal of the top level row based on the repetition
level. This means when we readahead the rep level, the top level row
might point to the value to be processed next. While top level row
in the other strategy always points to the row that has been
completely processed last.

Because of this in CheckPageFiltering() we can allow a difference of
one between the 'current_row_' values of the column readers.

I also got rid of the DCHECK in CheckPageFiltering() and replaced it
with a more informative error report.

Testing:
* added a test to nested-types-parquet-page-index.test

Change-Id: I01a570c09eeeb9580f4aa4f6f0de2fe6c7aeb806
Reviewed-on: http://gerrit.cloudera.org:8080/16619
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-21 20:30:05 +00:00
Qifan Chen
6dbf1ca09c IMPALA-6628: Use unqualified table references in .test files run from test_queries.py
This fix modified the following tests launched from test_queries.py by
removing references to database 'functional' whenever possible. The
objective of the change is to allow more testing coverage with different
databases than the single 'functional' database. In the fix, neither new
tables were added nor expected results were altered.

  empty.test
  inline-view-limit.test
  inline-view.test
  limit.test
  misc.test
  sort.test
  subquery-single-node.test
  subquery.test
  top-n.test
  union.test
  with-clause.test

It was determined that other tests in
testdata/workloads/functional-query/queries/QueryTest do not refer to
'functional' or the references are a must for some reason.

Testing
   Ran query_tests on these changed tests with exhaustive exploration
   strategy.

Change-Id: Idd50eaaaba25e3bedc2b30592a314d2b6b83f972
Reviewed-on: http://gerrit.cloudera.org:8080/16603
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-21 05:20:33 +00:00
stiga-huang
198bbe280c IMPALA-10255: Fix TestInsertQueries.test_insert fails in exhaustive builds
The patch in IMPALA-10233 adds 3 insert statements in
testdata/workloads/functional-query/queries/QueryTest/insert.test. The
test has CREATE TABLE ... LIKE functional.alltypes; therefore it'll
create a TEXT table regardless to the test vector. But the compression
codec is determined by the test vector, and since Impala cannot write
compressed text, the test fails.

The created table should use the same table format as the one in the
test vector.

Tests:
 - Run TestInsertQueries.test_insert in exhaustive mode.

Change-Id: Id0912f751fa04015f1ffdc38f5c7207db7679896
Reviewed-on: http://gerrit.cloudera.org:8080/16609
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-20 06:47:25 +00:00
stiga-huang
faa2d398e6 IMPALA-10233: zorder sort node should output rows in lexical order of partition keys
When inserting to a partitioned hdfs table, the planner will add a sort
node on top of the plan, depending on the clustered/noclustered plan
hint and on the 'sort.columns' table property. If clustering is enabled
in insertStmt or additional columns are specified in the 'sort.columns'
table property, then the ordering columns will start with the clustering
columns, so that partitions can be written sequentially in the table
sink. Any additional non-clustering columns specified by the
'sort.columns' property will be added to the ordering columns and after
any clustering columns.

For Z-order sort type, we should deal with these ordering columns
separately. The clustering columns should still be sorted lexically, and
only the remaining ordering columns be sorted in Z-order. So we can
still insert partitions one by one and avoid hitting the DCHECK as
described in the JIRA.

Tests
 - Add tests for inserting to a partitioned table with zorder.

Change-Id: I30cbad711167b8b63c81837e497b36fd41be9b54
Reviewed-on: http://gerrit.cloudera.org:8080/16590
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-16 06:40:38 +00:00
skyyws
0c0985a825 IMPALA-10159: Supporting ORC file format for Iceberg table
This patch mainly realizes querying Iceberg table with ORC
file format. We can using following SQL to create table with
ORC file format:
  CREATE TABLE default.iceberg_test (
    level string,
    event_time timestamp,
    message string,
  )
  STORED AS ICEBERG
  LOCATION 'hdfs://xxx'
  TBLPROPERTIES ('iceberg.file_format'='orc', 'iceberg.catalog'='hadoop.tables');
But pay attention, there still some problems when scan ORC files
with Timestamp, more details please refer IMPALA-9967. We may add
new tests with Timestmap type after this JIRA fixed.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py

Change-Id: Ib579461aa57348c9893a6d26a003a0d812346c4d
Reviewed-on: http://gerrit.cloudera.org:8080/16568
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-14 19:19:19 +00:00
Gabor Kaszab
13a78fc1b0 IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables
This patch adds support for Iceberg Bucket and Truncate partition
transforms. Both accept a parameter: number of buckets and width
respectively.

Usage:
CREATE TABLE tbl_name (i int, p1 int, p2 timestamp)
PARTITION BY SPEC (
  p1 BUCKET 10,
  p1 TRUNCATE 5
) STORED AS ICEBERG
TBLPROPERTIES ('iceberg.catalog'='hadoop.tables');

Testing:
  - Extended AnalyzerStmtsTest to cover creating partitioned Iceberg
    tables with the new partition transforms.
  - Extended ParserTest.
  - Extended iceberg-create.test to create Iceberg tables with the new
    partition transforms.
  - Extended show-create-table.test to check that the new partition
    transforms are displayed with their parameters in the SHOW CREATE
    TABLE output.

Change-Id: Idc75cd23045b274885607c45886319f4f6da19de
Reviewed-on: http://gerrit.cloudera.org:8080/16551
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-14 19:07:06 +00:00
wzhou-code
90b8b49554 IMPALA-10207: Avoid MD5 hash for lineage graph
To support FIPS, we have to avoid MD5 since it's not allowed algorithms
for FIPS. But we use MD5 hash for lineage graph.
This patch replace MD5 with non-cryptographic hash function murmur3_128
which generates hash value with same length as MD5.

External dependency on the hash value:
Went through Apache Atlas source code. ImpalaQuery.getHash() function
(https://github.com/apache/atlas/blob/master/addons/impala-bridge/src/
main/java/org/apache/atlas/impala/model/ImpalaQuery.java#L60) is not
called anywhere. Don't see any dependency on the hash value in Atlas.

Testing:
 - Passed test_lineage.py.
 - Passed core tests.

Change-Id: I22b1e91cf9d6c89a3c62749ae0fd88528ae69885
Reviewed-on: http://gerrit.cloudera.org:8080/16564
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-08 22:49:59 +00:00
Csaba Ringhofer
76f9b75c8b IMPALA-10172: Support Hive metastore managed locations for databases
This change lets the user set the managed location path in new
databases, e.g.
CREATE DATABASE db MANAGEDLOCATION 'some url';

This property sets the location where the database's tables with
table property 'transactional'='true' will be placed.

The change also adds managedlocation to DESCRIBE DATABASE's output.
Example:
DESCRIBE DATABASE db;
+------------------+-----------------------------------------+---------+
| name             | location                                | comment |
+------------------+-----------------------------------------+---------+
| db               | hdfs://localhost:20500/test-warehouse/a |         |
| managedlocation: | hdfs://localhost:20500/test-warehouse/b |         |
+------------------+-----------------------------------------+---------+
DESCRIBE DATABASE EXTENDED db6;
+------------------+-----------------------------------------+---------+
| name             | location                                | comment |
+------------------+-----------------------------------------+---------+
| db               | hdfs://localhost:20500/test-warehouse/a |         |
| managedlocation: | hdfs://localhost:20500/test-warehouse/b |         |
| Owner:           |                                         |         |
|                  | csringhofer                             | USER    |
+------------------+-----------------------------------------+---------+

Note that Impala's output for DESCRIBE DATABASE (EXTENDED) is
different than Hive's, where a new column was added for each extra
piece of information, while Impala adds a new row to keep the 3 column
format. Changing to Hive's format would be preferable in my opinion,
but is a potentially breaking change.
See IMPALA-6686 for further discussion.

Testing:
- added FE and EE tests
- ran relevant tests

Change-Id: I925632a43ff224f762031e89981896722e453399
Reviewed-on: http://gerrit.cloudera.org:8080/16529
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-07 18:32:51 +00:00
Zoltan Borok-Nagy
9817afbfd5 IMPALA-9952: Fix page index filtering for empty pages
As IMPALA-4371 and IMPALA-10186 points out, Impala might write
empty data pages. It usually does that when it has to write a bigger
page than the current page size. If we really need to write empty data
pages is a different question, but we need to handle them correctly
as there are already such files out there.

The corresponding Parquet offset index entries to empty data pages
are invalid PageLocation objects with 'compressed_page_size' = 0.
Before this commit Impala didn't ignore the empty page locations, but
generated a warning. Since invalid page index doesn't fail a scan
by default, Impala continued scanning the file with semi-initialized
page filtering. This resulted in 'Top level rows aren't in sync'
error, or a crash in DEBUG builds.

With this commit Impala ignores empty data pages and still able to
filter the rest of the pages. Also, if the page index is corrupt
for some other reason, Impala correctly resets the page filtering
logic and falls back to regular scanning.

Testing:
* Added unit test for empty data pages
* Added e2e test for empty data pages
* Added e2e test for invalid page index

Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0
Reviewed-on: http://gerrit.cloudera.org:8080/16503
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-07 13:41:34 +00:00
skyyws
5912c47617 IMPALA-10221: Rename 'iceberg_file_format' to 'iceberg.file_format' as Iceberg table property
We provide several new table properties in IMPALA-10164, such as
'iceberg.catalog', in order to keep consist of these properties, we
rename 'iceberg_file_format' to 'iceberg.file_format'. When we creating
Iceberg table, we should use SQL like this:
  CREATE TABLE default.iceberg_test (
    level string,
    event_time timestamp,
    message string,
  )
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.file_format'='parquet',
    'iceberg.catalog'='hadoop.tables')

Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623
Reviewed-on: http://gerrit.cloudera.org:8080/16550
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-06 16:58:04 +00:00
Gabor Kaszab
a5019eb12e IMPALA-10184: Add PARTITON BY SPEC to SHOW CREATE TABLE for Iceberg Tables
A SHOW CREATE TABLE output didn't contain the PARTITION BY SPEC section
for partitioned Iceberg tables. This patch addresses this shortcoming.

Change-Id: Ie4c43b75057807ab513a220d348155be2487e714
Reviewed-on: http://gerrit.cloudera.org:8080/16512
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-06 01:42:30 +00:00
skyyws
5b720a4d18 IMPALA-10164: Supporting HadoopCatalog for Iceberg table
This patch mainly realizes creating Iceberg table by HadoopCatalog.
We only supported HadoopTables api before this patch, but now we can
use HadoopCatalog to create Iceberg table. When creating managed table,
we can use SQL like this:
  CREATE TABLE default.iceberg_test (
    level string,
    event_time timestamp,
    message string,
  )
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
    'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test');
We supported two values ('hadoop.catalog', 'hadoop.tables') for
'iceberg.catalog' now. If you don't specify this property in your SQL,
default catalog type is 'hadoop.catalog'.
As for external Iceberg table, you can use SQL like this:
  CREATE EXTERNAL TABLE default.iceberg_test_external
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
    'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test',
    'iceberg.table_identifier'='default.iceberg_test');
We cannot set table location for both managed and external Iceberg
table with 'hadoop.catalog', and 'SHOW CREATE TABLE' will not display
table location yet. We need to use 'DESCRIBE FORMATTED/EXTENDED' to
get this location info.
'iceberg.catalog_location' is necessary for 'hadoop.catalog' table,
which used to reserved Iceberg table metadata and data, and we use this
location to load table metadata from Iceberg.
'iceberg.table_identifier' is used for Icebreg TableIdentifier.If this
property not been specified in SQL, Impala will use database and table name
to load Iceberg table, which is 'default.iceberg_test_external' in above SQL.
This property value is splitted by '.', you can alse set this value like this:
'org.my_db.my_tbl'. And this property is valid for both managed and external
table.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py
- Iceberg table show create table test in test_show_create_table.py

Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Reviewed-on: http://gerrit.cloudera.org:8080/16446
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-01 13:54:48 +00:00
Zoltan Borok-Nagy
d613d4c019 IMPALA-10143: TestAcid.test_full_acid_original_files is flaky
One of the test queries depended on the values of the row__id.rowid
field. In rare cases the files are written differently by Hive which
leads to different 'rowid' values for the rows.

I took out the test for the 'rowid' values in that particular query.

We have other tests for the 'rowid' field on static data files (from
'testdata/data'), therefore we still have coverage for that and they
shouldn't be flaky.

Change-Id: I3d36bd23b8d3cc257bad9a83a4462f20e073d437
Reviewed-on: http://gerrit.cloudera.org:8080/16523
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-30 21:01:54 +00:00
Gabor Kaszab
ee9904bc50 IMPALA-10175: Extend error msg when CAST(FORMAT) fails for DATE
The previous error message contained only the input string but
including the format string as well would make debugging easier.

Extended error message:
SELECT CAST('0;367' as date format 'YY;DDD');
String to Date parse failed. Input '0;367' doesn't match with
format 'YY;IDD'

Change-Id: I4e379f0f112e83e1511edb170bbe41f903972622
Reviewed-on: http://gerrit.cloudera.org:8080/16473
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
2020-09-28 14:02:35 +00:00