Commit Graph

2776 Commits

Author SHA1 Message Date
Peter Rozsa
a570ee866e IMPALA-10173: (Addendum) Fix substitution for unsafe expressions, column-level compatibility check
Expression substitution recreates cast expressions without considering
the compatibility level introduced by IMPALA-10173. In unsafe mode, the
recreation causes IllegalStateException. This change fixes this
behavior by storing the compatibility level in each CastExpr, and
reusing it when the expression substitution recreates the cast
expression.

For example: 'select "1", "1" union select 1, "1"'

Also, Set operation's common type calculations did not distinguish
compatibility levels for each column slot, if one column slot's common
type was considered unsafe, every other slot was treated as unsafe.
This change fixes this behavior by reinitializing the compatibility
level for every column slot, enabling cases where one column slot
contains unsafely casted constant values and another contains
non-constant expressions with regular casts.
These queries failed before this change with 'Unsafe implicit cast is
prohibited for non-const expression' error.

For example: 'select "1", 1 union select 1, int_col from unsafe_insert'

Tests:
 - test cases added to insert-unsafe.test

Change-Id: I39d13f177482f74ec39570118adab609444c6929
Reviewed-on: http://gerrit.cloudera.org:8080/20184
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-27 20:49:07 +00:00
Gergely Fürnstáhl
49d599c7f7 IMPALA-12233: Fixed PHJ hanging caused by cyclic barrier
Partitioned Hash Join with a limit could hang when using mt_dop>0, due
to the cyclic barrier in PHJBuilder is not cancelled properly. Added
possibility to unregister threads from the synchronization and a call
to it to PHJNode::Close(), so closing threads won't block still
processing ones.

Testing:
  - Added new unit tests covering new feature
  - Added e2e test to make sure the join does not hang

Change-Id: I8be75c7ce99c015964c8bbb547539e6619ba4f9b
Reviewed-on: http://gerrit.cloudera.org:8080/20179
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-20 20:08:35 +00:00
Joe McDonnell
e62043c1fa IMPALA-12287: Use old INSERT OVERWRITE TABLE syntax for Hive dataload
In dataload, we have some Hive statements that use the "INSERT OVERWRITE"
syntax rather than the "INSERT OVERWRITE TABLE" syntax. Older versions
of Hive do not support this syntax. In order to keep the dataload code
as compatible as possible, this switches the "INSERT OVERWRITE"
statements to "INSERT OVERWRITE TABLE".

Testing:
 - Ran a core job

Change-Id: I455641280166c39dcc42fb4187f728df8148cc70
Reviewed-on: http://gerrit.cloudera.org:8080/20198
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-14 23:33:01 +00:00
Peter Rozsa
b41339c22d IMPALA-12285: Use targetType in string-to-numeric literal conversion
This change fixes mismatched type problems when an implicitly casted
string literal gets converted to a numeric type. Example:

'INSERT INTO example(float_col) VALUES ("0"), (15629);'

After this change, StringLiteral's 'convertToNumber' method will
consider the targetType parameter when creates a new NumericLiteral.

Test:
 - test case added to insert-unsafe.test

Change-Id: I2141e7ab164af55a7fa66dda05fe6dcbd7379b69
Reviewed-on: http://gerrit.cloudera.org:8080/20197
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-14 17:23:26 +00:00
Riza Suminto
c2ef633c60 IMPALA-12282: Refine correlation factor in AggregationNode
IMPALA-11842 implement crude correlation factor calculation that simply
include all grouping expression. This can be made more precise by
excluding literal expression such as string literal or NULL literal that
often comes up in ROLLUP query.

Testing:
- Pass TpcdsPlannerTest

Change-Id: I4ffa9e82b83e7c0042bd918ac132668a47505688
Reviewed-on: http://gerrit.cloudera.org:8080/20194
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-14 04:25:46 +00:00
Joe McDonnell
57370fb06c IMPALA-12188: Avoid unnecessary output from sourcing bin/impala-config.sh
Many scripts source bin/impala-config.sh to get necessary
environment variables. The print statements in bin/impala-config.sh
for those scripts are not interesting and make the build logs
noisier.

This changes a variety of build scripts / utility scripts to
silence the output of sourcing bin/impala-config.sh. This continues
to print the output for invocations of buildall.sh.

Testing:
 - Ran a build and looked at the output

Change-Id: Ib4e39f50c7efb8c42a6d3597be0e18c4c79457c5
Reviewed-on: http://gerrit.cloudera.org:8080/20098
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Yifan Zhang <chinazhangyifan@163.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-07-14 03:17:47 +00:00
Riza Suminto
9070895ed3 IMPALA-11842: Improve memory estimation for Aggregate
Planner often overestimates aggregation node memory estimate since it
uses simple multiplication of NDVs of contributing grouping columns.
This patch introduces new query options LARGE_AGG_MEM_THRESHOLD and
AGG_MEM_CORRELATION_FACTOR. If the estimated perInstanceDataBytes from
the NDV multiplication method exceed LARGE_AGG_MEM_THRESHOLD, recompute
perInstanceDataBytes again by comparing against the max(NDV) &
AGG_MEM_CORRELATION_FACTOR method.

perInstanceDataBytes is kept at LARGE_AGG_MEM_THRESHOLD at a minimum so
that low max(NDV) will not negatively impact query execution. Unlike
PREAGG_BYTES_LIMIT, LARGE_AGG_MEM_THRESHOLD is evaluated on both
preaggregation and final aggregation, and does not cap max memory
reservation of the aggregation node (it may still increase memory
allocation beyond the estimate if it is available). However, if a plan
node is a streaming preaggregation node and PREAGG_BYTES_LIMIT is set,
then PREAGG_BYTES_LIMIT will override the value of
LARGE_AGG_MEM_THRESHOLD as a threshold.

Testing:
- Run the patch with 10 nodes, MT_DOP=12, against TPC-DS 3TB scale.
  Among 103 queries, 20 queries have lower
  "Per-Host Resource Estimates", 11 have lower "Cluster Memory
  Admitted", and 3 have over 10% reduced latency. No significant
  regression in query latency was observed.
- Pass core tests.

Change-Id: Ia4b4b2e519ee89f0a13fdb62d0471ee4047f6421
Reviewed-on: http://gerrit.cloudera.org:8080/20104
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-13 05:33:42 +00:00
Daniel Becker
31137cc0e3 IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
If, in a VALUES clause, for the same column all of the values are CHAR
types but not all are of the same length, the common type chosen is
CHAR(max(lengths)). This means that shorter values are padded with
spaces. If the destination column is not CHAR but VARCHAR or STRING,
this produces different results than if the values in the column are
inserted individually, in separate statements. This behaviour is
suboptimal because information is lost.

For example:
  CREATE TABLE impala_char_insert (s STRING);

  -- all values are CHAR(N) with different N, but all will use the
     biggest N
  INSERT OVERWRITE impala_char_insert VALUES
    (CAST("1" AS CHAR(1))),
    (CAST("12" AS CHAR(2))),
    (CAST("123" AS CHAR(3)));

  SELECT length(s) FROM impala_char_insert;
  3
  3
  3

  -- if inserted individually, the result is
  SELECT length(s) FROM impala_char_insert;
  1
  2
  3

This patch adds the query option VALUES_STMT_AVOID_LOSSY_CHAR_PADDING
which, when set to true, fixes the problem by implicitly casting the
values to the VARCHAR type of the longest value if all values in a
column are CHAR types AND not all have the same length. This VARCHAR
type will be the common type of the column in the VALUES statement.

The new behaviour is not turned on by default because it is a breaking
change.

Note that the behaviour in Hive is different from both behaviours in
Impala: Hive (and PostgreSQL) implicitly remove trailing spaces from
CHAR values when they are cast to other types, which is also lossy.

We choose VARCHAR instead of STRING as the common type because VARCHAR
can be converted to any VARCHAR type shorter or the same length and also
to STRING, while STRING cannot safely be converted to VARCHAR because
its length is not bounded - we would therefore run into problems if the
common type were STRING and the destination column were VARCHAR.

Note: although the VALUES statement is implemented as a special UNION
operation under the hood, this patch doesn't change the behaviour of
explicit UNION statements, it only applies to VALUES statements.

Note: the new VALUES_STMT_AVOID_LOSSY_CHAR_PADDING query option and
ALLOW_UNSAFE_CASTS are not allowed to be used at the same time: if both
are set to true and the query contains set operation(s), an error is
returned.

Testing:
 - Added tests verifying that unneeded padding doesn't occur and the
   queries succeed in various situations, e.g. different destination
   column types and multi-column inserts. See
   testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test

Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Reviewed-on: http://gerrit.cloudera.org:8080/18999
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-11 15:48:40 +00:00
Gergely Fürnstáhl
d0fe4c604f IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to integrate
features of PartitionedHashJoin if needed (partitioning, spilling).

ICEBERG_DELETE_JOIN is added as a join operator which is used only by
IcebergDeleteNode node.

IcebergDeleteBuild processes the data from the relevant delete files and
stores them in a {file_path: ordered row id vector} hash map.

IcebergDeleteNode tracks the processed file and progresses through the
row id vector parallel to the probe batch to check if a row is deleted
or hashes the probe row's file path and uses binary search to find the
closest row id if it is needed for the check.

Testing:
  - Duplicated related planner tests to run both with new operator and
hash join
  - Added a dimension for e2e tests to run both with new operator and
hash join
  - Added new multiblock tests to verify assumptions used in new
operator to optimize probing
  - Added new test with BATCH_SIZE=2 to verify in/out batch handling
with new operator

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Reviewed-on: http://gerrit.cloudera.org:8080/19850
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-05 20:32:23 +00:00
Gergely Fürnstáhl
04a930c4b3 IMPALA-11877: (Addendum) Fixed test regex for erasure coding
Tests fail in case of existing erasure coding, relaxed the row_regex
match.

Tested:
  - Ran locally

Change-Id: I9efa3b75d1bcfd5a7aeddc3bf58aed404735ee08
Reviewed-on: http://gerrit.cloudera.org:8080/20160
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-05 16:52:28 +00:00
LPL
929b91ac64 IMPALA-11013: Support 'MIGRATE TABLE' for external Hive tables
This patch implements the migration from legacy Hive tables to Iceberg
tables. The target Iceberg tables inherit the location of the original
Hive tables. The Hive table has to be a non-transactional table.

To migrate a Hive format table stored in a distributed system or object
store to an Iceberg table use the command:

ALTER TABLE [dbname.]table_name CONVERT TO ICEBERG [TBLPROPERTIES(...)];

Currently only 'iceberg.catalog' is allowed as a table property.

For example
     - ALTER TABLE hive_table CONVERT TO ICEBERG;
     - ALTER TABLE hive_table CONVERT TO ICEBERG TBLPROPERTIES(
       'iceberg.catalog' = 'hadoop.tables');

The HDFS table to be converted must follow those requirements:
     - table is not a transactional table
     - InputFormat must be either PARQUET, ORC, or AVRO

This is an in-place migration so the original data files of the legacy
Hive table are re-used and not moved, copied or re-created by this
operation. The new Iceberg table will have the 'external.table.purge'
property set to true after the migration.

NUM_THREADS_FOR_TABLE_MIGRATION query option can control the maximum
number of threads to execute the table conversion. The default value is
one, meaning that table conversion runs on one thread. It can be
configured in a range of [0, 1024]. Zero means that the number of CPU
cores will be the degree of parallelism. A value greater than zero will
imply the number of threads used for table conversion, however, there
is a cap of the number of CPU cores as the highest degree of
parallelism.

Process of migration:
 - Step 1: Setting table properties,
           e.g. 'external.table.purge'=false on the HDFS table.
 - Step 2: Rename the HDFS table to a temporary table name using a name
           format of "<original_table_name>_tmp_<random_ID>".
 - Step 3: Refresh the renamed HDFS table.
 - Step 4: Create an external Iceberg table by Iceberg API using the
           data of the Hdfs table.
 - Step 5 (Optional): For an Iceberg table in Hadoop Tables, run a
           CREATE TABLE query to add the Iceberg table to HMS as well.
 - Step 6 (Optional): For an Iceberg table in Hive catalog, run an
           INVALIDATE METADATA to make the new table available for all
           coordinators right after the conversion finished.
 - Step 7 (Optional): For an Iceberg table in Hadoop Tables, set the
           'external.table.purge' property to true in an ALTER TABLE
           query.
 - Step 8: Drop the temporary HDFS table.

Testing:
 - Add e2e tests
 - Add FE UTs
 - Manually tested the runtime performance for a table that is
   unpartitioned and has 10k data files. The runtime is around 10-13s.

Co-authored-by: lipenglin <lipenglin@apache.org>

Change-Id: Iacdad996d680fe545cc9a45e6bc64a348a64cd80
Reviewed-on: http://gerrit.cloudera.org:8080/20077
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tamas Mate <tmater@apache.org>
2023-07-03 08:17:41 +00:00
Zoltan Borok-Nagy
d19f895c62 IMPALA-12237: Add information about the table type in the lineage log
Apache Atlas needs table type information to correctly build the lineage
graph. This patch set adds a new field to the metadata of the lineage
graph vertices: 'tableType'.

Table type can be:
 * hive
 * iceberg
 * kudu
 * hbase
 * view
 * virtual
 * external-datasource

Tests:
 * updated current tests with the new field
 * added new tests focusing on Iceberg

Change-Id: I13aeb256ff6b1d0e3c2eb43f7f75513ffc2cd20e
Reviewed-on: http://gerrit.cloudera.org:8080/20120
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-30 23:27:31 +00:00
Fang-Yu Rao
0c7f15e8b0 IMPALA-12248: Add three configuration properties after RANGER-2895
This patch adds 3 Ranger configuration properties that will be required
after we start using a build that includes RANGER-2895 so that Ranger's
HTTP server could be properly started.

On the other hand, recall that a Ranger configuration property was
deprecated in RANGER-2895, i.e.,
ranger.jpa.jdbc.idleconnectiontestperiod. Hence, we should also remove
it when starting using a build that contains RANGER-2895. This task will
be tracked by IMPALA-12250.

Testing:
 - Verified that with this patch Ranger's HTTP server could be started
   and that we will be able to update the users, groups, and policies on
   the Ranger server with the current scripts in the Impala repository
   whether or not we are using a build that has RANGER-2895.

Change-Id: I19a27e3fe3ab96a9f60566dc2c87bd72636b91ae
Reviewed-on: http://gerrit.cloudera.org:8080/20129
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-30 04:03:09 +00:00
Peter Rozsa
3247cc6839 IMPALA-10173: Allow implicit casts between numeric and string types when inserting into table
This patch adds an expiremental query option called ALLOW_UNSAFE_CASTS
which allows implicit casting between some numeric types and string
types. A new type of compatibility is introduced for this purpose, and
the compatibility rule handling is refactored also. The new approach
uses an enum to differentiate the compatibility levels, and to make it
easier to pass them through methods. The unsafe compatibility is used
only in two cases: for set operations and for insert statements. The
insert statements and set operations accept unsafe implicitly casted
expressions only when the source expressions are constant.

The following implicit type casts are enabled in unsafe mode:
  - String -> Float, Double
  - String -> Tinyint, Smallint, Int, Bigint
  - Float, Double -> String
  - Tinyint, Smallint, Int, Bigint -> String

The patch also covers IMPALA-3217, and adds two more rules to handle
implicit casting in set operations and insert statements between string
types:
  - String -> Char(n)
  - String -> Varchar(n)
The unsafe implicit casting requires that the source expression must be
constant in this case as well.

Tests:
  - tests added to AnalyzeExprsTest.java
  - new test class added to test_insert.py

Change-Id: Iee5db2301216c2e088b4b3e4f6cb5a1fd10600f7
Reviewed-on: http://gerrit.cloudera.org:8080/19881
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-29 18:45:38 +00:00
Zoltan Borok-Nagy
ce4202a70b IMPALA-11877: (part 2) Add support for DELETE statements for PARTITIONED Iceberg tables
This patch adds support for the DELETE operation for partitioned
Iceberg tables. It does so by writing position delete files
(merge-on-read strategy). The delete files contain the file path
and file position of the deleted records. The delete files must
reside in the same partition as the data files they are referring
to.

To execute the DELETE statement for a given table 'tbl', we are
basically doing an INSERT to the virtual DELETE table
'tbl-POSITION-DELETE':

from:
 DELETE FROM ice_t WHERE id = 42;

to:
 INSERT INTO ice_t-POSITION-DELETE
 SELECT INPUT__FILE__NAME, FILE__POSITION
 FROM ice_t
 WHERE id = 42;

The above was true for unpartitioned Iceberg tables.

If the table is partitioned, we need to shuffle the rows around
executors based on the partitions they belong, then sort the rows
based on the partitions (also based on 'file_path' and 'pos'), so
writers can work on partitions sequentially.

To do this, we need to select the partition columns as well from the
table. But in case of partition-evolution there are different sets
of partition columns in each partition spec of the table. To overcome
this, this patchset introduces two additional virtual columns:

* PARTITION__SPEC__ID
* ICEBERG__PARTITION__SERIALIZED

PARTITION__SPEC__ID is an INT column that contains the Iceberg spec_id
for each row. ICEBERG__PARTITION__SERIALIZED is a BINARY column that
contains all partition values base64-encoded and dot-separated. E.g.:

select PARTITION__SPEC__ID, ICEBERG__PARTITION__SERIALIZED, * FROM ice_t
+---------------------+--------------------------------+---+---+
| partition__spec__id | iceberg__partition__serialized | i | j |
+---------------------+--------------------------------+---+---+
| 0                   | Mg==                           | 2 | 2 |
| 0                   | Mg==                           | 2 | 2 |
+---------------------+--------------------------------+---+---+

So for the INSERT we are shuffling the rows between executors based on
HASH(partition__spec__id, iceberg__partition__serialized) then each
writer fragment sorts the rows based on (partition__spec__id,
iceberg__partition__serialized, file_path, pos) before writing them out
to delete files. The IcebergDeleteSink has been smarten up in a way that
it creates a new delete file whenever it sees a row with a new
(partition__spec__id, iceberg__partition__serialized).

Some refactorings were also involved during implementing this patch set.
A lot of common code between IcebergDeleteSink and HdfsTableSink has
been moved to the common base class TableSinkBase. In the Frontend this
patch set also moves some common code of InsertStmt and ModifyStmt to a
new common base class DmlStatementBase.

Testing:
  * planner tests
  * e2e tests (including interop with Hive)
  * Did manual stress test with a TPCDS_3000.store_sales
  ** Table had 8 Billion rows, partitioned by column (ss_sold_date_sk)
  ** Deleted 800 Million rows using 10 Impala hosts
  ** Operation was successful, finished under a minute
  ** Created minimum number of delete files, i.e. one per partition

Change-Id: I28b06f240c23c336a7c5b6ef22fe2ee0a21f7b60
Reviewed-on: http://gerrit.cloudera.org:8080/20078
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-29 18:11:50 +00:00
Zoltan Borok-Nagy
1decfaf606 IMPALA-12247: Add unimplemented methods to ModifyStmt
This patch adds the missing method to ModifyStmt.

Testing:
 * added e2e tests

Change-Id: If00b4d9fb7c12b9eb63fe4e4dadbf349b633c31b
Reviewed-on: http://gerrit.cloudera.org:8080/20127
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-28 08:42:06 +00:00
Riza Suminto
7a94adbc30 IMPALA-12192: Fix scaling bug in scan fragment
IMPALA-12091 has a bug where scan fragment parallelism will always be
limited solely by the ScanNode cost. If ScanNode is colocated with other
query node operators that have higher processing costs, Planner will not
scale it up beyond what is allowed by the ScanNode cost.

This patch fixes the problem in two aspects. The first is to allow a
scan fragment to scale up higher as long as it is within the total
fragment cost and the number of effective scan ranges. The second is to
add missing Math.max() in CostingSegment.java which causes lower
fragment parallelism even when the total fragment cost is high.

IMPALA-10287 optimization is re-enabled to reduce regression in TPC-DS
Q78. Ideally, the broadcast vs partitioned costing formula during
distributed planning should not rely on numInstance. But enabling this
optimization ensures consistent query plan shape when comparing against
MT_DOP plan. This optimization can still be disabled by specifying
USE_DOP_FOR_COSTING=false.

This patch also does some cleanup including:
- Fix "max-parallelism" value in explain string.
- Make a constant in ScanNode.rowMaterializationCost() into a backend
  flag named scan_range_cost_factor for experimental purposes.
- Replace all references to ProcessingCost.isComputeCost() to
  queryOptions.isCompute_processing_cost() directly.
- Add Precondition in PlanFragment.getNumInstances() to verify that the
  fragment's num instance is not modified anymore after the costing
  algorithm finish.

Testing:
- Manually run TPCDS Q84 over tpcds10_parquet and confirm that the
  leftmost scan fragment parallelism is raised from 12 (before the
  patch) to 18 (after the patch).
- Add test in PlannerTest.testProcessingCost that reproduces the issue.
- Update compute stats test in test_executor_groups.py to maintain test
  assertion.
- Pass core tests.

Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Reviewed-on: http://gerrit.cloudera.org:8080/20024
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-24 02:24:41 +00:00
Michael Smith
dced8ca27c IMPALA-12217: Update cgroup-util to handle cgroups v2
RedHat 9 and Ubuntu 22 switch to cgroups v2, which has a different
hierarchy than cgroups v1. Ubuntu 20 has a hybrid layout with both
cgroup and cgroup2 mounted, but the cgroup2 functionality is limited.

Updates cgroup-util to
- identify available cgroups in FindCGroupMounts. Prefers v1 if
  available, as Ubuntu 20's hybrid layout provides only limited v2
  interfaces.
- refactors file reading to follow guidelines from
  https://gehrcke.de/2011/06/reading-files-in-c-using-ifstream-dealing-correctly-with-badbit-failbit-eofbit-and-perror/
  for clearer error handling. Specifically, failbit doesn't set errno, but
  we were printing it anyway (which produced misleading errors).
- updates FindCGroupMemLimit to read memory.max for cgroups v2.
- updates DebugString to print the correct property based on cgroup
  version.

Removes unused cgroups test library.

Testing:
- proc-info-test CGroupInfo.ErrorHandling test on RHEL 9 and Ubuntu 20.
- verified no error messages related to reading cgroup present in logs
  on RHEL 9 and Ubuntu 20.

Change-Id: I8dc499bd1b490970d30ed6dcd2d16d14ab41ee8c
Reviewed-on: http://gerrit.cloudera.org:8080/20105
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-23 01:07:12 +00:00
Riza Suminto
b1467b1567 IMPALA-12200: Cap stats NDV from SetOperationStmt.createMetadata
Union operator will create merged ColumnStats at
SetOperationStmt.createMetadata where it adds all ColumnStats from its
input children. One of the stats being accumulated is NDV (num distinct
value). There is an opportunity to lower the resulting NDV if all source
expression is referring to the same column. This lower NDV can benefit
Aggregation node on top of the Union node because it can lower
cardinality and memory estimate of the Aggregation node.

Testing:
- Pass core tests.

Change-Id: Ic0bb2eff5005fdfb11adf31499214c63dd552c05
Reviewed-on: http://gerrit.cloudera.org:8080/20040
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-22 02:31:17 +00:00
Michael Smith
c75e8b28e6 IMPALA-12227: Configure Ozone datanode client port
Ozone with HDDS-8501 adds a new port to datanodes configured via
hdds.datanode.client.port. Add it to our minicluster configuration to
avoid BindException on startup.

Change-Id: Ifdd591c3e7d9755ddadf151650e5b477d9f492c8
Reviewed-on: http://gerrit.cloudera.org:8080/20086
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-21 22:52:38 +00:00
Peter Rozsa
6b571eb7e4 IMPALA-12184: Java UDF increment on an empty string is inconsistent
This change removes the Text-typed overload for BufferAlteringUDF to
avoid ambiguous function matchings. It also changes the 2-parameter
function in BufferAlteringUDF to cover Text typed arguments.

Tests:
 - test_udfs.py manually executed

Change-Id: I3a17240ce39fef41b0453f162ab5752f1c940f41
Reviewed-on: http://gerrit.cloudera.org:8080/20038
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-20 17:00:35 +00:00
stiga-huang
e294be7707 IMPALA-12128: Bump ORC C++ version to 1.7.9-p10
This bumps the ORC C++ version from 1.7.0-p14 to 1.7.9-p10 to add the
fixes of ORC-1041 and ORC-1304.

Tests:
 - Add e2e test for ORC-1304.
 - It's hard to add test for ORC-1041 since it won't cause crashes when
   compiling with gcc-10.

Change-Id: I26c39fe5b15ab0bcbe6b2af6fe7a45e48eaec6eb
Reviewed-on: http://gerrit.cloudera.org:8080/20090
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-20 10:24:33 +00:00
ttttttz
919d08a84f IMPALA-12164: Avoid referencing non-materialized slots in analytic limit pushdown
When creating single-node analytic plan, if the plan node is an
EmptySetNode, its tuple ids should not be considered. Also, when
registering conjuncts, if a constant FALSE conjunct is found, the
other conjuncts in the same list should be marked as assigned.

Tests:
 - Add FE and e2e regression tests

Change-Id: I9e078f48863c38062e1e624a1ff3e9317092466f
Reviewed-on: http://gerrit.cloudera.org:8080/19937
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-20 09:09:17 +00:00
Andrew Sherman
06eb62d3ef IMPALA-12197: Prevent assertion failures when isClusteringColumn() is called on a IcebergTimeTravelTable.
When using local catalog mode, if a runtime filter is being generated
for a time travel iceberg table, then a query may fail with "ERROR:
IllegalArgumentException: null"

In the planner an Iceberg table that is being accessed with Time Travel
is represented by an IcebergTimeTravelTable object. This object
represents a time-based variation on a base table. The
IcebergTimeTravelTable may represent a different schema from the base
table, it does this by tracking its own set of Columns. As part of
generating a runtime filter the isClusteringColumn() method is called
on the table. IcebergTimeTravelTable was delegating this call to the
base object. In local catalog mode this method is implemented by
LocalTable which has a Preconditions check (an assertion) that the
column parameter matches the stored column. In this case the check
fails as the base table and time travel table have their own distinct
set of column objects.

The fix is to have IcebergTimeTravelTable provide its own
isClusteringColumn() method. For iceberg there are no clustering
columns, so this method simply returns false.

TESTING
- Ran all end-to-end tests.
- Added test case for query that failed.

Change-Id: I51d04c8757fb48bd417248492d4615ac58085632
Reviewed-on: http://gerrit.cloudera.org:8080/20034
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-16 22:32:11 +00:00
Zoltan Borok-Nagy
c579212251 IMPALA-11877: (part 1) Add support for DELETE statements for UNPARTITIONED Iceberg tables
This patch adds support for DELETE statements on unpartitioned Iceberg
tables. Impala uses the 'merge-on-read' mode with position delete files.

The patch reuses the existing IcebergPositionDeleteTable as the target
table of the DELETE statements, because this table already has the same
schema as position delete files, even with correct Iceberg field IDs.

The patch basically rewrites DELETE statements to INSERT statements,
e.g.:

from:
 DELETE FROM ice_t WHERE id = 42;

to:
 INSERT INTO ice_t-POSITION-DELETE
 SELECT INPUT__FILE__NAME, FILE__POSITION
 FROM ice_t
 WHERE id = 42;

Position delete files need to be ordered by (file_path, pos), so
we add an extra SORT node before the table sink operator.

In the backend the patch adds a new table sink operator, the
IcebergDeleteSink. It writes the incoming rows (file_path, position) to
delete files. It reuses a lot of code from HdfsTableSink, so this patch
moves the common code to the new common base class: TableSinkBase.

The coordinator then collects the written delete files and invokes
UpdateCatalog to finalize the DELETE statement.

The Catalog then uses Iceberg APIs to create a new snapshot with the
created delete files. It also validates that there was no conflicting
data files written since the operation started.

Testing:
 * added planer test
 * e2e tests
 * interop test between Impala and Hive

Change-Id: Ic933b2295abe54b46d2a736961219988ff42915b
Reviewed-on: http://gerrit.cloudera.org:8080/19776
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
2023-06-13 11:33:32 +00:00
stiga-huang
47309d14ca IMPALA-12204: Fix redundant codegen info added in subplan profiles
The SUBPLAN node will open its right child node many times in its
GetNext(), depending on how many rows generated from its left child. The
right child of a SUBPLAN node is a subtree of operators. They should not
add codegen info into profile in their Open() method since it will be
invoked repeatedly.

Currently, DataSink and UnionNode have such an issue. This patch fixes
them by adding the codegen info to profile in Close() instead of Open(),
just like what we did in IMPALA-11200.

Tests:
 - Add e2e tests

Change-Id: I99a0a842df63a03c61024e2b77d5118ca63a2b2d
Reviewed-on: http://gerrit.cloudera.org:8080/20037
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2023-06-13 07:05:41 +00:00
Riza Suminto
81a41fca6d IMPALA-12183: Fix cardinality clamping across aggregation phases
In the Impala planner, an aggregation node's cardinality is a sum of all
its aggregation class cardinality. An aggregation class cardinality is a
simple multiplication of NDVs of contributing grouping columns. Since
this simple multiplication of NDVs can be greater than the aggregation
node's input cardinality, each aggregation class cardinality is further
clamped at the aggregation node's input cardinality.

An aggregation operator can translate into a chain of multi-phase
aggregation plan nodes. The longest possible aggregation phase is as
follows, from the bottom to the top:

1. FIRST
2. FIRST_MERGE
3. SECOND
4. SECOND_MERGE
5. TRANSPOSE

FIRST_MERGE aggregation maintains its aggregation class cardinality
clamping at its corresponding FIRST aggregation's input
cardinality (similar relationship between SECOND_MERGE and SECOND).
However, the SECOND aggregation was clamped at the FIRST_MERGE output
cardinality instead of the FIRST input cardinality. This cardinality
mispropagation can causes cardinality explosion in the later aggregation
phase and node operator above them.

This patch fix the clamping of multi-phase aggregation to always look at
input cardinality of FIRST aggregation node. An exception is made for
TRANSPOSE phase of grouping set aggregation (such as ROLLUP). In that
case, cardinality clamping will use output cardinality of child node
right below it (either FIRST_MERGE or SECOND_MERGE) because the output
cardinality of the whole aggregation chain can be higher than input
cardinality of the FIRST phase.

Testing:
- Add test in card-agg.test
- Pass core tests.

Change-Id: I1d414fe56b027f887c7f901d8a6799a388b16b95
Reviewed-on: http://gerrit.cloudera.org:8080/20009
Reviewed-by: Aman Sinha <amsinha@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-12 15:06:55 +00:00
Joe McDonnell
893c55b523 IMPALA-12198: Create $RANGER_LOG_DIR before stopping in kill-ranger-server.sh
If the $RANGER_LOG_DIR directory doesn't exist, then Ranger's
ranger-admin-services.sh will fail to issue the stop command
because it tries and fails to write to that directory. Ranger's
script believes that it has issued the stop to Ranger, so it
waits for 30 seconds for Ranger to stop. When Ranger doesn't stop,
it kills Ranger. This is an unnecessary delay in stopping Ranger.
It is common if a developer runs bin/clean.sh after starting
Ranger.

This modifies kill-ranger-server.sh to create $RANGER_LOG_DIR
before running ranger-admin-services.sh with the stop command.
Since the directory exists, the stop command is successfully
issued and the script won't wait 30 seconds.

Testing:
 - Hand testing with starting Ranger, then running bin/clean.sh,
   then running kill-ranger-server.sh

Change-Id: I6ba5a90172affde3949e9f9a7618bde0dfa8c309
Reviewed-on: http://gerrit.cloudera.org:8080/20028
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-10 02:04:46 +00:00
Minghui Zhu
88275f327f IMPALA-12051: Propagate analytic tuple predicates of outer-joined InlineView
In some cases, direct pushing down predicates that reference analytic
tuple into inline view leads to incorrect query results. While pushing
down analytic predicates (e.g. row_number() < 10), we should also divide
them into two groups. Some of them can be migrated into the view so are
removed in the current scope. Some of them can be copied into the view
but still need to be evaluated in the current scope as demonstrated with
the following query. The bug is due to we migrate all of them into the
view.

  WITH detail_measure AS (
    SELECT
      *
    FROM
      (
        VALUES
          (
            1 AS `isqbiuar`,
            1 AS `bgsfrbun`,
            1 AS `result_type`,
            1 AS `bjuzzevg`
          ),
          (2, 2, 2, 2)
      ) a
  ),
  order_measure_sql0 AS (
    SELECT
      row_number() OVER (
        ORDER BY
          row_number_0 DESC NULLS LAST,
          isqbiuar ASC NULLS LAST
      ) AS `row_number_0`,
      `isqbiuar`
    FROM
      (
        VALUES
          (1 AS `row_number_0`, 1 AS `isqbiuar`),
          (2, 2)
      ) b
  )
  SELECT
    detail_measure.`isqbiuar` AS `isqbiuar`,
    detail_measure.`bgsfrbun` AS `bgsfrbun`,
    detail_measure.`result_type` AS `result_type`,
    detail_measure.`bjuzzevg` AS `bjuzzevg`,
    `row_number_0` AS `row_number_0`
  FROM
    detail_measure
    LEFT JOIN order_measure_sql0
    ON order_measure_sql0.isqbiuar = detail_measure.isqbiuar
  WHERE
    row_number_0 BETWEEN 1
    AND 1
  ORDER BY
    `row_number_0` ASC NULLS LAST,
    `bgsfrbun` ASC NULLS LAST

The current incorrect result is:
+----------+----------+-------------+----------+--------------+
| isqbiuar | bgsfrbun | result_type | bjuzzevg | row_number_0 |
+----------+----------+-------------+----------+--------------+
| 2        | 2        | 2           | 2        | 1            |
| 1        | 1        | 1           | 1        | NULL         |
+----------+----------+-------------+----------+--------------+

The correct result is:
+----------+----------+-------------+----------+--------------+
| isqbiuar | bgsfrbun | result_type | bjuzzevg | row_number_0 |
+----------+----------+-------------+----------+--------------+
| 2        | 2        | 2           | 2        | 1            |
+----------+----------+-------------+----------+--------------+

In the plan, the analysis predicate is pushed down to the TOP-N node,
but not in the HASH JOIN node, which leads to incorrect results.

  ...

  05:HASH JOIN [RIGHT OUTER JOIN]
  |  hash predicates: isqbiuar = isqbiuar
  |  row-size=14B cardinality=2

  ...

  02:TOP-N [LIMIT=1]
  |  order by: row_number_0 DESC NULLS LAST, isqbiuar ASC NULLS LAST
  |  source expr: row_number() <= CAST(1 AS BIGINT)
  |  row-size=2B cardinality=1

  ...

The HASH JOIN node shoud be:

  05:HASH JOIN [RIGHT OUTER JOIN]
  |  hash predicates: isqbiuar = isqbiuar
  |  other predicates: row_number() <= 1, row_number() >= 1
  |  row-size=14B cardinality=2

Tests:
* Add plan tests in analytic-rank-pushdown.test
* Add e2e tests in analytic-fns.test

Change-Id: If6c209b2a64bad37d893ba8b520342bf1f9a7513
Reviewed-on: http://gerrit.cloudera.org:8080/19768
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-06 19:06:24 +00:00
Michael Smith
683bef1ca4 IMPALA-11253: Support testing with Java 11 (take 2)
Adds new environment variable IMPALA_JDK_VERSION which can be 'system',
'8', or '11'.  The default is 'system', which uses the same logic as
before. If set to 8 or 11, it will ignore the system java and search for
java of that specific version (based on specific directories for Ubuntu
and Redhat). This is used by bin/bootstrap_system.sh to determine
whether to install java 8 or java 11 (other versions can come later). If
IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens
needed to deal with the ehcache issue.

This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of
bootstrap_system.sh. Instead, it provides a new environment variable
IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over
IMPALA_JDK_VERSION.

This also updates the versions of Maven plugins related to the build.

Source and target releases are still set to Java 8 compatibility.

Adds a verifier to the end of run-all-tests that
InaccessibleObjectException is not present in impalad logs. Tested with

  JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
    CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
    run-all-tests.sh

Testing: ran test suite with Java 11

This reverts the revert commit 1b6011c, restoring these changes minus
code to update IMPALA_JDK_VERSION based on $JAVA -version as that could
break subsequent sourcing of impala-config.sh.

Change-Id: Ie16504ad5738b1f228f97044afd3d9017ccc6c53
Reviewed-on: http://gerrit.cloudera.org:8080/19928
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-25 16:04:29 +00:00
Zoltan Borok-Nagy
ca5de24d6a IMPALA-12153: Parquet STRUCT reader should fill position slots
Before this patch the Parquet STRUCT reader didn't fill the
position slots: collection position, file position. When users
queried these virtual columns Impala was crashed or returned
incorrect results.

The ORC scanner already worked correctly, but there was no tests
written for it.

Test:
 * e2e tests for both ORC / Parquet

Change-Id: I32a808a11f4543cd404ed9f3958e9b4e971ca1f4
Reviewed-on: http://gerrit.cloudera.org:8080/19911
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-24 01:18:56 +00:00
Michael Smith
1b6011c6a0 Revert "IMPALA-11253: Support testing with Java 11"
This reverts commit ee6395db76 as it is
not flexible enough at detecting Java automatically in likely build
environments.

Change-Id: I836c9f7fd10740b15f7e40b2e7f889ac7ee61fc3
Reviewed-on: http://gerrit.cloudera.org:8080/19908
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2023-05-21 14:00:14 +00:00
Michael Smith
ee6395db76 IMPALA-11253: Support testing with Java 11
Adds new environment variable IMPALA_JDK_VERSION which can be 'system',
'8', or '11'.  The default is 'system', which uses the same logic as
before. If set to 8 or 11, it will ignore the system java and search for
java of that specific version (based on specific directories for Ubuntu
and Redhat). This is used by bin/bootstrap_system.sh to determine
whether to install java 8 or java 11 (other versions can come later). If
IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens
needed to deal with the ehcache issue.

This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of
bootstrap_system.sh. Instead, it provides a new environment variable
IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over
IMPALA_JDK_VERSION.

This also updates the versions of Maven plugins related to the build.

Source and target releases are still set to Java 8 compatibility.

Adds a verifier to the end of run-all-tests that
InaccessibleObjectException is not present in impalad logs. Tested with

  JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
    CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
    run-all-tests.sh

Testing: ran test suite with Java 11

Change-Id: I15d309e2092c12d7fdd2c99b727f3a8eed8bc07a
Reviewed-on: http://gerrit.cloudera.org:8080/19539
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-05-19 22:32:00 +00:00
Daniel Becker
8785270451 IMPALA-12147: Allow collections of fixed length types as non-passthrough children of unions
IMPALA-12019 implemented support for collections of fixed length types
in the sorting tuple. This was made possible by implementing the
materialisation of these collections.

Building on this, this change allows such collections as non-passthrough
children of UNION ALL operations. Note that plain UNIONs are not
supported for any collections for other reasons and this patch does not
affect them or any other set operation.

Testing:
Tests in nested-array-in-select-list.test and
nested-map-in-select-list.test check that
 - the newly allowed cases work correctly and
 - the correct error message is given for collections of variable length
   types.

Change-Id: I14c13323d587e5eb8a2617ecaab831c059a0fae3
Reviewed-on: http://gerrit.cloudera.org:8080/19903
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-19 22:11:03 +00:00
Michael Smith
2fc4f74796 IMPALA-10186: Fix writing empty parquet page
Fixes writing an empty parquet page when a page fills (or reaches
parquet_page_row_count_limit) at the same time that its dictionary
fills.

When a page filled (or reached parquet_page_row_count_limit) at the same
time that the dictionary filled, Impala would first detect the page was
full and create a new page. It would then detect the dictionary is full
and create another page, resulting in an empty page.

Parquet readers like Hive error if they encounter an empty page. This
patch attempts to make it impossible to generate an empty page by
reworking AppendRow and adding DCHECKs for empty pages. Dictionary size
is checked on FinalizeCurrentPage so whenever a page is written, we also
flush the dictionary if full.

Addresses clang-tidy by adding override in source files.

Testing:
- new test for full page size reached with full dictionary
- new test for parquet_page_row_count_limit with full dictionary
- new test for parquet_page_row_count_limit followed by large value.
  This seems useful as a theoretical corner-case; it currently writes
  the too-large value to the page anyway, but if we ever start checking
  whether the first value will fit the page this could become an issue.

Change-Id: I90d30d958f07c6289a1beba1b5df1ab3d7213799
Reviewed-on: http://gerrit.cloudera.org:8080/19898
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-19 12:02:42 +00:00
Daniel Becker
ff3d0c7984 IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
As a first stage of IMPALA-10939, this change implements support for
including in the sorting tuple top-level collections that only contain
fixed length types (including fixed length structs). For these types the
implementation is almost the same as the existing handling of strings.

Another limitation is that structs that contain any type of collection
are not yet allowed in the sorting tuple.

Also refactored the RawValue::Write*() functions to have a clearer
interface.

Testing:
 - Added a new test table that contains many rows with arrays. This is
   queried in a new test added in test_sort.py, to ensure that we handle
   spilling correctly.
 - Added tests that have arrays and/or maps in the sorting tuple in
   test_queries.py::TestQueries::{test_sort,
       test_top_n,test_partitioned_top_n}.

Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Reviewed-on: http://gerrit.cloudera.org:8080/19660
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-18 09:56:55 +00:00
Csaba Ringhofer
4261225f65 IMPALA-6433: Add read support for PageHeaderV2
Parquet v2 means several changes in Parquet files compared to v1:

1. file version = 2 instead of 1

c185faf0c4/src/main/thrift/parquet.thrift (L1016)
Before this patch Impala rejected Parquet files with version!=1.

2. possible use of DataPageHeaderV2 instead DataPageHeader

c185faf0c4/src/main/thrift/parquet.thrift (L561)

The main differences compared to V1 DataPageHeader:
a. rep/def levels are not compressed, so the compressed part contains
   only the actual encoded values
b. rep/def levels must be RLE encoded (Impala only supports RLE encoded
   levels even for V1 pages)
c. compression can be turned on/off per page (member is_compressed)
d. number of nulls (member num_nulls) is required - in v1 it was
   included in statistics which is optional
e. number of rows is required (member num_rows) which can help with
   matching collection items with the top level collection

The patch adds support for understanding v2 data pages but does not
implement some potential optimizations:

a. would allow an optimization for queries that need only the nullness
of a column but not the actual value: as the values are not needed the
decompression of the page data can be skipped. This optimization is not
implemented - currently Impala materializes both the null bit and the
value for all columns regardless of whether the value is actually
needed.

d. could be also used for optimizations / additional validity checks
but it is not used currently

e. could make skipping rows easier but is not used, as the existing
scanner has to be able to skip rows efficiently also in v1 files so
it can't rely on num_rows

3. possible use of new encodings (e.g. DELTA_BINARY_PACKED)

No new encoding is added - when an unsupported encoding is encountered
Impala returns an error.

parquet-mr uses new encodings (DELTA_BINARY_PACKED, DELTA_BYTE_ARRAY)
for most types if the file version is 2, so with this patch Impala is
not yet able to read all v2 Parquet tables written by Hive.

4. Encoding PLAIN_DICTIONARY is deprecated and RLE_DICTIONARY is used
instead. The semantics of the two encodings are exactly the same.

Additional changes:
Some responsibilites are moved from ParquetColumnReader to
ParquetColumnChunkReader:
- ParquetColumnChunkReader decodes rep/def level sizes to hide v1/v2
  differences (see 2.a.)
- ParquetColumnChunkReader skips empty data pages in
  ReadNextDataPageHeader
- the state machine of ParquetColumnChunkReader is simplified by
  separating data page header reading / reading rest of the page

Testing:
- added 4 v2 Parquet test tables (written by Hive) to cover
  compressed / uncompressed and scalar/complex cases
- added EE and fuzz tests for the test tables above
- manual tested v2 Parquet files written by pyarrow
- ran core tests

Note that no test is added where some pages are compressed while
some are not. It would be tricky to create such files with existing
writers. The code should handle this case and it is very unlikely that
files like this will be encountered.

Change-Id: I282962a6e4611e2b662c04a81592af83ecaf08ca
Reviewed-on: http://gerrit.cloudera.org:8080/19793
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-12 18:31:03 +00:00
jasonmfehr
63d13a35f3 IMPALA-11880: Adds support for authenticating to Impala using JWTs.
This support was modeled after the LDAP authentication.

If JWT authentication is used, the Impala shell enforces the use of the
hs2-http protocol since the JWT is sent via the "Authentication"
HTTP header.

The following flags have been added to the Impala shell:
* -j, --jwt: indicates that JWT authentication will be used
* --jwt_cmd: shell command to run to retrieve the JWT to use for
  authentication

Testing
New Python tests have been added:
* The shell tests ensure that the various command line arguments are
  handled properly. Situations such as a single authentication method,
  JWTs cannot be sent in clear text without the proper arguments, etc
  are asserted.
* The Python custom cluster tests leverage a test JWKS and test JWTs.
  Then, a custom Impala cluster is started with the test JWKS. The
  Impala shell attempts to authenticate using a valid JWT, an expired
  (invalid) JWT, and a valid JWT signed by a different, untrusted JWKS.
  These tests also exercise the Impala JWT authentication mechanism and
  assert the prometheus JWT auth success and failure metrics are
  reported accurately.

Change-Id: I52247f9262c548946269fe5358b549a3e8c86d4c
Reviewed-on: http://gerrit.cloudera.org:8080/19837
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-11 23:22:05 +00:00
Riza Suminto
1d0b111bcf IMPALA-12091: Control scan parallelism by its processing cost
Before this patch, Impala still relies on MT_DOP option to decide the
degree of parallelism of the scan fragment when a query runs with
COMPUTE_PROCESSING_COST=1. This patch adds the scan node's processing
cost as another consideration to raise scan parallelism beyond MT_DOP.

Scan node cost is now adjusted to also consider the number of effective
scan ranges. Each scan range is given a weight of (0.5% *
min_processing_per_thread), which roughly means that one scan node
instance can handle at most 200 scan ranges.

Query option MAX_FRAGMENT_INSTANCES_PER_NODE is added as an upper
bound on scan parallelism if COMPUTE_PROCESSING_COST=true. If the number
of scan ranges is fewer than the maximum parallelism allowed by the scan
node's processing cost, that processing cost will be clamped down
to (min_processing_per_thread / number of scan ranges). Lowering
MAX_FRAGMENT_INSTANCES_PER_NODE can also clamp down the scan processing
cost in a similar way. For interior fragments, a combination of
MAX_FRAGMENT_INSTANCES_PER_NODE, PROCESSING_COST_MIN_THREADS, and the
number of available cores per node is accounted to determine maximum
fragment parallelism per node. For scan fragment, only the first two are
considered to encourage Frontend to choose a larger executor group as
needed.

Two new static state is added into exec-node.h: is_mt_fragment_ and
num_instances_per_node_. The backend code that refers to the MT_DOP
option is replaced with either is_mt_fragment_ or
num_instances_per_node_.

Two new criteria are added during effective parallelism calculation in
PlanFragment.adjustToMaxParallelism():

- If a fragment has UnionNode, its parallelism is the maximum between
  its input fragments and its collocated ScanNode's expected
  parallelism.
- If a fragment only has a single ScanNode (and no UnionNode), its
  parallelism is calculated in the same fashion as the interior fragment
  but will not be lowered anymore since it will not have any child
  fragment to compare with.

Admission control slots remain unchanged. This may cause a query to fail
admission if Planner selects scan parallelism that is higher than the
configured admission control slots value. Setting
MAX_FRAGMENT_INSTANCES_PER_NODE equal to or lower than configured
admission control slots value can help lower scan parallelism and pass
the admission controller.

The previous workaround to control scan parallelism by IMPALA-12029 is
now removed. This patch also disables IMPALA-10287 optimization if
COMPUTE_PROCESSING_COST=true. This is because IMPALA-10287 relies on a
fixed number of fragment instances in DistributedPlanner.java. However,
effective parallelism calculation is done much later and may change the
final number of instances of hash join fragment, rendering
DistributionMode selected by IMPALA-10287 inaccurate.

This patch is benchmarked using single_node_perf_run.py with the
following parameters:

args="-gen_experimental_profile=true -default_query_options="
args+="mt_dop=4,compute_processing_cost=1,processing_cost_min_threads=1 "
./bin/single_node_perf_run.py --num_impalads=3 --scale=10 \
    --workloads=tpcds --iterations=5 --table_formats=parquet/none/none \
    --impalad_args="$args" \
    --query_names=TPCDS-Q3,TPCDS-Q14-1,TPCDS-Q14-2,TPCDS-Q23-1,TPCDS-Q23-2,TPCDS-Q49,TPCDS-Q76,TPCDS-Q78,TPCDS-Q80A \
    "IMPALA-12091~1" IMPALA-12091

The benchmark result is as follows:
+-----------+-------------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+
| Workload  | Query       | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval  |
+-----------+-------------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+
| TPCDS(10) | TPCDS-Q23-1 | parquet / none / none | 4.62   | 4.54        |   +1.92%   |   0.23%    |   1.59%        | 5     |   +2.32%       | 1.15    | 2.67  |
| TPCDS(10) | TPCDS-Q14-1 | parquet / none / none | 5.82   | 5.76        |   +1.08%   |   5.27%    |   3.89%        | 5     |   +2.04%       | 0.00    | 0.37  |
| TPCDS(10) | TPCDS-Q23-2 | parquet / none / none | 4.65   | 4.58        |   +1.38%   |   1.97%    |   0.48%        | 5     |   +0.81%       | 0.87    | 1.51  |
| TPCDS(10) | TPCDS-Q49   | parquet / none / none | 1.49   | 1.48        |   +0.46%   | * 36.02% * | * 34.95% *     | 5     |   +1.26%       | 0.58    | 0.02  |
| TPCDS(10) | TPCDS-Q14-2 | parquet / none / none | 3.76   | 3.75        |   +0.39%   |   1.67%    |   0.58%        | 5     |   -0.03%       | -0.58   | 0.49  |
| TPCDS(10) | TPCDS-Q78   | parquet / none / none | 2.80   | 2.80        |   -0.04%   |   1.32%    |   1.33%        | 5     |   -0.42%       | -0.29   | -0.05 |
| TPCDS(10) | TPCDS-Q80A  | parquet / none / none | 2.87   | 2.89        |   -0.51%   |   1.33%    |   0.40%        | 5     |   -0.01%       | -0.29   | -0.82 |
| TPCDS(10) | TPCDS-Q3    | parquet / none / none | 0.18   | 0.19        |   -1.29%   | * 15.26% * | * 15.87% *     | 5     |   -0.54%       | -0.87   | -0.13 |
| TPCDS(10) | TPCDS-Q76   | parquet / none / none | 1.08   | 1.11        |   -2.98%   |   0.92%    |   1.70%        | 5     |   -3.99%       | -2.02   | -3.47 |
+-----------+-------------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+

Testing:
- Pass PlannerTest.testProcessingCost
- Pass test_executor_groups.py
- Reenable test_tpcds_q51a in TestTpcdsQueryWithProcessingCost with
  MAX_FRAGMENT_INSTANCES_PER_NODE set to 5
- Pass test_tpcds_queries.py::TestTpcdsQueryWithProcessingCost
- Pass core tests

Change-Id: If948e45455275d9a61a6cd5d6a30a8b98a7c729a
Reviewed-on: http://gerrit.cloudera.org:8080/19807
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-11 22:46:31 +00:00
Riza Suminto
bd4a817893 IMPALA-11123: Restore NumFileMetadataRead counter
NumFileMetadataRead counter was lost with the revert of commit
f932d78ad0. This patch restore
NumFileMetadataRead counter and also assertions in impacted iceberg test
files. Other impacted test files will be gradually restored with
reimplementation of optimized count star for ORC.

Testing:
- Pass core tests.

Change-Id: Ib14576245d978a127f688e265cab2f4ff519600c
Reviewed-on: http://gerrit.cloudera.org:8080/19854
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-09 00:05:36 +00:00
Riza Suminto
7ca20b3c94 Revert "IMPALA-11123: Optimize count(star) for ORC scans"
This reverts commit f932d78ad0.

The commit is reverted because it cause significant regression for
non-optimized counts star query in parquet format.

There are several conflicts that need to be resolved manually:
- Removed assertion against 'NumFileMetadataRead' counter that is lost
  with the revert.
- Adjust the assertion in test_plain_count_star_optimization,
  test_in_predicate_push_down, and test_partitioned_insert of
  test_iceberg.py due to missing improvement in parquet optimized count
  star code path.
- Keep the "override" specifier in hdfs-parquet-scanner.h to pass
  clang-tidy
- Keep python3 style of RuntimeError instantiation in
  test_file_parser.py to pass check-python-syntax.sh

Change-Id: Iefd8fd0838638f9db146f7b706e541fe2aaf01c1
Reviewed-on: http://gerrit.cloudera.org:8080/19843
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
2023-05-06 22:55:05 +00:00
Riza Suminto
69da2ff86e IMPALA-12106: Fix overparallelization of Union fragment by 1
IMPALA-10973 has a bug where a union fragment without a scan node can be
over-parallelized by the backend scheduler by 1. It is reproducible by
running TPC-DS Q11 with MT_DOP=1. This patch additionally checks that
such a fragment does not have an input fragment before randomizing the
host assignment.

Testing:
Add TPC-DS Q11 to test_mt_dop.py::TestMtDopScheduling::test_scheduling
and verify the number of fragment instances scheduled in the
ExecSummary.

Change-Id: Ic69e7c8c0cadb4b07ee398aff362fbc6513eb08d
Reviewed-on: http://gerrit.cloudera.org:8080/19816
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-03 03:16:29 +00:00
wzhou-code
296224a6fb IMPALA-12110: Create Kudu table in CTAS without specifying primary key
IMPALA-11809 adds support non unique primary key for Kudu table.
It allows to create Kudu table without specifying primary key since
partition columns could be promoted as non unique primary key. But
when creating Kudu table in CTAS without specifying primary key,
Impala returns parsing error.

This patch fixed the parsing issue for creating Kudu table in CTAS
without specifying primary key.

Testing:
 - Added new test cases in parsing unit-test and end-to-end unit-test.
 - Passed core tests.

Change-Id: Ia7bb0cf1954e0a4c3d864a800e929a88de272dd5
Reviewed-on: http://gerrit.cloudera.org:8080/19825
Reviewed-by: Abhishek Chennaka <achennaka@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-02 02:47:46 +00:00
LPL
8b6f9273ce IMPALA-12097: WITH CLAUSE should be skipped when optimizing COUNT(*) query on Iceberg table
When optimizing the simple count star query for the Iceberg table, the
WITH CLAUSE should be skipped, but that doesn't mean the SQL can't be
optimized, because when the WITH CLAUSE is inlined, the final statement
is optimized by the CountStarToConstRule.

Testing:
 * Add e2e tests

Change-Id: I7b21cbea79be77f2ea8490bd7f7b2f62063eb0e4
Reviewed-on: http://gerrit.cloudera.org:8080/19811
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-04-29 02:35:45 +00:00
Michael Smith
0a42185d17 IMPALA-9627: Update utility scripts for Python 3 (part 2)
We're starting to see environments where the system Python ('python') is
Python 3. Updates utility and build scripts to work with Python 3, and
updates check-pylint-py3k.sh to check scripts that use system python.

Fixes other issues found during a full build and test run with Python
3.8 as the default for 'python'.

Fixes a impala-shell tip that was supposed to have been two tips (and
had no space after period when they were printed).

Removes out-of-date deploy.py and various Python 2.6 workarounds.

Testing:
- Full build with /usr/bin/python pointed to python3
- run-all-tests passed with python pointed to python3
- ran push_to_asf.py

Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb
Reviewed-on: http://gerrit.cloudera.org:8080/19697
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-04-26 18:52:23 +00:00
Tamas Mate
422006a809 IMPALA-11950: Planner change for Iceberg metadata querying
This commit extends the planner with IcebergMetadataScanNode which will
be used to scan Iceberg metadata tables (IMPALA-10947). The scan node is
only implemented on the frontend side in this patch, the backend part
will be developed in IMPALA-11996.

To avoid executing the plan there is a hardcoded condition, it is after
the explain part, so the change remains testable with EXPLAIN queries.

Testing:
 - Added planner tests

Change-Id: I3675d7a57ca570bfec306798589b5ef6aa34b5c6
Reviewed-on: http://gerrit.cloudera.org:8080/19547
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-04-24 14:18:01 +00:00
hexianqing
4f1d8d4d39 IMPALA-11536: fix invalid predicates propagate for outer join simplification
When set ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION = true, the planner
will simplify outer joins if the WHERE clause contains at least one
null rejecting condition and then remove the outer-joined tuple id
from the map of GlobalState#outerJoinedTupleIds.
However, there may be false removals for right join simplification or
full join simplification. This may lead to incorrect results since it
is incorrect to propagate a non null-rejecting predicate into a plan
subtree that is on the nullable side of an outer join.
GlobalState#outerJoinedTupleIds indicates whether a table is on the
nullable side of an outer join.

E.g.
SELECT COUNT(*)
FROM functional.nullrows t1
  FULL JOIN functional.nullrows t2 ON t1.id = t2.id
  FULL JOIN functional.nullrows t3 ON coalesce(t1.id, t2.id) = t3.id
WHERE t1.group_str = 'a'
  AND coalesce(t2.group_str, 'f') = 'f'
The predicate coalesce(t2.group_str, 'f') = 'f' will propagate into t2
if we remove t2 from GlobalState#outerJoinedTupleIds.

Testing:
- Add new plan tests in outer-to-inner-joins.test
- Add new query tests to verify the correctness on transformation

Change-Id: I6565c5bff0d2f24f30118ba47a2583383e83fff7
Reviewed-on: http://gerrit.cloudera.org:8080/19116
Reviewed-by: Qifan Chen <qfchen@hotmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-04-22 23:24:10 +00:00
Gabor Kaszab
7e0feb4a8e IMPALA-11701 Part1: Don't push down predicates to scanner if already applied by Iceberg
We push down predicates to Iceberg that uses them to filter out files
when getting the results of planFiles(). Using the
FileScanTask.residual() function we can find out if we have to use
the predicates to further filter the rows of the given files or if
Iceberg has already performed all the filtering.
Basically if we only filter on IDENTITY-partition columns then Iceberg
can filter the files and using these filters in Impala wouldn't filter
any more rows from the output (assuming that no partition evolution was
performed on the table).

An additional benefit of not pushing down no-op predicates to the
scanner is that we can potentially materialize less slots.
For example:

SELECT count(1) from iceberg_tbl where part_col = 10;

Another additional benefit comes with count(*) queries. If all the
predicates are skipped from being pushed to Impala's scanner for a
count(*) query then the Parquet scanner can go to an optimized path
where it uses stats instead of reading actual data to answer the query.

In the above query Iceberg filters the files using the predicate on
a partition column and then there won't be any need to materialize
'part_col' in Impala, nor to push down the 'part_col = 10' predicate.

Note, this is an all or nothing approach, meaning that assuming N
number of predicates we either push down all predicates to the scanner
or none of them. There is a room for improvement to identify a subset
of the predicates that we still have to push down to the scanner.
However, for this we'd need a mapping between Impala predicates and the
predicates returned by Iceberg's FileScanTask.residual() function that
would significantly increase the complexity of the relevant code.

Testing:
  - Some existing tests needed some extra care as they were checking
    for predicates being pushed down to the scanner, but with this
    patch not all of them are pushed down. For these tests I added some
    extra predicates to achieve that all of the predicates are pushed
    down to the scanner.
  - Added a new planner test suite for checking how predicate push down
    works with Iceberg tables.

Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b
Reviewed-on: http://gerrit.cloudera.org:8080/19534
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-04-21 15:22:17 +00:00
skyyws
563d5c50b8 IMPALA-7942 (part 2): Add query hints for predicate selectivities
Currently, Impala only uses simple estimation to compute selectivity.
For some predicates, this may lead to worse query plan due to CBO.

This patch adds a new query hint: 'SELECTIVITY' to help specify a
selectivity value for a predicate.

The parser will interpret expressions wrapped in () followed by a
C-style comment /* <predicate hint> */ as a predicate hint. The
predicate hint currently can be in the form of +SELECTIVITY(f) where
'f' is a positive floating point number, in the range of (0, 1], to
use as the selectivity for the preceding expression.

Single predicate example:

  select col from t where (a=1) /* +SELECTIVITY(0.5) */;

Compound predicate example:

  select col from t where (a=1 or b=2) /* +SELECTIVITY(0.5) */;

As a limitation of this path, the selectivity hints for 'AND' compound
predicates, either in the original SQL query or internally generated,
are ignored. We may supported this in the near future.

Testing:
- Added new fe tests in 'PlannerTest'
- Added new fe tests in 'AnalyzeStmtsTest' for negative cases

Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b
Reviewed-on: http://gerrit.cloudera.org:8080/18023
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Qifan Chen <qfchen@hotmail.com>
2023-04-20 10:14:08 +00:00
Daniel Becker
b73847f178 IMPALA-10851: Codegen for structs
IMPALA-9495 added support for struct types in SELECT lists but only with
codegen turned off. This commit implements codegen for struct types.

To facilitate this, code generation for reading and writing 'AnyVal's
has been refactored. A new class, 'CodegenAnyValReadWriteInfo' is
introduced. This class is an interface between sources and destinations,
one of which is an 'AnyVal' object: sources generate an instance of this
class and destinations take that instance and use it to write the value.

The other side can for example be tuples from which we read (in the case
of 'SlotRef') or tuples we write into (in case of materialisation, see
Tuple::CodegenMaterializeExprs()). The main advantage is that sources do
not have to know how to write their destinations, only how to read the
values (and vice versa).

Before this change, many tests that involve structs ran only with
codegen turned off. Now that codegen is supported in these cases, these
tests are also run with codegen on.

Testing:
  - enabed tests for structs in the select list with codegen on in
    tests/query_test/test_nested_types.py
  - enabled codegen in other tests where it used to be disabled because
    it was not supported.

Change-Id: I5272c3f095fd9f07877104ee03c8e43d0c4ec0b6
Reviewed-on: http://gerrit.cloudera.org:8080/18526
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-04-14 13:46:59 +00:00