This commit adds a scan node for querying Iceberg metadata tables. The
scan node creates a Java scanner object that creates and scans the
metadata table. The scanner uses the Iceberg API to scan the table after
that the scan node fetches the rows one by one and materialises them
into RowBatches. The Iceberg row reader on the backend does the
translation between Iceberg and Impala types.
There is only one fragment created to query the Iceberg metadata table
which is supposed to be executed on the coordinator node that already
has the Iceberg table loaded. This way there is no need for further
table loading on the executor side.
This change will not cover nested column types, these slots are set to
NULL, it will be done in IMPALA-12205.
Testing:
- Added e2e tests for querying metadata tables
- Updated planner tests
Performance testing:
Created a table and inserted ~5500 rows one by one, this generated
~270000 ALL_MANIFESTS metadata table records. This table is quite wide
and has a String column as well.
I only mention count(*) test on ALL_MANIFESTS, because every row is
materialized in every scenario currently:
- Cold cache: 15.76s
- IcebergApiScanTime: 124.407ms
- MaterializeTupleTime: 8s368ms
- Warm cache: 7.56s
- IcebergApiScanTime: 3.646ms
- MaterializeTupleTime: 7s477ms
Change-Id: I0e943cecd77f5ef7af7cd07e2b596f2c5b4331e7
Reviewed-on: http://gerrit.cloudera.org:8080/20010
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-11068 added three new tests into
hdfs-scanner-thread-mem-scaling.test. The first one is failing
intermittently, most likely due to fragment right above the scan does
not pull row batches fast enough. This patch attempt to deflake the
tests by replacing it with simple count start query. The three test
cases is now contained in its own
test_hdfs_scanner_thread_non_reserved_bytes and will be skipped for
sanitized build.
Testing:
- Loop and pass test_hdfs_scanner_thread_non_reserved_bytes a hundred
times.
Change-Id: I7c99b2ef70b71e148cedb19037e2d99702966d6e
Reviewed-on: http://gerrit.cloudera.org:8080/20593
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch uses the "external data source" mechanism in Impala to
implement data source for querying JDBC.
It has some limitations due to the restrictions of "external data
source":
- It is not distributed, e.g, fragment is unpartitioned. The queries
are executed on coordinator.
- Queries which read following data types from external JDBC tables
are not supported:
BINARY, CHAR, DATETIME, and COMPLEX.
- Only support binary predicates with operators =, !=, <=, >=,
<, > to be pushed to RDBMS.
- Following data types are not supported for predicates:
DECIMAL, TIMESTAMP, DATE, and BINARY.
- External tables with complex types of columns are not supported.
- Support is limited to the following databases:
MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.
- Catalog V2 is not supported (IMPALA-7131).
- DataSource objects are not persistent (IMPALA-12375).
Additional fixes are planned on top of this patch.
Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.
In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.
2. Copy the jar files of JDBC drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-ext-data-sources.sh
3. Create an `alltypes` table in the Postgres database.
${IMPALA_HOME}/testdata/bin/load-ext-data-sources.sh
4. Create data source tables (alltypes_jdbc_datasource and
alltypes_jdbc_datasource_2).
${IMPALA_HOME}/bin/impala-shell.sh -f\
${IMPALA_HOME}/testdata/bin/create-ext-data-source-table.sql
5. It's ready to run query to access data source tables created
in last step. Don't need to restart Impala cluster.
Testing:
- Added unit-test for Postgres and ran unit-test with JDBC driver
postgresql-42.5.1.jar.
- Ran manual unit-test for MySql with JDBC driver
mysql-connector-j-8.1.0.jar.
- Ran core tests successfully.
Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Reviewed-on: http://gerrit.cloudera.org:8080/17842
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Reviewed-by: Kurt Deschler <kdeschle@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Under heavy decompression workload, Impala running with scanner thread
parallelism (MT_DOP=0) can still hit OOM error due to launching too many
threads too soon. We have logic in ScannerMemLimiter to limit the number
of scanner threads by calculating the thread's memory requirement and
estimating the memory growth rate of all threads. However, it does not
prevent a scanner node from quickly launching many threads and
immediately reaching the memtracker's spare capacity. Even after
ScannerMemLimiter rejects a new thread launch, some existing threads
might continue increasing their non-reserved memory for decompression
work until the memory limit exceeded.
IMPALA-7096 adds hdfs_scanner_thread_max_estimated_bytes flag as a
heuristic to count for non-reserved memory growth. Increasing this flag
value can help reduce thread count, but might severely regress other
queries that do not have heavy decompression characteristics. Similarly
with lowering the NUM_SCANNER_THREADS query option.
This patch adds one more query option as an alternative to mitigate OOM
called HDFS_SCANNER_NON_RESERVED_BYTES. This option is intended to offer
the same control as hdfs_scanner_thread_max_estimated_bytes, but as a
query option such that tuning can be done at per query granularity. If
this query option not set, set to 0, or negative value, backend will
revert to use the value of hdfs_scanner_thread_max_estimated_bytes flag.
Testing:
- Add test case in query-options-test.cc and
TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling.
Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Reviewed-on: http://gerrit.cloudera.org:8080/18126
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
If an Iceberg table is frequently updated/written to in small batches,
a lot of small files are created. This decreases read performance.
Similarly, frequent row-level deletes contribute to this problem
by creating delete files, which have to be merged on read.
So far INSERT OVERWRITE (rewriting the table with itself) has been used
to compact Iceberg tables.
However, it comes with some RESTRICTIONS:
- The table should not have multiple partition specs/partition evolution.
- The table should not contain complex types.
The OPTIMIZE statement offers a new syntax and a solution limited to
Iceberg tables to enhance read performance for subsequent operations.
See IMPALA-12293 for details.
Syntax: OPTIMIZE TABLE <table_name>;
This first patch introduces the new syntax, temporarily as an alias
for INSERT OVERWRITE.
Note that executing OPTIMIZE TABLE requires ALL privileges.
Testing:
- negative tests
- FE planner test
- Ranger test
- E2E tests
Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Reviewed-on: http://gerrit.cloudera.org:8080/20405
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is mainly a revert of https://gerrit.cloudera.org/#/c/640/ but
some parts had to be updated due to changes in Impala.
See IMPALA-2201 for details about why this optimization was removed.
The patch can massively speed up COMPUTE STATS statement when the
majority of partitions has no changes.
COMPUTE STATS tpcds_parquet.store_sales;
before: 12s
after: 1s
Besides the DDL speed up the number of HMS events generated is also
reduced.
Testing:
- added test to verify COMPUTE STATS output
- correctness of cases when something is modified should be covered
by existing tests
- core tests passed
Change-Id: If2703e0790d5c25db98ed26f26f6d96281c366a3
Reviewed-on: http://gerrit.cloudera.org:8080/20505
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
IMPALA-6590 disabled expression rewrites for ValuesStmt. However,
CompoundVerticalBarExpr (||) cannot be executed directly without
rewrite. This is because it could either be an OR operation with boolean
arguments or CONCAT function call with string arguments.
Backend cannot evaluate a BetweenPredicate and relies on rewriting
BetweenPredicate into a conjunctive or disjunctive CompoundPredicate.
This patch enables non-optional expression rewrites for ValuesStmt with
CompoundVerticalBarExpr or BetweenPredicate.
Testing:
- Extended ExprRewriterTest and Planner test to have values clause
with || and Between predicate
Change-Id: I99b8b33bf6468d12b9e26f0a6e744feb7072619c
Reviewed-on: http://gerrit.cloudera.org:8080/18581
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
We currently depend on a snapshot version of Kudu between 1.16.0 and
1.17.0. This patch bumps the dependent Kudu version to 1.17.0.
Modified the expected error message when inserting to Auto-Incrementing
column (changed in KUDU-1945).
Tests:
- Verified kudu insert tests locally
Change-Id: I01ec8928adfbeb0788356b49f1341088dc132e19
Reviewed-on: http://gerrit.cloudera.org:8080/20452
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
This commit updates the iceberg_v2_delete_equality test table. The
previous table was a modified positional delete table and the delete
files were actually positional delete files.
This table was created with Flink, which is one of the few services that
can write equality delete files currently.
Testing:
- Queried the table manually from Hive/Impala
- Ran the related E2E tests
Change-Id: I2d7e5928aff95ed09b1d7725b31a1698e3c31835
Reviewed-on: http://gerrit.cloudera.org:8080/20422
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Prototype of HdfsJsonScanner implemented based on rapidjson, which
supports scanning data from splitting json files.
The scanning of JSON data is mainly completed by two parts working
together. The first part is the JsonParser responsible for parsing the
JSON object, which is implemented based on the SAX-style API of
rapidjson. It reads data from the char stream, parses it, and calls the
corresponding callback function when encountering the corresponding JSON
element. See the comments of the JsonParser class for more details.
The other part is the HdfsJsonScanner, which inherits from HdfsScanner
and provides callback functions for the JsonParser. The callback
functions are responsible for providing data buffers to the Parser and
converting and materializing the Parser's parsing results into RowBatch.
It should be noted that the parser returns numeric values as strings to
the scanner. The scanner uses the TextConverter class to convert the
strings to the desired types, similar to how the HdfsTextScanner works.
This is an advantage compared to using number value provided by
rapidjson directly, as it eliminates concerns about inconsistencies in
converting decimals (e.g. losing precision).
Added a startup flag, enable_json_scanner, to be able to disable this
feature if we hit critical bugs in production.
Limitations
- Multiline json objects are not fully supported yet. It is ok when
each file has only one scan range. However, when a file has multiple
scan ranges, there is a small probability of incomplete scanning of
multiline JSON objects that span ScanRange boundaries (in such cases,
parsing errors may be reported). For more details, please refer to
the comments in the 'multiline_json.test'.
- Compressed JSON files are not supported yet.
- Complex types are not supported yet.
Tests
- Most of the existing end-to-end tests can run on JSON format.
- Add TestQueriesJsonTables in test_queries.py for testing multiline,
malformed, and overflow in JSON.
Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569
Reviewed-on: http://gerrit.cloudera.org:8080/19699
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch forbids creating an EXTERNAL Iceberg table that points
to another Iceberg table in the Hive Catalog. I.e. the following should
be forbidden:
CREATE EXTERNAL TABLE ice_ext
STORED BY ICEBERG
TBLPROPERTIES ('iceberg.table_identifier'='db.tbl');
Loading such tables should also raise an error. Users need to query
the original Iceberg tables. Alternatively they can create VIEWs if
they want to query tables with a different name.
Testing:
* added e2e tests for CREATE EXTERNAL TABLE
* added e2e test about loading such table
Change-Id: Ifb0d7f0e7ec40fba356bd58b43f68d070432de71
Reviewed-on: http://gerrit.cloudera.org:8080/20429
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In filter table, PartialUpdates is intended to mark if coordinator
receive only partial update from contributing fragments. This can be
misleading for LOCAL filter in column "Min value", "Max value", and
"In-list size", because LOCAL filter does not aggregate in coordinator
anymore. Thus, coordinator does not receive any filter update.
This patch mark such column value as "LOCAL" if no global aggregation is
expected in coordinator.
Testing:
- Pass core tests.
Change-Id: I56078a458799671246ff90b831e5ecebd04a78e8
Reviewed-on: http://gerrit.cloudera.org:8080/20397
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
regr_intercept() and regr_r2()
The linear regression functions fit an ordinary-least-squares regression
line to a set of number pairs. They can be used both as aggregate and
analytic functions.
regr_slope() takes two arguments of numeric type and returns the slope
of the line.
regr_intercept() takes two arguments of numeric type and returns the
y-intercept of the regression line.
regr_r2() takes two arguments of numeric type and returns the
coefficient of determination (also called R-squared or goodness of fit)
for the regression.
Testing:
The functions are extensively tested and cross-checked with Hive. The
tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Reviewed-on: http://gerrit.cloudera.org:8080/19569
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit addresses an issue in the CastExpr class where the clone
constructor was not properly preserving compatibility settings. The
clone constructor assigned the default compatibility regardless of the
source expression, causing substitution errors for partitioned tables.
Example:
'insert into unsafe_insert_partitioned(int_col, string_col)
values("1", null), (null, "1")'
Throws:
ERROR: IllegalStateException: Failed analysis after expr substitution.
CAUSED BY: IllegalStateException: cast STRING to INT
Tests:
- new test case added to insert-unsafe.test
Change-Id: Iff64ce02539651fcb3a90db678f74467f582648f
Reviewed-on: http://gerrit.cloudera.org:8080/20385
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Previous to this patch we tried to load table
iceberg_lineitem_multiblock with HDFS block size 524288. This failed
in builds that use HDFS erasure coding which requires block size at
least 1048576.
This patch increases the block size to 1048576. This also triggers
the bug that was fixed by IMPALA-12327. But to have more tests with
multiblock tables this patch also adds table iceberg_lineitem_sixblocks
and few tests with different MT_DOP settings.
Testing:
* tested in build with HDFS EC
Change-Id: Iad15a335407c12578eb822bb1cb4450647502e50
Reviewed-on: http://gerrit.cloudera.org:8080/20359
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
KuduPartitionExpr::CloseEvaluator() didn't call the base class's
CloseEvaluator() which led to not closing its child expressions
recursively.
Testing:
- added a regression test
Change-Id: Ifcf35721e16ba780d01c19e916d58fb2e77a0836
Reviewed-on: http://gerrit.cloudera.org:8080/20330
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The Iceberg delete node tries to do mini merge-joins between data
records and delete records. This works in BROADCAST mode, and most of
the time in PARTITIONED mode as well. Though the Iceberg delete node had
the wrong assumption that if the rows in a row batch belong to the same
file, and come in ascending order, we rely on the previous delete
updating IcebergDeleteState to the next deleted row id and skip the
binary search if it's greater than or equal to the current probe row id.
When PARTITIONED mode is used, we cannot rely on ascending row order,
not even inside row batches, not even when the previous file path is the
same as the current one. This is because files with multiple blocks can
be processed by multiple hosts in parallel, then the rows are getting
hash-exchanged based on their file paths. Then the exchange-receiver at
the LHS coalesces the row batches from multiple senders, hence the row
IDs being unordered.
This patch adds a fix to ignore presumptions and do a binary search when
the position-based difference between the current row and previous row
is not one, and we are in PARTITIONED mode.
Tests:
* added e2e tests
Change-Id: Ib89a53e812af8c3b8ec5bc27bca0a50dcac5d924
Reviewed-on: http://gerrit.cloudera.org:8080/20295
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
UnnestExpr can be used in a sort tuple which won't have a resolved path.
In such case, there is no parent tuple id so we should not check it in
UnnestExpr.isBoundByTupleIds(). This fixes the NullPointerException.
Tests
- Added fe and e2e tests for sorting columns come from unnest().
- Move test_zipping_unnest_from_view to a dedicated class that only has
the parquet file_format dimension to make sure it also runs in "core"
exploration strategy. Before this it only runs in "exhaustive"
strategy.
Change-Id: I43e1ef0467edfb8a4a23047a439426053c93ad72
Reviewed-on: http://gerrit.cloudera.org:8080/20274
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this patch, the NDV used for bloom filter sizing was based only
on the cardinality of the build side. This is ok for FK/PK joins but
can highly overestimate NDV if the build key column's NDV is smaller
than the number of rows.
This change takes the minimum of NDV (not changed by selectiveness)
and cardinality (reduced by selectiveness).
Testing:
- Adjust test_bloom_filters and test_row_filters, raising the NDV of
the test case such that the assertion is maintained.
- Add 8KB bloom filter test case in test_bloom_filters.
Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Reviewed-on: http://gerrit.cloudera.org:8080/19506
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
Expression substitution recreates cast expressions without considering
the compatibility level introduced by IMPALA-10173. In unsafe mode, the
recreation causes IllegalStateException. This change fixes this
behavior by storing the compatibility level in each CastExpr, and
reusing it when the expression substitution recreates the cast
expression.
For example: 'select "1", "1" union select 1, "1"'
Also, Set operation's common type calculations did not distinguish
compatibility levels for each column slot, if one column slot's common
type was considered unsafe, every other slot was treated as unsafe.
This change fixes this behavior by reinitializing the compatibility
level for every column slot, enabling cases where one column slot
contains unsafely casted constant values and another contains
non-constant expressions with regular casts.
These queries failed before this change with 'Unsafe implicit cast is
prohibited for non-const expression' error.
For example: 'select "1", 1 union select 1, int_col from unsafe_insert'
Tests:
- test cases added to insert-unsafe.test
Change-Id: I39d13f177482f74ec39570118adab609444c6929
Reviewed-on: http://gerrit.cloudera.org:8080/20184
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Partitioned Hash Join with a limit could hang when using mt_dop>0, due
to the cyclic barrier in PHJBuilder is not cancelled properly. Added
possibility to unregister threads from the synchronization and a call
to it to PHJNode::Close(), so closing threads won't block still
processing ones.
Testing:
- Added new unit tests covering new feature
- Added e2e test to make sure the join does not hang
Change-Id: I8be75c7ce99c015964c8bbb547539e6619ba4f9b
Reviewed-on: http://gerrit.cloudera.org:8080/20179
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change fixes mismatched type problems when an implicitly casted
string literal gets converted to a numeric type. Example:
'INSERT INTO example(float_col) VALUES ("0"), (15629);'
After this change, StringLiteral's 'convertToNumber' method will
consider the targetType parameter when creates a new NumericLiteral.
Test:
- test case added to insert-unsafe.test
Change-Id: I2141e7ab164af55a7fa66dda05fe6dcbd7379b69
Reviewed-on: http://gerrit.cloudera.org:8080/20197
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
If, in a VALUES clause, for the same column all of the values are CHAR
types but not all are of the same length, the common type chosen is
CHAR(max(lengths)). This means that shorter values are padded with
spaces. If the destination column is not CHAR but VARCHAR or STRING,
this produces different results than if the values in the column are
inserted individually, in separate statements. This behaviour is
suboptimal because information is lost.
For example:
CREATE TABLE impala_char_insert (s STRING);
-- all values are CHAR(N) with different N, but all will use the
biggest N
INSERT OVERWRITE impala_char_insert VALUES
(CAST("1" AS CHAR(1))),
(CAST("12" AS CHAR(2))),
(CAST("123" AS CHAR(3)));
SELECT length(s) FROM impala_char_insert;
3
3
3
-- if inserted individually, the result is
SELECT length(s) FROM impala_char_insert;
1
2
3
This patch adds the query option VALUES_STMT_AVOID_LOSSY_CHAR_PADDING
which, when set to true, fixes the problem by implicitly casting the
values to the VARCHAR type of the longest value if all values in a
column are CHAR types AND not all have the same length. This VARCHAR
type will be the common type of the column in the VALUES statement.
The new behaviour is not turned on by default because it is a breaking
change.
Note that the behaviour in Hive is different from both behaviours in
Impala: Hive (and PostgreSQL) implicitly remove trailing spaces from
CHAR values when they are cast to other types, which is also lossy.
We choose VARCHAR instead of STRING as the common type because VARCHAR
can be converted to any VARCHAR type shorter or the same length and also
to STRING, while STRING cannot safely be converted to VARCHAR because
its length is not bounded - we would therefore run into problems if the
common type were STRING and the destination column were VARCHAR.
Note: although the VALUES statement is implemented as a special UNION
operation under the hood, this patch doesn't change the behaviour of
explicit UNION statements, it only applies to VALUES statements.
Note: the new VALUES_STMT_AVOID_LOSSY_CHAR_PADDING query option and
ALLOW_UNSAFE_CASTS are not allowed to be used at the same time: if both
are set to true and the query contains set operation(s), an error is
returned.
Testing:
- Added tests verifying that unneeded padding doesn't occur and the
queries succeed in various situations, e.g. different destination
column types and multi-column inserts. See
testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test
Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Reviewed-on: http://gerrit.cloudera.org:8080/18999
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to integrate
features of PartitionedHashJoin if needed (partitioning, spilling).
ICEBERG_DELETE_JOIN is added as a join operator which is used only by
IcebergDeleteNode node.
IcebergDeleteBuild processes the data from the relevant delete files and
stores them in a {file_path: ordered row id vector} hash map.
IcebergDeleteNode tracks the processed file and progresses through the
row id vector parallel to the probe batch to check if a row is deleted
or hashes the probe row's file path and uses binary search to find the
closest row id if it is needed for the check.
Testing:
- Duplicated related planner tests to run both with new operator and
hash join
- Added a dimension for e2e tests to run both with new operator and
hash join
- Added new multiblock tests to verify assumptions used in new
operator to optimize probing
- Added new test with BATCH_SIZE=2 to verify in/out batch handling
with new operator
Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Reviewed-on: http://gerrit.cloudera.org:8080/19850
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch implements the migration from legacy Hive tables to Iceberg
tables. The target Iceberg tables inherit the location of the original
Hive tables. The Hive table has to be a non-transactional table.
To migrate a Hive format table stored in a distributed system or object
store to an Iceberg table use the command:
ALTER TABLE [dbname.]table_name CONVERT TO ICEBERG [TBLPROPERTIES(...)];
Currently only 'iceberg.catalog' is allowed as a table property.
For example
- ALTER TABLE hive_table CONVERT TO ICEBERG;
- ALTER TABLE hive_table CONVERT TO ICEBERG TBLPROPERTIES(
'iceberg.catalog' = 'hadoop.tables');
The HDFS table to be converted must follow those requirements:
- table is not a transactional table
- InputFormat must be either PARQUET, ORC, or AVRO
This is an in-place migration so the original data files of the legacy
Hive table are re-used and not moved, copied or re-created by this
operation. The new Iceberg table will have the 'external.table.purge'
property set to true after the migration.
NUM_THREADS_FOR_TABLE_MIGRATION query option can control the maximum
number of threads to execute the table conversion. The default value is
one, meaning that table conversion runs on one thread. It can be
configured in a range of [0, 1024]. Zero means that the number of CPU
cores will be the degree of parallelism. A value greater than zero will
imply the number of threads used for table conversion, however, there
is a cap of the number of CPU cores as the highest degree of
parallelism.
Process of migration:
- Step 1: Setting table properties,
e.g. 'external.table.purge'=false on the HDFS table.
- Step 2: Rename the HDFS table to a temporary table name using a name
format of "<original_table_name>_tmp_<random_ID>".
- Step 3: Refresh the renamed HDFS table.
- Step 4: Create an external Iceberg table by Iceberg API using the
data of the Hdfs table.
- Step 5 (Optional): For an Iceberg table in Hadoop Tables, run a
CREATE TABLE query to add the Iceberg table to HMS as well.
- Step 6 (Optional): For an Iceberg table in Hive catalog, run an
INVALIDATE METADATA to make the new table available for all
coordinators right after the conversion finished.
- Step 7 (Optional): For an Iceberg table in Hadoop Tables, set the
'external.table.purge' property to true in an ALTER TABLE
query.
- Step 8: Drop the temporary HDFS table.
Testing:
- Add e2e tests
- Add FE UTs
- Manually tested the runtime performance for a table that is
unpartitioned and has 10k data files. The runtime is around 10-13s.
Co-authored-by: lipenglin <lipenglin@apache.org>
Change-Id: Iacdad996d680fe545cc9a45e6bc64a348a64cd80
Reviewed-on: http://gerrit.cloudera.org:8080/20077
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tamas Mate <tmater@apache.org>
Apache Atlas needs table type information to correctly build the lineage
graph. This patch set adds a new field to the metadata of the lineage
graph vertices: 'tableType'.
Table type can be:
* hive
* iceberg
* kudu
* hbase
* view
* virtual
* external-datasource
Tests:
* updated current tests with the new field
* added new tests focusing on Iceberg
Change-Id: I13aeb256ff6b1d0e3c2eb43f7f75513ffc2cd20e
Reviewed-on: http://gerrit.cloudera.org:8080/20120
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds an expiremental query option called ALLOW_UNSAFE_CASTS
which allows implicit casting between some numeric types and string
types. A new type of compatibility is introduced for this purpose, and
the compatibility rule handling is refactored also. The new approach
uses an enum to differentiate the compatibility levels, and to make it
easier to pass them through methods. The unsafe compatibility is used
only in two cases: for set operations and for insert statements. The
insert statements and set operations accept unsafe implicitly casted
expressions only when the source expressions are constant.
The following implicit type casts are enabled in unsafe mode:
- String -> Float, Double
- String -> Tinyint, Smallint, Int, Bigint
- Float, Double -> String
- Tinyint, Smallint, Int, Bigint -> String
The patch also covers IMPALA-3217, and adds two more rules to handle
implicit casting in set operations and insert statements between string
types:
- String -> Char(n)
- String -> Varchar(n)
The unsafe implicit casting requires that the source expression must be
constant in this case as well.
Tests:
- tests added to AnalyzeExprsTest.java
- new test class added to test_insert.py
Change-Id: Iee5db2301216c2e088b4b3e4f6cb5a1fd10600f7
Reviewed-on: http://gerrit.cloudera.org:8080/19881
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for the DELETE operation for partitioned
Iceberg tables. It does so by writing position delete files
(merge-on-read strategy). The delete files contain the file path
and file position of the deleted records. The delete files must
reside in the same partition as the data files they are referring
to.
To execute the DELETE statement for a given table 'tbl', we are
basically doing an INSERT to the virtual DELETE table
'tbl-POSITION-DELETE':
from:
DELETE FROM ice_t WHERE id = 42;
to:
INSERT INTO ice_t-POSITION-DELETE
SELECT INPUT__FILE__NAME, FILE__POSITION
FROM ice_t
WHERE id = 42;
The above was true for unpartitioned Iceberg tables.
If the table is partitioned, we need to shuffle the rows around
executors based on the partitions they belong, then sort the rows
based on the partitions (also based on 'file_path' and 'pos'), so
writers can work on partitions sequentially.
To do this, we need to select the partition columns as well from the
table. But in case of partition-evolution there are different sets
of partition columns in each partition spec of the table. To overcome
this, this patchset introduces two additional virtual columns:
* PARTITION__SPEC__ID
* ICEBERG__PARTITION__SERIALIZED
PARTITION__SPEC__ID is an INT column that contains the Iceberg spec_id
for each row. ICEBERG__PARTITION__SERIALIZED is a BINARY column that
contains all partition values base64-encoded and dot-separated. E.g.:
select PARTITION__SPEC__ID, ICEBERG__PARTITION__SERIALIZED, * FROM ice_t
+---------------------+--------------------------------+---+---+
| partition__spec__id | iceberg__partition__serialized | i | j |
+---------------------+--------------------------------+---+---+
| 0 | Mg== | 2 | 2 |
| 0 | Mg== | 2 | 2 |
+---------------------+--------------------------------+---+---+
So for the INSERT we are shuffling the rows between executors based on
HASH(partition__spec__id, iceberg__partition__serialized) then each
writer fragment sorts the rows based on (partition__spec__id,
iceberg__partition__serialized, file_path, pos) before writing them out
to delete files. The IcebergDeleteSink has been smarten up in a way that
it creates a new delete file whenever it sees a row with a new
(partition__spec__id, iceberg__partition__serialized).
Some refactorings were also involved during implementing this patch set.
A lot of common code between IcebergDeleteSink and HdfsTableSink has
been moved to the common base class TableSinkBase. In the Frontend this
patch set also moves some common code of InsertStmt and ModifyStmt to a
new common base class DmlStatementBase.
Testing:
* planner tests
* e2e tests (including interop with Hive)
* Did manual stress test with a TPCDS_3000.store_sales
** Table had 8 Billion rows, partitioned by column (ss_sold_date_sk)
** Deleted 800 Million rows using 10 Impala hosts
** Operation was successful, finished under a minute
** Created minimum number of delete files, i.e. one per partition
Change-Id: I28b06f240c23c336a7c5b6ef22fe2ee0a21f7b60
Reviewed-on: http://gerrit.cloudera.org:8080/20078
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change removes the Text-typed overload for BufferAlteringUDF to
avoid ambiguous function matchings. It also changes the 2-parameter
function in BufferAlteringUDF to cover Text typed arguments.
Tests:
- test_udfs.py manually executed
Change-Id: I3a17240ce39fef41b0453f162ab5752f1c940f41
Reviewed-on: http://gerrit.cloudera.org:8080/20038
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This bumps the ORC C++ version from 1.7.0-p14 to 1.7.9-p10 to add the
fixes of ORC-1041 and ORC-1304.
Tests:
- Add e2e test for ORC-1304.
- It's hard to add test for ORC-1041 since it won't cause crashes when
compiling with gcc-10.
Change-Id: I26c39fe5b15ab0bcbe6b2af6fe7a45e48eaec6eb
Reviewed-on: http://gerrit.cloudera.org:8080/20090
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When creating single-node analytic plan, if the plan node is an
EmptySetNode, its tuple ids should not be considered. Also, when
registering conjuncts, if a constant FALSE conjunct is found, the
other conjuncts in the same list should be marked as assigned.
Tests:
- Add FE and e2e regression tests
Change-Id: I9e078f48863c38062e1e624a1ff3e9317092466f
Reviewed-on: http://gerrit.cloudera.org:8080/19937
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When using local catalog mode, if a runtime filter is being generated
for a time travel iceberg table, then a query may fail with "ERROR:
IllegalArgumentException: null"
In the planner an Iceberg table that is being accessed with Time Travel
is represented by an IcebergTimeTravelTable object. This object
represents a time-based variation on a base table. The
IcebergTimeTravelTable may represent a different schema from the base
table, it does this by tracking its own set of Columns. As part of
generating a runtime filter the isClusteringColumn() method is called
on the table. IcebergTimeTravelTable was delegating this call to the
base object. In local catalog mode this method is implemented by
LocalTable which has a Preconditions check (an assertion) that the
column parameter matches the stored column. In this case the check
fails as the base table and time travel table have their own distinct
set of column objects.
The fix is to have IcebergTimeTravelTable provide its own
isClusteringColumn() method. For iceberg there are no clustering
columns, so this method simply returns false.
TESTING
- Ran all end-to-end tests.
- Added test case for query that failed.
Change-Id: I51d04c8757fb48bd417248492d4615ac58085632
Reviewed-on: http://gerrit.cloudera.org:8080/20034
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for DELETE statements on unpartitioned Iceberg
tables. Impala uses the 'merge-on-read' mode with position delete files.
The patch reuses the existing IcebergPositionDeleteTable as the target
table of the DELETE statements, because this table already has the same
schema as position delete files, even with correct Iceberg field IDs.
The patch basically rewrites DELETE statements to INSERT statements,
e.g.:
from:
DELETE FROM ice_t WHERE id = 42;
to:
INSERT INTO ice_t-POSITION-DELETE
SELECT INPUT__FILE__NAME, FILE__POSITION
FROM ice_t
WHERE id = 42;
Position delete files need to be ordered by (file_path, pos), so
we add an extra SORT node before the table sink operator.
In the backend the patch adds a new table sink operator, the
IcebergDeleteSink. It writes the incoming rows (file_path, position) to
delete files. It reuses a lot of code from HdfsTableSink, so this patch
moves the common code to the new common base class: TableSinkBase.
The coordinator then collects the written delete files and invokes
UpdateCatalog to finalize the DELETE statement.
The Catalog then uses Iceberg APIs to create a new snapshot with the
created delete files. It also validates that there was no conflicting
data files written since the operation started.
Testing:
* added planer test
* e2e tests
* interop test between Impala and Hive
Change-Id: Ic933b2295abe54b46d2a736961219988ff42915b
Reviewed-on: http://gerrit.cloudera.org:8080/19776
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
The SUBPLAN node will open its right child node many times in its
GetNext(), depending on how many rows generated from its left child. The
right child of a SUBPLAN node is a subtree of operators. They should not
add codegen info into profile in their Open() method since it will be
invoked repeatedly.
Currently, DataSink and UnionNode have such an issue. This patch fixes
them by adding the codegen info to profile in Close() instead of Open(),
just like what we did in IMPALA-11200.
Tests:
- Add e2e tests
Change-Id: I99a0a842df63a03c61024e2b77d5118ca63a2b2d
Reviewed-on: http://gerrit.cloudera.org:8080/20037
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
In some cases, direct pushing down predicates that reference analytic
tuple into inline view leads to incorrect query results. While pushing
down analytic predicates (e.g. row_number() < 10), we should also divide
them into two groups. Some of them can be migrated into the view so are
removed in the current scope. Some of them can be copied into the view
but still need to be evaluated in the current scope as demonstrated with
the following query. The bug is due to we migrate all of them into the
view.
WITH detail_measure AS (
SELECT
*
FROM
(
VALUES
(
1 AS `isqbiuar`,
1 AS `bgsfrbun`,
1 AS `result_type`,
1 AS `bjuzzevg`
),
(2, 2, 2, 2)
) a
),
order_measure_sql0 AS (
SELECT
row_number() OVER (
ORDER BY
row_number_0 DESC NULLS LAST,
isqbiuar ASC NULLS LAST
) AS `row_number_0`,
`isqbiuar`
FROM
(
VALUES
(1 AS `row_number_0`, 1 AS `isqbiuar`),
(2, 2)
) b
)
SELECT
detail_measure.`isqbiuar` AS `isqbiuar`,
detail_measure.`bgsfrbun` AS `bgsfrbun`,
detail_measure.`result_type` AS `result_type`,
detail_measure.`bjuzzevg` AS `bjuzzevg`,
`row_number_0` AS `row_number_0`
FROM
detail_measure
LEFT JOIN order_measure_sql0
ON order_measure_sql0.isqbiuar = detail_measure.isqbiuar
WHERE
row_number_0 BETWEEN 1
AND 1
ORDER BY
`row_number_0` ASC NULLS LAST,
`bgsfrbun` ASC NULLS LAST
The current incorrect result is:
+----------+----------+-------------+----------+--------------+
| isqbiuar | bgsfrbun | result_type | bjuzzevg | row_number_0 |
+----------+----------+-------------+----------+--------------+
| 2 | 2 | 2 | 2 | 1 |
| 1 | 1 | 1 | 1 | NULL |
+----------+----------+-------------+----------+--------------+
The correct result is:
+----------+----------+-------------+----------+--------------+
| isqbiuar | bgsfrbun | result_type | bjuzzevg | row_number_0 |
+----------+----------+-------------+----------+--------------+
| 2 | 2 | 2 | 2 | 1 |
+----------+----------+-------------+----------+--------------+
In the plan, the analysis predicate is pushed down to the TOP-N node,
but not in the HASH JOIN node, which leads to incorrect results.
...
05:HASH JOIN [RIGHT OUTER JOIN]
| hash predicates: isqbiuar = isqbiuar
| row-size=14B cardinality=2
...
02:TOP-N [LIMIT=1]
| order by: row_number_0 DESC NULLS LAST, isqbiuar ASC NULLS LAST
| source expr: row_number() <= CAST(1 AS BIGINT)
| row-size=2B cardinality=1
...
The HASH JOIN node shoud be:
05:HASH JOIN [RIGHT OUTER JOIN]
| hash predicates: isqbiuar = isqbiuar
| other predicates: row_number() <= 1, row_number() >= 1
| row-size=14B cardinality=2
Tests:
* Add plan tests in analytic-rank-pushdown.test
* Add e2e tests in analytic-fns.test
Change-Id: If6c209b2a64bad37d893ba8b520342bf1f9a7513
Reviewed-on: http://gerrit.cloudera.org:8080/19768
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this patch the Parquet STRUCT reader didn't fill the
position slots: collection position, file position. When users
queried these virtual columns Impala was crashed or returned
incorrect results.
The ORC scanner already worked correctly, but there was no tests
written for it.
Test:
* e2e tests for both ORC / Parquet
Change-Id: I32a808a11f4543cd404ed9f3958e9b4e971ca1f4
Reviewed-on: http://gerrit.cloudera.org:8080/19911
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-12019 implemented support for collections of fixed length types
in the sorting tuple. This was made possible by implementing the
materialisation of these collections.
Building on this, this change allows such collections as non-passthrough
children of UNION ALL operations. Note that plain UNIONs are not
supported for any collections for other reasons and this patch does not
affect them or any other set operation.
Testing:
Tests in nested-array-in-select-list.test and
nested-map-in-select-list.test check that
- the newly allowed cases work correctly and
- the correct error message is given for collections of variable length
types.
Change-Id: I14c13323d587e5eb8a2617ecaab831c059a0fae3
Reviewed-on: http://gerrit.cloudera.org:8080/19903
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
As a first stage of IMPALA-10939, this change implements support for
including in the sorting tuple top-level collections that only contain
fixed length types (including fixed length structs). For these types the
implementation is almost the same as the existing handling of strings.
Another limitation is that structs that contain any type of collection
are not yet allowed in the sorting tuple.
Also refactored the RawValue::Write*() functions to have a clearer
interface.
Testing:
- Added a new test table that contains many rows with arrays. This is
queried in a new test added in test_sort.py, to ensure that we handle
spilling correctly.
- Added tests that have arrays and/or maps in the sorting tuple in
test_queries.py::TestQueries::{test_sort,
test_top_n,test_partitioned_top_n}.
Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Reviewed-on: http://gerrit.cloudera.org:8080/19660
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Parquet v2 means several changes in Parquet files compared to v1:
1. file version = 2 instead of 1
c185faf0c4/src/main/thrift/parquet.thrift (L1016)
Before this patch Impala rejected Parquet files with version!=1.
2. possible use of DataPageHeaderV2 instead DataPageHeader
c185faf0c4/src/main/thrift/parquet.thrift (L561)
The main differences compared to V1 DataPageHeader:
a. rep/def levels are not compressed, so the compressed part contains
only the actual encoded values
b. rep/def levels must be RLE encoded (Impala only supports RLE encoded
levels even for V1 pages)
c. compression can be turned on/off per page (member is_compressed)
d. number of nulls (member num_nulls) is required - in v1 it was
included in statistics which is optional
e. number of rows is required (member num_rows) which can help with
matching collection items with the top level collection
The patch adds support for understanding v2 data pages but does not
implement some potential optimizations:
a. would allow an optimization for queries that need only the nullness
of a column but not the actual value: as the values are not needed the
decompression of the page data can be skipped. This optimization is not
implemented - currently Impala materializes both the null bit and the
value for all columns regardless of whether the value is actually
needed.
d. could be also used for optimizations / additional validity checks
but it is not used currently
e. could make skipping rows easier but is not used, as the existing
scanner has to be able to skip rows efficiently also in v1 files so
it can't rely on num_rows
3. possible use of new encodings (e.g. DELTA_BINARY_PACKED)
No new encoding is added - when an unsupported encoding is encountered
Impala returns an error.
parquet-mr uses new encodings (DELTA_BINARY_PACKED, DELTA_BYTE_ARRAY)
for most types if the file version is 2, so with this patch Impala is
not yet able to read all v2 Parquet tables written by Hive.
4. Encoding PLAIN_DICTIONARY is deprecated and RLE_DICTIONARY is used
instead. The semantics of the two encodings are exactly the same.
Additional changes:
Some responsibilites are moved from ParquetColumnReader to
ParquetColumnChunkReader:
- ParquetColumnChunkReader decodes rep/def level sizes to hide v1/v2
differences (see 2.a.)
- ParquetColumnChunkReader skips empty data pages in
ReadNextDataPageHeader
- the state machine of ParquetColumnChunkReader is simplified by
separating data page header reading / reading rest of the page
Testing:
- added 4 v2 Parquet test tables (written by Hive) to cover
compressed / uncompressed and scalar/complex cases
- added EE and fuzz tests for the test tables above
- manual tested v2 Parquet files written by pyarrow
- ran core tests
Note that no test is added where some pages are compressed while
some are not. It would be tricky to create such files with existing
writers. The code should handle this case and it is very unlikely that
files like this will be encountered.
Change-Id: I282962a6e4611e2b662c04a81592af83ecaf08ca
Reviewed-on: http://gerrit.cloudera.org:8080/19793
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
NumFileMetadataRead counter was lost with the revert of commit
f932d78ad0. This patch restore
NumFileMetadataRead counter and also assertions in impacted iceberg test
files. Other impacted test files will be gradually restored with
reimplementation of optimized count star for ORC.
Testing:
- Pass core tests.
Change-Id: Ib14576245d978a127f688e265cab2f4ff519600c
Reviewed-on: http://gerrit.cloudera.org:8080/19854
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This reverts commit f932d78ad0.
The commit is reverted because it cause significant regression for
non-optimized counts star query in parquet format.
There are several conflicts that need to be resolved manually:
- Removed assertion against 'NumFileMetadataRead' counter that is lost
with the revert.
- Adjust the assertion in test_plain_count_star_optimization,
test_in_predicate_push_down, and test_partitioned_insert of
test_iceberg.py due to missing improvement in parquet optimized count
star code path.
- Keep the "override" specifier in hdfs-parquet-scanner.h to pass
clang-tidy
- Keep python3 style of RuntimeError instantiation in
test_file_parser.py to pass check-python-syntax.sh
Change-Id: Iefd8fd0838638f9db146f7b706e541fe2aaf01c1
Reviewed-on: http://gerrit.cloudera.org:8080/19843
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
IMPALA-10973 has a bug where a union fragment without a scan node can be
over-parallelized by the backend scheduler by 1. It is reproducible by
running TPC-DS Q11 with MT_DOP=1. This patch additionally checks that
such a fragment does not have an input fragment before randomizing the
host assignment.
Testing:
Add TPC-DS Q11 to test_mt_dop.py::TestMtDopScheduling::test_scheduling
and verify the number of fragment instances scheduled in the
ExecSummary.
Change-Id: Ic69e7c8c0cadb4b07ee398aff362fbc6513eb08d
Reviewed-on: http://gerrit.cloudera.org:8080/19816
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-11809 adds support non unique primary key for Kudu table.
It allows to create Kudu table without specifying primary key since
partition columns could be promoted as non unique primary key. But
when creating Kudu table in CTAS without specifying primary key,
Impala returns parsing error.
This patch fixed the parsing issue for creating Kudu table in CTAS
without specifying primary key.
Testing:
- Added new test cases in parsing unit-test and end-to-end unit-test.
- Passed core tests.
Change-Id: Ia7bb0cf1954e0a4c3d864a800e929a88de272dd5
Reviewed-on: http://gerrit.cloudera.org:8080/19825
Reviewed-by: Abhishek Chennaka <achennaka@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When optimizing the simple count star query for the Iceberg table, the
WITH CLAUSE should be skipped, but that doesn't mean the SQL can't be
optimized, because when the WITH CLAUSE is inlined, the final statement
is optimized by the CountStarToConstRule.
Testing:
* Add e2e tests
Change-Id: I7b21cbea79be77f2ea8490bd7f7b2f62063eb0e4
Reviewed-on: http://gerrit.cloudera.org:8080/19811
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When set ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION = true, the planner
will simplify outer joins if the WHERE clause contains at least one
null rejecting condition and then remove the outer-joined tuple id
from the map of GlobalState#outerJoinedTupleIds.
However, there may be false removals for right join simplification or
full join simplification. This may lead to incorrect results since it
is incorrect to propagate a non null-rejecting predicate into a plan
subtree that is on the nullable side of an outer join.
GlobalState#outerJoinedTupleIds indicates whether a table is on the
nullable side of an outer join.
E.g.
SELECT COUNT(*)
FROM functional.nullrows t1
FULL JOIN functional.nullrows t2 ON t1.id = t2.id
FULL JOIN functional.nullrows t3 ON coalesce(t1.id, t2.id) = t3.id
WHERE t1.group_str = 'a'
AND coalesce(t2.group_str, 'f') = 'f'
The predicate coalesce(t2.group_str, 'f') = 'f' will propagate into t2
if we remove t2 from GlobalState#outerJoinedTupleIds.
Testing:
- Add new plan tests in outer-to-inner-joins.test
- Add new query tests to verify the correctness on transformation
Change-Id: I6565c5bff0d2f24f30118ba47a2583383e83fff7
Reviewed-on: http://gerrit.cloudera.org:8080/19116
Reviewed-by: Qifan Chen <qfchen@hotmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We push down predicates to Iceberg that uses them to filter out files
when getting the results of planFiles(). Using the
FileScanTask.residual() function we can find out if we have to use
the predicates to further filter the rows of the given files or if
Iceberg has already performed all the filtering.
Basically if we only filter on IDENTITY-partition columns then Iceberg
can filter the files and using these filters in Impala wouldn't filter
any more rows from the output (assuming that no partition evolution was
performed on the table).
An additional benefit of not pushing down no-op predicates to the
scanner is that we can potentially materialize less slots.
For example:
SELECT count(1) from iceberg_tbl where part_col = 10;
Another additional benefit comes with count(*) queries. If all the
predicates are skipped from being pushed to Impala's scanner for a
count(*) query then the Parquet scanner can go to an optimized path
where it uses stats instead of reading actual data to answer the query.
In the above query Iceberg filters the files using the predicate on
a partition column and then there won't be any need to materialize
'part_col' in Impala, nor to push down the 'part_col = 10' predicate.
Note, this is an all or nothing approach, meaning that assuming N
number of predicates we either push down all predicates to the scanner
or none of them. There is a room for improvement to identify a subset
of the predicates that we still have to push down to the scanner.
However, for this we'd need a mapping between Impala predicates and the
predicates returned by Iceberg's FileScanTask.residual() function that
would significantly increase the complexity of the relevant code.
Testing:
- Some existing tests needed some extra care as they were checking
for predicates being pushed down to the scanner, but with this
patch not all of them are pushed down. For these tests I added some
extra predicates to achieve that all of the predicates are pushed
down to the scanner.
- Added a new planner test suite for checking how predicate push down
works with Iceberg tables.
Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b
Reviewed-on: http://gerrit.cloudera.org:8080/19534
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-9495 added support for struct types in SELECT lists but only with
codegen turned off. This commit implements codegen for struct types.
To facilitate this, code generation for reading and writing 'AnyVal's
has been refactored. A new class, 'CodegenAnyValReadWriteInfo' is
introduced. This class is an interface between sources and destinations,
one of which is an 'AnyVal' object: sources generate an instance of this
class and destinations take that instance and use it to write the value.
The other side can for example be tuples from which we read (in the case
of 'SlotRef') or tuples we write into (in case of materialisation, see
Tuple::CodegenMaterializeExprs()). The main advantage is that sources do
not have to know how to write their destinations, only how to read the
values (and vice versa).
Before this change, many tests that involve structs ran only with
codegen turned off. Now that codegen is supported in these cases, these
tests are also run with codegen on.
Testing:
- enabed tests for structs in the select list with codegen on in
tests/query_test/test_nested_types.py
- enabled codegen in other tests where it used to be disabled because
it was not supported.
Change-Id: I5272c3f095fd9f07877104ee03c8e43d0c4ec0b6
Reviewed-on: http://gerrit.cloudera.org:8080/18526
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
There is a bug when an Iceberg table has a string partition column and
Impala insert special chars into this column that need to be URL
encoded. In this case the partition name is URL encoded not to confuse
the file paths for that partition. E.g. 'b=1/2' value is converted to
'b=1%2F2'.
This if fine for path creation, however, for Iceberg tables
the same URL encoded partition name is saved into catalog as the
partition name also used for Iceberg column stats. This brings to
incorrect results when querying the table as the URL encoded values
are returned in a SELECT * query instead of what the user inserted.
Additionally, when adding a filter to the query, Iceberg will filter
out all the rows because it compares the non-encoded values to the URL
encoded values.
Testing:
- Added new tests to iceberg-partitioned-insert.test to cover this
scenario.
- Re-run the existing test suite.
Change-Id: I67edc3d04738306fed0d4ebc5312f3d8d4f14254
Reviewed-on: http://gerrit.cloudera.org:8080/19654
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>