We optimize plain count(*) queries on Iceberg tables the following way:
AGGREGATE
COUNT(*)
|
UNION ALL
/ \
/ \
/ \
SCAN all ANTI JOIN
datafiles / \
without / \
deletes SCAN SCAN
datafiles deletes
||
rewrite
||
\/
ArithmethicExpr: LHS + RHS
/ \
/ \
/ \
record_count AGGREGATE
of all COUNT(*)
datafiles |
without ANTI JOIN
deletes / \
/ \
SCAN SCAN
datafiles deletes
This optimization consists of two parts:
1 Rewriting count(*) expression to count(*) + "record_count" (of data
files without deletes)
2 In IcebergScanPlanner we only need to consruct the right side of
the original UNION ALL operator, i.e.:
ANTI JOIN
/ \
/ \
SCAN SCAN
datafiles deletes
SelectStmt decides whether we can do the count(*) optimization, and if
so, does the following:
1: SelectStmt sets 'TotalRecordsNumV2' in the analyzer, then during the
expression rewrite phase the CountStarToConstRule rewrites the
count(*) to count(*) + record_count
2: SelectStmt sets "OptimizeCountStarForIcebergV2" in the query context
then IcebergScanPlanner creates plan accordingly.
This mechanism works for simple queries, but can turn on count(*)
optimization in IcebergScanPlanner for all Iceberg V2 tables in complex
queries. Even if only one subquery enables count(*) optimization during
analysis.
With this patch the followings change:
1: We introduce IcebergV2CountStarAccumulator which we use instead of
the ArithmethicExpr. So after rewrite we still know if count(*)
optimization should be enabled for the planner.
2: Instead of using the query context, we pass the information to the
IcebergScanPlanner via the TableRef object.
Testing
* e2e tests
Change-Id: I1940031298eb634aa82c3d32bbbf16bce8eaf874
Reviewed-on: http://gerrit.cloudera.org:8080/23705
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Previously, `BaseScalarColumnReader::levels_readahead_` was not reset
when the reader did not do page filtering. If a query selected the last
row containing a collection value in a row group, `levels_readahead_`
would be set and would not be reset when advancing to the next row
group without page filtering. As a result, trying to skip collection
values at the start of the next row group would cause a check failure.
This patch fixes the failure by resetting `levels_readahead_` in
`BaseScalarColumnReader::Reset()`, which is always called when advancing
to the next row group.
`levels_readahead_` is also moved out of the "Members used for page
filtering" section as the variable is also used in late materialization.
Testing:
- Added an E2E test for the fix.
Change-Id: Idac138ffe4e1a9260f9080a97a1090b467781d00
Reviewed-on: http://gerrit.cloudera.org:8080/23779
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit contains the simpler parts from
https://gerrit.cloudera.org/#/c/20602
This mainly means accessors for the header of the binary
format and bounding box check (st_envIntersects).
New tests for not yet covered functions / overloads are also added.
For details of the binary format see be/src/exprs/geo/shape-format.h
Differences from the PR above:
Only a subset of functions are added. The criteria was:
1. the native function must be fully compatible with the Java version*
2. must not rely on (de)serializing the full geometry
3. the function must be tested
1 implies 2 because (de)serialization is not implemented yet in
the original patch for >2d geometries, which would break compatibility
for the Java version for ZYZ/XYM/XYZM geometries.
*: there are 2 known differences:
1. NULL handling: the Java functions return error instead of NULL
when getting a NULL parameter
2. st_envIntersects() doesn't check if the SRID matches - the Java
library looks inconsistant about this
Because the native functions are fairly safe replacements for the Java
ones, they are always used when geospatial_library=HIVE_ESRI.
Change-Id: I0ff950a25320549290a83a3b1c31ce828dd68e3c
Reviewed-on: http://gerrit.cloudera.org:8080/23700
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch mainly implement the querying of paimon data table
through JNI based scanner.
Features implemented:
- support column pruning.
The partition pruning and predicate push down will be submitted
as the third part of the patch.
We implemented this by treating the paimon table as normal
unpartitioned table. When querying paimon table:
- PaimonScanNode will decide paimon splits need to be scanned,
and then transfer splits to BE do the jni-based scan operation.
- We also collect the required columns that need to be scanned,
and pass the columns to Scanner for column pruning. This is
implemented by passing the field ids of the columns to BE,
instead of column position to support schema evolution.
- In the original implementation, PaimonJniScanner will directly
pass paimon row object to BE, and call corresponding paimon row
field accessor, which is a java method to convert row fields to
impala row batch tuples. We find it is slow due to overhead of
JVM method calling.
To minimize the overhead, we refashioned the implementation,
the PaimonJniScanner will convert the paimon row batches to
arrow recordbatch, which stores data in offheap region of
impala JVM. And PaimonJniScanner will pass the arrow offheap
record batch memory pointer to the BE backend.
BE PaimonJniScanNode will directly read data from JVM offheap
region, and convert the arrow record batch to impala row batch.
The benchmark shows the later implementation is 2.x better
than the original implementation.
The lifecycle of arrow row batch is mainly like this:
the arrow row batch is generated in FE,and passed to BE.
After the record batch is imported to BE successfully,
BE will be in charge of freeing the row batch.
There are two free paths: the normal path, and the
exception path. For the normal path, when the arrow batch
is totally consumed by BE, BE will call jni to fetch the next arrow
batch. For this case, the arrow batch is freed automatically.
For the exceptional path, it happends when query is cancelled, or memory
failed to allocate. For these corner cases, arrow batch is freed in the
method close if it is not totally consumed by BE.
Current supported impala data types for query includes:
- BOOLEAN
- TINYINT
- SMALLINT
- INTEGER
- BIGINT
- FLOAT
- DOUBLE
- STRING
- DECIMAL(P,S)
- TIMESTAMP
- CHAR(N)
- VARCHAR(N)
- BINARY
- DATE
TODO:
- Patches pending submission:
- Support tpcds/tpch data-loading
for paimon data table.
- Virtual Column query support for querying
paimon data table.
- Query support with time travel.
- Query support for paimon meta tables.
- WIP:
- Snapshot incremental read.
- Complex type query support.
- Native paimon table scanner, instead of
jni based.
Testing:
- Create tests table in functional_schema_template.sql
- Add TestPaimonScannerWithLimit in test_scanners.py
- Add test_paimon_query in test_paimon.py.
- Already passed the tpcds/tpch test for paimon table, due to the
testing table data is currently generated by spark, and it is
not supported by impala now, we have to do this since hive
doesn't support generating paimon table for dynamic-partitioned
tables. we plan to submit a separate patch for tpcds/tpch data
loading and associated tpcds/tpch query tests.
- JVM Offheap memory leak tests, have run looped tpch tests for
1 day, no obvious offheap memory increase is observed,
offheap memory usage is within 10M.
Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384
Reviewed-on: http://gerrit.cloudera.org:8080/23613
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
This fixes an IllegalStateException in HdfsPartitionPruner when
evaluating 'IN' predicates whose consist of two compatible types, for
example DATE and STRING: date_col in (<date as string>).
Previously, 'canEvalUsingPartitionMd' did not check if the slot type
matched the literal type. This caused the frontend to attempt invalid
comparisons via 'LiteralExpr.compareTo', leading to
IllegalStateException or incorrect pruning.
The fix ensures 'canEvalUsingPartitionMd' returns false on type
mismatches, deferring evaluation to the backend where proper casting
occurs.
Testing:
- Added regression test in hdfs-partition-pruning.test.
Change-Id: Idc226a628c8df559329a060cb963b81e27e21eda
Reviewed-on: http://gerrit.cloudera.org:8080/23706
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In some cases users delete files directly from storage without
going through the Iceberg API, e.g. they remove old partitions.
This corrupts the table, and makes queries that try to read the
missing files fail.
This change introduces a repair statement that deletes the
dangling references of missing files from the metadata.
Note that the table cannot be repaired if there are missing
delete files because Iceberg's DeleteFiles API which is used
to execute the operation allows removing only data files.
Testing:
- E2E
- HDFS
- S3, Ozone
- analysis
Change-Id: I514403acaa3b8c0a7b2581d676b82474d846d38e
Reviewed-on: http://gerrit.cloudera.org:8080/23512
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The main motivation is to evaluate expensive geospatial
functions (which are Java functions) last in predicates.
Java functions have a major overhead anyway from the JNI
call, so bumping all Java function costs seems beneficial.
Note that currently geospatial functions are the only
built-in Java functions.
Change-Id: I11d1652d76092ec60af18a33502dacc25b284fcc
Reviewed-on: http://gerrit.cloudera.org:8080/22733
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
tables
This patch implements partition filtering support for the SHOW FILES
statement on Iceberg tables, based on the functionality added in
IMPALA-12243. Prior to this change, the syntax resulted in a
NullPointerException.
Key changes:
- Added ShowFilesStmt.analyzeIceberg() to validate and transform
partition expressions using IcebergPartitionExpressionRewriter and
IcebergPartitionPredicateConverter. After that, it collects matching
file paths using IcebergUtil.planFiles().
- Added FeIcebergTable.Utils.getIcebergTableFilesFromPaths() to
accept pre-filtered file lists from the analysis phase.
- Enhanced TShowFilesParams thrift struct with optional selected_files
field to pass pre-filtered file paths from frontend to backend.
Testing:
- Analyzer tests for negative cases: non-existent partitions, invalid
expressions, non-partition columns, unsupported transforms.
- Analyzer tests for positive cases: all transform types, complex
expressions.
- Authorization tests for non-filtered and filtered syntaxes.
- E2E tests covering every partition transform type with various
predicates.
- Schema evolution and rollback scenarios.
The implementation follows AlterTableDropPartition's pattern where the
analysis phase performs validation/metadata retrieval and the execution
phase handles result formatting and display.
Change-Id: Ibb9913e078e6842861bdbb004ed5d67286bd3152
Reviewed-on: http://gerrit.cloudera.org:8080/23455
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Initial implementation of KUDU-1261 (array column type) recently merged
in upstream Apache Kudu repository. This patch add initial Impala
support for working with Kudu tables having array type columns.
Unlike rows, the elements of a Kudu array are stored in a different
format than Impala. Instead of per-row bit flag for NULL info, values
and NULL bits are stored in separate arrays.
The following types of queries are not supported in this patch:
- (IMPALA-14538) Queries that reference an array column as a table, e.g.
```sql
SELECT item FROM kudu_array.array_int;
```
- (IMPALA-14539) Queries that create duplicate collection slots, e.g.
```sql
SELECT array_int FROM kudu_array AS t, t.array_int AS unnested;
```
Testing:
- Add some FE tests in AnalyzeDDLTest and AnalyzeKuduDDLTest.
- Add EE test test_kudu.py::TestKuduArray.
Since Impala does not support inserting complex types, including
array, the data insertion part of the test is achieved through
custom C++ code kudu-array-inserter.cc that insert into Kudu via
Kudu C++ client. It would be great if we could migrate it to Python so
that it can be moved to the same file as the test (IMPALA-14537).
- Pass core tests.
Co-authored-by: Riza Suminto
Change-Id: I9282aac821bd30668189f84b2ed8fff7047e7310
Reviewed-on: http://gerrit.cloudera.org:8080/23493
Reviewed-by: Alexey Serbin <alexey@apache.org>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch is to prohibit un-supported operation against
paimon table. All unsupported operations are added the
checked in the analyze stage in order to avoid
mis-operation. Currently only CREATE/DROP statement
is supported, the prohibition will be removed later
after the corresponding operation is truly supported.
TODO:
- Patches pending submission:
- Support jni based query for paimon data table.
- Support tpcds/tpch data-loading
for paimon data table.
- Virtual Column query support for querying
paimon data table.
- Query support with time travel.
- Query support for paimon meta tables.
Testing:
- Add unit test for AnalyzeDDLTest.java.
- Add unit test for AnalyzerTest.java.
- Add test_paimon_negative and test_paimon_query in test_paimon.py.
Change-Id: Ie39fa4836cb1be1b1a53aa62d5c02d7ec8fdc9d7
Reviewed-on: http://gerrit.cloudera.org:8080/23530
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala crashes when it needs to write multiple delete files per
partition in a single DELETE operation. It is because
IcebergBufferedDeleteSink has its own DmlExecState object, but
sometimes the methods in TableSinkBase use the RuntimeState's
DmlExecState object. I.e. it can happen that we add a partition
to the IcebergBufferedDeleteSink's DmlExecState, but later we
expect to find it in the RuntimeState's DmlExecState.
This patch adds new methods to TableSinkBase that are specific
for writing delete files, and they always take a DmlExecState
object as a parameter. They are now used by IcebergBufferedDeleteSink.
Testing
* added e2e tests
Change-Id: I46266007a6356e9ff3b63369dd855aff1396bb72
Reviewed-on: http://gerrit.cloudera.org:8080/23537
Reviewed-by: Mihaly Szjatinya <mszjat@pm.me>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This changes the default behavior of the tuple cache to consider
cost when placing the TupleCacheNodes. It tries to pick the best
locations within a budget. First, it eliminates unprofitable locations
via a threshold. Next, it ranks the remaining locations by their
profitability. Finally, it picks the best locations in rank order
until it reaches the budget.
The threshold is based on the ratio of processing cost for regular
execution versus the processing cost for reading from the cache.
If the ratio is below the threshold, the location is eliminated.
The threshold is specified by the tuple_cache_required_cost_reduction_factor
query option. This defaults to 3.0, which means that the cost of
reading from the cache must be less than 1/3 the cost of computing
the value normally. A higher value makes this more restrictive
about caching locations, which pushes in the direction of lower
overhead.
The ranking is based on the cost reduction per byte. This is given
by the formula:
(regular processing cost - cost to read from cache) / estimated serialized size
This prefers locations with small results or high reduction in cost.
The budget is based on the estimated serialized size per node. This
limits the total caching that a query will do. A higher value allows more
caching, which can increase the overhead on the first run of a query. A lower
value is less aggressive and can limit the overhead at the expense of less
caching. This uses a per-node limit as the limit should scale based on the
size of the executor group as each executor brings extra capacity. The budget
is specified by the tuple_cache_budget_bytes_per_executor.
The old behavior to place the tuple cache at all eligible locations is
still available via the tuple_cache_placement_policy query option. The
default is the cost_based policy described above, but the old behavior
is available via the all_eligible policy. This is useful for correctness
testing (and the existing tuple cache test cases).
This changes the explain plan output:
- The hash trace is only enabled at VERBOSE level. This means that the regular
profile will not contain the hash trace, as the regular profile uses EXTENDED.
- This adds additional information at VERBOSE to display the cost information
for each plan node. This can help trace why a particular location was
not picked.
Testing:
- This adds a TPC-DS planner test with tuple caching enabled (based on the
existing TpcdsCpuCostPlannerTest)
- This modifies existing tests to adapt to changes in the explain plan output
Change-Id: Ifc6e7b95621a7937d892511dc879bf7c8da07cdc
Reviewed-on: http://gerrit.cloudera.org:8080/23219
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this change, query plans and profile reported only a single
partition even for partitioned Iceberg tables, which was misleading
for users.
Now we can display the number of scanned partitions correctly for
both partitioned and unpartitioned Iceberg tables. This is achieved by
extracting the partition values from the file descriptors and storing
them in the IcebergContentFileStore. Instead of storing this information
redundantly in all file descriptors, we store them in one place and
reference the partition metadata in the FDs with an id.
This also gives the opportunity to optimize memory consumption in the
Catalog and Coordinator as well as reduce network traffic between them
in the future.
Time travel is handled similarly to oldFileDescMap. In that case
we don't know the total number of partitions in the old snapshot,
so the output is [Num scanned partitions]/unknown.
Testing:
- Planner tests
- E2E tests
- partition transforms
- partition evolution
- DROP PARTITION
- time travel
Change-Id: Ifb2f654bc6c9bdf9cfafc27b38b5ca2f7b6b4872
Reviewed-on: http://gerrit.cloudera.org:8080/23113
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch mainly implement the creation/drop of paimon table
through impala.
Supported impala data types:
- BOOLEAN
- TINYINT
- SMALLINT
- INTEGER
- BIGINT
- FLOAT
- DOUBLE
- STRING
- DECIMAL(P,S)
- TIMESTAMP
- CHAR(N)
- VARCHAR(N)
- BINARY
- DATE
Syntax for creating paimon table:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
(
[col_name data_type ,...]
[PRIMARY KEY (col1,col2)]
)
[PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
STORED AS PAIMON
[LOCATION 'hdfs_path']
[TBLPROPERTIES (
'primary-key'='col1,col2',
'file.format' = 'orc/parquet',
'bucket' = '2',
'bucket-key' = 'col3',
];
Two types of paimon catalogs are supported.
(1) Create table with hive catalog:
CREATE TABLE paimon_hive_cat(userid INT,movieId INT)
STORED AS PAIMON;
(2) Create table with hadoop catalog:
CREATE [EXTERNAL] TABLE paimon_hadoop_cat
STORED AS PAIMON
TBLPROPERTIES('paimon.catalog'='hadoop',
'paimon.catalog_location'='/path/to/paimon_hadoop_catalog',
'paimon.table_identifier'='paimondb.paimontable');
SHOW TABLE STAT/SHOW COLUMN STAT/SHOW PARTITIONS/SHOW FILES
statements are also supported.
TODO:
- Patches pending submission:
- Query support for paimon data files.
- Partition pruning and predicate push down.
- Query support with time travel.
- Query support for paimon meta tables.
- WIP:
- Complex type query support.
- Virtual Column query support for querying
paimon data table.
- Native paimon table scanner, instead of
jni based.
Testing:
- Add unit test for paimon impala type conversion.
- Add unit test for ToSqlTest.java.
- Add unit test for AnalyzeDDLTest.java.
- Update default_file_format TestEnumCase in
be/src/service/query-options-test.cc.
- Update test case in
testdata/workloads/functional-query/queries/QueryTest/set.test.
- Add test cases in metadata/test_show_create_table.py.
- Add custom test test_paimon.py.
Change-Id: I57e77f28151e4a91353ef77050f9f0cd7d9d05ef
Reviewed-on: http://gerrit.cloudera.org:8080/22914
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
This patch modifies the string overload of
IcebergFunctions::TruncatePartitionTransform so that it always handles
strings as UTF-8-encoded ones, because the Iceberg specification states
that that strings are UTF-8 encoded.
Also, for an Iceberg table UrlEncode is called in not the
Hive-compatible way, rather than the standard way, similar to Java's
URLEncoder.encode() (which the Iceberg API also uses) to conform with
existing practices by Hive, Spark and Trino. This included a change in
the set of characters which are not escaped to follow the URL Standard's
application/x-www-form-urlencoded format. [1] Also renamed it from
ShouldNotEscape to IsUrlSafe for better readability.
Testing:
* add and extend e2e tests to check partitions with Unicode characters
* add be tests to coding-util-test.cc
[1]: https://url.spec.whatwg.org/#application-x-www-form-urlencoded-percent-encode-set
Change-Id: Iabb39727f6dd49b76c918bcd6b3ec62532555755
Reviewed-on: http://gerrit.cloudera.org:8080/23190
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Running exhaustive tests with env var IMPALA_USE_PYTHON3_TESTS=true
reveals some tests that require adjustment. This patch made such
adjustment, which mostly revolves around encoding differences and string
vs bytes type in Python3. This patch also switch the default to run
pytest with Python3 by setting IMPALA_USE_PYTHON3_TESTS=true. The
following are the details:
Change hash() function in conftest.py to crc32() to produce
deterministic hash. Hash randomization is enabled by default since
Python 3.3 (see
https://docs.python.org/3/reference/datamodel.html#object.__hash__).
This cause test sharding (like --shard_tests=1/2) produce inconsistent
set of tests per shard. Always restart minicluster during custom cluster
tests if --shard_tests argument is set, because test order may change
and affect test correctness, depending on whether running on fresh
minicluster or not.
Moved one test case from delimited-latin-text.test to
test_delimited_text.py for easier binary comparison.
Add bytes_to_str() as a utility function to decode bytes in Python3.
This is often needed when inspecting the return value of
subprocess.check_output() as a string.
Implement DataTypeMetaclass.__lt__ to substitute
DataTypeMetaclass.__cmp__ that is ignored in Python3 (see
https://peps.python.org/pep-0207/).
Fix WEB_CERT_ERR difference in test_ipv6.py.
Fix trivial integer parsing in test_restart_services.py.
Fix various encoding issues in test_saml2_sso.py,
test_shell_commandline.py, and test_shell_interactive.py.
Change timeout in Impala.for_each_impalad() from sys.maxsize to 2^31-1.
Switch to binary comparison in test_iceberg.py where needed.
Specify text mode when calling tempfile.NamedTemporaryFile().
Simplify create_impala_shell_executable_dimension to skip testing dev
and python2 impala-shell when IMPALA_USE_PYTHON3_TESTS=true. The reason
is that several UTF-8 related tests in test_shell_commandline.py break
in Python3 pytest + Python2 impala-shell combo. This skipping already
happen automatically in build OS without system Python2 available like
RHEL9 (IMPALA_SYSTEM_PYTHON2 env var is empty).
Removed unused vector argument and fixed some trivial flake8 issues.
Several test logic require modification due to intermittent issue in
Python3 pytest. These include:
Add _run_query_with_client() in test_ranger.py to allow reusing a single
Impala client for running several queries. Ensure clients are closed
when the test is done. Mark several tests in test_ranger.py with
SkipIfFS.hive because they run queries through beeline + HiveServer2,
but Ozone and S3 build environment does not start HiveServer2 by
default.
Increase the sleep period from 0.1 to 0.5 seconds per iteration in
test_statestore.py and mark TestStatestore to execute serially. This is
because TServer appears to shut down more slowly when run concurrently
with other tests. Handle the deprecation of Thread.setDaemon() as well.
Always force_restart=True each test method in TestLoggingCore,
TestShellInteractiveReconnect, and TestQueryRetries to prevent them from
reusing minicluster from previous test method. Some of these tests
destruct minicluster (kill impalad) and will produce minidump if metrics
verifier for next tests fail to detect healthy minicluster state.
Testing:
Pass exhaustive tests with IMPALA_USE_PYTHON3_TESTS=true.
Change-Id: I401a93b6cc7bcd17f41d24e7a310e0c882a550d4
Reviewed-on: http://gerrit.cloudera.org:8080/23319
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Replaced allpairspy with a homemade pair finder that
seems to find a somewhat less optimal (larger) covering
vector set but works reliably with filters. For details
see tests/common/test_vector.py
Also fixes a few test issues uncovered. Some fixes are
copied from https://gerrit.cloudera.org/#/c/23319/
Added the possibility of shuffling vectors to get a
different test set (env var IMPALA_TEST_VECTOR_SEED).
By default the algorithm is deterministic so the test
set won't change between runs (similarly to allpairspy).
Added a new constraint to test only a single compression
per file format in some tests to reduce the number of
new vectors.
EE + custom_cluster test count in exhaustive runs:
before patch: ~11000
after patch: ~16000
without compression constraint: ~17000
Change-Id: I419c24659a08d8d6592fadbbd5b764ff73cbba3e
Reviewed-on: http://gerrit.cloudera.org:8080/23342
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The test expects a "SELECT..LIMIT 1" query on tpch.lineitem to finish in
2s. This could be impacted by other concurrent tests when memory
reservation is used up. This patch marks the test to run serially to
avoid the impact from other tests.
Change-Id: Ibbb2f1a34e24c83a3d2c69d2daa4dece8d94ec1e
Reviewed-on: http://gerrit.cloudera.org:8080/23351
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this patch, coordinator just invalidates the catalog cache when
witness the catalog service id changes in DDL/DML responses or
statestore catalog updates. This is enough in the legacy catalog mode
since these are the only ways that coordinator gets metadata from
catalogd. However, in local catalog mode, coordinator sends
getPartialCatalogObject requests to fetch metadata from catalogd. If the
request is now served by a new catalogd (e.g. due to HA failover),
coordinator should invalidate its catalog cache in case catalog version
overlaps on the same table and unintentionally reuse stale metadata.
To ensure performance, catalogServiceIdLock_ in CatalogdMetaProvider is
refactored to be a ReentrantReadWriteLock. Most of the usages on it just
need the read lock.
This patch also adds the catalog service id in the profile.
Tests:
- Ran test_warmed_up_metadata_failover_catchup 50 times.
- Ran FE tests: CatalogdMetaProviderTest and LocalCatalogTest.
- Ran CORE tests
Change-Id: I751e43f5d594497a521313579defc5b179dc06ce
Reviewed-on: http://gerrit.cloudera.org:8080/23236
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Quanlong Huang <huangquanlong@gmail.com>
The test didn't wait in wait_for_finished_timeout() long
enough and ignored its return value, so it could continue
execution before the query was actually finished.
Change-Id: I339bd338cfd3873cc4892f012066034a6f7d4e12
Reviewed-on: http://gerrit.cloudera.org:8080/23180
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently, the tuple cache keys do not include partition
information in either the planner key or the fragment instance
key. However, the partition actually is important to correctness.
First, there are settings defined on the table and partition that
can impact the results. For example, for processing text files,
the separator, escape character, etc are specified at the table
level. This impacts the rows produced from a given file. There
are other such settings stored at the partition level (e.g.
the JSON binary format).
Second, it is possible to have two partitions pointed at the same
filesystem location. For example, scale_db.num_partitions_1234_blocks_per_partition_1
is a table that has all partitions pointing to the same
location. In that case, the cache can't tell the partitions
apart based on the files alone. This is an exotic configuration.
Incorporating an identifier of the partition (e.g. the partition
keys/values) allows the cache to tell the difference.
To fix this, we incorporate partition information into the
key. At planning time, when incorporating the scan range information,
we also incorporate information about the associated partitions.
This moves the code to HdfsScanNode and changes it to iterate over
the partitions, hashing both the partition information and the scan
ranges. At runtime, the TupleCacheNode looks up the partition
associated with a scan node and hashes the additional information
on the HdfsPartitionDescriptor.
This includes some test-only changes to make it possible to run the
TestBinaryType::test_json_binary_format test case with tuple caching.
ImpalaTestSuite::_get_table_location() (used by clone_table()) now
detects a fully-qualified table name and extracts the database from it.
It only uses the vector to calculate the database if the table is
not fully qualified. This allows a test to clone a table without
needing to manipulate its vector to match the right database. This
also changes _get_table_location() so that it does not switch into the
database. This required reworking test_scanners_fuzz.py to use absolute
paths for queries. It turns out that some tests in test_scanners_fuzz.py
were running in the wrong database and running against uncorrupted
tables. After this is corrected, some tests can crash Impala. This
xfails those tests until this can be fixed (tracked by IMPALA-14219).
Testing:
- Added a frontend test in TupleCacheTest for a table with
multiple partitions pointed at the same place.
- Added custom cluster tests testing both issues
Change-Id: I3a7109fcf8a30bf915bb566f7d642f8037793a8c
Reviewed-on: http://gerrit.cloudera.org:8080/23074
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
If an external table contains data files in subdirectories, and
recursive listing is enabled, Impala considers the files in the
subdirectories as part of the table. However, currently INSERT OVERWRITE
and TRUNCATE do not always delete these files, leading to data
corruption.
This change takes care of INSERT OVERWRITE.
Before this change, for unpartitioned external tables, only top-level
data files were deleted and data files in subdirectories (whether
hidden, ignored or normal) were kept.
After this change, directories are also deleted in addition to
(non-hidden) data files, with the exception of hidden and ignored
directories. (Note: for ignored directories, see
--ignored_dir_prefix_list).
Note that for partitioned tables, INSERT OVERWRITE completely removes
the partition directories that are affected, and this change does not
alter that.
Testing:
- extended the tests in test_recursive_listing.py::TestRecursiveListing
Change-Id: I1a40a22e18e6a384da982d300422ac8995ed0273
Reviewed-on: http://gerrit.cloudera.org:8080/23165
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Adds support to cancel a query during Frontend planning or metadata
operations. Frontend planning is handled by createExecRequest, so
registers Java Threads executing createExecRequest by their query ID and
provides cancelExecRequest to interrupt the Thread for a particular
query ID.
Cancellation is implemented by setting a boolean for the thread, and
calling Thread.interrupt to trigger InterruptedException from any wait
calls. Several ignored wait calls are updated to check the boolean and
throw an exception if the query has been cancelled, interrupting those
operations.
Adds periodic checks to the planning process to interrupt planning.
They're primarily useful when planning is waiting on catalogd/HMS. If
planning gets into an algorithmically complex operation, it will not be
interrupted.
Removes check_inflight, as we can now cancel a query before it's
inflight. In the case that cancellation doesn't happen immediately -
because we're in a busy frontend loop that can't be interrupted -
/cancel will block until the frontend reaches an interruption point and
returns to the backend to finalize the query.
When analysis returns, cancellation is finalized in the backend. The
/cancel_query request returns once the query is cancelled. Cancelling
a request can no longer fail, so additional checks for whether the
request has been cancelled before it started executing are added.
Removes setting UpdateQueryStatus when GetExecRequest returns because
that's already handled in ImpalaServer::Execute when it calls
UnregisterQuery in response to an error, and constitutes an update race
on the status with UnregisterQuery triggered by CancelQueryHandler. We
want to use the status from CancelQueryHandler in this case as it
provides more context (about who initiated the cancel); the result of
GetExecRequest is just UserCancelledException. Avoids calling
UnregisterQuery in Execute if the query is already finalized to avoid
redundant "Invalid or unknown query handle" logs.
Extends idle_query_statuses_ to save status for any query interrupted by
an external process - cancelled by a user or timeout - so they can be
handled consistently.
Testing:
- updates test_query_cancel_created to cancel a CREATED query
- added tests to cancel a query while metadata loading is delayed
- removes test_query_cancel_exception, as it no longer demonstrates
relevant behavior; cancelling a query that will encounter an exception
before the exception occurs is no different than other queries
- ran query_test/test_cancellation.py in exhaustive mode
- ran query_test/test_cancellation.py w/ DEFAULT_TEST_PROTOCOL=beeswax
- updates cancellation tests that expect INVALID_QUERY_HANDLE to accept
Cancelled, which is sometimes returned by interrupted query status.
Change-Id: I0d25d4c7fb0b8dcc7dad9510db1e8dca220eeb86
Reviewed-on: http://gerrit.cloudera.org:8080/21803
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
test_execute_remove_orphan_files failed in Ozone build for not getting
the correct table path.
This patch fix the issue by defining
TABLE_PATH = '{0}/{1}.db/{2}'.format(
WAREHOUSE, unique_database, tbl_name)
Other path assertions are also modified to use os.path.join.
Testing:
Run and pass test_execute_remove_orphan_files at Ozone.
Change-Id: I2aba4f464ef472ab8f9d8ff7db7b5fb31a735478
Reviewed-on: http://gerrit.cloudera.org:8080/23114
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When trying to doubly unnest a 2D array from an Iceberg table that has
delete files but not for every data file, we run into an error:
Filtering an unnested collection that comes from a UNION [ALL] is not
supported yet.
This is because there is a UNION node because of the Iceberg delete
files, and there is an added "not-empty" conjunct on the collections.
IMPALA-12753 describes a bug where a conjunct on an unnested collection
coming from a UNION ALL is only applied to the first UNION operand. To
avoid incorrectness, we disabled this case in the commit for
IMPALA-12695, but its unintended consequence is that it leads to this
error with Iceberg tables.
However, in this case with Iceberg deletes, the bug described in
IMPALA-12753 is not present because both sides of the UNION have the
same tuple id, so conjuncts are naturally applied to both sides.
This commit relaxes the check, which now does not fire if all UNION
operands have the same tuple ids.
Testing:
- existing tests related to IMPALA-12753 pass
- added a regression test with an Iceberg table with DELETE files
Change-Id: Ifbc6f580586d4b337f33a2f32052aa07f6fca828
Reviewed-on: http://gerrit.cloudera.org:8080/23107
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch implements delete orphan files query for Iceberg table.
The following statement becomes available for Iceberg tables:
- ALTER TABLE <tbl> EXECUTE remove_orphan_files(<timestamp>)
The bulk of implementation copies Hive's implementation of
org.apache.iceberg.actions.DeleteOrphanFiles interface (HIVE-27906,
6b2e21a93ef3c1776b689a7953fc59dbf52e4be4), which this patch rename to
ImpalaIcebergDeleteOrphanFiles.java. Upon execute(),
ImpalaIcebergDeleteOrphanFiles class instance will gather all URI of
valid data files and Iceberg metadata files using Iceberg API. These
valid URIs then will be compared to recursive file listing obtained
through Hadoop FileSystem API under table's 'data' and 'metadata'
directory accordingly. Any unmatched URI from FileSystem API listing
that has modification time less than 'olderThanTimestamp' parameter will
then be removed via Iceberg FileIO API of given Iceberg table. Note that
this is a destructive query that will wipe out any files within Iceberg
table's 'data' and 'metadata' directory that is not addressable by any
valid snapshots.
The execution happens in CatalogD via
IcebergCatalogOpExecutor.alterTableExecuteRemoveOrphanFiles(). CatalogD
supplied CatalogOpExecutor.icebergExecutorService_ as executor service
to execute the Iceberg API planFiles and FileIO API for deletion.
Also fixed toSql() implementation for all ALTER TABLE EXECUTE queries.
Testing:
- Add FE and EE tests.
Change-Id: I5979cdf15048d5a2c4784918533f65f32e888de0
Reviewed-on: http://gerrit.cloudera.org:8080/23042
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Since IMPALA-11591 Impala tries to avoid pushing down predicates to
Iceberg unless it is necessary (timetravel) or is likely to be useful
(at least 1 partition column is involved in predicates). While this
makes planning faster, it may miss opportunities to skip files during
planning.
This patch adds table property impala.iceberg.push_down_hint that
expects a comma separated list of column names and leads to push
down to Iceberg when there is a predicate on any of these columns.
Users can set this manually, while in the future Impala or other tools
may be able to set it automatically, e.g. during COMPUTE STATS if
there are many files with non-overlapping min/max stats for a given
column.
Note that in most cases when Iceberg can skip files the Parquet/ORC
scanner would also skip most of the data based on stat filtering. The
benefit of doing it during planning is reading less footers and a
"smaller" query plan.
Change-Id: I8eb4ab5204c20b3991fdf305d7317f4023904a0f
Reviewed-on: http://gerrit.cloudera.org:8080/22995
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
TestIcebergV2Table.test_compute_stats_table_sampling was failing in
ARM release builds. However, COMPUTE STATS with TABLESAMPLE is
inherently non-deterministic due to its use of SAMPLED_NDV().
This patch completely rewrites the tests and moves them to
test_stats_extrapolation.py to test Iceberg tables similarly to
legacy tables.
'diff_perc' argument of appx_equals() method was also updated in
the tests, as with the previous value (1.0) it only reported errors
for negative estimates.
Change-Id: I98b07b156aad300827c9e1b7970b8dfacfc6d251
Reviewed-on: http://gerrit.cloudera.org:8080/23044
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This makes two changes to deflake test_tuple_cache_tpc_queries.py.
First, it increases the runtime filter wait time from 60 seconds to
600 seconds. The correctness verification slows down the path
that produces the runtime filter. The slowdown is dependent on
the speed of storage, so this can get very slow on test machines.
Second, this skips correctness checking for locations that are just
after streaming aggregations. Streaming aggregations can produce
variable output that the correctness checking can't handle.
For example a grouping aggregation computing a sum might have
a preaggregation produce either (A: 3) or (A: 2), (A: 1) or
(A: 1), (A: 1), (A: 1). The finalization sees these as equivalent.
This marks the nodes as variable starting with the preaggregation
and clears the mark at the finalize stage.
When skipping correctness checking, the tuple cache node does not
hit the cache normally. This guarantees that its children will run
and go through correctness checking.
Testing:
- Ran test_tuple_cache_tpc_queries.py locally
- Added a frontend test for this specific case
Change-Id: If5e1be287bdb489a89aea3b2d7bec416220feb9a
Reviewed-on: http://gerrit.cloudera.org:8080/23010
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
A regression for IMPALA-10319 for Ozone environment. A hardcoded
'/test-warehouse' in the path was causing some of the 'test_charcodec'
tests to fail. Turns out the 'makedir' part is not necessary.
Change-Id: If1f74b1ddc481a996d82843041f0f031580f14e5
Reviewed-on: http://gerrit.cloudera.org:8080/23004
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
COMPUTE STATS with TABLESAMPLE clause did a full scan on Iceberg
tables since IMPALA-13737, because before this patch ComputeStatsStmt
used FeFsTable.Utils.getFilesSample() which only works correctly on
FS tables that have the file descriptors loaded. Since IMPALA-13737
the internal FS table of an Iceberg table doesn't have file descriptor
information, therefore FeFsTable.Utils.getFilesSample() returned an
empty map which turned off table sampling for COMPUTE STATS.
We did not have proper testing for COMPUTE STATS with table sampling
therefore we did not catch the regression.
This patch adds proper table sampling logic for Iceberg tables that
can be used for COMPUTE STATS. The algorithm previously found in
IcebergScanNode.getFilesSample() has been moved to
FeIcebergTable.Utils.getFilesSample().
Testing
* added e2e tests
Change-Id: Ie59d5fc1374ab69209a74f2488bcb9a7d510b782
Reviewed-on: http://gerrit.cloudera.org:8080/22873
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this patch, Impala executes EXPIRE_SNAPSHOTS operation on a
single thread. It can be really slow on cloud storage systems,
especially if the operation needs to remove lots of files.
This patch adds CatalogOpExecutor.icebergExecutorService_ to parallelize
Iceberg API call that supports passing ExecutorService, such as
ExpireSnapshots.executeDeleteWith(). Number of threads for this executor
service is controlled by CatalogD flag --iceberg_catalog_num_threads. It
is default to 16, same as --num_metadata_loading_threads default value.
Rename ValidateMinProcessingPerThread to ValidatePositiveInt64 to match
with other validators in backend-gflag-util.cc.
Testing:
- Lower sleep time between insert queries from 5s to 1s in
test_expire_snapshots and test_describe_history_params to speed up
tests.
- Manually verify that 'IcebergCatalogThread' threads are visible in
/jvm-threadz page of CatalogD.
- Pass test_iceberg.py.
Change-Id: I6dcbf1e406e1732ef8829eb0cd627d932291d485
Reviewed-on: http://gerrit.cloudera.org:8080/22980
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
As proposed in Jira, this implements decoding and encoding of text
buffers for Impala/Hive text tables. Given a table with
'serialization.encoding' property set, similarly to Hive, Impala should
be able to encode the inserted data into charset specified, consequently
saving it into a text file. The opposite decoding operation should be
performed upon reading data buffers from text files. Both operations
employ boost::locale::conv library.
Since Hive doesn't encode line delimiters, charsets that would have
delimiters stored differently from ASCII are not allowed.
One difference from Hive is that Impala implements
'serialization.encoding' only as a per partition serdeproperty to avoid
confusion of allowing both serde and tbl properties. (See related
IMPALA-13748)
Note: Due to precreated non-UTF-8 files present in the patch
'gerrit-code-review-checks' was performed locally. (See IMPALA-14100)
Change-Id: I787cd01caa52a19d6645519a6cedabe0a5253a65
Reviewed-on: http://gerrit.cloudera.org:8080/22049
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
test_parallel_checksum only need to run over single exec option
dimension and text/none format. Leaving it in TestInsertQueries will
exercise test_parallel_checksum over 'compression_codec' query
option (in exhaustive builds). The CTAS fails when
compression_codec != none since the target table is in text format
and writing to compressed text table is not supported.
This patch move test_parallel_checksum under
TestInsertNonPartitionedTable that have such limited test dimension.
Also add assertion that CTAS query is successful.
Change-Id: I2b2bc34ae48a2355ee1e6f6e9e42da9076adf96b
Reviewed-on: http://gerrit.cloudera.org:8080/22948
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
TestDecimalFuzz uses division to calculate the number of
iterations for certain tests. On Python 3, division produces
a float and range() will not take a float as an argument.
In theory, the "from __future__ import division" was supposed
to produce the same behavior on Python 2 and 3, but in practice,
the "from builtins import range" allows a float argument to
range() on Python 2 but not Python 3.
This fixes the issue by explicitly casting to an integer.
Testing:
- Ran TestDecimalFuzz with Python 3
Change-Id: I4cd4daecde690bf41a4e412c02c23cbb6ae5a14c
Reviewed-on: http://gerrit.cloudera.org:8080/22955
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch enables late materialization for collections to avoid the
cost of materializing collections that will never be accessed by the
query.
For a collection column, late materialization takes effect only when the
collection column is not used in any predicate, including the `!empty()`
predicate added by the planner. Otherwise we need to read every row to
evaluate the predicate and cannot skip any. Therefore, this patch skips
registering the `!empty()` predicates if the query contains zipping
unnests. This can affect performance if the table contains many empty
collections, but should be noticeable only in very extreme cases.
The late materialization threshold is set to 1 in HdfsParquetScanner
when there is any collection that can be skipped.
This patch also adds the detail of `HdfsScanner::parse_status_` to the
error message returned by the HdfsParquetScanner to help figure out the
root cause.
Performance:
- Tests with the queries involving collection columns in table
`tpch_nested_parquet.customer` show that when the selectivity is low,
the single-threaded (1 impalad and MT_DOP=1) scanning time can be
reduced by about 50%, while when the selectivity is high, the scanning
time almost does not change.
- For queries not involving collections, performance A/B testing
shows no regression on TPC-H.
Testing:
- Added a runtime profile counter NumTopLevelValuesSkipped to record
the total number of top-level values skipped for all columns. The
counter only counts the values that are not skipped as a page.
- Added e2e test cases in test_parquet_late_materialization.py to ensure
that late materialization works using the new counter.
Change-Id: Ia21bdfa6811408d66d74367e0a9520e20951105f
Reviewed-on: http://gerrit.cloudera.org:8080/22662
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Refactors ParallelFileMetadataLoader to be usable for multiple types of
metadata. Uses it to collect checksums for new files in parallel.
Testing: adds test that multiple loading threads are used and checksum
does not take too long.
Change-Id: I314621104e4757620c0a90d41dd6875bf8855b51
Reviewed-on: http://gerrit.cloudera.org:8080/22872
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
test_scan_metrics_in_profile was querying pre-written Iceberg V2
tables. The position delete files of such tables contain hard-coded
URIs of data files, i.e. URIs that start with "hdfs://localhost...".
Therefore the test only worked well in HDFS builds.
This patch splits the test into two parts:
* test_scan_metrics_in_profile_basic: it works on all storage systems
as it only works on Iceberg tables that don't have delete files.
* test_scan_metrics_in_profile_with_deletes: uses Iceberg tables
that have delete files, therefore it is only executed on HDFS.
Change-Id: I80a7c6469a7f56b58254e1327a05ef7b3dc9c9ff
Reviewed-on: http://gerrit.cloudera.org:8080/22931
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
test_low_mem_limit_orderby_all is flaking if test_mem_limit equals 100
and 120 in test vector. The minimum mem_limit to run this test is 120MB
+ 30MB = 150MB. Thus, this test vector expect one of
MEM_LIMIT_ERROR_MSGS will be thrown because mem_limit (test_mem_limit)
is not enough.
Parquet scan under this low mem_limit sometimes throws "Couldn't skip
rows in column" error instead. This possibly indicate memory exhaustion
happen while reading parquet page index or late materialization (see
IMPALA-5843, IMPALA-9873, IMPALA-11134). This patch attempt to deflake
the test by adding "Couldn't skip rows in column" into
MEM_LIMIT_ERROR_MSGS.
Change-Id: I43a953bc19b40256e3a8fe473b1498bbe477c54d
Reviewed-on: http://gerrit.cloudera.org:8080/22932
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is the final patch to move all Impala e2e and custom cluster tests
to use HS2 protocol by default. Only beeswax-specific test remains
testing against beeswax protocol by default. We can remove them once
Impala officially remove beeswax support.
HS2 error message formatting in impala-hs2-server.cc is adjusted a bit
to match with formatting in impala-beeswax-server.cc.
Move TestWebPageAndCloseSession from webserver/test_web_pages.py to
custom_cluster/test_web_pages.py to disable glog log buffering.
Testing:
- Pass exhaustive tests, except for some known and unrelated flaky
tests.
Change-Id: I42e9ceccbba1e6853f37e68f106265d163ccae28
Reviewed-on: http://gerrit.cloudera.org:8080/22845
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Jason Fehr <jfehr@cloudera.com>
PartitionDeltaUpdater has two sub-classes, PartNameBasedDeltaUpdater and
PartBasedDeltaUpdater. They are used in reloading metadata of a table.
Their constructors invoke HMS RPCs which could be slow and should be
tracked in the catalog timeline.
This patch adds missing timeline items for those HMS RPCs.
Tests:
- Added e2e tests
Change-Id: Id231c2b15869aac2dae3258817954abf119da802
Reviewed-on: http://gerrit.cloudera.org:8080/22917
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
cancel_query_and_validate_state is a helper method used to test query
cancellation with concurrent fetch. It is still use beeswax client by
default.
This patch change the test method to use HS2 protocol by default. The
changes include following:
1. Set TGetOperationStatusResp.operationState to
TOperationState::ERROR_STATE if returning abnormally.
2. Use separate MinimalHS2Client for
(execute_async, fetch, get_runtime_profile) vs cancel vs close.
Cancellation through KILL QUERY still instantiate new
ImpylaHS2Connection client.
3. Implement required missing methods in MinimalHS2Client.
4. Change MinimalHS2Client logging pattern to match with other clients.
Testing:
Pass test_cancellation.py and TestResultSpoolingCancellation in core
exploration mode. Also fix default_test_protocol to HS2 for these tests.
Change-Id: I626a1a06eb3d5dc9737c7d4289720e1f52d2a984
Reviewed-on: http://gerrit.cloudera.org:8080/22853
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
maxRowsInHeaps calculation may overflow because it use simple
multiplication. This patch fix the bug by calculating it using
checkedMultiply(). A broader refactoring will be done by IMPALA-14071.
Testing:
Add ee tests TestTopNHighNdv that exercise the issue.
Change-Id: Ic6712b94f4704fd8016829b2538b1be22baaf2f7
Reviewed-on: http://gerrit.cloudera.org:8080/22896
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-11604 (part 2) changes how many instances to create in
Scheduler::CreateInputCollocatedInstances. This works when the left
child fragment of a parent fragment is distributed across nodes.
However, if the left child fragment instance is limited to only 1
node (the case of UNPARTITIONED fragment), the scheduler might
over-parallelize the parent fragment by scheduling too many instances in
a single node.
This patch attempts to mitigate the issue in two ways. First, it adds
bounding logic in PlanFragment.traverseEffectiveParallelism() to lower
parallelism further if the left (probe) side of the child fragment is
not well distributed across nodes.
Second, it adds TQueryExecRequest.max_parallelism_per_node to relay
information from Analyzer.getMaxParallelismPerNode() to the scheduler.
With this information, the scheduler can do additional sanity checks to
prevent Scheduler::CreateInputCollocatedInstances from
over-parallelizing a fragment. Note that this sanity check can also cap
MAX_FS_WRITERS option under a similar scenario.
Added ScalingVerdict enum and TRACE log it to show the scaling decision
steps.
Testing:
- Add planner test and e2e test that exercise the corner case under
COMPUTE_PROCESSING_COST=1 option.
- Manually comment the bounding logic in traverseEffectiveParallelism()
and confirm that the scheduler's sanity check still enforces the
bounding.
Change-Id: I65223b820c9fd6e4267d57297b1466d4e56829b3
Reviewed-on: http://gerrit.cloudera.org:8080/22840
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch attempt to stabilize TestFetch by using HS2 as test protocol.
test_rows_sent_counters is modified to use the default hs2_client.
test_client_fetch_time_stats and test_client_fetch_time_stats_incomplete
is modified to use MinimalHS2Connection that has more simpler mechanism
in terms of fetching (ImpylaHS2Connection always fetch 10240 rows at a
time).
Implemented minimal functions needed to wait for finished state and pull
runtime profile at MinimalHS2Connection.
Testing:
Loop the test 50 times and pass them all.
Change-Id: I52651df37a318357711d26d2414e025cce4185c3
Reviewed-on: http://gerrit.cloudera.org:8080/22847
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fixes an issue where EXEC_TIME_LIMIT_S was inaccurately
enforced by including the planning time in its countdown. The timer
for EXEC_TIME_LIMIT_S is now started only after the coordinator
reaches the "Ready to start on the backends" state, ensuring that
this time limit applies strictly to the execution phase.
This patch also adds a DebugAction PLAN_CREATE in the planning phase
for the testing purpose.
Tests:
Passed core tests.
Adds an ee testcase query_test/test_exec_time_limit.py.
Change-Id: I825e867f1c9a39a9097d1c97ee8215281a009d7d
Reviewed-on: http://gerrit.cloudera.org:8080/22837
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently, Impala always assumes that the data in the binary columns of
JSON tables is base64 encoded. However, before HIVE-21240, Hive wrote
binary data to JSON tables without base64 encoding it, instead writing
it as escaped strings. After HIVE-21240, Hive defaults to base64
encoding binary data when writing to JSON tables and introduces the
serde property 'json.binary.format' to indicate the encoding method of
binary data in JSON tables.
To maintain consistency with Hive and avoid correctness issues caused by
reading data in an incorrect manner, this patch also introduces the
serde property 'json.binary.format' to specify the reading method for
binary data in JSON tables. Currently, this property supports reading in
either base64 or rawstring formats, same as Hive.
Additionally, this patch introduces a query option 'json_binary_format'
to achieve the same effect. This query option will only take effect for
JSON tables where the serde property 'json.binary.format' is not set.
The reading format of binary columns in JSON tables can be configured
globally by setting the 'default_query_options'. It should be noted that
the default value of 'json_binary_format' is 'NONE', and impala will
prohibit reading binary columns of JSON tables that either have
"no 'json.binary.format' set and 'json_binary_format' is 'NONE'" or
"an invalid 'json.binary.format' value set", and will provide an error
message to avoid using an incorrect format without the user noticing.
Testing:
- Enabled existing binary type E2E tests for JSON tables
- Added new E2E test for 'json.binary.format'
Change-Id: Idf61fa3afc0f33caa63fbc05393e975733165e82
Reviewed-on: http://gerrit.cloudera.org:8080/22289
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When calling planFiles() on an Iceberg table, it can give us some
metrics like total planning time, number of data/delete files and
manifests, how many of these could be skipped etc.
This change integrates these metrics into the query profile, under the
"Frontend" section. These metrics are per-table, so if multiple tables
are scanned for the query there will be multiple sections in the
profile.
Note that we only have these metrics for a table if Iceberg needs to be
used for planning for that table, e.g. if a predicate is pushed down to
Iceberg or if there is time travel. For tables where Iceberg was not
used in planning, the profile will contain a short note describing this.
To facilitate pairing the metrics with scans, the metrics header
references the plan node responsible for the scan. This will always be
the top level node for the scan, so it can be a SCAN node, a JOIN node
or a UNION node depending on whether the table has delete files.
Testing:
- added EE tests in iceberg-scan-metrics.tests
- added a test in PlannerTest.java that asserts on the number of
metrics; if it changes in a new Iceberg release, the test will fail
and we can update our reporting
Change-Id: I080ee8eafc459dad4d21356ac9042b72d0570219
Reviewed-on: http://gerrit.cloudera.org:8080/22501
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
This patch aims to extract existing verifications on DEBUG_ACTION query
option format onto pre-planner stage SetQueryOption(), in order to
prevent failures on execution stage. Also, it localizes verification
code for two existing types of debug actions.
There are two types of debug actions, global e.g. 'RECVR_ADD_BATCH:FAIL'
and ExecNode debug actions, e.g. '0:GETNEXT:FAIL'. Two types are
implemented independently in source code, both having verification code
intertwined with execution. In addition, global debug actions subdivide
into C++ and Java, the two being more or less synchronized though.
In case of global debug actions, most of the code inside existing
DebugActionImpl() consists of verification, therefore it makes sense to
make a wrapper around it for separating out the execution code.
Things are worse for ExecNode debug actions, where verification code
consists of two parts, one in DebugOptions() constructor and another one
in ExecNode::ExecDebugActionImpl(). Additionally, some verification in
constructor produces warnings, while ExecDebugActionImpl() verification
either fails on DCHECK() or (in a single case) returns an error. For
this case, a reasonable solution seems to be simply calling the
constructor for a temporary object and extracting verification code from
ExecNode::ExecDebugActionImpl(). This has the drawback of having the
same warning being produced two times.
Finally, having extracted verification code for both types, logic in
impala::SetQueryOption() combines the two verification mechanisms.
Note: In the long run, it is better to write a single verification
routine for both Global and ExecNode debug actions, ideally as part of a
general unification of the two existing debug_action mechanisms. With
this in mind, the current patch intends to preserve current behavior,
while avoiding complex refactoring.
Change-Id: I53816aba2c79b556688d3b916883fee7476fdbb5
Reviewed-on: http://gerrit.cloudera.org:8080/22734
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>