impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
dependabot[bot]	4353e3670e	Bump idna from 2.8 to 3.7 in /infra/python/deps Bumps [idna](https://github.com/kjd/idna) from 2.8 to 3.7. - [Release notes](https://github.com/kjd/idna/releases) - [Changelog](https://github.com/kjd/idna/blob/master/HISTORY.rst) - [Commits](https://github.com/kjd/idna/compare/v2.8...v3.7) --- updated-dependencies: - dependency-name: idna dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2024-04-11 23:26:32 +00:00
Yida Wu	9837637d93	IMPALA-12920: Support ai_generate_text built-in function for OpenAI's chat completion API Added support for following built-in functions: - ai_generate_text_default(prompt) - ai_generate_text(ai_endpoint, prompt, ai_model, ai_api_key_jceks_secret, additional_params) 'ai_endpoint', 'ai_model' and 'ai_api_key_jceks_secret' are flagfile options. 'ai_generate_text_default(prompt)' syntax expects all these to be set to proper values. The other syntax, will try to use the provided input parameter values, but fallback to instance level values if the inputs are NULL or empty. Only public OpenAI (api.openai.com) and Azure OpenAI (openai.azure.com) API endpoints are currently supported. Exposed these functions in FunctionContext so that they can also be called from UDFs: - ai_generate_text_default(context, model) - ai_generate_text(context, ai_endpoint, prompt, ai_model, ai_api_key_jceks_secret, additional_params) Testing: - Added unit tests for AiGenerateTextInternal function - Added fe test for JniFrontend::getSecretFromKeyStore - Ran manual tests to make sure Impala can talk with OpenAI LLMs using 'ai_generate_text' built-in function. Example sql: select ai_generate_text("https://api.openai.com/v1/chat/completions", "hello", "gpt-3.5-turbo", "open-ai-key", '{"temperature": 0.9, "model": "gpt-4"}') - Tested using standalone UDF SDK and made sure that the UDFs can invoke BuiltInFunctions (ai_generate_text and ai_generate_text_default) Change-Id: Id4446957f6030bab1f985fdd69185c3da07d7c4b Reviewed-on: http://gerrit.cloudera.org:8080/21168 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-11 07:25:50 +00:00
Laszlo Gaal	408c119f7d	IMPALA-12564: Prevent Hive loading libfesupport.so in the minicluster during TSAN runs During TSAN runs all Impala binaries (including libfesupport.so) are built with TSAN options, which include a reference to the external symbol __tsan_init. This causes a problem for libfesupport.so when it is loaded into Hive during minicluster startup, because the Java VM running Hive's code cannot supply this symbol (the stock JVM is obviously not built with TSAN). Unfortunately this symbol resolution failure causes Hive's JVM simply to abort on Red Hat 8 (or later) and on Ubuntu 20.04 (or later). On earlier versions of the same platforms the JVM turned the same failure into an UnsatisfiedLinkError exception, which is actually handled by Hive. This patch prevents libfesupport.so from being loaded into Hive for TSAN runs so that the minicluster can actually be started. This is achieved by not adding the directory containing libfesupport.so to JAVA_LIBRARY_PATH, preventing the JVM from finding it. Change-Id: Ie030d9876c297d6e9dae80eba37e525ee2bccb20 Reviewed-on: http://gerrit.cloudera.org:8080/21191 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-10 20:37:02 +00:00
Gabor Kaszab	df7aac9517	IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans When a table is partitioned IcebergScanNode sorts the file descriptors for better scheduling. However, the list of file descriptors comes from IcebergContentFileStore and is shared between different select queries on the table. When another query tries to iterate the list of file descriptors and at the same time the IcebergScanNode sorts them we get a ConcurrentModificationException. To solve this IceberScanNode now creates its own copy of the file descriptor list not to interfere with other queries. Manual testing: 300-400 SELECT * Iceberg queries were sent into Impala in a loop that confidently reproduced the original issue. With the fix the issue is gone. The queries used for the repro: 1: select * from functional_parquet.iceberg_v2_partitioned_position_deletes_orc a, functional_parquet.iceberg_partitioned_orc_external b where a.action = b.action and b.id=3; 2: select * from functional_parquet.iceberg_v2_equality_delete_schema_evolution; Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3 Reviewed-on: http://gerrit.cloudera.org:8080/21267 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-10 17:47:32 +00:00
Csaba Ringhofer	8ff51fbf74	IMPALA-5323: Support BINARY columns in Kudu tables The patch adds read and write support for BINARY columns in Kudu tables. Predicate push down is implemented, but is incomplete: a constant binary argument will be only pushed down if the constant folding never encounters non-ascii strings. Examples: - cast(unhex(hex("aa")) as binary) can be pushed down - cast(hex(unhex("aa")) as binary) can't be pushed down as unhex("aa") is not ascii (even though the final result is ascii) See IMPALA-10349 for more details on this limitation. The patch also changes casting BINARY <-> STRING from noop to calling an actual function. While this may add some small overhead it allows the backend to know whether an expression returns STRING or BINARY. Change-Id: Iff701a4b3a09ce7b6982c5d238e65f3d4f3d1151 Reviewed-on: http://gerrit.cloudera.org:8080/18868 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-10 16:17:15 +00:00
Michael Smith	6121c4f7d6	IMPALA-12905: Disk-based tuple caching This implements on-disk caching for the tuple cache. The TupleCacheNode uses the TupleFileWriter and TupleFileReader to write and read back tuples from local files. The file format uses RowBatch's standard serialization used for KRPC data streams. The TupleCacheMgr is the daemon-level structure that coordinates the state machine for cache entries, including eviction. When a writer is adding an entry, it inserts an IN_PROGRESS entry before starting to write data. This does not count towards cache capacity, because the total size is not known yet. This IN_PROGRESS entry prevents other writers from concurrently writing the same entry. If the write is successful, the entry transitions to the COMPLETE state and updates the total size of the entry. If the write is unsuccessful and a new execution might succeed, then the entry is removed. If the write is unsuccessful and won't succeed later (e.g. if the total size of the entry exceeds the max size of an entry), then it transitions to the TOMBSTONE state. TOMBSTONE entries avoid the overhead of trying to write entries that are too large. Given these states, when a TupleCacheNode is doing its initial Lookup() call, one of three things can happen: 1. It can find a COMPLETE entry and read it. 2. It can find an IN_PROGRESS/TOMBSTONE entry, which means it cannot read or write the entry. 3. It finds no entry and inserts its own IN_PROGRESS entry to start a write. The tuple cache is configured using the tuple_cache parameter, which is a combination of the cache directory and the capacity similar to the data_cache parameter. For example, /data/0:100GB uses directory /data/0 for the cache with a total capacity of 100GB. This currently supports a single directory, but it can be expanded to multiple directories later if needed. The cache eviction policy can be specified via the tuple_cache_eviction_policy parameter, which currently supports LRU or LIRS. The tuple_cache parameter cannot be specified if allow_tuple_caching=false. This contains contributions from Michael Smith, Yida Wu, and Joe McDonnell. Testing: - This adds basic custom cluster tests for the tuple cache. Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf Reviewed-on: http://gerrit.cloudera.org:8080/21171 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Kurt Deschler <kdeschle@cloudera.com>	2024-04-10 03:11:49 +00:00
Riza Suminto	4764b91f42	IMPALA-12965: Add debug query option RUNTIME_FILTER_IDS_TO_SKIP Runtime filter still have negative effect on certain scenario such as long wait time that delays scan and cascading runtime filter chain that prevents parallel execution of fragments. Having debug query option to simply skip a runtime filter id from being scheduled can help us investigate and test a solution early before implementing the improvement code. This patch add RUNTIME_FILTER_IDS_TO_SKIP option to do that. This patch also improve parsing of multi-value query options to not split at ',' char that is within two double quotes and ignore empty/whitespace value if exist. Testing: - Add BE test in query-options-test.cc - Add FE test in runtime-filter-query-options.test Change-Id: I897e37685dd1ec279989b55560ec7616a00d2280 Reviewed-on: http://gerrit.cloudera.org:8080/21230 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-09 21:35:53 +00:00
Csaba Ringhofer	5c003cdcda	IMPALA-12978: Fix impala-shell`s live progress with older Impalas If the Impala server has an older version that does not contain IMPALA-12048 then TExecProgress.total_fragment_instances will be None, leading to error when checking total_fragment_instances > 0. Note that this issue only comes with Python 3, in Python 2 None > 0 returns False. Testing: - Manually checked with a modified Impala that doesn't set total_fragment_instances. Only the scanner progress bar is shown in this case. Change-Id: Ic6562ff6c908bfebd09b7612bc5bcbd92623a8e6 Reviewed-on: http://gerrit.cloudera.org:8080/21256 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zihao Ye <eyizoha@163.com>	2024-04-09 02:23:05 +00:00
Xiang Yang	07218588a6	IMPALA-12362: (part-3/4) Add more binaries to packaging module. Add admissiond service and impala-profile-tool to packaging module. Testing: - Manually deploy packages on Ubuntu22.04 and verify it. Change-Id: I594742037a05d4d74d6a2bc011619713f7ca12e4 Reviewed-on: http://gerrit.cloudera.org:8080/20929 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-08 10:23:18 +00:00
Zoltan Borok-Nagy	fb3c379f39	IMPALA-12894: Addendum: Re-enable test_plain_count_star_optimization test_plain_count_star_optimization was disabled by IMPALA-12894 part 1, and part 2 didn't re-enable it. This patch re-enables it. Change-Id: I30629632742c0d402a6bb852a169359edac59eba Reviewed-on: http://gerrit.cloudera.org:8080/21249 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>	2024-04-08 09:36:17 +00:00
Xiang Yang	e74bb9d81b	IMPALA-12362: (part-2/4) Optimize default configurations for packaging module. To avoid absolutely paths and keep it simple, optimize the default configurations for packaging module by remove or change some entries. At the same time, add license header to 'package/conf/-site.xml' and rename them to '-site.xml.template' to force administrator making configurations appropriate for their cluster. Testing: - Manually deploy packages on Ubuntu22.04 and verify it. Change-Id: Ifda229b779a3d6fca647bb81fe23dd61ad7e5d66 Reviewed-on: http://gerrit.cloudera.org:8080/20928 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-08 04:29:26 +00:00
Xiang Yang	b64dc110c1	IMPALA-12362: (part-1/4) Refactor service management scripts. Uniform all service management scripts to 'bin/impala.sh', administrator can customize environment variables based on their cluster at 'conf/impala-env.sh', as well as set flags at 'conf/impalad_flags...'. Usually administrator can override the environment variables in 'conf/impala-env.sh' with commandline arguments. The same is true for flags in 'conf/impalad_flags...'. This flexibility can be used in scenarios such as supporting multi-instance deployments. The directory structure has been adjusted as follows: - put java libs to 'lib/jars' directory. - put native libs to 'lib/native' directory. - put impalad binary to 'sbin' directory. Testing: - Manually deploy packages on Ubuntu22.04 and verify it. Change-Id: I8f4dcad9cfa12d351d562e7ef8c0a8957d3ca147 Reviewed-on: http://gerrit.cloudera.org:8080/20921 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-06 11:22:26 +00:00
wzhou-code	e50bfa8376	IMPALA-12925: Fix decimal data type for external JDBC table Decimal type is a primitive data type for Impala. Current code returns wrong values for columns with decimal data type in external JDBC tables. This patch fixes wrong values returned from JDBC data source, and supports pushing down decimal type of predicates to remote database and remote Impala. The decimal precision and scale of the columns in external JDBC table must be no less than the decimal precision and scale of the corresponding columns in the table of remote database. Otherwise, Impala fails with an error since it may cause truncation of decimal data. Testing: - Added Planner test for pushing down decimal type of predicates. - Added end-to-end unit-tests for tables with decimal type of columns for Postgres, MySQL, and Impala-to-Impala. - Passed core-tests. Change-Id: I8c9d2e0667c42c0e52436b158e3dfe3ec14b9e3b Reviewed-on: http://gerrit.cloudera.org:8080/21218 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Abhishek Rawat <arawat@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-05 09:16:53 +00:00
Yida Wu	4be5fd8896	IMPALA-12960: Fix Incorrect RowsPassedThrough Metric in Streaming Aggregation This patch fixes a bug in the RowsPassedThrough metric within the query profile while using Streaming Aggregation. The issue is from the AddBatchStreaming() function's logic, where the number of rows in the output batch isn't necessarily initialized to 0, while the function uses num_rows() of the output batch directly to be the actual number of rows returned and passed through of this specific aggregator. This discrepancy can significantly impact the accuracy of the returned and passed through numbers, as well as the calculation of reduction rates during hash table expansion in Streaming Aggregation. Huge differences can be observed especially when using the rollup function. The solution is to calculate the actual number of rows added to the output batch within each round of the AddBatchStreaming() function. Tests: Passed exhaustive tests. Added a corresponding case in tpch-passthrough-aggregations.test. Change-Id: I59205a4b06824ee1607a25e906db1f96dc4eda9f Reviewed-on: http://gerrit.cloudera.org:8080/21235 Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-04 15:27:47 +00:00
Gabor Kaszab	da8704f90b	IMPALA-12612: SELECT * queries expand complex type columns from Iceberg metadata tables Similarly to how regular tables behave, the nested columns are omitted when we do a SELECT * on Iceberg metadata tables and the user needs to turn EXPAND_COMPLEX_TYPES on to include the nested columns into the result. This patch changes this behaviour to unconditionally include the nested columns from Iceberg metadata tables. Note, the behavior of handling nested columns from regular tables doesn't change with this patch. Testing: - Adjusted the SELECT * metadata table queries to add the nested columns into the results. - Added some new tests where both metadata tables and regular tables were queried in the same query. Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02 Reviewed-on: http://gerrit.cloudera.org:8080/21236 Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-04 14:04:51 +00:00
Csaba Ringhofer	47389f715b	IMPALA-12969: Release JNI array if DeserializeThriftMsg failed Before this patch ReleaseByteArrayElements was not called in case the deserialization failed (e.g. by hitting Thrift's MaxMessageSize). This could potentially cause JVM/native heap leak, depending on how the JVM handled the array allocation. Change-Id: Id2c0335b12e9289ae851d0ec050765951a8ca6c7 Reviewed-on: http://gerrit.cloudera.org:8080/21234 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-04 08:58:25 +00:00
Riza Suminto	97adba5192	IMPALA-12881: Use getFkPkJoinCardinality to reduce scan cardinality IMPALA-12018 adds reduceCardinalityForScanNode to lower cardinality estimation when a runtime filter is involved. It calls JoinNode.computeGenericJoinCardinality(). However, if the originating join node has FK-PK conjunct, it should be possible to obtain a lower cardinality estimate by calling JoinNode.getFkPkJoinCardinality() instead. This patch adds that analysis and calls JoinNode.getFkPkJoinCardinality() when possible. It is, however, only limited to runtime filters that evaluate at the storage layer, such as partition filter and pushed-down Kudu filter. Row-level runtime filters that evaluate at scan node will continue using JoinNode.computeGenericJoinCardinality(). This distinction is because a storage layer filter is applied more consistently than a row-level filter. For example, a partition filter evaluate all partition_id and never disabled regardless of its precision (see HdfsScanNodeBase::PartitionPassesFilters). On the other hand, scan node can disable a row-level filter later on if it is deemed ineffective / not precise enough (see HdfsScanner::CheckFiltersEffectiveness, LocalFilterStats::enabled_for_row, and min_filter_reject_ratio flag). For the pushed-down Kudu filter, Impala will rely on Kudu to evaluate the filter. Runtime filters can arrive late as well. But for both storage layer filter and row-level filter, the scan node can stop waiting and start scanning after runtime_filter_wait_time_ms passed. Scan node will still evaluate a late runtime filter later on if the scan process is still ongoing. Also, note that this cardinality reduction algorithm is based only on highly selective runtime filters to increase its estimate confidence (see RuntimeFilter.isHighlySelective()). Testing: - Update TpcdsCpuCostPlannerTest. - Pass FE tests. Change-Id: I6efafffc8f96247a860b88e85d9097b2b4327f32 Reviewed-on: http://gerrit.cloudera.org:8080/21118 Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-04 04:46:19 +00:00
Daniel Becker	a623447206	IMPALA-12899: Temporary workaround for BINARY in complex types The BINARY type is currently not supported inside complex types and a cross-component decision is probably needed to support it (see IMPALA-11491). We would like to enable EXPAND_COMPLEX_TYPES for Iceberg metadata tables (IMPALA-12612), which requires that queries with BINARY inside complex types don't fail. Enabling EXPAND_COMPLEX_TYPES is a more prioritised issue than IMPALA-11491, so we have come up with a temporary solution. This change NULLs out BINARY values in complex types coming from Iceberg metadata tables and logs a warning. BINARYs in complex types from regular tables are not affected by this change. Testing: - Added test queries in iceberg-metadata-tables.test. Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 Reviewed-on: http://gerrit.cloudera.org:8080/21219 Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-03 14:53:53 +00:00
Michael Smith	f05eac6476	IMPALA-12602: Unregister queries on idle timeout Queries cancelled due to idle_query_timeout/QUERY_TIMEOUT_S are now also Unregistered to free any remaining memory, as you cannot fetch results from a cancelled query. Adds a new structure - idle_query_statuses_ - to retain Status messages for queries closed this way so that we can continue to return a clear error message if the client returns and requests query status or attempts to fetch results. This structure must be global because HS2 server can only identify a session ID from a query handle, and the query handle no longer exists. SessionState tracks queries added to idle_query_statuses_ so they can be cleared when the session is closed. Also ensures MarkInactive is called in ClientRequestState when Wait() completes. Previously WaitInternal would only MarkInactive on success, leaving any failed requests in an active state until explicitly closed or the session ended. The beeswax get_log RPC will not return the preserved error message or any warnings for these queries. It's also possible the summary and profile are rotated out of query log as the query is no longer inflight. This is an acceptable outcome as a client will likely not look for a log/summary/profile after it times out. Testing: - updates test_query_expiration to verify number of waiting queries is only non-zero for queries cancelled by EXEC_TIME_LIMIT_S and not yet closed as an idle query - modified test_retry_query_timeout to use exec_time_limit_s because queries closed by idle_timeout_s don't work with get_exec_summary Change-Id: Iacfc285ed3587892c7ec6f7df3b5f71c9e41baf0 Reviewed-on: http://gerrit.cloudera.org:8080/21074 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-03 03:25:10 +00:00
stiga-huang	effc9df933	IMPALA-12782: Show info of the event processing in /events webUI The /events page of catalogd shows the metrics and status of the event-processor. This patch adds more info in this page, including - lag info - current event batch that's being processed See the screenshot attached in the JIRA for how it looks like. Also moves the error message to the top to highlight the error status. Fixes the issue of not updating latest event id when event processor is stopped. Also fixes the issue of error message not cleared after global INVALIDATE METADATA. Adds a debug action, catalogd_event_processing_delay, to inject a sleep while processing an event. So the web page can be captured more easily. Also adds a missing test for showing the error message of event-processing in the /events page. Tests: - Add e2e test to verify the content of the page. Change-Id: I2e7d4952c7fd04ae89b6751204499bf9dd99f57c Reviewed-on: http://gerrit.cloudera.org:8080/20986 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-02 18:40:26 +00:00
Daniel Becker	63f52807f0	IMPALA-12611: Add support to MAP type Iceberg Metadata table columns This change adds support for querying MAP types from Iceberg Metadata tables. The 'IcebergMetadataScanner.ArrayScanner' java class is renamed to 'CollectionScanner' and extended to be able to handle maps. For arrays the iteration returns the element as before, for maps it returns 'Map.Entry' objects. Note that collections in the FROM clause are still not supported. Testing: - Added E2E tests in iceberg-metadata-tables.test. Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a Reviewed-on: http://gerrit.cloudera.org:8080/21125 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-02 18:30:39 +00:00
Gabor Kaszab	18b9c08c52	IMPALA-12600: Schema evolution with equality delete files This patch adds test coverage for a table that has equality delete files and also schema evolution, where the schema changes didn't affect the primary key columns. Note, partition evolution on tables with equality deletes is still not supported. Testing: - Added a new test table for this use-case and some E2E tests on that table. Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395 Reviewed-on: http://gerrit.cloudera.org:8080/21210 Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-02 13:11:21 +00:00
Daniel Becker	72732da9d8	IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables After this change, the new SHOW METADATA TABLES IN statement can be used to list all the available metadata tables of an Iceberg table. Note that in contrast to querying the contents of Iceberg metadata tables, this does not require fully qualified paths, e.g. both SHOW METADATA TABLES IN functional_parquet.iceberg_query_metadata; and USE functional_parquet; SHOW METADATA TABLES IN iceberg_query_metadata; work. The available metadata tables for all Iceberg tables are the same, corresponding to the values of the enum "org.apache.iceberg.MetadataTableType", so there is actually no need to pass the name of the regular table for which the metadata table list is requested through Thrift. This change, however, does send the table name because this way - if we add support for metadata tables for other table formats, the table name/path will be necessary to determine the correct list of metadata tables - we could later add support for different authorisation policies for individual tables - we can check also at the point of generating the list of metadata tables that the table is an Iceberg table Testing: - added and updated tests in ParserTest, AnalyzeDDLTest, ToSqlTest and AuthorizationStmtTest - added a custom cluster test in test_authorization.py - added functional tests in iceberg-metadata-tables.test Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49 Reviewed-on: http://gerrit.cloudera.org:8080/21026 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>	2024-04-02 09:58:37 +00:00
zhangyifan27	23a14a249c	IMPALA-12852: Make Kudu service start and stop independent This patch decouples run-kudu.sh and kill-kudu.sh from run-mini-dfs.sh and kill-mini-dfs.sh. These scripts can be useful for setting up test environments that require no or only Kudu service. Testing: - Ran the modified and new scripts and checked they worked as expected. Change-Id: I9624aaa61353bb4520e879570e5688d5e3493201 Reviewed-on: http://gerrit.cloudera.org:8080/21090 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-02 08:26:59 +00:00
jasonmfehr	f55077007b	IMPALA-12426: Switches the duration fields to be stored in decimal seconds. The original implementation of the completed queries table stored durations in integer nanoseconds. This change modifies the duration fields to be stored as seconds with up to three digits of millisecond precision. Also reduces the default max number of queued queries to a number that will not consume as much memory. Existing sys.impala_query_log tables will need to be dropped. Testing was accomplished by modifying the python custom cluster tests. Change-Id: I842951a132b7b8eadccb09a3674f4c34ac42ff1b Reviewed-on: http://gerrit.cloudera.org:8080/21203 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-30 03:48:14 +00:00
jasonmfehr	83734a1220	IMPALA-12944: Fixes Workload Management Test Flakiness The custom cluster workload management tests are flaky because the tests can actually run before the completed queries table has been fully created by the Impala startup process. The table create sql runs asynchronously during startup and thus can take longer to finish than the custom cluster tests take to execute. This change adds checks at the beginning of each test to ensure the completed queries table sql has finished before any of the test code runs. Change-Id: I428702a210e024db95808dc2518da497426922f8 Reviewed-on: http://gerrit.cloudera.org:8080/21221 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-29 22:38:05 +00:00
jasonmfehr	deee153c76	IMPALA-12426: Skip Inserting HS2 Operation Queries into the Completed Queries Table Prevents queries associated with HS2 metadata operations from being written to the completed queries table. These queries are represented by the TMetadataOpcode enum. A Custom cluster test that makes an HS2 connection to Impala and runs these operations has been added. This test asserts that none of the operations have their queries written to the completed queries table. Change-Id: Ie19cf5953522fa85941e6c0b9c15a9c9ba9dc362 Reviewed-on: http://gerrit.cloudera.org:8080/21207 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-29 20:28:12 +00:00
Michael Smith	c529b855e9	IMPALA-12626: Add Tables Queried to profile/history Adds "Tables Queried" to the query profile, enumerating a comma-separated list of tables accessed during a query: Tables Queried: tpch.customer,tpch.lineitem Also adds "tables_queried" to impala_query_log and impala_query_live with the same content. Requires 'drop table sys.impala_query_log' to recreate it with the new column. Change-Id: I9c9c80b2adf7f3e44225a191fe8eb9df3c4bc5aa Reviewed-on: http://gerrit.cloudera.org:8080/20886 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-29 11:04:17 +00:00
Daniel Becker	9071030f7f	IMPALA-12809: Iceberg metadata table scanner should always be scheduled to the coordinator On clusters with dedicated coordinators and executors the Iceberg metadata scanner fragment(s) can be scheduled to executors, for example during a join. The fragment in this case will fail a precondition check, because either the 'frontend_' object or the table will not be present. This change forces Iceberg metadata scanner fragments to be scheduled on the coordinator. It is not enough to set the DataPartition type to UNPARTITIONED, because unpartitioned fragments can still be scheduled on executors. This change introduces a new flag in the TPlanFragment thrift struct - if it is true, the fragment is always scheduled on the coordinator. Testing: - Added a regression test in test_coordinators.py. - Added a new planner test with two metadata tables and a regular table joined together. Change-Id: Ib4397f64e9def42d2b84ffd7bc14ff31df27d58e Reviewed-on: http://gerrit.cloudera.org:8080/21138 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-29 04:40:31 +00:00
Andrew Sherman	1f9db9e05b	IMPALA-12264: Add limit on number of HS2 sessions per user. Add a new flag --max_hs2_sessions_per_user which sets a limit on the number of Hiveserver2 sessions that can be concurrently opened by a user on a coordinator. By default this value is -1, which disables the new feature. An attempt to open more sessions than the new flag value results in an error. This is implemented by maintaining a map of users to a count of HS2 sessions. If the per-user HS2 session counts are being limited in this way, then the per-user counts are visible on the /sessions page of the webui. Add a new test case in test_session_expiration.py. Change-Id: Idd28edc352102d89774f6ece5376e7c79ae41aa8 Reviewed-on: http://gerrit.cloudera.org:8080/21128 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-28 20:32:47 +00:00
jasonmfehr	c01986d9e8	IMPALA-12945: Fix Flaky Ticker Test The ticker test failed on an exhaustive release build. The failure was because the total elapsed time was longer than expected. This change bumps up the margin of error from 1% to 2% to give slightly more time on the more intensive builds. Change-Id: Ibc8cf03ae68fb3103c5bbc438c32f6565b8c406c Reviewed-on: http://gerrit.cloudera.org:8080/21214 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-28 20:06:59 +00:00
Michael Smith	9ac55828f3	IMPALA-12540: (Fixup) Add EventSequence arg to load Adds a new argument from IMPALA-12443 to Table#load. Change-Id: I46185e9c0095cc470178e0d2d45d10a1803bff99 Reviewed-on: http://gerrit.cloudera.org:8080/21222 Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2024-03-28 18:41:25 +00:00
Michael Smith	45995e6892	IMPALA-12540: Query Live Table Defines SystemTable which are in-memory tables that can provide access to Impala state. Adds the 'impala_query_live' to the database 'sys', which already exists for 'sys.impala_query_log'. Implements the 'impala_query_live' table to view active queries across all coordinators sharing the same statestore. SystemTables create new SystemTableScanNodes for their scan node implementation. When computing scan range locations, SystemTableScanNodes creates a scan range for each in the cluster (identified via ClusterMembershipMgr). This produces a plan that looks like: Query: explain select * from sys.impala_query_live +------------------------------------------------------------+ \| Explain String \| +------------------------------------------------------------+ \| Max Per-Host Resource Reservation: Memory=4.00MB Threads=2 \| \| Per-Host Resource Estimates: Memory=11MB \| \| WARNING: The following tables are missing relevant table \| \| and/or column statistics. \| \| sys.impala_query_live \| \| \| \| PLAN-ROOT SINK \| \| \| \| \| 01:EXCHANGE [UNPARTITIONED] \| \| \| \| \| 00:SCAN SYSTEM_TABLE [sys.impala_query_live] \| \| row-size=72B cardinality=20 \| +------------------------------------------------------------+ Impala's scheduler checks for whether the query contains fragments that can be scheduled on coordinators, and if present includes an ExecutorGroup containing all coordinators. These are used to schedule scan ranges that are flagged as 'use_coordinator', allowing SystemTableScanNodes to be scheduled on dedicated coordinators and outside the selected executor group. Execution will pull data from ImpalaServer on the backend via a SystemTableScanner implementation based on table name. In the query profile, SYSTEM_TABLE_SCAN_NODE includes ActiveQueryCollectionTime and PendingQueryCollectionTime to track time spent collecting QueryState from ImpalaServer. Grants QueryScanner private access to ImpalaServer, identical to how ImpalaHttpHandler access internal server state. Adds custom cluster tests for impala_query_live, and unit tests for changes to planner and scheduler. Change-Id: Ie2f9a449f0e5502078931e7f1c5df6e0b762c743 Reviewed-on: http://gerrit.cloudera.org:8080/20762 Reviewed-by: Jason Fehr <jfehr@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-28 16:34:48 +00:00
Zoltan Borok-Nagy	b03cfcf2ad	IMPALA-12894: (part 2) Fix optimized count() for Iceberg tables with dangling delete files Impala can return incorrect results if a table has dangling delete files. Dangling delete files are delete files that are part of the snapshot but they are not applicable to any of the data files. We can have such delete files after Spark's rewrite_data_files action. During analysis we check the existence of delete files based on the snapshot summary. If there are no delete files in the table, we just replace the count() expression with NumericLiteral($record_count). If there are delete files in the table (based on the summary), we set optimize_count_star_for_iceberg_v2 in the query context. Without optimize_count_star_for_iceberg_v2 in the query context, the IcebergScanPlanner would create the following plan. AGGREGATE COUNT() \| UNION ALL / \ / \ / \ SCAN all ANTI JOIN datafiles / \ without / \ deletes SCAN SCAN datafiles deletes with deletes With optimize_count_star_for_iceberg_v2 the final plan looks like the following: ArithmeticExpr(ADD) / \ / \ / \ record_count AGGREGATE of all COUNT() datafiles \| without ANTI JOIN deletes / \ / \ SCAN SCAN datafiles deletes with deletes The ArithmeticExpr(ADD) and its left child (record_count) is created by the analyzer, IcebergScanPlanner is responsible in creating the plan under AGGREGATE COUNT(). And if it has delete files and optimize_count_star_for_iceberg_v2 is true, it knows it can omit the original UNION ALL and its left child. However, IcebergScanPlanner checks delete file existence based on the result of planFiles(), hence dangling delete files are eliminated. And if there are no delete files, IcebergScanPlanner assumes that case is already handled by the Analyzer (i.e. it replaced count() with NumericLiteral($record_count)). So it will incorrectly create a normal SCAN plan of the table under COUNT(), i.e. we end up with this: ArithmeticExpr(ADD) / \ / \ / \ record_count AGGREGATE of all COUNT() datafiles \| without SCAN deletes datafiles without deletes Which means Impala will yield $record_count * 2 as a result. This patch fixes the FeIcebergTable.hasDeleteFiles() method, so it also ignores dangling delete files. Therefore, the analyzer will just substitute count() with NumericLiteral($record_count) if all deletes are dangling, i.e. no need to involve the IcebergScanPlanner at all. The patch also introduces a new query option, "iceberg_disable_count_star_optimization", so users can completely disable the statistic-based count()-optimization if necessary. Testing: * e2e tests * planner tests Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Reviewed-on: http://gerrit.cloudera.org:8080/21190 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-28 15:17:40 +00:00
Gabor Kaszab	73171cb716	IMPALA-12729: Allow creating primary keys for Iceberg tables There are writer engines that use Iceberg's identifier-field-ids from the Iceberg schema to identify the columns to be written into the equality delete files (Flink, NiFi). So far Impala wasn't able to populate this identifier-field-ids. This patch introduces the support for not enforced primary keys for Iceberg tables, where the primary key is going to be used for setting identifier-field-ids during Iceberg schema creation. Example syntax: CREATE TABLE ice_tbl ( i int NOT NULL, j int, s string NOT NULL primary key(i, s) not enforced) PARTITIONED BY SPEC (truncate(10, s)) STORED AS ICEBERG; There are some constraints with primary keys (PK) following the behavior of Flink: - Only NOT NULL columns can be in the PK. - PK is not allowed in the column definition level like 'i int NOT NULL PRIMARY KEY'. - If the table is partitioned then the partition columns have to be part of the PK. - Float and double columns are not allowed for the PK. - Not allowed to drop a column that is used as a PK. Testing: - New E2E tests added for different table creation scenarios. - Manual test to use Nifi for writing into a table with PK. Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 Reviewed-on: http://gerrit.cloudera.org:8080/21149 Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-28 13:57:07 +00:00
jasonmfehr	3e4fdeece1	IMPALA-12824: Removes the prettyprint_duration Built-in Function The prettyprint_duration function was originally implemented in IMPALA-12824 to work with the workload management tables which stored durations in integer nanoseconds. These tables have changed to store decimal seconds. The prettyprint_duration function would have required a large investment of time to make it work with decimal values, and since the new format is more human readable anyways, this function has been removed. Change-Id: If2154c2ed9a7217ed4b7587adeae87df55ff03dc Reviewed-on: http://gerrit.cloudera.org:8080/21208 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-28 06:58:56 +00:00
Zoltan Borok-Nagy	c8d56425f8	IMPALA-12942: deflake test_virtual_column_file_position_generic Sometimes the runtime filters don't arrive in time in test test_virtual_column_file_position_generic. This patch increases RUNTIME_FILTER_WAIT_TIME_MS to 30 seconds. Change-Id: I4d7a23389a2dcdd92602c2de22a2fc8f09aa618c Reviewed-on: http://gerrit.cloudera.org:8080/21209 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>	2024-03-27 17:24:06 +00:00
Zoltan Borok-Nagy	580a477e69	IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table The following query throws an error for Iceberg tables: select * from ice_tbl where rand() < 0.001; It's because the predicate 'rand() < 0.001' doesn't involve any table columns. Because of a bug in IcebergScanPlanner.hasPartitionTransformType() the method throws an IndexOutOfBoundsException. This patch fixes the method to handle such predicates. Testing: * added e2e tests Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342 Reviewed-on: http://gerrit.cloudera.org:8080/21179 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>	2024-03-27 09:25:36 +00:00
Sai Hemanth Gantasala	52b11ab6aa	IMPALA-12487: Skip reloading file metadata for ALTER_TABLE events with trivial changes in StorageDescriptor IMPALA-11534 skips reloading file metadata for some trivial ALTER_TABLE events. However, ALTER_TABLE events that have trivial changes in StorageDescriptor are not handled in IMPALA-11534. The only changes that require file metadata reload are: location, rowformat, fileformat, serde, and storedAsSubDirectories. The file metadata reload can be skipped for all other changes in SD. Testing: 1) Manual testing by changing SD parameters in local environment. 2) Added unit tests for the same in MetastoreEventsProcessorTest class. Change-Id: I6fd9a9504bf93d2529dc7accbf436ad83e51d8ac Reviewed-on: http://gerrit.cloudera.org:8080/21019 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-27 08:27:29 +00:00
wzhou-code	0a077fe992	IMPALA-12928: Mask JDBC table property dbcp.password for DESC FORMATTED and SHOW CREATE TABLE 'desc formatted' and 'show create table' commands show all of table properties in clear text. For external JDBC table, dbcp.password table property value should be masked in the output of these two commands. This patch makes dbcp.password property value been masked in the output of 'desc formatted' and 'show create table' commands. dbcp.password table property could be wrote into Impala and HMS log files with JDBC table creation statements. There is generic tool in production environment with which user could set up the regular expressions to detect and redact sensitive information within SQL statement text in log files. Testing: - Added end-to-end test cases. - Passed core tests. Change-Id: I83dc32c8d0fec1cdfdfe06e720561b2ae1adf5df Reviewed-on: http://gerrit.cloudera.org:8080/21187 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-27 05:09:04 +00:00
jasonmfehr	5835c9b994	IMPALA-12913: Refactor Workload Management Custom Cluster Tests The custom cluster tests that assert the workload management functionality to insert completed queries into the impala_query_log table were inefficient because they created their own database tables and added data to those tables. This patch updates these tests to use the existing tables in the functional database where possible. The few tests that need their own tables now have those tables set up in a database created by the pytest unique_database fixture instead of using the default database. A new table has also been added to the functional database. This table is named zipcode_timezones and contains two columns, the first having a few zipcodes and the second having their corresponding timezone. This table can be used to join the zipcode_incomes and alltimezones tables. This table is populated by a new csv file in the testdata directory. Change-Id: I1e3249a8f306cf43de0d6f6586711c779399e83b Reviewed-on: http://gerrit.cloudera.org:8080/21153 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-27 04:46:37 +00:00
stiga-huang	9dcd136df1	IMPALA-12699: Set timeout for catalog RPCs We have seen trivial GetPartialCatalogObject RPCs hanging in coordinator side, e.g. IMPALA-11409. Due to the piggyback mechanism of fetching metadata in local-catalog mode (see comments in CatalogdMetaProvider#loadWithCaching()), a hanging RPC on shared metadata (e.g. db/table list) could block other queries on the same coordinator. Such lightweight requests don't need to acquire table lock or trigger table loading in catalogd. The causes of the hanging are usually network issues, e.g. TCP connection become half open due to TCP retransmissions timed out. A retry on the RPC helps to recover from such failures. Currently, the timeout for catalog RPC is set to 0 by default. This prevent the retry and let the client to wait infinitely. This patch distinguishes the lightweight catalog RPCs and uses a dedicated catalogd client cache for them. They use a timeout of 30 mins which is longer enough to tolerate TCP retransmission timeouts. Also sets a timeout of 10 hours for other catalog RPCs. Operations take longer than that are usually abnormal and hanging. Tests - Add e2e test to verify the lightweight RPC client cache is used. - Adjust TestRestart.test_catalog_connection_retries to use local catalog mode since in the legacy catalog mode, coordinator only sends PrioritizeLoad requests which are lightweight RPCs. This is a continuation of patch by Wenzhe Zhou <wzhou@cloudera.com> Change-Id: Iad39a79d0c89f2b04380f610a7e60558429e9c6e Reviewed-on: http://gerrit.cloudera.org:8080/21146 Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-26 05:54:52 +00:00
Joe McDonnell	88dcdfd466	IMPALA-12807: Add support for mold linker This adds support for using the mold linker. It changes the existing USE_GOLD_LINKER environment variable to IMPALA_LINKER, which accepts ld, gold, or mold as values. It defaults to 'gold' to match current behavior. Developers can override it in bin/impala-config-local.sh. Clang does not implement -gz properly until version 12. It does not enable compressed debuginfo in the final binary. IMPALA_LINKER=mold doesn't work with IMPALA_COMPRESSED_DEBUG_INFO=true on Clang due to this. This detects Clang <12 and skips -gz as it is ineffective. Mold follows similar to behavior to LLD and requires --exclude-libs to use the full library name (i.e. liblz4.a rather than liblz4). Gold will happily accept the full library name, so this changes to use the full library name. Mold is much faster for incremental builds on my system: (e.g. touch be/src/scheduling/scheduler.cc && make -j8 impalad) gold: 15.8s mold: 2.6s Testing: - Ran builds with IMPALA_LINKER=mold on Centos 7, Redhat 8, and Ubuntu 20. Change-Id: Ia9e9accd06b6ecd182d200d81afaae09a885c241 Reviewed-on: http://gerrit.cloudera.org:8080/21121 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Andrew Sherman <asherman@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-25 22:53:34 +00:00
stiga-huang	0e8a9077b1	IMPALA-12915: Use libgtest.so when built with shared libs When building Impala with shared libraries, we currently link against the static library of GTest in both libkudu_test_util.so and unifiedbetests. libkudu_test_util.so is also linked by unifiedbetests dynamically. When unifiedbetests exits, both binaries try to delete the global variables of GTest, which leads to a double-free memory issue. To fix the double-free memory issue, instead of statically linking libgtest.a, all binaries should dynamically link libgtest.so. So when the process exits, only libgtest.so will delete its global variables. Tests - Verified the issue resolved locally Change-Id: I27d21217db219f52b072a4e5cfa1caaace35d1a2 Reviewed-on: http://gerrit.cloudera.org:8080/21163 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-25 15:02:45 +00:00
Zoltan Borok-Nagy	23c1f0d4e1	IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala Impala generates segmentation fault when it queries the virtual column FILE__POSITION for TEXT or JSON tables. When the scanners that do not support the FILE__POSITION virtual column detect its presence they try to report an error and close themselves. The segfault is in the scanners' Close() method when they try to dereference a NULL stream object. This patch simply adds NULL-checks in Close(). Alternatively we could detect the presence of FILE__POSITION during planning in the HdfsScanNode, but doing it in the scanners lets us handle more queries, e.g. queries that dynamically prune partitions and the surviving partitions all have file formats that support FILE__POSITION. Testing: * added negative tests to properly report the errors * added tests for mixed file format tables Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0 Reviewed-on: http://gerrit.cloudera.org:8080/21148 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>	2024-03-25 13:55:11 +00:00
wzhou-code	c0507c02cd	IMPALA-12896 (Part 2): JDBC table must be created as external table In some of the deployment environments, default table type is transactional. In these scenarios, JDBC tables which are created as non external table are not accepted by HMS due to strict managed table check failures. This patch forces JDBC tables to be created as external table, and requires at least 1 column for JDBC tables. Testing: - Updated frontend unit tests and end-to-end unit tests to create JDBC tables as external tables. - Passed core tests Change-Id: Ib5533b52434cdf1c430e30ac28a0146ab4d9d4b9 Reviewed-on: http://gerrit.cloudera.org:8080/21159 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-23 09:54:30 +00:00
Riza Suminto	266d7ec51c	IMPALA-4545: Simplify test dimension in test_decimal_casting.py This patch splits precision and scale as independent dimensions and then constrains them to yield a valid decimal type. With this split, core exploration will have the same test dimension as pairwise exploration, while exhaustive exploration still permutes all possible decimal types. Also did minor refactoring to reduce test skipping and pass flake8. After this patch, core exploration has 214 test items and exhaustive exploration has 12312 test items. Before, they were 408 and 12464 respectively. Testing: - Pass test_decimal_casting.py in core and exhaustive exploration. Change-Id: Ibe269e08a955097ad9e924d5d64b42438ad15be2 Reviewed-on: http://gerrit.cloudera.org:8080/21174 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-23 02:34:50 +00:00
Riza Suminto	39f5ec777b	IMPALA-12898: Tidy up test dimensions of test_scanner.py This patch tidies up the test dimensions of test_scanner.py. 'exec_option' initialization is moved to add_test_dimensions() method as much as possible. It ensures correct permutation and execution of test cases. After this patch, the total collected tests of test_scanner.py is 1242 for core/pairwise exploration and 7514 for exhaustive exploration. Before, they were 794 and 11864 accordingly. The increase in test count after refactoring with core exploration is because exec option dimensions are now permuted correctly along with other default exec option dimensions. The reduction in exhaustive exploration is due to a reduction in the overall dimension to permute and a reduction in test skipping (the test was run, but only called pytest.skip()). Testing: - Pass query_test/test_scanners.py in exhaustive exploration. Change-Id: I5efd2b483338fb55b958d8e1a0acf6b365f8093e Reviewed-on: http://gerrit.cloudera.org:8080/21162 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-22 18:18:45 +00:00
stiga-huang	0d49c9d6cc	IMPALA-12929: Skip loading HDFS permissions in local-catalog mode HDFS file/dir permissions are not used at all in local catalog mode - in LocalFsTable, hasWriteAccessToBaseDir() always returns true and getFirstLocationWithoutWriteAccess() always returns null. However, in catalogd, we still load them (in single thread for a table!) which could dominant the table loading time when there are lots of partitions. Note that the table loading process in catalogd is the same no matter what catalog mode is in used. The difference between catalog modes is mainly in how coordinators get metadata from catalogd. Local catalog mode is turned on by setting --catalog_topic_mode=minimal on catalogd and --use_local_catalog=true on coordinators. This patch skips loading HDFS permissions on catalogd when running in local catalog mode. We can revisit it in IMPALA-7539. Tests: - Ran CORE tests Change-Id: I5baa9f6ab0d3888a78ff161ae5caa19e85bc983a Reviewed-on: http://gerrit.cloudera.org:8080/21178 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-22 14:17:58 +00:00
wzhou-code	74b6df7997	IMPALA-12930: Fix TestExtDataSources.test_jdbc_data_source failure The patch of IMPALA-12802 added some negative test cases for altering external JDBC table. These test cases verify the error messages. One of test cases failed on some test environments due to different error message returned from Postgres server. This patch fixes the unit-test failure by checking if the error message is matching with one of two possible error messages. Testing: - Ran the unit-test on Jenkins with centos and ubuntu and verified the unit-test passed for different error messages. Change-Id: I84566f67751538d72a4d17da21e7ea907e1dcdd2 Reviewed-on: http://gerrit.cloudera.org:8080/21181 Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-22 01:11:26 +00:00

1 2 3 4 5 ...

11340 Commits