impala

mirror of https://github.com/apache/impala.git synced 2025-12-21 19:08:12 -05:00

Author	SHA1	Message	Date
Csaba Ringhofer	f98b697c7b	IMPALA-13929: Make 'functional-query' the default workload in tests This change adds get_workload() to ImpalaTestSuite and removes it from all test suites that already returned 'functional-query'. get_workload() is also removed from CustomClusterTestSuite which used to return 'tpch'. All other changes besides impala_test_suite.py and custom_cluster_test_suite.py are just mass removals of get_workload() functions. The behavior is only changed in custom cluster tests that didn't override get_workload(). By returning 'functional-query' instead of 'tpch', exploration_strategy() will no longer return 'core' in 'exhaustive' test runs. See IMPALA-3947 on why workload affected exploration_strategy. An example for affected test is TestCatalogHMSFailures which was skipped both in core and exhaustive runs before this change. get_workload() functions that return a different workload than 'functional-query' are not changed - it is possible that some of these also don't handle exploration_strategy() as expected, but individually checking these tests is out of scope in this patch. Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115 Reviewed-on: http://gerrit.cloudera.org:8080/22726 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-08 07:12:55 +00:00
Eyizoha	2f06a7b052	IMPALA-10798: Initial support for reading JSON files Prototype of HdfsJsonScanner implemented based on rapidjson, which supports scanning data from splitting json files. The scanning of JSON data is mainly completed by two parts working together. The first part is the JsonParser responsible for parsing the JSON object, which is implemented based on the SAX-style API of rapidjson. It reads data from the char stream, parses it, and calls the corresponding callback function when encountering the corresponding JSON element. See the comments of the JsonParser class for more details. The other part is the HdfsJsonScanner, which inherits from HdfsScanner and provides callback functions for the JsonParser. The callback functions are responsible for providing data buffers to the Parser and converting and materializing the Parser's parsing results into RowBatch. It should be noted that the parser returns numeric values as strings to the scanner. The scanner uses the TextConverter class to convert the strings to the desired types, similar to how the HdfsTextScanner works. This is an advantage compared to using number value provided by rapidjson directly, as it eliminates concerns about inconsistencies in converting decimals (e.g. losing precision). Added a startup flag, enable_json_scanner, to be able to disable this feature if we hit critical bugs in production. Limitations - Multiline json objects are not fully supported yet. It is ok when each file has only one scan range. However, when a file has multiple scan ranges, there is a small probability of incomplete scanning of multiline JSON objects that span ScanRange boundaries (in such cases, parsing errors may be reported). For more details, please refer to the comments in the 'multiline_json.test'. - Compressed JSON files are not supported yet. - Complex types are not supported yet. Tests - Most of the existing end-to-end tests can run on JSON format. - Add TestQueriesJsonTables in test_queries.py for testing multiline, malformed, and overflow in JSON. Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569 Reviewed-on: http://gerrit.cloudera.org:8080/19699 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-09-05 16:55:41 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Michael Smith	f8443d9828	IMPALA-11697: Enable SkipIf.not_hdfs tests for Ozone Convert SkipIf.not_hdfs to SkipIf.not_dfs for tests that require filesystem semantics, adding more feature test coverage with Ozone. Creates a separate not_scratch_fs flag for scratch dir tests as they're not supported with Ozone yet. Filed IMPALA-11730 to address this. Preserves not_hdfs for a specific test that uses the dfsadmin CLI to put it in safemode. Adds sfs_ofs_unsupported for SmallFileSystem tests. This should work for many of our filesystems based on `ebb1e2fa99/ql/src/java/org/apache/hadoop/hive/ql/io/SingleFileSystem.java (L62-L87)`. Makes sfs tests work on S3. Adds hardcoded_uris for IcebergV2 tests where deletes are implemented as hardcoded URIs in parquet files. Adding a parquet read/write library for Python is beyond the scope if this patch. Change-Id: Iafc1dac52d013e74a459fdc4336c26891a256ef1 Reviewed-on: http://gerrit.cloudera.org:8080/19254 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-11-21 18:51:30 +00:00
Gergely Fürnstáhl	bb4903aeb0	IMPALA-10748: Remove enable_orc_scanner flag Impala supports reading ORC files by default for quite some time. Removed enable_orc_scanner flag and related code and test, disabling ORC support is no longer possible. Removed notes on how to disable ORC support from docs. Change-Id: I7ff640afb98cbe3aa46bf03f9bff782574c998a5 Reviewed-on: http://gerrit.cloudera.org:8080/18188 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-02-03 03:13:41 +00:00
stiga-huang	ff52064823	IMPALA-7844: Default to not allow ordinals in HAVING clause Base on the discussion of the previous patch, we decide to not allow ordinals in the HAVING clause since 4.0. It's a non-standard feature that unintentionally supported by Impala 3.x and earlier versions. This patch disables it by default, and add a feature flag to turn it on for users that do depend on it. Tests: - Modify existing FE tests to test on the flag. - Add custom cluster test to verify the flag works. Change-Id: I0a57b8b65b046fae483e485e8391f8222fa586a5 Reviewed-on: http://gerrit.cloudera.org:8080/17415 Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-05-10 15:17:35 +00:00
stiga-huang	3ba8d637cd	IMPALA-10256: Skip test_disable_incremental_metadata_updates on S3 tests IMPALA-10113 adds a test for disabling the incremental_metadata_updates flag to verify the metadata propagation still working correctly. The test invokes two test files which is used in metadata/test_ddl.py. One test file is about hdfs caching. It should only be run on HDFS file system. So we should mark the test with "SkipIf.not_hdfs". Tests: - Run CORE test on S3 build. Change-Id: I0b922de84cff0a1e0771d5a8470bdd9f153f85f0 Reviewed-on: http://gerrit.cloudera.org:8080/16616 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-24 06:10:52 +00:00
stiga-huang	308d692a1b	IMPALA-10113: Add feature flag for incremental metadata updates This patch adds a feature flag, enable_incremental_metadata_updates, to turn off incremental metadata (i.e. partition level metadata) propagation from catalogd to coordinators. It defaults to true. When setting to false, catalogd will send metadata updates in table granularity (the legacy behavior). Also fixes a bug of logging an empty aggregated partition update log when no partitions are changed in a DDL. Tests: - Run CORE tests with this flag set to true and false. - Add tests with enable_incremental_metadata_updates=false. Change-Id: I98676fc8ca886f3d9f550f9b96fa6d6bff178ebb Reviewed-on: http://gerrit.cloudera.org:8080/16436 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-14 14:15:53 +00:00
Tim Armstrong	0e1f304e6b	IMPALA-7792: fix disabling of ORC scans The previous approach could lead to hangs or cryptic error messages because it removed the ORC data type from a lookup table. Instead check explicitly in the planner for ORC scans and throw a more helpful error message. Testing: Added custom cluster test to exercise code and check error message. Change-Id: I209e79b18745c48d0182800a916d6566083f4609 Reviewed-on: http://gerrit.cloudera.org:8080/11835 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-01 07:30:58 +00:00

9 Commits