impala

mirror of https://github.com/apache/impala.git synced 2025-12-25 02:03:09 -05:00

Author	SHA1	Message	Date
Joe McDonnell	2357958e73	IMPALA-10304: Fix log level and format for pytests Recent testing showed that the pytests are not respecting the log level and format set in conftest.py's configure_logging(). It is using the default log level of WARNING and the default formatter. The issue is that logging.basicConfig() is only effective the first time it is called. The code in lib/python/impala_py_lib/helpers.py does a call to logging.basicConfig() at the global level, and conftest.py imports that file. This renders the call in configure_logging() ineffective. To avoid this type of confusion, logging.basicConfig() should only be called from the main() functions for libraries. This removes the call in lib/python/impala_py_lib (as it is not needed for a library without a main function). It also fixes up various other locations to move the logging.basicConfig() call to the main() function. Testing: - Ran the end to end tests and custom cluster tests - Confirmed the logging format - Added an assert in configure_logging() to test that the INFO log level is applied to the root logger. Change-Id: I5d91b7f910b3606c50bcba4579179a0bc8c20588 Reviewed-on: http://gerrit.cloudera.org:8080/16679 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-30 15:32:21 +00:00
skyyws	3e06d600c2	IMPALA-10166 (part 1): ALTER TABLE for Iceberg tables This patch mainly implements ALTER TABLE for Iceberg tables, we currently support these statements: * ADD COLUMNS * RENAME TABLE * SET TBL_PROPERTIES * SET OWNER We forbid DROP COLUMN/REPLACE COLUMNS/ALTER COLUMN in this patch, since these statemens may make Iceberg tables unreadable. We may support column resolution by field id in the near future, after that, we will support COLUMN/REPLACE COLUMNS/ALTER COLUMN for Iceberg tables. Here something we still need to pay attention: 1.RENAME TABLE is not supported for HadoopCatalog/HadoopTables, even if we already implement 'RENAME TABLE' statement, so we only rename the table in the Hive Metastore for external table. 2.We cannot ADD/DROP PARTITION now since there is no API for that in Iceberg, but related work is already in progess in Iceberg. Testing: - Iceberg table alter test in test_iceberg.py - Iceberg table negative test in test_scanners.py - Rename tables in iceberg-negative.test Change-Id: I5104cc47c7b42dacdb52983f503cd263135d6bfc Reviewed-on: http://gerrit.cloudera.org:8080/16606 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-30 14:03:29 +00:00
Qifan Chen	d164bf42f9	IMPALA-10294: Improvement to test_skew_reporting_in_runtime_profile This fix improved the skew reporting test by lowering the threshold to 0 and by taking care of the extreme case of no skews. Testing: 1. Unit testing. Change-Id: I7a36551f2507d724891707d26b7394fbe3a5657b Reviewed-on: http://gerrit.cloudera.org:8080/16662 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-30 05:34:16 +00:00
wzhou-code	f4ed07c8eb	IMPALA-10206: Avoid MD5 Digest Authorization in FIPS approved mode To compliant with FIPS requirement, we should use OpenSSL libraries for cryptographic hash functions, instead of own hash functions. This patch replace MD5 and SHA1 functions in Squeasel Web server with OpenSSL APIs. It also force to turn off Digest Authorization for Web server in FIPS approved mode since Digest Authorization use MD5 hash and it doesn't comply with FIPS 140-2. Testing: - Passed webserver-test. - Passed exhaustive tests. - Manually verified HTTP Digest Authorization could not be enabled by setting webserver_password_file on a FIPS enabled cluster. Change-Id: Ie075389b3ab65c612d64ba58e16a10b19bdf4d6f Reviewed-on: http://gerrit.cloudera.org:8080/16630 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-30 04:33:10 +00:00
Joe McDonnell	047906bd6b	IMPALA-10302: Enable logging at the INFO level for test_scanners_fuzz.py This changes test_scanners_fuzz.py to set the logging level to INFO. By default, it is WARNING, so it was missing some useful INFO log messages like the random seed used. This also fixes formatting on one of the log lines. Testing: - Ran test_scanners_fuzz.py locally and checked to make sure the INFO messages were present Change-Id: Ida4a9cbed6572520998def9618a8b4189c1ba799 Reviewed-on: http://gerrit.cloudera.org:8080/16677 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-30 04:31:30 +00:00
Thomas Tauber-Marshall	01e1b4df80	IMPALA-10303: Fix warnings from impala-shell with --quiet When the --quiet flag is used with impala-shell, the intention is that if the query is successful then only the query results should be printed. This patch fixes two cases where --quiet was not being respected: - When using the HTTP transport and --client_connect_timeout_ms is set, a warning is printed that the timeout is not applied. - When running in non-interactive mode, a warning is printed that --live_progress is automatically disabled. This warning is now also only printed if --live_progress is actually set. Testing: - Added a test that runs a simple query with --quiet and confirms the output is as expected. Change-Id: I1e94c9445ffba159725bacd6f6bc36f7c91b88fe Reviewed-on: http://gerrit.cloudera.org:8080/16673 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-30 02:17:29 +00:00
wzhou-code	1682afcda6	IMPALA-10298: Change column mask hash as SHA512 in FIPS mode Column masking API is called by Ranger during policy evaluation. Ranger team requires to change the column mask hash as SHA-512 in FIPS mode without changing API. This patch changes the MaskFunctions::MaskHash() for string type to use SHA-512 in FIPS mode. Testing: - Passed exhaustive tests. - Manually test the API. Change-Id: I422d4b11b31c3e6eb7963260a1da730579c4ca74 Reviewed-on: http://gerrit.cloudera.org:8080/16671 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-30 01:16:07 +00:00
Fucun Chu	193c2e773f	IMPALA-10132: Implement ds_hll_estimate_bounds_as_string() function. This function receives a string that is a serialized Apache DataSketches HLL sketch and optional kappa that is a number of standard deviations from the mean: 1, 2 or 3 (default 2). Returns estimate and bounds with the values separated with commas. The result is three values: estimate, lower bound and upper bound. ds_hll_estimate_bounds_as_string(sketch [, kappa]) Kappa: 1 represent the 68.3% confidence bounds 2 represent the 95.4% confidence bounds 3 represent the 99.7% confidence bounds Note, ds_hll_estimate_bounds() should return an Array of doubles as the result but with that we have to wait for the complex type support. Until, we provide ds_hll_estimate_bounds_as_string() that can be deprecated once we have array support. Tracking Jira for returning complex types from functions is IMPALA-9520. Example: select ds_hll_estimate_bounds_as_string(ds_hll_sketch(int_col)) from functional_parquet.alltypestiny; +----------------------------------------------------------+ \| ds_hll_estimate_bounds_as_string(ds_hll_sketch(int_col)) \| +----------------------------------------------------------+ \| 2,2,2.0002 \| +----------------------------------------------------------+ Change-Id: I46bf8263e8fd3877a087b9cb6f0d1a2392bb9153 Reviewed-on: http://gerrit.cloudera.org:8080/16626 Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-29 17:45:01 +00:00
Qifan Chen	1bd27a3ea8	IMPALA-7097 Print EC info in the query plan and profile This fix added the functionality to show the number of erasure coded files and the total size of such files in the scan node in the query plan and profile. Shown below are two examples for the HDFS file system. Non-partitioned table: 00:SCAN HDFS [default.test_show_ec_nonpart, RANDOM] HDFS partitions=1/1 files=2 size=1.65KB erasure coded: files=2 size=1.65KB stored statistics: Partitioned table: 00:SCAN HDFS [default.test_show_ec_part] HDFS partitions=4/4 files=4 size=2.36KB erasure coded: files=3 size=1.77KB row-size=12B cardinality=999 Testing: 1. Unit testing; 2. Ran Core tests successfully. Change-Id: I6ea378914624a714fde820d290b3b9c43325c6a1 Reviewed-on: http://gerrit.cloudera.org:8080/16587 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-29 10:35:27 +00:00
Yida Wu	d62a04078d	IMPALA-10102 Fix Impalad crashes when writing a parquet file with large rows. The crash happens when trying to dereference a null pointer returned from a failed memory allocation from memory pool. TryAllocate is used instead of Allocate and null check is added for the large memory allocations such as buffer for dictionary page and compressed dictionary page. The memory allocation is most likely to fail for these large allocations when memory is scarce. This change fixes the crash in this particular code path, however in practice, there could still be an OOM issue which could lead to the process getting killed by the OS. The change doesn't fix the OOM issue, users need to configure the mem_limit (start-up option) properly to avoid the OOM crash. Test: Ran a script to redo the test mentioned in the Jira for thirty times, no crash happens. Change-Id: I0dee474cceb0c370278d290eb900c05769b23dec Reviewed-on: http://gerrit.cloudera.org:8080/16638 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-29 01:45:11 +00:00
Qifan Chen	09fdeb56f8	IMPALA-10267: Impala crashes in HdfsScanner::WriteTemplateTuples() with negative num_tuples This fix enhances method HdfsAvroScanner::ProcessRange() with a DCHECK() to help catch negative num_tuples. The DCHEK() specifically checks that num_records_in_block_ is always greater than record_pos_. Testing: 1. Unit testing. Change-Id: If88fd3aa4c96a69e37d060031f7432d27d069c62 Reviewed-on: http://gerrit.cloudera.org:8080/16672 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2020-10-29 00:49:54 +00:00
Vihang Karajgaonkar	bd4a38ea33	IMPALA-10277: Fix test_catalogd_debug_actions on S3 builds test_catalogd_debug_actions fails on S3 builds because the time taken to load a table on S3 is faster than on HDFS. The fix changes the test to reduce the expected delay when the debug actions are set so that test works on S3 builds. Testing: 1. Ran the test on the s3 build jenkins job. 2. Ran the test on HDFS build jenkins job. Change-Id: I8348a33ff8e9c7812540e014f4de2c65636da64f Reviewed-on: http://gerrit.cloudera.org:8080/16664 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-28 05:40:51 +00:00
stiga-huang	8ac382c784	IMPALA-10075: Reuse unchanged partition instances Currently, we always update the partition instance when we reload a partition. If a partition remains the same after reloading, we should reuse the old partition instance. So we won't send redundant updates on these partitions. This reduces the size of the catalog topic update. When a huge table is REFRESHed, catalogd only propagates the changed partitions. Tests: - Add tests to verify that partition instances are reused after some DDL/DMLs. Change-Id: I2dd645c260d271291021e52fdac4b74924df1170 Reviewed-on: http://gerrit.cloudera.org:8080/16392 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-27 04:47:13 +00:00
Zoltan Borok-Nagy	981ef10465	IMPALA-10215: Implement INSERT INTO for non-partitioned Iceberg tables (Parquet) This commit adds support for INSERT INTO statements against Iceberg tables when the table is non-partitioned and the underlying file format is Parquet. We still use Impala's HdfsParquetTableWriter to write the data files, though they needed some modifications to conform to the Iceberg spec, namely: * write Iceberg/Parquet 'field_id' for the columns * TIMESTAMPs are encoded as INT64 micros (without time zone) We use DmlExecState to transfer information from the table sink operators to the coordinator, then updateCatalog() invokes the AppendFiles API to add files atomically. DmlExecState is encoded in protobuf, communication with the Frontend uses Thrift. Therefore to avoid defining Iceberg DataFile multiple times they are stored in FlatBuffers. The commit also does some corrections on Impala type <-> Iceberg type mapping: * Impala TIMESTAMP is Iceberg TIMESTAMP (without time zone) * Impala CHAR is Iceberg FIXED Testing: * Added INSERT tests to iceberg-insert.test * Added negative tests to iceberg-negative.test * I also did some manual testing with Spark. Spark is able to read Iceberg tables written by Impala until we use TIMESTAMPs. In that case Spark rejects the data files because it only accepts TIMESTAMPS with time zone. * Added concurrent INSERT tests to test_insert_stress.py Change-Id: I5690fb6c2cc51f0033fa26caf8597c80a11bcd8e Reviewed-on: http://gerrit.cloudera.org:8080/16545 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-26 20:01:09 +00:00
stiga-huang	3ba8d637cd	IMPALA-10256: Skip test_disable_incremental_metadata_updates on S3 tests IMPALA-10113 adds a test for disabling the incremental_metadata_updates flag to verify the metadata propagation still working correctly. The test invokes two test files which is used in metadata/test_ddl.py. One test file is about hdfs caching. It should only be run on HDFS file system. So we should mark the test with "SkipIf.not_hdfs". Tests: - Run CORE test on S3 build. Change-Id: I0b922de84cff0a1e0771d5a8470bdd9f153f85f0 Reviewed-on: http://gerrit.cloudera.org:8080/16616 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-24 06:10:52 +00:00
stiga-huang	2a1d3acaf1	IMPALA-9870: impala-shell 'summary' to show original and retried queries This patch extends the 'summary' command of impala-shell to support retrieving the summary of the original query attempt. The new syntax is SUMMARY [ALL \| LATEST \| ORIGINAL] If 'ALL' is specified, both the latest and original summaries are printed. If 'LATEST' is specified, only the summary of the latest query attempt is printed. If 'ORIGINAL' is specified, only the summary of the original query attempt is printed. The default option is 'LATEST'. Support for this has only been added to HS2 given that Beeswax is being deprecated soon. Tests: - Add new tests in test_shell_interactive.py Change-Id: I8605dd0eb2d3a2f64f154afb6c2fd34251c1fec2 Reviewed-on: http://gerrit.cloudera.org:8080/16502 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-24 05:11:06 +00:00
Bikramjeet Vig	fd1ea147b3	IMPALA-10210: Skip Authentication for connection from a trusted domain Adds the ability to skip authentication for connection requests originating from a trusted domain over the hs2 http endpoint and the http webserver endpoint. The trusted domain can be specified using the newly added "--trusted_domain" startup flag. Additionally, if the startup flag "--trusted_domain_use_xff_header" is set to true, impala will switch to using the 'X-Forwarded-For' HTML header to extract the origin address while attempting to check if the connection originated from a trusted domain. Other highlights: - This still requires the client to specify a username via a basic auth header. - To avoid looking up hostname for every http request, a cookie is returned on the first auth attempt which will then be subsequently used for further communication on the same connection. Testing: Added tests for both the hs2 http endpoint and the webserver http endpoint Change-Id: I09234078e2314dbc3177d0e869ae028e216ca699 Reviewed-on: http://gerrit.cloudera.org:8080/16542 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-24 02:30:51 +00:00
Riza Suminto	4b5c66f329	IMPALA-10266: Identify FileSystem type based on the protocol scheme. Frontend identifies the type of FileSystem in two ways. The first is done using the instanceof operator with subclasses of org.apache.hadoop.fs.FileSystem. The second is by checking the FileSystem protocol scheme. This patch standardizes the FileSystem identification based on the scheme only. Testing: - Add several tests in FileSystemUtilTest to check validity of some FileSystemUtil functions. - Run and pass core tests. Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b Reviewed-on: http://gerrit.cloudera.org:8080/16628 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-24 01:14:37 +00:00
Joe McDonnell	cfa8a7a5e5	IMPALA-10278: Use full libraries for impalad_executor Docker container This backs out the piece of IMPALA-10016 that used a pared-down set of libraries for the impalad_executor. That pared-down set was missing org.apache.impala.common.JniUtil, which prevented the impalad_executor container from starting up. Testing: - Ran a docker core job with one coord_exec and two executors and it was able to startup where it wouldn't before Change-Id: Ieecca61cd3c11f446b922a04fdeb5fd0c90fc971 Reviewed-on: http://gerrit.cloudera.org:8080/16640 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-23 21:20:44 +00:00
Andrew Sherman	4159a8085c	IMPALA-10244: Make non-scalable failures to dequeue observable. One of the important ways to observe Impala throughput is by looking at when queries are queued. This can be an indication that more resources should be added to the cluster by adding more executor groups. This is only a good strategy if adding more resources will help with the current workload. In some situations the head of the query queue cannot be executed because of resource constraints on the coordinator. In these cases the coordinator is the bottleneck so adding more executor groups will not help. This change is to make these cases observable by adding a new counter which is incremented when a dequeue fails because of resource constraints on the coordinator. The two cases that cause the counter to be incremented are: - when there are not enough admission control slots on the coordinator - when there is not enough memory on the coordinator but it is possible that other conditions may be added in future. TESTING: Added new unit tests. Ran all end-to-end tests. Change-Id: I3456396ac139c562ad9cd3ac1a624d8f35487518 Reviewed-on: http://gerrit.cloudera.org:8080/16613 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-23 06:35:54 +00:00
Bikramjeet Vig	d459b434b6	IMPALA-10245: Disable test_kudu_scanner when run with erasure coding Test disabled for EC since the erasure coded files when loaded in kudu during data load cause the expected behavior to change. Confirmed by loading data without EC and turning on EC when running the test. Change-Id: Ia383c1a788c6c0c66e2ef7c6494fe5fe643956df Reviewed-on: http://gerrit.cloudera.org:8080/16623 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-23 06:11:04 +00:00
Fang-Yu Rao	eda06f41ce	IMPALA-9990: Support SET OWNER for Kudu tables KUDU-3090 adds the support for table ownership and exposes the API's of setting owner on creating and altering tables, which allows Impala to also pass to Kudu the new owner of the Kudu table for the ALTER TABLE SET OWNER statement. Specifically, based on the API of AlterTableOptions#setOwner(), this patch stores the ownership information of the Kudu table in the corresponding instance of AlterTableOptions, which will then be passed to Kudu via a KuduClient. Testing: - Added a FE test in AnalyzeKuduDDLTest.java to verify the statement could be correctly analyzed. - Added an E2E test in kudu_alter.test to verify the statement could be correctly executed when the integration between Kudu and HMS is not enabled. - Added an E2E test in kudu_hms_alter.test and verified that the statement could be correctly executed when the integration between Kudu and HMS is enabled after manually re-enabling TestKuduHMSIntegration::test_kudu_alter_table(). Note that this was not possible before IMPALA-10092 was resolved due to a bug in the class of CustomClusterTestSuite. In addition, we may need to delete the Kudu table 'simple' via a Kudu-Python client if the E2E test complains that the Kudu table already exists, which may be related to IMPALA-8751. - Manually verified that the views of Kudu server and HMS are consistent for a synchronized Kudu table after the ALTER TABLE SET OWNER statement even though the Kudu table was once an external and non-synchronized table, meaning that the owner from Kudu's perspective could be different than that from HMS' perspective. Such a discrepancy could be created if we execute the ALTER TABLE SET OWNER statement for an external Kudu table with the property of 'external.table.purge' being false. The test is performed manually because currently the Kudu-Python client adopted in Impala's E2E tests is not up to date so that the field of 'owner' cannot be accessed in the E2E tests. On the other hand, to verify the owner of a Kudu table from Kudu's perspective, we used the latest Kudu-Python client as provided at github.com/apache/kudu/tree/master/examples/python/basic-python-example. - Verified that the patch could pass the exhaustive tests in the DEBUG mode. Change-Id: I29d641efc8db314964bc5ee9828a86d4a44ae95c Reviewed-on: http://gerrit.cloudera.org:8080/16273 Reviewed-by: Vihang Karajgaonkar <vihang@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-23 04:01:26 +00:00
Tim Armstrong	227e43f481	IMPALA-10216: add logging to help debug flaky test This commit adds additional info to the assertions to help debug it if it reoccurs. Change-Id: I09984dd3cea686808115ca4cb8c88d24271d8cc1 Reviewed-on: http://gerrit.cloudera.org:8080/16620 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-22 23:44:33 +00:00
Qifan Chen	61a020d0f8	IMPALA-10007: Impala development environment does not support Ubuntu 20.04 This is a minor amendment to a previously merged change with ChangeId I4f592f60881fd8f34e2bf393a76f5a921505010a, to address additional review comments. In particular, the original commit referred to Ubuntu 20.4 whereas it should have used Ubuntu 20.04. Change-Id: I7db302b4f1d57ec9aa2100d7589d5e814db75947 Reviewed-on: http://gerrit.cloudera.org:8080/16241 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-22 05:03:32 +00:00
Vihang Karajgaonkar	15c3b13e97	IMPALA-10219: Expose DEBUG_ACTION query option in catalog This patches enables DEBUG_ACTION in the catalog service's java code. Specifically, DEBUG_ACTION query option is now exposed to TResetMetadataRequest and TExecDdlRequest so that we can inject delays while executing refresh or ddl statements. For example, 1. To inject a delay of 100ms per HDFS list operation during refresh statement set the following query option: set debug_action=catalogd_refresh_hdfs_listing_delay:SLEEP@100; 2. To inject a delay of 100ms in alter table recover partitions statement: set debug_action=catalogd_table_recover_delay:SLEEP@100; 3. To inject a delay of 100ms in compute stats statement set debug_action=catalogd_update_stats_delay:SLEEP@100; Note that this option only adds the delay during the update_stats phase of the compute stats execution. Testing: 1. Added a test which sets the query option and makes sure that command takes more time than without query option. 2. Added unit tests for the debugAction implementation logic. Change-Id: Ia7196b1ce76415a5faf3fa8575a26d22b2bf50b1 Reviewed-on: http://gerrit.cloudera.org:8080/16548 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-22 01:58:47 +00:00
Zoltan Borok-Nagy	9384a18180	IMPALA-10257: Relax check for page filtering HdfsParquetScanner::CheckPageFiltering() is a bit too strict. It checks that all column readers agree on the top level rows. Column readers have different strategies to read columns. One strategy reads ahead the Parquet def/rep levels, the other strategy reads levels and values simoultaneously, i.e. no readahead of levels. We calculate the ordinal of the top level row based on the repetition level. This means when we readahead the rep level, the top level row might point to the value to be processed next. While top level row in the other strategy always points to the row that has been completely processed last. Because of this in CheckPageFiltering() we can allow a difference of one between the 'current_row_' values of the column readers. I also got rid of the DCHECK in CheckPageFiltering() and replaced it with a more informative error report. Testing: * added a test to nested-types-parquet-page-index.test Change-Id: I01a570c09eeeb9580f4aa4f6f0de2fe6c7aeb806 Reviewed-on: http://gerrit.cloudera.org:8080/16619 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-21 20:30:05 +00:00
stiga-huang	ee4043e1a0	IMPALA-10168: Expose JSON catalog objects in catalogd's debug page Catalogd has a debug page at '/catalog_object' showing catalog objects in thrift debug strings. It's inconvenient for tests to parse the thrift string and get interesting infos. This patch extends this page to support returning JSON results, which eases tests to extract complex infos from the catalog objects, e.g. partition ids of a hdfs table. Just like getting json results from other pages, the usage is adding a ‘json’ argument in the URL, e.g. http://localhost:25020/catalog_object?json&object_type=TABLE&object_name=db1.tbl1 Implementation: Csaba helped to find that Thrift has a protocol, TSimpleJSONProtocol, which can convert thrift objects to human readable JSON strings. This simplifies the implementation a lot. However, TSimpleJSONProtocol is not implemented in cpp yet (THRIFT-2476). So we do the conversion in FE to use its java implementation. Tests: - Add tests to verify json fields existence. Change-Id: I15f256b4e3f5206c7140746694106e03b0a4ad92 Reviewed-on: http://gerrit.cloudera.org:8080/16449 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-21 06:31:47 +00:00
Qifan Chen	6dbf1ca09c	IMPALA-6628: Use unqualified table references in .test files run from test_queries.py This fix modified the following tests launched from test_queries.py by removing references to database 'functional' whenever possible. The objective of the change is to allow more testing coverage with different databases than the single 'functional' database. In the fix, neither new tables were added nor expected results were altered. empty.test inline-view-limit.test inline-view.test limit.test misc.test sort.test subquery-single-node.test subquery.test top-n.test union.test with-clause.test It was determined that other tests in testdata/workloads/functional-query/queries/QueryTest do not refer to 'functional' or the references are a must for some reason. Testing Ran query_tests on these changed tests with exhaustive exploration strategy. Change-Id: Idd50eaaaba25e3bedc2b30592a314d2b6b83f972 Reviewed-on: http://gerrit.cloudera.org:8080/16603 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-21 05:20:33 +00:00
Qifan Chen	65a0325572	IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through coefficient of variation (CoV). A high CoV (say > 1.0) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE 2. ProbeRows and BuildRows in HASH_JOIN_NODE 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE and reported as follows: 1. In execution profile, a new skew summary that lists the names of the operators with skews; 2. In the averaged profile for the corresponding operator, the list of values of the counter across all fragment instances in the backend processes; 3. Skew detection formula: CoV > limit and mean > 5,000 4. A new query option 'report_skew_limit' < 0: disable skew reporting >= 0: enable skew reporting and supply the CoV limit Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... ... ... In averaged profiles: HDFS_SCAN_NODE (id=2): ... Skew details: RowsRead ([2004992,1724693,2001351], CoV=0.07, mean=1910345) Testing: 1. Added test_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Reviewed-on: http://gerrit.cloudera.org:8080/16474 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-20 23:30:51 +00:00
stiga-huang	198bbe280c	IMPALA-10255: Fix TestInsertQueries.test_insert fails in exhaustive builds The patch in IMPALA-10233 adds 3 insert statements in testdata/workloads/functional-query/queries/QueryTest/insert.test. The test has CREATE TABLE ... LIKE functional.alltypes; therefore it'll create a TEXT table regardless to the test vector. But the compression codec is determined by the test vector, and since Impala cannot write compressed text, the test fails. The created table should use the same table format as the one in the test vector. Tests: - Run TestInsertQueries.test_insert in exhaustive mode. Change-Id: Id0912f751fa04015f1ffdc38f5c7207db7679896 Reviewed-on: http://gerrit.cloudera.org:8080/16609 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-20 06:47:25 +00:00
Joe McDonnell	ca4d6912be	IMPALA-10261: Include org/apache/hive/com/google in impala-minimal-hive-exec Newer versions of Hive shade guava, which means that they require the presence of artifacts in org/apache/hive/com/google. To support these newer versions, this adds that path to the inclusions for impala-minimal-hive-exec. Testing: - Tested with a newer version of Hive that has the shading and verified that Impala starts up and functions. Change-Id: I87ac089fdacc6fc5089ed68be92dedce514050b9 Reviewed-on: http://gerrit.cloudera.org:8080/16614 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-20 04:15:57 +00:00
Csaba Ringhofer	1e2176c849	IMPALA-9918: ORC scanner hits DCHECK when GLOG_v=3 PrintPath assumed that all elements in the path are complex, and hit a DCHECK if it contained a scalar element. This didn't seem to cause problems in Parquet, but the ORC scanner called this function with paths where the last element was scalar. This problem was probably not discovered because no one tested ORC scanning with v=3 logging + DEBUG builds. Also added logging to the events when log levels are changed through the webpage. In case of ResetJavaLogLevelCallback there was already log line from GlogAppender.java. Note that the cause of the original issue is still unknown, as it occurred during custom cluster tests where no other tests should change the log levels in parallel. Testing: - tested the log changes manually Change-Id: I94e12d2a62ccab5eb5d21675d5f0138f04e622ac Reviewed-on: http://gerrit.cloudera.org:8080/16611 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-20 01:32:42 +00:00
Joe McDonnell	e76010d628	IMPALA-10226: Change buildall.sh -notests to invoke a single Make target This is a small cleanup to add specific targets in CMake for buildall.sh -notests to invoke. Previously, it ran multiple targets like: make target1 target2 target3 ... In hand tests, make builds each target separately, so it is unable to overlap the builds of the multiple targets. Pushing it into CMake simplifies the code and allows the targets to build simultaneously. Testing: - Ran buildall.sh -notests Change-Id: Id881d6f481b32ba82501b16bada14b6630ba32d2 Reviewed-on: http://gerrit.cloudera.org:8080/16605 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-19 21:17:55 +00:00
stiga-huang	c7f118a860	IMPALA-10248: Fix test_column_storage_attributes date string errors After IMPALA-10225 bumps the impyla version to 0.17a1, we should expect impyla return a datetime.date instead of a string for DATE type data. Tests: - Run test_column_storage_attributes with --exploration_strategy=exhaustive to verify the fix. Change-Id: I618a759a03213efc22a5e54e9a30fa09e8929023 Reviewed-on: http://gerrit.cloudera.org:8080/16608 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-19 10:24:43 +00:00
stiga-huang	faa2d398e6	IMPALA-10233: zorder sort node should output rows in lexical order of partition keys When inserting to a partitioned hdfs table, the planner will add a sort node on top of the plan, depending on the clustered/noclustered plan hint and on the 'sort.columns' table property. If clustering is enabled in insertStmt or additional columns are specified in the 'sort.columns' table property, then the ordering columns will start with the clustering columns, so that partitions can be written sequentially in the table sink. Any additional non-clustering columns specified by the 'sort.columns' property will be added to the ordering columns and after any clustering columns. For Z-order sort type, we should deal with these ordering columns separately. The clustering columns should still be sorted lexically, and only the remaining ordering columns be sorted in Z-order. So we can still insert partitions one by one and avoid hitting the DCHECK as described in the JIRA. Tests - Add tests for inserting to a partitioned table with zorder. Change-Id: I30cbad711167b8b63c81837e497b36fd41be9b54 Reviewed-on: http://gerrit.cloudera.org:8080/16590 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-16 06:40:38 +00:00
Zoltan Borok-Nagy	6542b6070d	IMPALA-10243: ConcurrentModificationException during parallel INSERTs Impala might throw a ConcurrentModificationException during a high load of INSERTs to the same table. The exception happens during thrift serialization of TUpdateCatalogResponse which have a reference to the metastore table. The serialization happens without a lock, so another thread might modify the metastore table object in the meantime. This can potentially happen in CatalogOpExecutor.updateCatalog() which updates the catalog version and unsets table column statistics. For some reason I only saw this error with local catalog. The problem is that in Table.toThrift() we set a reference to the metastore table object instead of deep copying it. So my fix is to deep copy the metastore table, this prevents concurrent modifications. Testing * added stress test 'test_insert_stress.py' Change-Id: Ie656925d764d5eb26c318703ca425529ecf7a3a3 Reviewed-on: http://gerrit.cloudera.org:8080/16602 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-15 21:31:30 +00:00
Zoltan Borok-Nagy	05ad4ce05f	IMPALA-10152: part 1: refactor Iceberg catalog handling This patch refactors the code a bit to make it easier in the future to add support for new Iceberg catalogs. We plan to add support for HiveCatalog in the near future. Iceberg has two main interfaces to manage tables: Tables and Catalog. I created a new interface in Impala called 'IcebergCatalog' that abstracts both Tables and Catalog. Currently there are two implementations for IcebergCatalog: * HadoopTablesCatalog for HadoopTables * HadoopCatalog for HadoopCatalog This patch also delegates dropTable() to the Iceberg catalogs. Until this patch we let HMS drop the tables and delete the directories. It worked fine with the filesystem-based catalogs, but might not work well with other Iceberg catalogs like HiveCatalog. Change-Id: Ie69dff6cd6b8b3dc0ba5f7671b8504a936032a85 Reviewed-on: http://gerrit.cloudera.org:8080/16575 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-15 21:28:57 +00:00
Joe McDonnell	97792c4bad	IMPALA-10198 (part 2): Add support for mvn versions:set This adds support for setting the version of Java artifacts through "mvn versions:set". It changes the modules to inherit the version from the parent pom. Previously, we used a mix of 0.1-SNAPSHOT and 1.0-SNAPSHOT. This now uses 4.0.0-SNAPSHOT across the board. With each release, we can use "mvn versions:set" to update the versions. The only exception is the Hive UDF code that we build for testing. This remains at version 1.0 to avoid test changes. Testing: - Ran core job - Added build-all-flag-combinations.sh case that does "mvn versions:set" and runs a build Change-Id: I661b32e1e445169bac2ffe4f9474f14090031743 Reviewed-on: http://gerrit.cloudera.org:8080/16559 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-15 19:30:13 +00:00
Joe McDonnell	97856478ec	IMPALA-10198 (part 1): Unify Java in a single java/ directory This changes all existing Java code to be submodules under a single root pom. The root pom is impala-parent/pom.xml with minor changes to add submodules. This avoids most of the weird CMake/maven interactions, because there is now a single maven invocation for all the Java code. This moves all the Java projects other than fe into a top level java directory. fe is left where it is to avoid disruption (but still is compiled via the java directory's root pom). Various pieces of code that reference the old locations are updated. Based on research, there are two options for dealing with the shaded dependencies. The first is to have an entirely separate Maven project with a separate Maven invocation. In this case, the consumers of the shaded jars will see the reduced set of transitive dependencies. The second is to have the shaded dependencies as modules with a single Maven invocation. The consumer would see all of the original transitive dependencies and need to exclude them all. See MSHADE-206/MNG-5899. This chooses the second. This only moves code around and does not focus on version numbers or making "mvn versions:set" work. Testing: - Ran a core job - Verified existing maven commands from fe/ directory still work - Compared the *-classpath.txt files from fe and executor-deps and verified they are the same except for paths Change-Id: I08773f4f9d7cb269b0491080078d6e6f490d8d7a Reviewed-on: http://gerrit.cloudera.org:8080/16500 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2020-10-15 19:30:13 +00:00
Qifan Chen	398f17f710	IMPALA-9440 Typo in rpcz.tmpl for inbound connection metrics This fix corrected a typo in rpcz.tmpl. 1. Unit testing. Change-Id: Id0fcdcd8f81567bad3d9931f952a00ad815265fa Reviewed-on: http://gerrit.cloudera.org:8080/16576 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-15 05:09:56 +00:00
Joe McDonnell	7b42e9a439	IMPALA-10127: Fix performance for enforcement of lirs_tombstone_multiple When enforcing lirs_tombstone_multiple for the LIRS cache, currently it needs to walk backwards through the recency list to find the oldest tombstone entry to remove it. This traversal can involve passing a large number of non-tombstone entries. In pathological cases, it is O(N) where N is the number of entries in the cache. This modifies the LIRS implementation to use a combined unprotected and tombstone list. The first half of the list is unprotected entries. The second half is tombstone entries. The tombstone portion of the list allows the enforcement for the lirs_tombstone_multiple to be O(1) when finding the oldest tombstone entry. The unprotected half of the list operates as it did before, except that the combined list requires maintaining a pointer to the front of the unprotected list (i.e. the boundary between the unprotected portion and the tombstone portion). Using a combined list means that evicting an unprotected entry to become a tombstone entry is only updating a pointer. This is a common operation, so it keeps the overhead of the tombstone list low. Performance: For all existing performance test cases in cache-bench, lookups/sec are within 2% of the current implementation of LIRS. This adds a new pathological case to cache-bench where the cache hit rate is 0.2%. This causes extensive use of the lirs_tombstone_multiple. This case sees a 2000x improvement, going from 1.3K lookups/sec to 2.96M lookups/sec. Testing: - lirs-cache-test passes (debug and release) - This also adds the DATA_CACHE_EVICTION_POLICY environment variable to run-all-tests.sh to allow easy testing with LIRS. - Ran a core job with 500MB cache using LIRS Change-Id: I25b697f57c7daacccf8791a5a7b31878a6f7f1d2 Reviewed-on: http://gerrit.cloudera.org:8080/16597 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-15 04:22:39 +00:00
Qifan Chen	41813b27bc	IMPALA-9754 buffer_pool_limit error message is confusing This fix reworded following two error messages for clarity. 1. "Invalid --buffer_pool_limit value, must be a percentage or positive bytes value or percentage:" 2. "Invalid --buffer_pool_clean_pages_limit value, must be a percentage or positive bytes value or percentage:" The fix also enhanced the code to verify that the JVM max heap size is less than the process memory limit when mem_limit_includes_jvm flag is set to true, and raise a new error message otherwise: "Invalid combination of --mem_limit_includes_jvm and JVM max heap size $0, which must be smaller than process memory limit $1". Testing: 1. Unit testing; 2. Ran Core tests successfully. Change-Id: I15ce1cdcc168163b3f5b21e778f9bf6e6b7730d5 Reviewed-on: http://gerrit.cloudera.org:8080/16566 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-15 00:26:12 +00:00
Qifan Chen	63c435cac1	IMPALA-9232 Potential overflow in serializeThriftMsg This fix added a sanity check to assure the length of the buffer holding a serialized object does not go over INT_MAX bytes. Testing: 1. Unit testing; 2. Ran Core tests successfully. Change-Id: Ie76028acea84dbe0e88518dae60aaf7e7ca55e9e Reviewed-on: http://gerrit.cloudera.org:8080/16584 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-14 23:25:42 +00:00
Riza Suminto	c7581b5d8a	IMPALA-10220: Fix negative value bug in RpcNetworkTime counter. Total RPC time was incorrectly computed using resp_.receiver_latency_ns() in function EndDataStreamCompleteCb(). This patch fix the bug by replacing it with eos_rsp_.receiver_latency_ns(). This patch also fix logging mistakes in LogSlowRpc() to use its 'resp' parameter instead of 'resp_' field member. Testing: - Manually run data loading query that exhibit the bug for several times and verify that the Min value of RpcNetworkTime counter is always positive after the patch. The query used in testing is insert query to TPC-DS fact table store_sales of scale 10GB in single machine mini cluster. - Add DCHECK to verify that total rpc time is greater than or equal to receiver_latency_ns. - Run and pass core tests. Change-Id: I2a4d65a3e0f88349bd4ee1b01290bd2c386acc69 Reviewed-on: http://gerrit.cloudera.org:8080/16552 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-14 22:20:06 +00:00
skyyws	0c0985a825	IMPALA-10159: Supporting ORC file format for Iceberg table This patch mainly realizes querying Iceberg table with ORC file format. We can using following SQL to create table with ORC file format: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg.file_format'='orc', 'iceberg.catalog'='hadoop.tables'); But pay attention, there still some problems when scan ORC files with Timestamp, more details please refer IMPALA-9967. We may add new tests with Timestmap type after this JIRA fixed. Testing: - Create table tests in functional_schema_template.sql - Iceberg table create test in test_iceberg.py - Iceberg table query test in test_scanners.py Change-Id: Ib579461aa57348c9893a6d26a003a0d812346c4d Reviewed-on: http://gerrit.cloudera.org:8080/16568 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-14 19:19:19 +00:00
Gabor Kaszab	13a78fc1b0	IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables This patch adds support for Iceberg Bucket and Truncate partition transforms. Both accept a parameter: number of buckets and width respectively. Usage: CREATE TABLE tbl_name (i int, p1 int, p2 timestamp) PARTITION BY SPEC ( p1 BUCKET 10, p1 TRUNCATE 5 ) STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.tables'); Testing: - Extended AnalyzerStmtsTest to cover creating partitioned Iceberg tables with the new partition transforms. - Extended ParserTest. - Extended iceberg-create.test to create Iceberg tables with the new partition transforms. - Extended show-create-table.test to check that the new partition transforms are displayed with their parameters in the SHOW CREATE TABLE output. Change-Id: Idc75cd23045b274885607c45886319f4f6da19de Reviewed-on: http://gerrit.cloudera.org:8080/16551 Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-14 19:07:06 +00:00
Attila Jeges	3d067572dd	IMPALA-10224: Add startup flag not to expose debug web url to clients This patch introduces a new startup flag --ping_expose_webserver_url (true by default) to control whether PingImpalaService, PingImpalaHS2Service RPC calls should expose the debug web url to the client or not. This is necessary as the debug web UI is not something that end-users will necessarily have access to. If the flag is set to false, the RPC calls will return an empty string instead of the real url signalling that the debug web ui is not available. Note that if the webserver is disabled (--enable_webserver flag is set to false) the RPC calls will behave the same and return an empty string for the url. Change-Id: I7ec3e92764d712b8fee63c1f45b038c31c184cfc Reviewed-on: http://gerrit.cloudera.org:8080/16573 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-14 14:39:39 +00:00
stiga-huang	308d692a1b	IMPALA-10113: Add feature flag for incremental metadata updates This patch adds a feature flag, enable_incremental_metadata_updates, to turn off incremental metadata (i.e. partition level metadata) propagation from catalogd to coordinators. It defaults to true. When setting to false, catalogd will send metadata updates in table granularity (the legacy behavior). Also fixes a bug of logging an empty aggregated partition update log when no partitions are changed in a DDL. Tests: - Run CORE tests with this flag set to true and false. - Add tests with enable_incremental_metadata_updates=false. Change-Id: I98676fc8ca886f3d9f550f9b96fa6d6bff178ebb Reviewed-on: http://gerrit.cloudera.org:8080/16436 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-14 14:15:53 +00:00
Zoltan Borok-Nagy	0fb234923c	IMPALA-10055: Fix DCHECK hit on corrupt ORC file Our ORC scanner could hit a DCHECK on corrupt ORC files. In test_scanners_fuzz we randomly modify ORC files, so the this test might hit a DCHECK occasionally. I converted the DCHECK to a parse error. This way the fuzz test won't crash the Impala daemon. Testing: Unfortunately I don't have an ORC file on which we hit the DCHECK. So I manually changed the code to always raise this error and executed the fuzz test to see if it still succeeds. Change-Id: I18d9f56c3c37afd1a4898ee36f8cc2ddb5049972 Reviewed-on: http://gerrit.cloudera.org:8080/16591 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-14 13:33:21 +00:00
Joe McDonnell	481ea4ab0d	IMPALA-9815: Update URL for cdh-releases-rcs maven repo We use repository.cloudera.com to get some Cloudera-patched depdencies required by the CDP Hadoop dependencies (e.g. log4j, logredactor, etc). The URL for repository.cloudera.com has changed from repository.cloudera.com/content/* to repository.cloudera.com/artifactory/*. It is possible that the old URL will be restored. To get things working, this updates the cdh-releases-rcs to the new URL. It turns out that cdh-releases-rcs contains all the artifacts that we would otherwise get from the third-party repository, so this replaces third-party with cdh-releases-rcs. Testing: - Ran build-all-options-ub1604 Change-Id: I438305565a1e6b7515408a701e9f9e31f7cfd679 Reviewed-on: http://gerrit.cloudera.org:8080/16594 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-14 05:39:01 +00:00

1 2 3 4 5 ...

9515 Commits