Commit Graph

9515 Commits

Author SHA1 Message Date
Joe McDonnell
2357958e73 IMPALA-10304: Fix log level and format for pytests
Recent testing showed that the pytests are not
respecting the log level and format set in
conftest.py's configure_logging(). It is using
the default log level of WARNING and the
default formatter.

The issue is that logging.basicConfig() is only
effective the first time it is called. The code
in lib/python/impala_py_lib/helpers.py does a
call to logging.basicConfig() at the global
level, and conftest.py imports that file. This
renders the call in configure_logging()
ineffective.

To avoid this type of confusion, logging.basicConfig()
should only be called from the main() functions for
libraries. This removes the call in lib/python/impala_py_lib
(as it is not needed for a library without a main function).
It also fixes up various other locations to move the
logging.basicConfig() call to the main() function.

Testing:
 - Ran the end to end tests and custom cluster tests
 - Confirmed the logging format
 - Added an assert in configure_logging() to test that
   the INFO log level is applied to the root logger.

Change-Id: I5d91b7f910b3606c50bcba4579179a0bc8c20588
Reviewed-on: http://gerrit.cloudera.org:8080/16679
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-30 15:32:21 +00:00
skyyws
3e06d600c2 IMPALA-10166 (part 1): ALTER TABLE for Iceberg tables
This patch mainly implements ALTER TABLE for Iceberg
tables, we currently support these statements:
  * ADD COLUMNS
  * RENAME TABLE
  * SET TBL_PROPERTIES
  * SET OWNER
We forbid DROP COLUMN/REPLACE COLUMNS/ALTER COLUMN in this
patch, since these statemens may make Iceberg tables unreadable.
We may support column resolution by field id in the near future,
after that, we will support COLUMN/REPLACE COLUMNS/ALTER COLUMN
for Iceberg tables.

Here something we still need to pay attention:
1.RENAME TABLE is not supported for HadoopCatalog/HadoopTables,
even if we already implement 'RENAME TABLE' statement, so we
only rename the table in the Hive Metastore for external table.
2.We cannot ADD/DROP PARTITION now since there is no API for that
in Iceberg, but related work is already in progess in Iceberg.

Testing:
- Iceberg table alter test in test_iceberg.py
- Iceberg table negative test in test_scanners.py
- Rename tables in iceberg-negative.test

Change-Id: I5104cc47c7b42dacdb52983f503cd263135d6bfc
Reviewed-on: http://gerrit.cloudera.org:8080/16606
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-30 14:03:29 +00:00
Qifan Chen
d164bf42f9 IMPALA-10294: Improvement to test_skew_reporting_in_runtime_profile
This fix improved the skew reporting test by lowering the threshold
to 0 and by taking care of the extreme case of no skews.

Testing:
1. Unit testing.

Change-Id: I7a36551f2507d724891707d26b7394fbe3a5657b
Reviewed-on: http://gerrit.cloudera.org:8080/16662
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-30 05:34:16 +00:00
wzhou-code
f4ed07c8eb IMPALA-10206: Avoid MD5 Digest Authorization in FIPS approved mode
To compliant with FIPS requirement, we should use OpenSSL libraries
for cryptographic hash functions, instead of own hash functions.
This patch replace MD5 and SHA1 functions in Squeasel Web server
with OpenSSL APIs. It also force to turn off Digest Authorization
for Web server in FIPS approved mode since Digest Authorization
use MD5 hash and it doesn't comply with FIPS 140-2.

Testing:
 - Passed webserver-test.
 - Passed exhaustive tests.
 - Manually verified HTTP Digest Authorization could not be enabled
   by setting webserver_password_file on a FIPS enabled cluster.

Change-Id: Ie075389b3ab65c612d64ba58e16a10b19bdf4d6f
Reviewed-on: http://gerrit.cloudera.org:8080/16630
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-30 04:33:10 +00:00
Joe McDonnell
047906bd6b IMPALA-10302: Enable logging at the INFO level for test_scanners_fuzz.py
This changes test_scanners_fuzz.py to set the logging level
to INFO. By default, it is WARNING, so it was missing some useful
INFO log messages like the random seed used. This also fixes formatting
on one of the log lines.

Testing:
 - Ran test_scanners_fuzz.py locally and checked to make
   sure the INFO messages were present

Change-Id: Ida4a9cbed6572520998def9618a8b4189c1ba799
Reviewed-on: http://gerrit.cloudera.org:8080/16677
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-30 04:31:30 +00:00
Thomas Tauber-Marshall
01e1b4df80 IMPALA-10303: Fix warnings from impala-shell with --quiet
When the --quiet flag is used with impala-shell, the intention is that
if the query is successful then only the query results should be
printed.

This patch fixes two cases where --quiet was not being respected:
- When using the HTTP transport and --client_connect_timeout_ms is
  set, a warning is printed that the timeout is not applied.
- When running in non-interactive mode, a warning is printed that
  --live_progress is automatically disabled. This warning is now also
  only printed if --live_progress is actually set.

Testing:
- Added a test that runs a simple query with --quiet and confirms the
  output is as expected.

Change-Id: I1e94c9445ffba159725bacd6f6bc36f7c91b88fe
Reviewed-on: http://gerrit.cloudera.org:8080/16673
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-30 02:17:29 +00:00
wzhou-code
1682afcda6 IMPALA-10298: Change column mask hash as SHA512 in FIPS mode
Column masking API is called by Ranger during policy evaluation.
Ranger team requires to change the column mask hash as SHA-512 in
FIPS mode without changing API.
This patch changes the MaskFunctions::MaskHash() for string type
to use SHA-512 in FIPS mode.

Testing:
 - Passed exhaustive tests.
 - Manually test the API.

Change-Id: I422d4b11b31c3e6eb7963260a1da730579c4ca74
Reviewed-on: http://gerrit.cloudera.org:8080/16671
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-30 01:16:07 +00:00
Fucun Chu
193c2e773f IMPALA-10132: Implement ds_hll_estimate_bounds_as_string() function.
This function receives a string that is a serialized Apache DataSketches
HLL sketch and optional kappa that is a number of standard deviations
from the mean: 1, 2 or 3 (default 2). Returns estimate and bounds with
the values separated with commas.
The result is three values: estimate, lower bound and upper bound.

   ds_hll_estimate_bounds_as_string(sketch [, kappa])

Kappa:
 1 represent the 68.3% confidence bounds
 2 represent the 95.4% confidence bounds
 3 represent the 99.7% confidence bounds

Note, ds_hll_estimate_bounds() should return an Array of doubles as
the result but with that we have to wait for the complex type support.
Until, we provide ds_hll_estimate_bounds_as_string() that can be
deprecated once we have array support. Tracking Jira for returning
complex types from functions is IMPALA-9520.

Example:
select ds_hll_estimate_bounds_as_string(ds_hll_sketch(int_col)) from
functional_parquet.alltypestiny;
+----------------------------------------------------------+
| ds_hll_estimate_bounds_as_string(ds_hll_sketch(int_col)) |
+----------------------------------------------------------+
| 2,2,2.0002                                               |
+----------------------------------------------------------+

Change-Id: I46bf8263e8fd3877a087b9cb6f0d1a2392bb9153
Reviewed-on: http://gerrit.cloudera.org:8080/16626
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-29 17:45:01 +00:00
Qifan Chen
1bd27a3ea8 IMPALA-7097 Print EC info in the query plan and profile
This fix added the functionality to show the number of erasure coded
files and the total size of such files in the scan node in the
query plan and profile. Shown below are two examples for the HDFS file
system.

Non-partitioned table:
00:SCAN HDFS [default.test_show_ec_nonpart, RANDOM]
   HDFS partitions=1/1 files=2 size=1.65KB
   erasure coded: files=2 size=1.65KB
   stored statistics:

Partitioned table:
00:SCAN HDFS [default.test_show_ec_part]
   HDFS partitions=4/4 files=4 size=2.36KB
   erasure coded: files=3 size=1.77KB
   row-size=12B cardinality=999

Testing:
1. Unit testing;
2. Ran Core tests successfully.

Change-Id: I6ea378914624a714fde820d290b3b9c43325c6a1
Reviewed-on: http://gerrit.cloudera.org:8080/16587
Reviewed-by: Aman Sinha <amsinha@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-29 10:35:27 +00:00
Yida Wu
d62a04078d IMPALA-10102 Fix Impalad crashes when writing a parquet file with
large rows.

The crash happens when trying to dereference a null pointer
returned from a failed memory allocation from memory pool.

TryAllocate is used instead of Allocate and null check is added
for the large memory allocations such as buffer for dictionary
page and compressed dictionary page. The memory allocation is
most likely to fail for these large allocations when memory is
scarce.

This change fixes the crash in this particular code path,
however in practice, there could still be an OOM issue which
could lead to the process getting killed by the OS. The change
doesn't fix the OOM issue, users need to configure the mem_limit
(start-up option) properly to avoid the OOM crash.

Test:
Ran a script to redo the test mentioned in the Jira for thirty
times, no crash happens.

Change-Id: I0dee474cceb0c370278d290eb900c05769b23dec
Reviewed-on: http://gerrit.cloudera.org:8080/16638
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-29 01:45:11 +00:00
Qifan Chen
09fdeb56f8 IMPALA-10267: Impala crashes in HdfsScanner::WriteTemplateTuples() with negative num_tuples
This fix enhances method HdfsAvroScanner::ProcessRange() with a DCHECK()
to help catch negative num_tuples. The DCHEK() specifically checks that
num_records_in_block_ is always greater than record_pos_.

Testing:
1. Unit testing.

Change-Id: If88fd3aa4c96a69e37d060031f7432d27d069c62
Reviewed-on: http://gerrit.cloudera.org:8080/16672
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2020-10-29 00:49:54 +00:00
Vihang Karajgaonkar
bd4a38ea33 IMPALA-10277: Fix test_catalogd_debug_actions on S3 builds
test_catalogd_debug_actions fails on S3 builds because the
time taken to load a table on S3 is faster than on HDFS.
The fix changes the test to reduce the expected delay
when the debug actions are set so that test works on
S3 builds.

Testing:
1. Ran the test on the s3 build jenkins job.
2. Ran the test on HDFS build jenkins job.

Change-Id: I8348a33ff8e9c7812540e014f4de2c65636da64f
Reviewed-on: http://gerrit.cloudera.org:8080/16664
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-28 05:40:51 +00:00
stiga-huang
8ac382c784 IMPALA-10075: Reuse unchanged partition instances
Currently, we always update the partition instance when we reload a
partition. If a partition remains the same after reloading, we should
reuse the old partition instance. So we won't send redundant updates on
these partitions. This reduces the size of the catalog topic update.
When a huge table is REFRESHed, catalogd only propagates the changed
partitions.

Tests:
 - Add tests to verify that partition instances are reused after some
   DDL/DMLs.

Change-Id: I2dd645c260d271291021e52fdac4b74924df1170
Reviewed-on: http://gerrit.cloudera.org:8080/16392
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-27 04:47:13 +00:00
Zoltan Borok-Nagy
981ef10465 IMPALA-10215: Implement INSERT INTO for non-partitioned Iceberg tables (Parquet)
This commit adds support for INSERT INTO statements against Iceberg
tables when the table is non-partitioned and the underlying file format
is Parquet.

We still use Impala's HdfsParquetTableWriter to write the data files,
though they needed some modifications to conform to the Iceberg spec,
namely:
 * write Iceberg/Parquet 'field_id' for the columns
 * TIMESTAMPs are encoded as INT64 micros (without time zone)

We use DmlExecState to transfer information from the table sink
operators to the coordinator, then updateCatalog() invokes the
AppendFiles API to add files atomically. DmlExecState is encoded in
protobuf, communication with the Frontend uses Thrift. Therefore to
avoid defining Iceberg DataFile multiple times they are stored in
FlatBuffers.

The commit also does some corrections on Impala type <-> Iceberg type
mapping:
 * Impala TIMESTAMP is Iceberg TIMESTAMP (without time zone)
 * Impala CHAR is Iceberg FIXED

Testing:
 * Added INSERT tests to iceberg-insert.test
 * Added negative tests to iceberg-negative.test
 * I also did some manual testing with Spark. Spark is able to read
   Iceberg tables written by Impala until we use TIMESTAMPs. In that
   case Spark rejects the data files because it only accepts TIMESTAMPS
   with time zone.
 * Added concurrent INSERT tests to test_insert_stress.py

Change-Id: I5690fb6c2cc51f0033fa26caf8597c80a11bcd8e
Reviewed-on: http://gerrit.cloudera.org:8080/16545
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-26 20:01:09 +00:00
stiga-huang
3ba8d637cd IMPALA-10256: Skip test_disable_incremental_metadata_updates on S3 tests
IMPALA-10113 adds a test for disabling the incremental_metadata_updates
flag to verify the metadata propagation still working correctly. The
test invokes two test files which is used in metadata/test_ddl.py. One
test file is about hdfs caching. It should only be run on HDFS file
system. So we should mark the test with "SkipIf.not_hdfs".

Tests:
 - Run CORE test on S3 build.

Change-Id: I0b922de84cff0a1e0771d5a8470bdd9f153f85f0
Reviewed-on: http://gerrit.cloudera.org:8080/16616
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-24 06:10:52 +00:00
stiga-huang
2a1d3acaf1 IMPALA-9870: impala-shell 'summary' to show original and retried queries
This patch extends the 'summary' command of impala-shell to support
retrieving the summary of the original query attempt. The new syntax is

SUMMARY [ALL | LATEST | ORIGINAL]

If 'ALL' is specified, both the latest and original summaries are
printed. If 'LATEST' is specified, only the summary of the latest query
attempt is printed. If 'ORIGINAL' is specified, only the summary of the
original query attempt is printed. The default option is 'LATEST'.
Support for this has only been added to HS2 given that Beeswax is being
deprecated soon.

Tests:
 - Add new tests in test_shell_interactive.py

Change-Id: I8605dd0eb2d3a2f64f154afb6c2fd34251c1fec2
Reviewed-on: http://gerrit.cloudera.org:8080/16502
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-24 05:11:06 +00:00
Bikramjeet Vig
fd1ea147b3 IMPALA-10210: Skip Authentication for connection from a trusted domain
Adds the ability to skip authentication for connection requests
originating from a trusted domain over the hs2 http endpoint and
the http webserver endpoint. The trusted domain can be specified
using the newly added "--trusted_domain" startup flag. Additionally,
if the startup flag "--trusted_domain_use_xff_header" is set to true,
impala will switch to using the 'X-Forwarded-For' HTML header to
extract the origin address while attempting to check if the connection
originated from a trusted domain.

Other highlights:
- This still requires the client to specify a username via a basic
  auth header.
- To avoid looking up hostname for every http request, a cookie is
  returned on the first auth attempt which will then be subsequently
  used for further communication on the same connection.

Testing:
Added tests for both the hs2 http endpoint and the webserver http endpoint

Change-Id: I09234078e2314dbc3177d0e869ae028e216ca699
Reviewed-on: http://gerrit.cloudera.org:8080/16542
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-24 02:30:51 +00:00
Riza Suminto
4b5c66f329 IMPALA-10266: Identify FileSystem type based on the protocol scheme.
Frontend identifies the type of FileSystem in two ways. The first is
done using the instanceof operator with subclasses of
org.apache.hadoop.fs.FileSystem. The second is by checking the
FileSystem protocol scheme. This patch standardizes the FileSystem
identification based on the scheme only.

Testing:
- Add several tests in FileSystemUtilTest to check validity of some
  FileSystemUtil functions.
- Run and pass core tests.

Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b
Reviewed-on: http://gerrit.cloudera.org:8080/16628
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-24 01:14:37 +00:00
Joe McDonnell
cfa8a7a5e5 IMPALA-10278: Use full libraries for impalad_executor Docker container
This backs out the piece of IMPALA-10016 that used a pared-down
set of libraries for the impalad_executor. That pared-down
set was missing org.apache.impala.common.JniUtil, which
prevented the impalad_executor container from starting up.

Testing:
 - Ran a docker core job with one coord_exec and two executors
   and it was able to startup where it wouldn't before

Change-Id: Ieecca61cd3c11f446b922a04fdeb5fd0c90fc971
Reviewed-on: http://gerrit.cloudera.org:8080/16640
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-23 21:20:44 +00:00
Andrew Sherman
4159a8085c IMPALA-10244: Make non-scalable failures to dequeue observable.
One of the important ways to observe Impala throughput is by looking at
when queries are queued. This can be an indication that more resources
should be added to the cluster by adding more executor groups. This is
only a good strategy if adding more resources will help with the current
workload. In some situations the head of the query queue cannot be
executed because of resource constraints on the coordinator. In these
cases the coordinator is the bottleneck so adding more executor groups
will not help. This change is to make these cases observable by adding a
new counter which is incremented when a dequeue fails because of
resource constraints on the coordinator.

The two cases that cause the counter to be incremented are:
- when there are not enough admission control slots on the coordinator
- when there is not enough memory on the coordinator
but it is possible that other conditions may be added in future.

TESTING:
Added new unit tests.
Ran all end-to-end tests.

Change-Id: I3456396ac139c562ad9cd3ac1a624d8f35487518
Reviewed-on: http://gerrit.cloudera.org:8080/16613
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-23 06:35:54 +00:00
Bikramjeet Vig
d459b434b6 IMPALA-10245: Disable test_kudu_scanner when run with erasure coding
Test disabled for EC since the erasure coded files when loaded in kudu
during data load cause the expected behavior to change.
Confirmed by loading data without EC and turning on EC when running the
test.

Change-Id: Ia383c1a788c6c0c66e2ef7c6494fe5fe643956df
Reviewed-on: http://gerrit.cloudera.org:8080/16623
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-23 06:11:04 +00:00
Fang-Yu Rao
eda06f41ce IMPALA-9990: Support SET OWNER for Kudu tables
KUDU-3090 adds the support for table ownership and exposes the API's of
setting owner on creating and altering tables, which allows Impala to
also pass to Kudu the new owner of the Kudu table for the ALTER TABLE
SET OWNER statement.

Specifically, based on the API of AlterTableOptions#setOwner(), this
patch stores the ownership information of the Kudu table in the
corresponding instance of AlterTableOptions, which will then be passed
to Kudu via a KuduClient.

Testing:
- Added a FE test in AnalyzeKuduDDLTest.java to verify the statement
  could be correctly analyzed.
- Added an E2E test in kudu_alter.test to verify the statement could be
  correctly executed when the integration between Kudu and HMS is not
  enabled.
- Added an E2E test in kudu_hms_alter.test and verified that the
  statement could be correctly executed when the integration between
  Kudu and HMS is enabled after manually re-enabling
  TestKuduHMSIntegration::test_kudu_alter_table(). Note that this was
  not possible before IMPALA-10092 was resolved due to a bug in the
  class of CustomClusterTestSuite. In addition, we may need to delete
  the Kudu table 'simple' via a Kudu-Python client if the E2E test
  complains that the Kudu table already exists, which may be related to
  IMPALA-8751.
- Manually verified that the views of Kudu server and HMS are consistent
  for a synchronized Kudu table after the ALTER TABLE SET OWNER
  statement even though the Kudu table was once an external and
  non-synchronized table, meaning that the owner from Kudu's perspective
  could be different than that from HMS' perspective. Such a discrepancy
  could be created if we execute the ALTER TABLE SET OWNER statement for
  an external Kudu table with the property of 'external.table.purge'
  being false. The test is performed manually because currently the
  Kudu-Python client adopted in Impala's E2E tests is not up to date so
  that the field of 'owner' cannot be accessed in the E2E tests. On the
  other hand, to verify the owner of a Kudu table from Kudu's
  perspective, we used the latest Kudu-Python client as provided at
  github.com/apache/kudu/tree/master/examples/python/basic-python-example.
- Verified that the patch could pass the exhaustive tests in the DEBUG
  mode.

Change-Id: I29d641efc8db314964bc5ee9828a86d4a44ae95c
Reviewed-on: http://gerrit.cloudera.org:8080/16273
Reviewed-by: Vihang Karajgaonkar <vihang@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-23 04:01:26 +00:00
Tim Armstrong
227e43f481 IMPALA-10216: add logging to help debug flaky test
This commit adds additional info to the assertions to
help debug it if it reoccurs.

Change-Id: I09984dd3cea686808115ca4cb8c88d24271d8cc1
Reviewed-on: http://gerrit.cloudera.org:8080/16620
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-22 23:44:33 +00:00
Qifan Chen
61a020d0f8 IMPALA-10007: Impala development environment does not support
Ubuntu 20.04

This is a minor amendment to a previously merged change with
ChangeId I4f592f60881fd8f34e2bf393a76f5a921505010a, to address
additional review comments. In particular, the original commit
referred to Ubuntu 20.4 whereas it should have used Ubuntu 20.04.

Change-Id: I7db302b4f1d57ec9aa2100d7589d5e814db75947
Reviewed-on: http://gerrit.cloudera.org:8080/16241
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-22 05:03:32 +00:00
Vihang Karajgaonkar
15c3b13e97 IMPALA-10219: Expose DEBUG_ACTION query option in catalog
This patches enables DEBUG_ACTION in the catalog service's
java code. Specifically, DEBUG_ACTION query option is now
exposed to TResetMetadataRequest and TExecDdlRequest
so that we can inject delays while executing refresh
or ddl statements.

For example,
1. To inject a delay of 100ms per HDFS list operation
during refresh statement set the following query option:

set debug_action=catalogd_refresh_hdfs_listing_delay:SLEEP@100;

2. To inject a delay of 100ms in alter table recover
partitions statement:

set debug_action=catalogd_table_recover_delay:SLEEP@100;

3. To inject a delay of 100ms in compute stats statement

set debug_action=catalogd_update_stats_delay:SLEEP@100;

Note that this option only adds the delay during the
update_stats phase of the compute stats execution.

Testing:
1. Added a test which sets the query option and makes
sure that command takes more time than without query option.
2. Added unit tests for the debugAction implementation
logic.

Change-Id: Ia7196b1ce76415a5faf3fa8575a26d22b2bf50b1
Reviewed-on: http://gerrit.cloudera.org:8080/16548
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-22 01:58:47 +00:00
Zoltan Borok-Nagy
9384a18180 IMPALA-10257: Relax check for page filtering
HdfsParquetScanner::CheckPageFiltering() is a bit too strict. It checks
that all column readers agree on the top level rows. Column readers
have different strategies to read columns. One strategy reads ahead
the Parquet def/rep levels, the other strategy reads levels and
values simoultaneously, i.e. no readahead of levels.

We calculate the ordinal of the top level row based on the repetition
level. This means when we readahead the rep level, the top level row
might point to the value to be processed next. While top level row
in the other strategy always points to the row that has been
completely processed last.

Because of this in CheckPageFiltering() we can allow a difference of
one between the 'current_row_' values of the column readers.

I also got rid of the DCHECK in CheckPageFiltering() and replaced it
with a more informative error report.

Testing:
* added a test to nested-types-parquet-page-index.test

Change-Id: I01a570c09eeeb9580f4aa4f6f0de2fe6c7aeb806
Reviewed-on: http://gerrit.cloudera.org:8080/16619
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-21 20:30:05 +00:00
stiga-huang
ee4043e1a0 IMPALA-10168: Expose JSON catalog objects in catalogd's debug page
Catalogd has a debug page at '/catalog_object' showing catalog objects
in thrift debug strings. It's inconvenient for tests to parse the thrift
string and get interesting infos.

This patch extends this page to support returning JSON results, which
eases tests to extract complex infos from the catalog objects, e.g.
partition ids of a hdfs table. Just like getting json results from other
pages, the usage is adding a ‘json’ argument in the URL, e.g.
http://localhost:25020/catalog_object?json&object_type=TABLE&object_name=db1.tbl1

Implementation:
Csaba helped to find that Thrift has a protocol, TSimpleJSONProtocol,
which can convert thrift objects to human readable JSON strings. This
simplifies the implementation a lot. However, TSimpleJSONProtocol is not
implemented in cpp yet (THRIFT-2476). So we do the conversion in FE to
use its java implementation.

Tests:
 - Add tests to verify json fields existence.

Change-Id: I15f256b4e3f5206c7140746694106e03b0a4ad92
Reviewed-on: http://gerrit.cloudera.org:8080/16449
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-21 06:31:47 +00:00
Qifan Chen
6dbf1ca09c IMPALA-6628: Use unqualified table references in .test files run from test_queries.py
This fix modified the following tests launched from test_queries.py by
removing references to database 'functional' whenever possible. The
objective of the change is to allow more testing coverage with different
databases than the single 'functional' database. In the fix, neither new
tables were added nor expected results were altered.

  empty.test
  inline-view-limit.test
  inline-view.test
  limit.test
  misc.test
  sort.test
  subquery-single-node.test
  subquery.test
  top-n.test
  union.test
  with-clause.test

It was determined that other tests in
testdata/workloads/functional-query/queries/QueryTest do not refer to
'functional' or the references are a must for some reason.

Testing
   Ran query_tests on these changed tests with exhaustive exploration
   strategy.

Change-Id: Idd50eaaaba25e3bedc2b30592a314d2b6b83f972
Reviewed-on: http://gerrit.cloudera.org:8080/16603
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-21 05:20:33 +00:00
Qifan Chen
65a0325572 IMPALA-10178 Run-time profile shall report skews
This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through coefficient of variation
(CoV). A high CoV (say > 1.0) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters
  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE
  2. ProbeRows and BuildRows in HASH_JOIN_NODE
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE

and reported as follows:
  1. In execution profile, a new skew summary that lists the names
     of the operators with skews;
  2. In the averaged profile for the corresponding operator, the list
     of values of the counter across all fragment instances in the
     backend processes;
  3. Skew detection formula: CoV > limit and mean > 5,000
  4. A new query option 'report_skew_limit'
     < 0: disable skew reporting
     >= 0: enable skew reporting and supply the CoV limit

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)

  Per Node Peak Memory Usage: ...
  ... ...

In averaged profiles:

  HDFS_SCAN_NODE (id=2): ...
          Skew details: RowsRead ([2004992,1724693,2001351],
                                  CoV=0.07, mean=1910345)

Testing:
1. Added test_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Reviewed-on: http://gerrit.cloudera.org:8080/16474
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-20 23:30:51 +00:00
stiga-huang
198bbe280c IMPALA-10255: Fix TestInsertQueries.test_insert fails in exhaustive builds
The patch in IMPALA-10233 adds 3 insert statements in
testdata/workloads/functional-query/queries/QueryTest/insert.test. The
test has CREATE TABLE ... LIKE functional.alltypes; therefore it'll
create a TEXT table regardless to the test vector. But the compression
codec is determined by the test vector, and since Impala cannot write
compressed text, the test fails.

The created table should use the same table format as the one in the
test vector.

Tests:
 - Run TestInsertQueries.test_insert in exhaustive mode.

Change-Id: Id0912f751fa04015f1ffdc38f5c7207db7679896
Reviewed-on: http://gerrit.cloudera.org:8080/16609
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-20 06:47:25 +00:00
Joe McDonnell
ca4d6912be IMPALA-10261: Include org/apache/hive/com/google in impala-minimal-hive-exec
Newer versions of Hive shade guava, which means that they require
the presence of artifacts in org/apache/hive/com/google. To
support these newer versions, this adds that path to the inclusions
for impala-minimal-hive-exec.

Testing:
 - Tested with a newer version of Hive that has the shading
   and verified that Impala starts up and functions.

Change-Id: I87ac089fdacc6fc5089ed68be92dedce514050b9
Reviewed-on: http://gerrit.cloudera.org:8080/16614
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-20 04:15:57 +00:00
Csaba Ringhofer
1e2176c849 IMPALA-9918: ORC scanner hits DCHECK when GLOG_v=3
PrintPath assumed that all elements in the path are complex,
and hit a DCHECK if it contained a scalar element. This didn't
seem to cause problems in Parquet, but the ORC scanner called
this function with paths where the last element was scalar.
This problem was probably not discovered because no one tested
ORC scanning with v=3 logging + DEBUG builds.

Also added logging to the events when log levels are changed
through the webpage. In case of ResetJavaLogLevelCallback
there was already log line from GlogAppender.java.

Note that the cause of the original issue is still unknown,
as it occurred during custom cluster tests where no other tests
should change the log levels in parallel.

Testing:
- tested the log changes manually

Change-Id: I94e12d2a62ccab5eb5d21675d5f0138f04e622ac
Reviewed-on: http://gerrit.cloudera.org:8080/16611
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-20 01:32:42 +00:00
Joe McDonnell
e76010d628 IMPALA-10226: Change buildall.sh -notests to invoke a single Make target
This is a small cleanup to add specific targets in CMake for
buildall.sh -notests to invoke. Previously, it ran multiple
targets like:
make target1 target2 target3 ...
In hand tests, make builds each target separately, so it is
unable to overlap the builds of the multiple targets. Pushing
it into CMake simplifies the code and allows the targets to
build simultaneously.

Testing:
 - Ran buildall.sh -notests

Change-Id: Id881d6f481b32ba82501b16bada14b6630ba32d2
Reviewed-on: http://gerrit.cloudera.org:8080/16605
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-19 21:17:55 +00:00
stiga-huang
c7f118a860 IMPALA-10248: Fix test_column_storage_attributes date string errors
After IMPALA-10225 bumps the impyla version to 0.17a1, we should expect
impyla return a datetime.date instead of a string for DATE type data.

Tests:
 - Run test_column_storage_attributes with
   --exploration_strategy=exhaustive to verify the fix.

Change-Id: I618a759a03213efc22a5e54e9a30fa09e8929023
Reviewed-on: http://gerrit.cloudera.org:8080/16608
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-19 10:24:43 +00:00
stiga-huang
faa2d398e6 IMPALA-10233: zorder sort node should output rows in lexical order of partition keys
When inserting to a partitioned hdfs table, the planner will add a sort
node on top of the plan, depending on the clustered/noclustered plan
hint and on the 'sort.columns' table property. If clustering is enabled
in insertStmt or additional columns are specified in the 'sort.columns'
table property, then the ordering columns will start with the clustering
columns, so that partitions can be written sequentially in the table
sink. Any additional non-clustering columns specified by the
'sort.columns' property will be added to the ordering columns and after
any clustering columns.

For Z-order sort type, we should deal with these ordering columns
separately. The clustering columns should still be sorted lexically, and
only the remaining ordering columns be sorted in Z-order. So we can
still insert partitions one by one and avoid hitting the DCHECK as
described in the JIRA.

Tests
 - Add tests for inserting to a partitioned table with zorder.

Change-Id: I30cbad711167b8b63c81837e497b36fd41be9b54
Reviewed-on: http://gerrit.cloudera.org:8080/16590
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-16 06:40:38 +00:00
Zoltan Borok-Nagy
6542b6070d IMPALA-10243: ConcurrentModificationException during parallel INSERTs
Impala might throw a ConcurrentModificationException during a high
load of INSERTs to the same table. The exception happens during thrift
serialization of TUpdateCatalogResponse which have a reference to the
metastore table. The serialization happens without a lock, so another
thread might modify the metastore table object in the meantime.

This can potentially happen in CatalogOpExecutor.updateCatalog() which
updates the catalog version and unsets table column statistics.

For some reason I only saw this error with local catalog.

The problem is that in Table.toThrift() we set a reference to the
metastore table object instead of deep copying it. So my fix is to deep
copy the metastore table, this prevents concurrent modifications.

Testing
* added stress test 'test_insert_stress.py'

Change-Id: Ie656925d764d5eb26c318703ca425529ecf7a3a3
Reviewed-on: http://gerrit.cloudera.org:8080/16602
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-15 21:31:30 +00:00
Zoltan Borok-Nagy
05ad4ce05f IMPALA-10152: part 1: refactor Iceberg catalog handling
This patch refactors the code a bit to make it easier in the future
to add support for new Iceberg catalogs. We plan to add support for
HiveCatalog in the near future.

Iceberg has two main interfaces to manage tables: Tables and Catalog.
I created a new interface in Impala called 'IcebergCatalog' that
abstracts both Tables and Catalog. Currently there are two
implementations for IcebergCatalog:
* HadoopTablesCatalog for HadoopTables
* HadoopCatalog for HadoopCatalog

This patch also delegates dropTable() to the Iceberg catalogs. Until
this patch we let HMS drop the tables and delete the directories. It
worked fine with the filesystem-based catalogs, but might not work well
with other Iceberg catalogs like HiveCatalog.

Change-Id: Ie69dff6cd6b8b3dc0ba5f7671b8504a936032a85
Reviewed-on: http://gerrit.cloudera.org:8080/16575
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-15 21:28:57 +00:00
Joe McDonnell
97792c4bad IMPALA-10198 (part 2): Add support for mvn versions:set
This adds support for setting the version of Java
artifacts through "mvn versions:set". It changes
the modules to inherit the version from the parent
pom.

Previously, we used a mix of 0.1-SNAPSHOT and
1.0-SNAPSHOT. This now uses 4.0.0-SNAPSHOT across the
board. With each release, we can use "mvn versions:set"
to update the versions. The only exception is the
Hive UDF code that we build for testing. This remains
at version 1.0 to avoid test changes.

Testing:
 - Ran core job
 - Added build-all-flag-combinations.sh case that
   does "mvn versions:set" and runs a build

Change-Id: I661b32e1e445169bac2ffe4f9474f14090031743
Reviewed-on: http://gerrit.cloudera.org:8080/16559
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-15 19:30:13 +00:00
Joe McDonnell
97856478ec IMPALA-10198 (part 1): Unify Java in a single java/ directory
This changes all existing Java code to be submodules under
a single root pom. The root pom is impala-parent/pom.xml
with minor changes to add submodules.

This avoids most of the weird CMake/maven interactions,
because there is now a single maven invocation for all
the Java code.

This moves all the Java projects other than fe into
a top level java directory. fe is left where it is
to avoid disruption (but still is compiled via the
java directory's root pom). Various pieces of code
that reference the old locations are updated.

Based on research, there are two options for dealing
with the shaded dependencies. The first is to have an
entirely separate Maven project with a separate Maven
invocation. In this case, the consumers of the shaded
jars will see the reduced set of transitive dependencies.
The second is to have the shaded dependencies as modules
with a single Maven invocation. The consumer would see
all of the original transitive dependencies and need to
exclude them all. See MSHADE-206/MNG-5899. This chooses
the second.

This only moves code around and does not focus on version
numbers or making "mvn versions:set" work.

Testing:
 - Ran a core job
 - Verified existing maven commands from fe/ directory still work
 - Compared the *-classpath.txt files from fe and executor-deps
   and verified they are the same except for paths

Change-Id: I08773f4f9d7cb269b0491080078d6e6f490d8d7a
Reviewed-on: http://gerrit.cloudera.org:8080/16500
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2020-10-15 19:30:13 +00:00
Qifan Chen
398f17f710 IMPALA-9440 Typo in rpcz.tmpl for inbound connection metrics
This fix corrected a typo in rpcz.tmpl.

1. Unit testing.

Change-Id: Id0fcdcd8f81567bad3d9931f952a00ad815265fa
Reviewed-on: http://gerrit.cloudera.org:8080/16576
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-15 05:09:56 +00:00
Joe McDonnell
7b42e9a439 IMPALA-10127: Fix performance for enforcement of lirs_tombstone_multiple
When enforcing lirs_tombstone_multiple for the LIRS cache,
currently it needs to walk backwards through the recency
list to find the oldest tombstone entry to remove it.
This traversal can involve passing a large number of
non-tombstone entries. In pathological cases, it is
O(N) where N is the number of entries in the cache.

This modifies the LIRS implementation to use a combined
unprotected and tombstone list. The first half of the
list is unprotected entries. The second half is tombstone
entries. The tombstone portion of the list allows the
enforcement for the lirs_tombstone_multiple to be O(1)
when finding the oldest tombstone entry.

The unprotected half of the list operates as it did
before, except that the combined list requires maintaining
a pointer to the front of the unprotected list (i.e. the
boundary between the unprotected portion and the tombstone
portion).

Using a combined list means that evicting an unprotected
entry to become a tombstone entry is only updating a
pointer. This is a common operation, so it keeps the
overhead of the tombstone list low.

Performance:
For all existing performance test cases in cache-bench,
lookups/sec are within 2% of the current implementation of LIRS.

This adds a new pathological case to cache-bench where
the cache hit rate is 0.2%. This causes extensive use
of the lirs_tombstone_multiple. This case sees a 2000x
improvement, going from 1.3K lookups/sec to 2.96M lookups/sec.

Testing:
 - lirs-cache-test passes (debug and release)
 - This also adds the DATA_CACHE_EVICTION_POLICY environment
   variable to run-all-tests.sh to allow easy testing with LIRS.
 - Ran a core job with 500MB cache using LIRS

Change-Id: I25b697f57c7daacccf8791a5a7b31878a6f7f1d2
Reviewed-on: http://gerrit.cloudera.org:8080/16597
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-15 04:22:39 +00:00
Qifan Chen
41813b27bc IMPALA-9754 buffer_pool_limit error message is confusing
This fix reworded following two error messages for clarity.

1. "Invalid --buffer_pool_limit value, must be a percentage or positive
    bytes value or percentage:"
2. "Invalid --buffer_pool_clean_pages_limit value, must be a percentage
    or positive bytes value or percentage:"

The fix also enhanced the code to verify that the JVM max heap size is
less than the process memory limit when mem_limit_includes_jvm flag is
set to true, and raise a new error message otherwise:

"Invalid combination of --mem_limit_includes_jvm and JVM max
heap size $0, which must be smaller than process memory limit $1".

Testing:
1. Unit testing;
2. Ran Core tests successfully.

Change-Id: I15ce1cdcc168163b3f5b21e778f9bf6e6b7730d5
Reviewed-on: http://gerrit.cloudera.org:8080/16566
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-15 00:26:12 +00:00
Qifan Chen
63c435cac1 IMPALA-9232 Potential overflow in serializeThriftMsg
This fix added a sanity check to assure the length of the buffer
holding a serialized object does not go over INT_MAX bytes.

Testing:
1. Unit testing;
2. Ran Core tests successfully.

Change-Id: Ie76028acea84dbe0e88518dae60aaf7e7ca55e9e
Reviewed-on: http://gerrit.cloudera.org:8080/16584
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-14 23:25:42 +00:00
Riza Suminto
c7581b5d8a IMPALA-10220: Fix negative value bug in RpcNetworkTime counter.
Total RPC time was incorrectly computed using
resp_.receiver_latency_ns() in function EndDataStreamCompleteCb(). This
patch fix the bug by replacing it with eos_rsp_.receiver_latency_ns().
This patch also fix logging mistakes in LogSlowRpc() to use its 'resp'
parameter instead of 'resp_' field member.

Testing:
- Manually run data loading query that exhibit the bug for several times
  and verify that the Min value of RpcNetworkTime counter is always
  positive after the patch. The query used in testing is insert query to
  TPC-DS fact table store_sales of scale 10GB in single machine mini
  cluster.
- Add DCHECK to verify that total rpc time is greater than or equal to
  receiver_latency_ns.
- Run and pass core tests.

Change-Id: I2a4d65a3e0f88349bd4ee1b01290bd2c386acc69
Reviewed-on: http://gerrit.cloudera.org:8080/16552
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-14 22:20:06 +00:00
skyyws
0c0985a825 IMPALA-10159: Supporting ORC file format for Iceberg table
This patch mainly realizes querying Iceberg table with ORC
file format. We can using following SQL to create table with
ORC file format:
  CREATE TABLE default.iceberg_test (
    level string,
    event_time timestamp,
    message string,
  )
  STORED AS ICEBERG
  LOCATION 'hdfs://xxx'
  TBLPROPERTIES ('iceberg.file_format'='orc', 'iceberg.catalog'='hadoop.tables');
But pay attention, there still some problems when scan ORC files
with Timestamp, more details please refer IMPALA-9967. We may add
new tests with Timestmap type after this JIRA fixed.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py

Change-Id: Ib579461aa57348c9893a6d26a003a0d812346c4d
Reviewed-on: http://gerrit.cloudera.org:8080/16568
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-14 19:19:19 +00:00
Gabor Kaszab
13a78fc1b0 IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables
This patch adds support for Iceberg Bucket and Truncate partition
transforms. Both accept a parameter: number of buckets and width
respectively.

Usage:
CREATE TABLE tbl_name (i int, p1 int, p2 timestamp)
PARTITION BY SPEC (
  p1 BUCKET 10,
  p1 TRUNCATE 5
) STORED AS ICEBERG
TBLPROPERTIES ('iceberg.catalog'='hadoop.tables');

Testing:
  - Extended AnalyzerStmtsTest to cover creating partitioned Iceberg
    tables with the new partition transforms.
  - Extended ParserTest.
  - Extended iceberg-create.test to create Iceberg tables with the new
    partition transforms.
  - Extended show-create-table.test to check that the new partition
    transforms are displayed with their parameters in the SHOW CREATE
    TABLE output.

Change-Id: Idc75cd23045b274885607c45886319f4f6da19de
Reviewed-on: http://gerrit.cloudera.org:8080/16551
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-14 19:07:06 +00:00
Attila Jeges
3d067572dd IMPALA-10224: Add startup flag not to expose debug web url to clients
This patch introduces a new startup flag
--ping_expose_webserver_url (true by default) to control whether
PingImpalaService, PingImpalaHS2Service RPC calls should expose
the debug web url to the client or not.

This is necessary as the debug web UI is not something that
end-users will necessarily have access to.

If the flag is set to false, the RPC calls will return an empty
string instead of the real url signalling that the debug web ui
is not available.

Note that if the webserver is disabled (--enable_webserver flag
is set to false) the RPC calls will behave the same and return an
empty string for the url.

Change-Id: I7ec3e92764d712b8fee63c1f45b038c31c184cfc
Reviewed-on: http://gerrit.cloudera.org:8080/16573
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-14 14:39:39 +00:00
stiga-huang
308d692a1b IMPALA-10113: Add feature flag for incremental metadata updates
This patch adds a feature flag, enable_incremental_metadata_updates, to
turn off incremental metadata (i.e. partition level metadata)
propagation from catalogd to coordinators. It defaults to true. When
setting to false, catalogd will send metadata updates in table
granularity (the legacy behavior).

Also fixes a bug of logging an empty aggregated partition update log
when no partitions are changed in a DDL.

Tests:
 - Run CORE tests with this flag set to true and false.
 - Add tests with enable_incremental_metadata_updates=false.

Change-Id: I98676fc8ca886f3d9f550f9b96fa6d6bff178ebb
Reviewed-on: http://gerrit.cloudera.org:8080/16436
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-14 14:15:53 +00:00
Zoltan Borok-Nagy
0fb234923c IMPALA-10055: Fix DCHECK hit on corrupt ORC file
Our ORC scanner could hit a DCHECK on corrupt ORC files. In
test_scanners_fuzz we randomly modify ORC files, so the this test
might hit a DCHECK occasionally.

I converted the DCHECK to a parse error. This way the fuzz test
won't crash the Impala daemon.

Testing:
Unfortunately I don't have an ORC file on which we hit the DCHECK.
So I manually changed the code to always raise this error and
executed the fuzz test to see if it still succeeds.

Change-Id: I18d9f56c3c37afd1a4898ee36f8cc2ddb5049972
Reviewed-on: http://gerrit.cloudera.org:8080/16591
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-14 13:33:21 +00:00
Joe McDonnell
481ea4ab0d IMPALA-9815: Update URL for cdh-releases-rcs maven repo
We use repository.cloudera.com to get some Cloudera-patched
depdencies required by the CDP Hadoop dependencies (e.g.
log4j, logredactor, etc). The URL for repository.cloudera.com
has changed from repository.cloudera.com/content/* to
repository.cloudera.com/artifactory/*. It is possible that the
old URL will be restored. To get things working, this updates
the cdh-releases-rcs to the new URL.

It turns out that cdh-releases-rcs contains all the
artifacts that we would otherwise get from the third-party
repository, so this replaces third-party with cdh-releases-rcs.

Testing:
 - Ran build-all-options-ub1604

Change-Id: I438305565a1e6b7515408a701e9f9e31f7cfd679
Reviewed-on: http://gerrit.cloudera.org:8080/16594
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-14 05:39:01 +00:00