Files
impala/common/thrift/generate_error_codes.py
wzhou-code b5e2a0ce2e IMPALA-9224: Blacklist nodes with faulty disk for spilling
This patch extends blacklist functionality by adding executor node to
blacklist if a query fails caused by disk failure during spill-to-disk.
Also classifies disk error codes and defines a blacklistable error set
for non-transient disk errors. Coordinator blacklists executor only if
the executor hitted blacklistable error during spill-to-disk.

Adds a new debug action to simulate disk write error during spill-to-
disk. To use, specify in query options as:
  'debug_action': 'IMPALA_TMP_FILE_WRITE:<hostname>:<port>:<action>'

  where <hostname> and <port> represent the impalad which execute the
  fragment instances, <port> is the BE krpc port (default 27000).

Adds new test cases for blacklist and query-retry to cover the code
changes.

Testing:
 - Passed new test cases.
 - Passed exhaustive test.
 - Manually simulated disk failures in scratch directories on nodes
   of a cluster, verified that the nodes were blacklisted as
   expected.

Change-Id: I04bfcb7f2e0b1ef24a5b4350f270feecd8c47437
Reviewed-on: http://gerrit.cloudera.org:8080/16949
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-02-04 05:12:42 +00:00

21 KiB
Executable File