mirror of
https://github.com/apache/impala.git
synced 2025-12-19 18:12:08 -05:00
IMPALA-9224: Blacklist nodes with faulty disk for spilling
This patch extends blacklist functionality by adding executor node to blacklist if a query fails caused by disk failure during spill-to-disk. Also classifies disk error codes and defines a blacklistable error set for non-transient disk errors. Coordinator blacklists executor only if the executor hitted blacklistable error during spill-to-disk. Adds a new debug action to simulate disk write error during spill-to- disk. To use, specify in query options as: 'debug_action': 'IMPALA_TMP_FILE_WRITE:<hostname>:<port>:<action>' where <hostname> and <port> represent the impalad which execute the fragment instances, <port> is the BE krpc port (default 27000). Adds new test cases for blacklist and query-retry to cover the code changes. Testing: - Passed new test cases. - Passed exhaustive test. - Manually simulated disk failures in scratch directories on nodes of a cluster, verified that the nodes were blacklisted as expected. Change-Id: I04bfcb7f2e0b1ef24a5b4350f270feecd8c47437 Reviewed-on: http://gerrit.cloudera.org:8080/16949 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
committed by
Impala Public Jenkins
parent
91fd8fd130
commit
b5e2a0ce2e
@@ -468,6 +468,9 @@ error_codes = (
|
||||
"Query $0 terminated due to join rows produced exceeds the limit of $1 "
|
||||
"at node with id $2. Unset or increase JOIN_ROWS_PRODUCED_LIMIT query option "
|
||||
"to produce more rows."),
|
||||
|
||||
("LOCAL_DISK_FAULTY", 152,
|
||||
"Query execution failure caused by local disk IO fatal error on backend: $0."),
|
||||
)
|
||||
|
||||
import sys
|
||||
|
||||
Reference in New Issue
Block a user