When a task submitted via HdfsMonitoredOps fails, it raises
TErrorCode::THREAD_POOL_TASK_TIMED_OUT or
TErrorCode::THREAD_POOL_SUBMIT_FAILED errors. The error generated
calls GetDescription() to provide information about the failed
hdfs operation. This change appends the hostname to the description
to enable the user to easily identify the host which has reached a
bad connection state with the NameNode.
Testing:
Modified the test_hdfs_timeout.py to ensure that the hostname is
logged in the error message.
Change-Id: Ief1e21560b6fb54965f2fb2793c32c2ba5176ba2
Reviewed-on: http://gerrit.cloudera.org:8080/12593
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
test_hdfs_timeout.py used subprocess.check_output. This function
does not exist on python 2.6; it was introduced in python 2.7.
This switches test_hdfs_timeout.py to use exec_process() from
tests.util.shell_util.
Change-Id: Ifde02778dfe74c5e7f56c6ae08cd0114bc3d3dca
Reviewed-on: http://gerrit.cloudera.org:8080/12059
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is part 1 of a push to add timeouts for all HDFS operations.
It adds timeouts for opening an HDFS file handle.
It introduces a new SynchronousThreadPool, which executes
an operation in a thread pool and waits up to a specified
timeout for the operation to complete. This type of thread
pool can accept any subclass of SynchronousWorkItem, and
a single thread pool can process different types of work
items. It is tested by a new test case in thread-pool-test.
This also introduces a new HdfsMonitor which implements
timeouts for HDFS operations, currently limited to
hdfsOpenFile(). This is implemented using a SynchronousThreadPool.
The timeout for hdfs operations is specified by
hdfs_operation_timeout_sec, which defaults to 5 minutes.
Testing:
1. Added a test to thread-pool-test for the new
SynchronousThreadPool.
2. Core tests
3. Added a custom cluster test that does "kill -STOP"
for the NameNode and verifies that a subsequent
hdfsOpenFile operation times out.
Change-Id: Ia14403ca5f3f19c6d5f61b9ab2306b0ad3267454
Reviewed-on: http://gerrit.cloudera.org:8080/11874
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>