impala

mirror of https://github.com/apache/impala.git synced 2025-12-22 19:35:22 -05:00

Author	SHA1	Message	Date
Tim Armstrong	cf224f8461	IMPALA-9128: part 2: dump traces for slow RPCs This adds trace events for data stream RPCs and dumps them when they take longer than --impala_slow_rpc_threshold_ms. I needed to modify the KRPC code to do this because it currently only dumps traces for RPCs with deadlines. I plan to add some version of this upstream in Kudu so that we don't diverge our KRPC implementation. Example output from test_exchange_small_buffer: I1111 08:38:53.732910 26509 rpcz_store.cc:265] Call impala.DataStreamService.TransmitData from 127.0.0.1:42434 (request call id 43) took 7799ms. Request Metrics: {} I1111 08:38:53.732928 26509 rpcz_store.cc:269] Trace: 1111 08:38:45.933412 (+ 0us) impala-service-pool.cc:167] Inserting onto call queue 1111 08:38:45.933449 (+ 37us) impala-service-pool.cc:254] Handling call 1111 08:38:45.933470 (+ 21us) krpc-data-stream-mgr.cc:227] Added early sender 1111 08:38:47.906542 (+1973072us) krpc-data-stream-recvr.cc:327] Enqueuing deferred RPC 1111 08:38:53.732858 (+5826316us) krpc-data-stream-recvr.cc:506] Processing deferred RPC 1111 08:38:53.732860 (+ 2us) krpc-data-stream-recvr.cc:399] Deserializing batch 1111 08:38:53.732888 (+ 28us) krpc-data-stream-recvr.cc:426] Enqueuing deserialized batch 1111 08:38:53.732895 (+ 7us) inbound_call.cc:162] Queueing success response Disabled +-clang-diagnostic-gnu-zero-variadic-macro-arguments because it had false positives on the TRACE_TO invocations. Testing: * Ran exhaustive and ASAN tests * Ran stress test Change-Id: Ic7af4b45c43ec731d742d3696112c5f800849947 Reviewed-on: http://gerrit.cloudera.org:8080/14668 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-11-14 20:24:58 +00:00
Tim Armstrong	4fb8e8e324	IMPALA-8816: reduce custom cluster test runtime in core This includes some optimisations and a bulk move of tests to exhaustive. Move a bunch of custom cluster tests to exhaustive. I selected these partially based on runtime (i.e. I looked most carefully at the tests that ran for over a minute) and the likelihood of them catching a precommit bug. Regression tests for specific edge cases and tests for parts of the code that are very stable were prime candidates. Remove an unnecessary cluster restart in test_breakpad. Merge test_scheduler_error into test_failpoints to avoid an unnecessary cluster restart. Speed up cluster starts by ensuring that the default statestore args are applied even when _start_impala_cluster() is called directly. This shaves a couple of seconds off each restart. We made the default args use a faster update frequency - see IMPALA-7185 - but they did not take effect in all tests. Change-Id: Ib2e3e7ebc9695baec4d69183387259958df10f62 Reviewed-on: http://gerrit.cloudera.org:8080/13967 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-08-06 21:34:26 +00:00
Michael Ho	ca89014eaa	IMPALA-8251: Run test_exchange_deferred_batches in dev builds only The startup flag stress_datastream_recvr_delay_ms is not available in development builds. Skip the test in non-developement builds. Testing done: Ran the test with release build and verified that it's skipped. Change-Id: I5caaa6fa39d6c97f313b675838c27740af9aa1d5 Reviewed-on: http://gerrit.cloudera.org:8080/12610 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-02-27 04:01:26 +00:00
Michael Ho	1718f9c07b	IMPALA-8239: Fix handling of failed deserialization of row batch Previously, when a row batch failed to be deserialized in the data stream receiver, we will return the error status to the sender of the row batch without inserting the row batch. The receiver will continue to operate without flagging any error. The assumption is that the sender will eventually cancel the query upon receiving the failed status. Normally, when a caller of GetBatch() successfully dequeues a row batch from the batch queue, it will kick off the draining of the row batches from the deferred queue into the normal batch queue, which will further continue the cycle of draining the deferred queue upon the next call to GetBatch() until the deferred queue becomes empty. When an error is hit when deserializing a deferred batch to be inserted into the batch queue, the existing code will simply not insert the row batch or flag any error. This breaks the cycle of the deferred queue draining as the batch queue may become empty forever. The caller of GetBatch() will block indefinitely until the query is cancelled. The existing code works fine as the expectation is that the query will be cancelled once the sender receives the error status from the RPC response. However, this behavior is still not ideal as it lets a query which has hit a fatal error to hold on to resources for extended period of time. This patch fixes the problem by explicitly recording any error during row batch insertion in an error status object. Callers of GetBatch() will now also poll for this status object while waiting for row batch to show up and bail out early if there is any error. A new test case has been added to simulate the problematic case above. Change-Id: Iaa74937b046d95484887533be548249e96078755 Reviewed-on: http://gerrit.cloudera.org:8080/12567 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Thomas Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-02-26 03:38:09 +00:00

4 Commits