This is the final patch to move all Impala e2e and custom cluster tests
to use HS2 protocol by default. Only beeswax-specific test remains
testing against beeswax protocol by default. We can remove them once
Impala officially remove beeswax support.
HS2 error message formatting in impala-hs2-server.cc is adjusted a bit
to match with formatting in impala-beeswax-server.cc.
Move TestWebPageAndCloseSession from webserver/test_web_pages.py to
custom_cluster/test_web_pages.py to disable glog log buffering.
Testing:
- Pass exhaustive tests, except for some known and unrelated flaky
tests.
Change-Id: I42e9ceccbba1e6853f37e68f106265d163ccae28
Reviewed-on: http://gerrit.cloudera.org:8080/22845
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Jason Fehr <jfehr@cloudera.com>
cancel_query_and_validate_state is a helper method used to test query
cancellation with concurrent fetch. It is still use beeswax client by
default.
This patch change the test method to use HS2 protocol by default. The
changes include following:
1. Set TGetOperationStatusResp.operationState to
TOperationState::ERROR_STATE if returning abnormally.
2. Use separate MinimalHS2Client for
(execute_async, fetch, get_runtime_profile) vs cancel vs close.
Cancellation through KILL QUERY still instantiate new
ImpylaHS2Connection client.
3. Implement required missing methods in MinimalHS2Client.
4. Change MinimalHS2Client logging pattern to match with other clients.
Testing:
Pass test_cancellation.py and TestResultSpoolingCancellation in core
exploration mode. Also fix default_test_protocol to HS2 for these tests.
Change-Id: I626a1a06eb3d5dc9737c7d4289720e1f52d2a984
Reviewed-on: http://gerrit.cloudera.org:8080/22853
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
This change adds get_workload() to ImpalaTestSuite and removes it
from all test suites that already returned 'functional-query'.
get_workload() is also removed from CustomClusterTestSuite which
used to return 'tpch'.
All other changes besides impala_test_suite.py and
custom_cluster_test_suite.py are just mass removals of
get_workload() functions.
The behavior is only changed in custom cluster tests that didn't
override get_workload(). By returning 'functional-query' instead
of 'tpch', exploration_strategy() will no longer return 'core' in
'exhaustive' test runs. See IMPALA-3947 on why workload affected
exploration_strategy. An example for affected test is
TestCatalogHMSFailures which was skipped both in core and exhaustive
runs before this change.
get_workload() functions that return a different workload than
'functional-query' are not changed - it is possible that some of
these also don't handle exploration_strategy() as expected, but
individually checking these tests is out of scope in this patch.
Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115
Reviewed-on: http://gerrit.cloudera.org:8080/22726
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
With IMPALA-13682 merged, checking for query state can be done via
wait_for_impala_state(), wait_for_any_impala_state() and other helper
methods of ImpalaConnection. This patch remove all reference to
protocol-specific states such as BeeswaxService.QueryState.
Also fix flake8 errors and unused variable in modified test files.
Testing:
- Run and pass all affected tests.
Change-Id: Id6b56024fbfcea1ff005c34cd146d16e67cb6fa1
Reviewed-on: http://gerrit.cloudera.org:8080/22586
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
1. Python 3 requires absolute imports within packages. This
can be emulated via "from __future__ import absolute_import"
2. Python 3 changed division to "true" division that doesn't
round to an integer. This can be emulated via
"from __future__ import division"
This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.
I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.
Testing:
- Ran core tests
Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Two bugs happening on result spooling when we set two of its query
options to 0 (unbounded).
The first bug happens if max_spilled_result_spooling_mem =
0 (unbounded). max_unpinned_bytes_ in SpillableRowBatchQueue will be set
to 0, SpillableRowBatchQueue::IsFull() then will always return true, and
the query hang. This patch fix it by setting max_unpinned_bytes_ to
INT64_MAX if max_spilled_result_spooling_mem = 0.
The second bug happens if we set max_result_spooling_mem =
0 (unbounded). PlanRootSink.java will peg maxMemReservationBytes to
always equal to minMemReservationBytes. This patch fixes this by
reverting to the default max_result_spooling_mem (100MB).
Testing:
- Add test_unbounded_result_spooling_mem.
- Pass core tests.
Change-Id: If8f5e3668281bba8813f8082f45b4faa7721530e
Reviewed-on: http://gerrit.cloudera.org:8080/17187
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
TestResultSpoolingFetchSize.test_fetch has been flaky in
ubuntu-16.04-dockerised environment for not reaching finished state
within 10 seconds. This patch increase the timeout of the test to 30
seconds.
Testing:
- Looped the test locally.
Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5
Reviewed-on: http://gerrit.cloudera.org:8080/16895
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
PlanRootSink can fail silently if result spooling is enabled and
maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. This happens
because results are spilled using a SpillableRowBatchQueue which needs 2
buffer (read and write) with at least MAX_ROW_SIZE bytes per buffer.
This patch fixes this by setting a lower bound of 2 * MAX_ROW_SIZE while
computing the min reservation for the PlanRootSink.
Testing:
- Pass exhaustive tests.
- Add e2e TestResultSpoolingMaxReservation.
- Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to
extremely low value. Also verify that PLAN_ROOT_SINK's ReservationLimit
remain unchanged after lowering the MAX_ROW_SIZE.
Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Reviewed-on: http://gerrit.cloudera.org:8080/16765
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Disable the tests TestResultSpooling::test_full_queue(_large_fetch)
until we figure out why they are flaky.
Replace the sleep in TestAdmissionController::test_release_backend with
assert_eventually to reduce flakiness.
Change-Id: I7ea6bf3d84f174745c8a0b1e0f2b55ce05ee618b
Reviewed-on: http://gerrit.cloudera.org:8080/14337
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
TestResultSpooling::_test_full_queue was flaky because there was a race
condition in the test where the result spooling queue would not fill up
quickly enough. The original way around this was to sleep for a fixed
amount of time in hope that the queue would fill up by the time the
thread woke up. The new approach periodically searches the runtime
profile for specific patterns that indicate the queue is full.
TestFetchAndSpooling.test_rows_sent_counters was flaky because the
RowsSentRate can be 0 if the results are spooled fast enough (because
the time spent spooling results is 0). The fix is to use the DEBUG_ACTION
BPRS_BEFORE_ADD_BATCH to introduce a delay when spooling results, so that
the RowsSentRate is guaranteed to be non-zero.
TestFetch.test_rows_sent_counters was flaky because ClientFetchWaitTimer
can be 0 if the Coordinator does not end up waiting any time for results
to be fetched. The fix is to wait until the query has 'FINISHED'
(results are available to fetch) and then sleep so that the
ClientFetchWaitTimer is a non-zero value.
Cleaned up a few other tests as well.
Testing:
* Looped both tests for a few hours without failure
Change-Id: I3042f592bc79785e43ebc7b09ac1270eae8ed66f
Reviewed-on: http://gerrit.cloudera.org:8080/14275
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds several "failpoint" tests to test_result_spooling.py. These tests
use debug_actions spread throughout buffered-plan-root-sink.cc to
trigger failures while result spooling is running. The tests validate
that all queries gracefully fail and do not cause any impalad crashes.
Fixed a few bugs that came up when adding these tests, as well as the
crash reported in IMPALA-8924 (which is now covered by the failpoint
tests added in this patch).
The first bug fixed was a DCHECK in SpillableRowBatchQueue::IsEmpty()
where the method was being called after the queue had been closed. The
fix is to only call IsEmpty() if IsOpen() returns true.
The second bug was an issue in the cancellation path where
BufferedPlanRootSink::GetNext would enter an infinite loop if the query
was cancelled and then GetNext was called. The fix is to check the
cancellation state in the outer while loop.
Testing:
* Added new tests to test_result_spooling.py
* Ran core tests
Change-Id: Ib96f797bc8a5ba8baf9fb28abd1f645345bbe932
Reviewed-on: http://gerrit.cloudera.org:8080/14214
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
De-flake test_result_spooling.py::TestResultsSpooling::test_slow_query
by increasing the delay in RowBatch production. This patch makes two
fixes to ensure that RowBatchGetWaitTime is a non-zero value:
* Add the DELAY DEBUG_ACTION to the SCAN_NODE rather than the
EXCHANGE_NODE. Since the EXCHANGE_NODE only processes a few rows, adding
the delay to the SCAN_NODE decreases the rate at which results are
produced.
* Wait for all rows to be fetched before checking if RowBatchGetWaitTime
is in the profile. This fixes a possible race condition where the fetch
thread was not able to issue the fetch RPC before the test checked if
RowBatchGetWaitTime was in the runtime profile.
Testing:
* Looped test_slow_query for several hours with 0 failures
Change-Id: Idcb4a6b38f85a9497f80e2674e1c1fa512be5940
Reviewed-on: http://gerrit.cloudera.org:8080/14170
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds support for non-default fetch sizes when result spooling is enabled
(the default is to return BATCH_SIZE rows for each fetch request). When
result spooling is disabled, Impala can only return up to BATCH_SIZE
rows because it only buffers a single RowBatch at a time. When result
spooling is enabled, each fetch request returns exactly the number of
rows requested assuming there are that many rows left in the result set.
There is also an upper limit on the fetch size to prevent the resulting
QueryResultSet from getting too big.
Unlike the behavior when result spooling is disabled, fetches do not
break on RowBatch boundaries. For example, when result spooling is
disabled, if the fetch size is 10 and the batch size is 15, the second
fetch will return 5 rows. However, when result spooling is enabled the
second fetch will return 10 rows (assuming there is another RowBatch to
read).
Testing:
* Ran core tests
* Added new tests to test_result_spooling.py
* Added new tests to buffered-tuple-stream-test to validate writing to a
BufferedTupleStream before releasing row batches with 'attach_on_read'
set to true.
Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Reviewed-on: http://gerrit.cloudera.org:8080/14129
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
De-flake test_result_spooling.py::TestResultSpooling::test_spilling.
Bump up the timeout that controls how long the test waits for result
spooling to start spilling to disk (detected by the presence of
PeakUnpinnedBytes in the PLAN_ROOT_SINK section of the runtime profile).
Testing:
* Looped TestResultSpooling::test_spilling on an ASAN build for an hour
without any failure.
Change-Id: Iabac8d7273735079dca48a1e0ecd4f341ea690a0
Reviewed-on: http://gerrit.cloudera.org:8080/14155
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Replaces DequeRowBatchQueue with SpillableRowBatchQueue in
BufferedPlanRootSink. A few changes to BufferedPlanRootSink were
necessary for it to work with the spillable queue, however, all the
synchronization logic is the same.
SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and
a ReservationManager. It takes in a TBackendResourceProfile that
specifies the max / min memory reservation the BufferedTupleStream can
use to buffer rows. The 'max_unpinned_bytes' parameter limits the max
number of bytes that can be unpinned in the BufferedTupleStream. The
limit is a 'soft' limit because calls to AddBatch may push the amount of
unpinned memory over the limit. The queue is non-blocking and not thread
safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill
if the BufferedTupleStream does not have enough reservation to fit the
entire RowBatch.
Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and
'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned
and unpinned memory that a query can use for spooling, respectively.
MAX_PINNED_RESULT_SPOOLING_MEMORY must be <=
MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned
data in the BufferedTupleStream to be unpinned. This is enforced in a
new method in QueryOptions called 'ValidateQueryOptions'.
Planner Changes:
PlanRootSink.java now computes a full ResourceProfile if result spooling
is enabled. The min mem reservation is bounded by the size of the read and
write pages used by the BufferedTupleStream. The max mem reservation is
bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is
computed by estimating the size of the result set using stats.
BufferedTupleStream Re-Factoring:
For the most part, using a BufferedTupleStream outside an ExecNode works
properly. However, some changes were necessary:
* The message for the MAX_ROW_SIZE error is ExecNode specific. In order to
fix this, this patch introduces the concept of an ExecNode 'label' which
is a more generic version of an ExecNode 'id'.
* The definition of TBackendResourceProfile lived in PlanNodes.thrift,
it was moved to its own file so it can be used by DataSinks.thrift.
* Modified BufferedTupleStream so it internally tracks how many bytes
are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY').
Metrics:
* Added a few of the metrics mentioned in IMPALA-8825 to
BufferedPlanRootSink. Specifically, added timers to track how much time
is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext'
methods.
* The BufferedTupleStream in the SpillableRowBatchQueue exposes several
BufferPool metrics such as number of reserved and unpinned bytes.
Bug Fixes:
* Fixed a bug in BufferedPlanRootSink where the MemPool used by the
expression evaluators was not being cleared incrementally.
* Fixed a bug where the inactive timer was not being properly updated in
BufferedPlanRootSink.
* Fixed a bug where RowBatch memory was not freed if
BufferedPlanRootSink::GetNext terminated early because it could not
handle requests where num_results < BATCH_SIZE.
Testing:
* Added new tests to test_result_spooling.py.
* Updated errors thrown in spilling-large-rows.test.
* Ran exhaustive tests.
Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Reviewed-on: http://gerrit.cloudera.org:8080/14039
Reviewed-by: Sahil Takiar <stakiar@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Prefix the query in TestResultSpooling::test_multi_batches with the
database name. This was causing the Dockerized tests to fail. I double
checked what other tests do and all the ones I saw either switch to the
appropriate database or prefix the table name using the database name.
The latter seemed more straightforward.
I was not able to re-produce this locally, and its odd that this only
affected the Dockerized tests (even more odd is that it seems to either
be intermittent, or only affecting Dockerized tests triggered by
gerrit-verify-dryrun-external). Regardless, it is a straightforward fix
that makes the TestResultSpooling::test_multi_batches consistent with
the rest of the tests.
Testing:
* Ran test_result_spooling.py locally using both bin/impala-py.test
and tests/run-tests.py.
Change-Id: I939eedba37003f5c720cea96e5c3532e2cc6312c
Reviewed-on: http://gerrit.cloudera.org:8080/14022
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds additional tests to test_result_spooling.py to cover various edge
cases when fetching query results (ensure all Impala types are returned
properly, UDFs are evaluated correctly, etc.). A new QueryTest file
result-spooling.test is added to encapsulate all these tests. Tests with
a decreased ROW_BATCH_SIZE are added as well to validate that
BufferedPlanRootSink buffers row batches correctly.
BufferedPlanRootSink requires careful synchronization of the producer
and consumer threads, especially when queries are cancelled. The
TestResultSpoolingCancellation class is dedicated to running
cancellation tests with SPOOL_QUERY_RESULTS = true. The implementation
is heavily borrowed from test_cancellation.py and some of the logic is
re-factored into a new utility class called cancel_utils.py to avoid
code duplication between test_cancellation.py and
test_result_spooling.py.
Testing:
* Looped test_result_spooling.py overnight with no failures
* Core tests passed
Change-Id: Ib3b3a1539c4a5fa9b43c8ca315cea16c9701e283
Reviewed-on: http://gerrit.cloudera.org:8080/13907
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Improves the encapsulation of RowBatchQueue by the doing the following
re-factoring:
* Renames RowBatchQueue to BlockingRowBatchQueue which is more
indicitive of what the queue does
* Re-factors the timers managed by the scan-node into the
BlockingRowBatchQueue implementation
* Favors composition over inheritance by re-factoring
BlockingRowBatchQueue to own a BlockingQueue rather than extending one
The re-factoring lays the groundwork for introducing a generic
RowBatchQueue that all RowBatch queues inherit from.
Adds a new DequeRowBatchQueue which is a simple wrapper around a
std::deque that (1) stores unique_ptr to queued RowBatch-es and (2)
has a maximum capacity.
Implements BufferedPlanRootSink using the new DequeRowBatchQueue.
DequeRowBatchQueue is generic enough that replacing it with a
SpillableQueue (queue backed by a BufferedTupleStream) should be
straightforward. BufferedPlanRootSink is synchronized to protect access
to DequeRowBatchQueue since the queue is not thread safe.
BufferedPlanRootSink FlushFinal blocks until the consumer thread has
processed all RowBatches. This ensures that the coordinator fragment
stays alive until all results are fetched, but allows all other
fragments to be shutdown immediately.
Testing:
* Running core tests
* Updated tests/query_test/test_result_spooling.py
Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Reviewed-on: http://gerrit.cloudera.org:8080/13883
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Refactors PlanRootSink into a base class with two subclasses:
BlockingPlanRootSink and BufferedPlanRootSink. BlockingPlanRootSink
encapsulates the current implementation of PlanRootSink and
BufferedPlanRootSink encapsulates a new implementation that will buffer
RowBatches in memory until they are read by the client. The
implementation of BlockingPlanRootSink is left to future work.
A new query option called POOL_QUERY_RESULTS controls whether a
BlockingPlanRootSink or a BufferedPlanRootSink is used as the DataSink.
POOL_QUERY_RESULTS is false by default.
Added a few more docs to PlanRootSink and BlockingPlanRootSink to make
the implementation easier to understand.
Testing:
* Added tests/query_test/test_result_spooling.py; currently only runs a
simple select limit 10 with SPOOL_QUERY_RESULTS = true and validates
that 0 rows are returned
* Ran core tests
Change-Id: I8786b1a9af68ab0a8a094970d8f955eb20d04bca
Reviewed-on: http://gerrit.cloudera.org:8080/13873
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>