Commit Graph

29 Commits

Author SHA1 Message Date
Joe McDonnell
8d5adfd0ba IMPALA-13123: Add option to run tests with Python 3
This introduces the IMPALA_USE_PYTHON3_TESTS environment variable
to select whether to run tests using the toolchain Python 3.
This is an experimental option, so it defaults to false,
continuing to run tests with Python 2.

This fixes a first batch of Python 2 vs 3 issues:
 - Deciding whether to open a file in bytes mode or text mode
 - Adapting to APIs that operate on bytes in Python 3 (e.g. codecs)
 - Eliminating 'basestring' and 'unicode' locations in tests/ by using
   the recommendations from future
   ( https://python-future.org/compatible_idioms.html#basestring and
     https://python-future.org/compatible_idioms.html#unicode )
 - Uses impala-python3 for bin/start-impala-cluster.py

All fixes leave the Python 2 path working normally.

Testing:
 - Ran an exhaustive run with Python 2 to verify nothing broke
 - Verified that the new environment variable works and that
   it uses Python 3 from the toolchain when specified

Change-Id: I177d9b8eae9b99ba536ca5c598b07208c3887f8c
Reviewed-on: http://gerrit.cloudera.org:8080/21474
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2024-12-17 07:28:51 +00:00
Xuebin Su
ad868b9947 IMPALA-13115: Add query id to error messages
This patch adds the query id to the error messages in both

- the result of the `get_log()` RPC, and
- the error message in an RPC response

before they are returned to the client, so that the users can easily
figure out the errored queries on the client side.

To achieve this, the query id of the thread debug info is set in the
RPC handler method, and is retrieved from the thread debug info each
time the error reporting function or `get_log()` gets called.

Due to the change of the error message format, some checks in the
impala-shell.py are adapted to keep them valid.

Testing:
- Added helper function `error_msg_expected()` to check whether an
  error message is expected. It is stricter than only using the `in`
  operator.
- Added helper function `error_msg_equal()` to check if two error
  messages are equal regardless of the query ids.
- Various test cases are adapted to match the new error message format.
- `ImpalaBeeswaxException`, which is used in tests only, is simplified
  so that it has the same error message format as the exceptions for
  HS2.
- Added an assertion to the case of killing and restarting a worker
  in the custom cluster test to ensure that the query id is in
  the error message in the client log retrieved with `get_log()`.

Change-Id: I67e659681e36162cad1d9684189106f8eedbf092
Reviewed-on: http://gerrit.cloudera.org:8080/21587
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-08-08 14:11:04 +00:00
Joe McDonnell
eb66d00f9f IMPALA-11974: Fix lazy list operators for Python 3 compatibility
Python 3 changes list operators such as range, map, and filter
to be lazy. Some code that expects the list operators to happen
immediately will fail. e.g.

Python 2:
range(0,5) == [0,1,2,3,4]
True

Python 3:
range(0,5) == [0,1,2,3,4]
False

The fix is to wrap locations with list(). i.e.

Python 3:
list(range(0,5)) == [0,1,2,3,4]
True

Since the base operators are now lazy, Python 3 also removes the
old lazy versions (e.g. xrange, ifilter, izip, etc). This uses
future's builtins package to convert the code to the Python 3
behavior (i.e. xrange -> future's builtins.range).

Most of the changes were done via these futurize fixes:
 - libfuturize.fixes.fix_xrange_with_import
 - lib2to3.fixes.fix_map
 - lib2to3.fixes.fix_filter

This eliminates the pylint warnings:
 - xrange-builtin
 - range-builtin-not-iterating
 - map-builtin-not-iterating
 - zip-builtin-not-iterating
 - filter-builtin-not-iterating
 - reduce-builtin
 - deprecated-itertools-function

Testing:
 - Ran core job

Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f
Reviewed-on: http://gerrit.cloudera.org:8080/19589
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
82bd087fb1 IMPALA-11973: Add absolute_import, division to all eligible Python files
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
 1. Python 3 requires absolute imports within packages. This
    can be emulated via "from __future__ import absolute_import"
 2. Python 3 changed division to "true" division that doesn't
    round to an integer. This can be emulated via
    "from __future__ import division"

This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.

I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.

Testing:
 - Ran core tests

Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Thomas Tauber-Marshall
ca884476bb IMPALA-9930 (part 2): Introduce new admission control rpc service
This patch introduces a new krpc service, AdmissionControlService,
which coordinators can use to submit queries for admission.

This patch adds some simple configuration flags that make it possible
to have coordinators use this service to submit their queries for
admission to other coordinators. These flags are only to make this
patch testable and will be replaced when the separate admission
control daemon is introduced in IMPALA-9975.

The interface consists of the following RPCs:
- AdmitQuery: takes a TQueryExecRequest and a TQueryOptions
  (serialized into sidecars), places the request on a queue to be
  processed by a thread pool and then immediately returns.
- GetQueryStatus: takes a query id and returns the current admission
  status, including the QuerySchedulePB if admission has completed
  successfully but the query has not been released yet.
- ReleaseQueryBackends: called when individual backends complete but
  the overall query is still running to release resources
  incrementally. This RPC will be called at most O(log(# backends))
  per query due to BackendResourceState, which batches backends to
  release together.
- ReleaseQuery: called when the query has completely finished.
  Releases all remaining resources.
- CancelAdmission: called if a query is cancelled before an admission
  decision has been made to indicate that it should no longer be
  considered for admission.

The majority of the patch consists of two classes:
- AdmissionControlClient: used to abstract whether admission is being
  performed locally or remotely. In the local case, it is basically
  just a wrapper around AdmissionController. In the remote case, it
  handles serializing/deserializing of RPC params, polling
  GetQueryStatus() until a decision has been made, etc.
- AdmissionControlService: exports the RPC interface and acts as a
  wrapper around AdmissionController.

Some notable changes involved:
- AdmissionController::SubmitForAdmission() no longer blocks while a
  query is queued. Instead, a new function WaitOnQueued() can be used
  to monitor the admission status of a queued query.
- Adding events to the query timeline is moved out of
  AdmissionController and into the AdmissionControlClient classes, so
  that it always happens on the coordinator.
- When a cluster is run in the new admission control service mode,
  only the impalad that is performing admission control exposes the
  /admission http endpoint. Observability will be cleaned up in a
  subsequent patch.

Testing:
- Modified existing admission control tests to run both with and
  without the admission control service enabled, including both the
  functional and stress tests. The 'num_queries' param in the stress
  test is modified to only use a single value to reduce the number of
  tests that are run and keep the running time reasonable.
- Ran tpch10 on a local minicluster and observed no significant
  regressions.

Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Reviewed-on: http://gerrit.cloudera.org:8080/16412
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-12-03 23:46:29 +00:00
Tim Armstrong
748e41ab41 IMPALA-9380: async query unregistration
This change improves query latency by doing much
of the heavyweight work of unregistering a query
asynchronously, instead of synchronously on the
RPC thread. The biggest win is to move the
profile serialization off the RPC thread.

Unregistration processing is done by a thread pool
with 4 threads by default. This is configurable by
--unregistration_thread_pool_size and
--unregistration_thread_pool_queue_depth.

This fixes a pre-existing bug where a
query was temporarily neither in the in-flight
queries nor the completed queries. It would be
much easier to hit this with async unregistration
because there is less synchronisation on the client
side. Now the query is briefly in both maps, but
this is handled as follows:
* All places that look up both the maps will check
  the in-flight map first, and return a reference to
  the ClientRequestState, i.e. ignoring the entry in
  the query log.
* The /queries page does not return completed queries
  if they were found in the in-flight queries map, so
  avoids duplicate results.

The thread safety story changes slightly.
Before this change, only one thread could
remove the query from the map and close it,
with only one thread "winning" the race to remove
the ClientRequestState from the map. Since we leave
the query in the map while being finalized, we
instead use an atomic in ClientRequestState to ensure
that only one thread does the finalization.

Some misc cleanup was done as a result of these changes:
* Fix a pre-existing TSAN race in RuntimeProfile that
  was revealed by the new concurrent unregister test.
* Consolidate the various unknown query handle errors into
  an error code so that we consistently return the same
  string.
* "Unregister query" should include flushing audit events.

Testing:
* Add a test that unregisters a query concurrent with other
  operations.
* Ran exhaustive tests

Perf:
Ran TPC-H 30 with mt_dop=4. No regressions and some improvements:
+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 5.38    | -2.67%     | 4.02       | -2.01%         |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval   |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+
| TPCH(30) | TPCH-Q1  | parquet / none / none | 5.36   | 5.17        |   +3.61%   |   1.82%   |   1.17%        | 5     |   +3.73%       | 1.73    | 3.65   |
| TPCH(30) | TPCH-Q6  | parquet / none / none | 1.77   | 1.74        |   +1.48%   |   2.00%   |   2.50%        | 5     |   +2.89%       | 0.87    | 1.03   |
| TPCH(30) | TPCH-Q12 | parquet / none / none | 3.02   | 3.00        |   +0.79%   |   2.18%   |   2.21%        | 5     |   +1.55%       | 0.00    | 0.57   |
| TPCH(30) | TPCH-Q16 | parquet / none / none | 1.65   | 1.64        |   +0.81%   |   1.35%   |   0.03%        | 5     |   +0.07%       | 1.15    | 1.34   |
| TPCH(30) | TPCH-Q2  | parquet / none / none | 1.21   | 1.21        |   -0.07%   |   2.11%   |   2.14%        | 5     |   -0.04%       | -0.29   | -0.05  |
| TPCH(30) | TPCH-Q4  | parquet / none / none | 2.50   | 2.52        |   -0.49%   |   2.43%   |   3.34%        | 5     |   -0.09%       | -0.29   | -0.27  |
| TPCH(30) | TPCH-Q20 | parquet / none / none | 2.86   | 2.90        |   -1.28%   |   2.30%   |   1.24%        | 5     |   -0.02%       | -0.58   | -1.11  |
| TPCH(30) | TPCH-Q3  | parquet / none / none | 4.35   | 4.40        |   -1.15%   |   1.76%   |   1.78%        | 5     |   -1.12%       | -0.87   | -1.03  |
| TPCH(30) | TPCH-Q19 | parquet / none / none | 4.10   | 4.17        |   -1.80%   |   1.05%   |   1.31%        | 5     |   -1.25%       | -1.73   | -2.40  |
| TPCH(30) | TPCH-Q14 | parquet / none / none | 3.20   | 3.25        |   -1.52%   |   0.79%   |   2.56%        | 5     |   -1.56%       | -0.58   | -1.26  |
| TPCH(30) | TPCH-Q18 | parquet / none / none | 10.81  | 11.07       |   -2.34%   |   5.00%   |   7.01%        | 5     |   -1.40%       | -0.58   | -0.61  |
| TPCH(30) | TPCH-Q7  | parquet / none / none | 11.19  | 11.56       |   -3.18%   |   3.47%   |   6.02%        | 5     |   -0.90%       | -0.87   | -1.03  |
| TPCH(30) | TPCH-Q21 | parquet / none / none | 19.91  | 20.32       |   -2.02%   |   0.66%   |   0.47%        | 5     |   -2.18%       | -2.31   | -5.64  |
| TPCH(30) | TPCH-Q17 | parquet / none / none | 5.63   | 5.77        |   -2.40%   |   1.71%   |   2.01%        | 5     |   -1.84%       | -1.73   | -2.05  |
| TPCH(30) | TPCH-Q5  | parquet / none / none | 3.91   | 4.03        |   -2.74%   |   1.08%   |   1.86%        | 5     |   -2.45%       | -1.44   | -2.88  |
| TPCH(30) | TPCH-Q8  | parquet / none / none | 4.55   | 4.71        |   -3.48%   |   1.90%   |   3.53%        | 5     |   -2.35%       | -1.44   | -1.96  |
| TPCH(30) | TPCH-Q22 | parquet / none / none | 1.93   | 2.01        |   -3.96%   |   0.05%   |   4.05%        | 5     |   -2.59%       | -2.31   | -2.19  |
| TPCH(30) | TPCH-Q10 | parquet / none / none | 4.52   | 4.73        |   -4.26%   |   1.26%   |   2.43%        | 5     |   -3.40%       | -2.02   | -3.51  |
| TPCH(30) | TPCH-Q11 | parquet / none / none | 1.02   | 1.05        |   -3.58%   |   3.94%   |   2.36%        | 5     |   -4.56%       | -1.44   | -1.79  |
| TPCH(30) | TPCH-Q13 | parquet / none / none | 9.52   | 10.04       | I -5.24%   |   2.14%   |   0.56%        | 5     | I -4.67%       | -2.31   | -5.57  |
| TPCH(30) | TPCH-Q15 | parquet / none / none | 3.49   | 3.68        | I -5.08%   |   0.07%   |   0.56%        | 5     | I -5.66%       | -2.31   | -20.08 |
| TPCH(30) | TPCH-Q9  | parquet / none / none | 11.92  | 12.71       | I -6.19%   |   0.57%   |   3.15%        | 5     | I -4.99%       | -2.31   | -4.33  |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+

Change-Id: I80027b1baeb4ab453938c0f6357b120f4035ba08
Reviewed-on: http://gerrit.cloudera.org:8080/15821
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-05-05 10:12:42 +00:00
stakiar
c47fca5960 IMPALA-8962: FETCH_ROWS_TIMEOUT_MS should apply before rows are available
IMPALA-7312 added the query option FETCH_ROWS_TIMEOUT_MS, but it only
applies to fetch requests against a query that has already transitioned
to the 'FINISHED' state. This patch changes the timeout so that it
applies to queries in the 'RUNNING' state as well. Before this patch,
fetch requests issued while a query was 'RUNNING' blocked until the query
transitioned to the 'FINISHED' state, and then it fetched results and
returned them. After this patch, fetch requests against queries in the
'RUNNING' state will block for 'FETCH_ROWS_TIMEOUT_MS' and then return.

For HS2 clients, fetch requests that return while a query is 'RUNNING'
set their TStatusCode to STILL_EXECUTING_STATUS. For Beeswax clients,
fetch requests that return while a query is 'RUNNING' set the 'ready'
flag to false. For both clients, hasMoreRows is set to true.

If the following sequence of events occurs:
* A fetch request is issued and blocks on a 'RUNNING' query
* The query transitions to the 'FINISHED' state
* The fetch request attempts to read multiple batches
Then the time spent waiting for the query to finish is deducted from
the timeout used when waiting for rows to be produced by the Coordinator
fragment.

Fixed a bug in the current usage of FETCH_ROWS_TIMEOUT_MS where the
time units for FETCH_ROWS_TIMEOUT_MS and MonotonicStopWatch were not
being converted properly.

Tests:
* Moved existing fetch timeout tests from hs2/test_fetch.py into a new
test file hs2/test_fetch_timeout.py.
* Added several new tests to hs2/test_fetch_timeout.py to validate that
the timeout is applied to 'RUNNING' queries and that the timeout applies
across a 'RUNNING' and 'FINISHED' query.
* Added new tests to query_test/test_fetch.py to validate the timeout
while using the Beeswax protocol.

Change-Id: I2cba6bf062dcc1af19471d21857caa797c1ea4a4
Reviewed-on: http://gerrit.cloudera.org:8080/14332
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-10-08 19:13:33 +00:00
Vihang Karajgaonkar
39613c8226 IMPALA-8627: Enable catalog-v2 in tests
This patch enables catalog-v2 by default in all the tests.

Test fixes:
1. Modified test_observability which fails on catalog-v2 since
the profile emits different metadata load events. The test now looks for
the right events on the profile depending on whether catalogv2 is
enabled or not.
2. TableName.java constructor allows non-lowercased
table and database names. This causes problems at the local catalog
cache which expects the tablenames to be always in lowercase. More
details on this failure are available in IMPALA-8627. The patch makes
sure that the loadTable requests in local catalog do a explicit
conversion of tablename to lowercase in order to get around the issue.
3. Fixes the JdbcTest which checks for existence of table comment in the
getTables metadata jdbc call. In catalog-v2 since the columns are not
requested, LocalTable is not loaded and hence the test needs to be
modified to check if catalog-v2 is enabled.
4. Skips test_sanity which creates a Hive db and issues a invalidate
metadata to make it visible in catalog. Unfortunately, in catalog-v2
currently there is no way to see a newly created database when event
polling is disabled.
5. Similar to above (4) test_metadata_query_statements.py creates a hive
db and issues a invalidate metadata. The test runs QueryTest/describe-db
which is split into two one for checking the hive-db and other contains
rest of the queries of the original describe-db. The split makes it
possible to only execute the test partially when catalog-v2 is enabled

Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d
Reviewed-on: http://gerrit.cloudera.org:8080/13933
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-08-07 01:41:15 +00:00
Tim Armstrong
ab908d54c2 IMPALA-8605: clean up HS2/beeswax session management
Session/operation secrets are part of the HS2 handle
but we haven't made use of them up until now. This
patch checks the value and treats it as part of the
session key, as originally intended. I.e. if the
secret is missing, the session lookup fails.

The operation secret is the same as the session secret
to save having to generate and store extra secrets
(there's no real benefit).

A secret is added to each Beeswax session. This secret is
internal to the server and not exposed. Adds validation
that client requests accessed via Beeswax belong to the
same user as the session.

We switch uuid_generator_ to use boost::random_device, which
uses /dev/urandom as its source of randomness to be more robust -
otherwise it's hard to be sure that we won't have collisions,
although it doesn't seem to be a problem in practice.

For requests - GetRuntimeProfile() and GetExecSummary()
that provide both a session and query ID, the code already
checks that the session's user matches the query.

An exception to the validation mechanisms above is added
for Close() and Cancel() beeswax operations, because impala-shell
and some administrative tools allow cancellation of
queries on different threads and from different tools.

We skip validating the session secret when cancelling queries
from the web UI, since web UI users don't have the secret.

Testing:
* Ran exhaustive tests.
* Add tests for all HS2 RPCs that provide invalid session secrets.
* Add tests for HS2 RPCs that provide both session and query
  ID to ensure that query belongs to the session.
* Add basic test for beeswax testing accessing a query from
  different connections.

Change-Id: I4c014d1a32e273275a773f842b9ed9793dbdba6b
Reviewed-on: http://gerrit.cloudera.org:8080/13585
Reviewed-by: Lars Volker <lv@cloudera.com>
Reviewed-by: Thomas Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-11 22:38:24 +00:00
Tim Armstrong
f46de21140 IMPALA-1760: Implement shutdown command
This is the same patch except with fixes for the test failures
on EC and S3 noted in the JIRA.

This allows graceful shutdown of executors and partially graceful
shutdown of coordinators (new operations fail, old operations can
continue).

Details:
* In order to allow future admin commands, this is implemented with
  function-like syntax and does not add any reserved words.
* ALL privilege is required on the server
* The coordinator impalad that the client is connected to can be shut
  down directly with ":shutdown()".
* Remote shutdown of another impalad is supported, e.g. with
  ":shutdown('hostname')", so that non-coordinators can be shut down
  and for the convenience of the client, which does not have to
  connect to the specific impalad. There is no assumption that the
  other impalad is registered in the statestore; just that the
  coordinator can connect to the other daemon's thrift endpoint.
  This simplifies things and allows shutdown in various important
  cases, e.g. statestore down.
* The shutdown time limit can be overridden to force a quicker or
  slower shutdown by specifying a deadline in seconds after the
  statement is executed.
* If shutting down, a banner is shown on the root debug page.

Workflow:
1. (if a coordinator) clients are prevented from submitting
  queries to this coordinator via some out-of-band mechanism,
  e.g. load balancer
2. the shutdown process is started via ":shutdown()"
3. a bit is set in the statestore and propagated to coordinators,
  which stop scheduling fragment instances on this daemon
  (if an executor).
4. the query startup grace period (which is ideally set to the AC
  queueing delay plus some additional leeway) expires
5. once the daemon is quiesced (i.e. no fragments, no registered
  queries), it shuts itself down.
6. If the daemon does not successfully quiesce (e.g. rogue clients,
  long-running queries), after a longer timeout (counted from the start
  of the shutdown process) it will shut down anyway.

What this does:
* Executors can be shut down without causing a service-wide outage
* Shutting down an executor will not disrupt any short-running queries
  and will wait for long-running queries up to a threshold.
* Coordinators can be shut down without query failures only if
  there is an out-of-band mechanism to prevent submission of more
  queries to the shut down coordinator. If queries are submitted to
  a coordinator after shutdown has started, they will fail.
* Long running queries or other issues (e.g. stuck fragments) will
  slow down but not prevent eventual shutdown.

Limitations:
* The startup grace period needs to be configured to be greater than
  the latency of statestore updates + scheduling + admission +
  coordinator startup. Otherwise a coordinator may send a
  fragment instance to the shutting down impalad. (We could
  automate this configuration as a follow-on)
* The startup grace period means a minimum latency for shutdown,
  even if the cluster is idle.
* We depend on the statestore detecting the process going down
  if queries are still running on that backend when the timeout
  expires. This may still be subject to existing problems,
  e.g. IMPALA-2990.

Tests:
* Added parser, analysis and authorization tests.
* End-to-end test of shutting down impalads.
* End-to-end test of shutting down then restarting an executor while
  queries are running.
* End-to-end test of shutting down a coordinator
  - New queries cannot be started on coord, existing queries continue to run
  - Exercises various Beeswax and HS2 operations.

Change-Id: I8f3679ef442745a60a0ab97c4e9eac437aef9463
Reviewed-on: http://gerrit.cloudera.org:8080/11484
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-09-26 01:28:36 +00:00
Tim Armstrong
16a04ce81b Revert "IMPALA-1760: Implement shutdown command"
This reverts commit fda44aed9d.

A couple of the tests broken on S3 and erasure coding. Reverting
to unblock testing until we can come up with a proper fix.

Change-Id: Icef47b3aa67bc056c40592d47e93c4ebc57be98c
Reviewed-on: http://gerrit.cloudera.org:8080/11435
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2018-09-14 01:12:22 +00:00
Tim Armstrong
fda44aed9d IMPALA-1760: Implement shutdown command
This allows graceful shutdown of executors and partially graceful
shutdown of coordinators (new operations fail, old operations can
continue).

Details:
* In order to allow future admin commands, this is implemented with
  function-like syntax and does not add any reserved words.
* ALL privilege is required on the server
* The coordinator impalad that the client is connected to can be shut
  down directly with ":shutdown()".
* Remote shutdown of another impalad is supported, e.g. with
  ":shutdown('hostname')", so that non-coordinators can be shut down
  and for the convenience of the client, which does not have to
  connect to the specific impalad. There is no assumption that the
  other impalad is registered in the statestore; just that the
  coordinator can connect to the other daemon's thrift endpoint.
  This simplifies things and allows shutdown in various important
  cases, e.g. statestore down.
* The shutdown time limit can be overridden to force a quicker or
  slower shutdown by specifying a deadline in seconds after the
  statement is executed.
* If shutting down, a banner is shown on the root debug page.

Workflow:
1. (if a coordinator) clients are prevented from submitting
  queries to this coordinator via some out-of-band mechanism,
  e.g. load balancer
2. the shutdown process is started via ":shutdown()"
3. a bit is set in the statestore and propagated to coordinators,
  which stop scheduling fragment instances on this daemon
  (if an executor).
4. the query startup grace period (which is ideally set to the AC
  queueing delay plus some additional leeway) expires
5. once the daemon is quiesced (i.e. no fragments, no registered
  queries), it shuts itself down.
6. If the daemon does not successfully quiesce (e.g. rogue clients,
  long-running queries), after a longer timeout (counted from the start
  of the shutdown process) it will shut down anyway.

What this does:
* Executors can be shut down without causing a service-wide outage
* Shutting down an executor will not disrupt any short-running queries
  and will wait for long-running queries up to a threshold.
* Coordinators can be shut down without query failures only if
  there is an out-of-band mechanism to prevent submission of more
  queries to the shut down coordinator. If queries are submitted to
  a coordinator after shutdown has started, they will fail.
* Long running queries or other issues (e.g. stuck fragments) will
  slow down but not prevent eventual shutdown.

Limitations:
* The startup grace period needs to be configured to be greater than
  the latency of statestore updates + scheduling + admission +
  coordinator startup. Otherwise a coordinator may send a
  fragment instance to the shutting down impalad. (We could
  automate this configuration as a follow-on)
* The startup grace period means a minimum latency for shutdown,
  even if the cluster is idle.
* We depend on the statestore detecting the process going down
  if queries are still running on that backend when the timeout
  expires. This may still be subject to existing problems,
  e.g. IMPALA-2990.

Tests:
* Added parser, analysis and authorization tests.
* End-to-end test of shutting down impalads.
* End-to-end test of shutting down then restarting an executor while
  queries are running.
* End-to-end test of shutting down a coordinator
  - New queries cannot be started on coord, existing queries continue to run
  - Exercises various Beeswax and HS2 operations.

Change-Id: I4d5606ccfec84db4482c1e7f0f198103aad141a0
Reviewed-on: http://gerrit.cloudera.org:8080/10744
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-09-11 23:57:20 +00:00
Bikramjeet Vig
2de9db8fc6 IMPALA-5216: Make admission control queuing async
Implement asynchronous admission control queuing. This is achieved by
running the admission control code-path in a separate thread. Major
changes include: propagating cancellation to the admission control
thread and dequeuing thread, and adding a new Query Operation State
called "PENDING" that represents the state between completion of
planning and starting of query execution.

Testing:
- Added a deterministic end to end test and a session expiry test.
- Ran multiple stress tests successfully with a cancellation probability
of 60% and with different values for the following parameters:
max_requests, queue_wait_timeout_ms. Ensured that the impalad was in a
valid state afterwards (no orphan fragments or wrong metrics).
- Ran all exhaustive tests and ASAN core tests successfully.
- Ran data load successfully.

Change-Id: I989cf5b259afb8f5bc5c35590c94961c81ce88bf
Reviewed-on: http://gerrit.cloudera.org:8080/10060
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-06-13 15:48:17 +00:00
Zoltan Borok-Nagy
ce65b43d47 IMPALA-2248: Make idle_session_timeout a query option
This commit makes idle_session_timeout a query option.

idle_session_timeout currently can be set as a command line
option, which will be the default timeout for sessions.
HS2 sessions can override it with a smaller value by setting
it in the configuration overlay of HS2 OpenSession().

However, we can't override idle_session_timeout for JDBC/ODBC
connections, because we cannot put this in the connection string.

This commit is a workaround for this problem, it allows JDBC/ODBC
connections to set the session timeout as a query option
with the SET statement.

After this commit, the session timeout can be overridden to
any value, i.e. the command line flag idle_session_timeout
doesn't limit this option anymore.

I created an automated test case in JdbcTest.java based on
test_hs2.py::test_concurrent_session_mixed_idle_timeout. I also
extended the test_session_expiration and test_set_and_unset
test suites.

Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e
Reviewed-on: http://gerrit.cloudera.org:8080/8490
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-06 01:47:47 +00:00
Philip Zeyliger
c9740b43d1 IMPALA-5908: Allow SET to unset modified query options.
The query 'SET <option>=""' will now unset an option within the session,
reverting it to its default state.

This change became necessary when "SET" started returning an empty
string for unset options which don't have a default. The test
infrastructure (impala_test_suite.py) resets options to what it thinks
is its defaults, and, when this broke, some ASAN builds started to fail,
presumably due to a timing issue with how we re-use connections between
tests.

Previously, SessionState copied over the default options from the server
when the session was created and then mutated that. To support unsetting
options at the session layer, this change keeps a pointer to the default
server settings, keeps separately the mutations, and overlays the
options each time they're requested. Similarly, for configuration
overlays that happen per-query, the overlay is now done explicitly,
because empty per-query overlay values (key=..., value="") now have no effect.

Because "set key=''" is ambiguous between "set to the empty string" and
"unset", it's now impossible to set to the empty string, at the session
layer, an option that is configured at a previous layer. In practice,
this is just debug_action and request_pool. debug_action is essentially
an internal tool. For request_pool, this means that setting the default
request_pool via impalad command line is now a bad idea, as it can't
be cleared at a per-session level. For request_pool, the correct
course of action for users is to use placement rules, and to have a
default placement rule.

Testing:
* Added a simple test that triggered this side-effect without this code.
  Specifically, "impala-python infra/python/env/bin/py.test tests/metadata/test_set.py -s"
  with the modified set.test triggers.
* Amended tests/custom_cluster/test_admission_controller.py; it was
  useful for testing these code paths.
* Added cases to query-options-test to check behavior for both
  defaulted and non-defaulted values.
* Added a custom cluster test that checks that overlays are
  working against
* Ran an ASAN build where this was triggering previously.

Change-Id: Ia8c383e68064f839cb5000118901dff77b4e5cb9
Reviewed-on: http://gerrit.cloudera.org:8080/8070
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-10-05 03:04:38 +00:00
Philip Zeyliger
eb11b46be6 Re-apply: IMPALA-5589: change "set" in impala-shell to show empty string for unset query options
(Re-applies reverted commit 387bde0639.
The commit broke ASAN tests due to a race in how test infrastructure
re-uses connections. The fix for that is in an adjacent commit.)

When converting TQueryOptions to a map<string,string>, we now convert
unset options to the empty string. Within TQueryOptions we have optional
options (like mt_dop or compression_codec) with no default specified. In
this case, the user was seeing 0 for numeric types and the first enum
option for enumeration types (e.g., "NONE" in the compression case).
This was confusing as the implementation handles this "null" case
differently (e.g., using SNAPPY as the default codec in the case
reported in the JIRA).

When running "set" in impala-shell, the difference is as
follows:

    -        BUFFER_POOL_LIMIT: [0]
    +        BUFFER_POOL_LIMIT: []
    -        COMPRESSION_CODEC: [NONE]
    +        COMPRESSION_CODEC: []
    -        MT_DOP: [0]
    +        MT_DOP: []
    -        RESERVATION_REQUEST_TIMEOUT: [0]
    +        RESERVATION_REQUEST_TIMEOUT: []
    -        SEQ_COMPRESSION_MODE: [0]
    +        SEQ_COMPRESSION_MODE: []
    -        V_CPU_CORES: [0]
    +        V_CPU_CORES: []

Obviously, the empty string is a valid value for a string-typed option, where
it will be impossible to tell the difference between "unset" and "set to empty
string." Today, there are two string-typed options: debug_string defaults to ""
and request_pool has no default. An alternative would have been to use
a special token like "_unset" or to introduce a new field in the beeswax
Thrift ConfigVariable struct. I think the empty string approach
is clearest.

The other users of this state, which I believe are HiveServer2's OpenSession()
call and HiveServer2's response to a "SET" query are affected. They
benefit from the same fix, and a new test has been added to test_hs2.py.

I did a mild refactoring in the HS2 tests to write a helper method
for the very common pattern of excecuting a query.

Testing:
* Manual testing with impala-shell
* Modified impala-shell tests to check this explicitly for one case.
* Modified HS2 test to check this as well as the SET k=v statement,
  which looked otherwise untested.

Change-Id: I29f5d8ab874cb1338077f16019a9537766cac0c4
Reviewed-on: http://gerrit.cloudera.org:8080/8096
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
2017-10-03 01:11:50 +00:00
Philip Zeyliger
f0e79314fe Revert "IMPALA-5589: change "set" in impala-shell to show empty string for unset query options"
Due to re-use of connections in the test infrastructure, this commit
is causing ASAN failures in the builds. That is being worked out
as part of IMPALA-5908, but, in the meanwhile, it's prudent
to revert.

This reverts commit 387bde0639.

Change-Id: I41bc8ab0f1df45bbd311030981a7c18989c2edc8
Reviewed-on: http://gerrit.cloudera.org:8080/8087
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
2017-09-16 04:06:53 +00:00
Philip Zeyliger
387bde0639 IMPALA-5589: change "set" in impala-shell to show empty string for unset query options
When converting TQueryOptions to a map<string,string>, we now convert
unset options to the empty string. Within TQueryOptions we have optional
options (like mt_dop or compression_codec) with no default specified. In
this case, the user was seeing 0 for numeric types and the first enum
option for enumeration types (e.g., "NONE" in the compression case).
This was confusing as the implementation handles this "null" case
differently (e.g., using SNAPPY as the default codec in the case
reported in the JIRA).

When running "set" in impala-shell, the difference is as
follows:

    -        BUFFER_POOL_LIMIT: [0]
    +        BUFFER_POOL_LIMIT: []
    -        COMPRESSION_CODEC: [NONE]
    +        COMPRESSION_CODEC: []
    -        MT_DOP: [0]
    +        MT_DOP: []
    -        RESERVATION_REQUEST_TIMEOUT: [0]
    +        RESERVATION_REQUEST_TIMEOUT: []
    -        SEQ_COMPRESSION_MODE: [0]
    +        SEQ_COMPRESSION_MODE: []
    -        V_CPU_CORES: [0]
    +        V_CPU_CORES: []

Obviously, the empty string is a valid value for a string-typed option, where
it will be impossible to tell the difference between "unset" and "set to empty
string." Today, there are two string-typed options: debug_string defaults to ""
and request_pool has no default. An alternative would have been to use
a special token like "_unset" or to introduce a new field in the beeswax
Thrift ConfigVariable struct. I think the empty string approach
is clearest.

The other users of this state, which I believe are HiveServer2's OpenSession()
call and HiveServer2's response to a "SET" query are affected. They
benefit from the same fix, and a new test has been added to test_hs2.py.

I did a mild refactoring in the HS2 tests to write a helper method
for the very common pattern of excecuting a query.

Testing:
* Manual testing with impala-shell
* Modified impala-shell tests to check this explicitly for one case.
* Modified HS2 test to check this as well as the SET k=v statement,
  which looked otherwise untested.

Change-Id: I86bc06a58d67b099da911293202dae9e844c439b
Reviewed-on: http://gerrit.cloudera.org:8080/7886
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
2017-09-06 19:43:57 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Thomas Tauber-Marshall
5231301084 IMPALA-1633: GetOperationStatus should set errorMessage and sqlState
Currently, we never populate the errorMessage or sqlState
fields of TGetOperationStatusResp when the GetOperationStatus
HiveServer2 rpc is called. This patch checks if the query has
an error status and if so sets errorMessage and sqlState.

GetOperationStatus also now takes the QueryExecState lock since
QueryExecState::query_state_ and QueryExecState::query_status_
are supposed to be protected by it.

Additionally, this patch performs some cleanup and adds some
documentation around our behavior for updating
QueryExecState::query_state_/query_status_.

This also addresses IMPALA-3298: TGetOperationStatusResp missing
error message when data is expired

Change-Id: Icb792f88286779fcf2ce409828de818bc4e80bed
Reviewed-on: http://gerrit.cloudera.org:8080/3094
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Internal Jenkins
2016-06-01 19:32:39 -07:00
Casey Ching
074e5b4349 Remove hashbang from non-script python files
Many python files had a hashbang and the executable bit set though
they were not intended to be run a standalone script. That makes
determining which python files are actually scripts very difficult.
A future patch will update the hashbang in real python scripts so they
use $IMPALA_HOME/bin/impala-python.

Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba
Reviewed-on: http://gerrit.cloudera.org:8080/599
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2015-08-04 05:26:07 +00:00
Henry Robinson
79913b01e6 IMPALA-2064: Add effective_user() builtin
The user() builtin always returns the connecteduser. However, if the
client wants to see which user its queries are actually delegated to,
there was no easy way to do that.

This patch adds effective_user(), which returns the proxy delegated user
for authorization purposes. If no delegated user is set, the effective
user is the same as that returned from user().

The only way to test this is via a new custom cluster test, which sets
impala.doas.user so that the effective user might be different from the
connected one.

Change-Id: I7048c27c6808a6986dbe1246929816176dca9f76
Reviewed-on: http://gerrit.cloudera.org:8080/458
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
2015-06-16 23:42:40 +00:00
Henry Robinson
66295a8554 IMPALA-1264: Re-enable test_fetch_first
Since Impala may sometimes return fewer rows than asked for, it's not
safe to test for an exact size response from a single fetch() call
except in very particular cases (when either 0 or 1 rows are
expected). The HS2 fetch_first tests relied on the full request always
being honoured in a couple of places, and as a result are prone to
occasional failures due to 'underflow'. This patch changes the fetch()
call to use fetch_until() in all places where underflow could happen,
and removes the xfail restriction from those V1 and V6 tests.

Change-Id: Ia62f3624947530d516a87f84e706e305048b916f
Reviewed-on: http://gerrit.cloudera.org:8080/192
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2015-03-11 16:39:39 -07:00
Henry Robinson
c850e6c65c IMPALA-1453: Fix many bugs with HS2 FETCH_FIRST
This patch fixes a number of bugs with HS2 columnar result sets,
particularly around the computation of their size in memory. It also
fixes test_fetch_first.py to be robust to the possibility of a single
fetch not returning all expected rows (IMPALA-1264).

In order to properly account for the columnar result set's size in
memory, we need to cope with the fact that the size of two result sets
combined might be larger than a result set which the result of appending
one to the other. However, we estimate the expected size of the result
cache as its current size plus the size of the result set to be
added (we do this so that we don't perform the copy, then find out it
used too much memory). This patch adds logic to
QueryExecState::FetchRowsInternal() to correct the true memory usage
after the copy, in the case where memory was saved.

(This happens for example when we stitch together two null columns and
wind up saving a byte).

Change-Id: Ie172840ea0ff40e43370825ca8f671a86dde4f22
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5199
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
2014-11-21 16:40:41 -08:00
Henry Robinson
6bc411c890 Add support for HS2 protocol V6
This patch adds support for V6 of the HS2 protocol, which notably
includes columnar organisation of result sets. Clients that set their
protocol version to < V6 will receive result sets in the traditional row
orientation.

The performance of fetches over HS2 goes up significantly as a result,
since the V1 protocol had some pathologies in its deserialisation
performance.

 Beeswax
  Row materialisation: 455ms, client processing time: 523ms
 HS2 V6:
  Row materialisation: 444ms, client processing time: 1.8s
 HS2 V1:
  Row materialisation: 585ms, client processing time: 15.9s (!)

TODO: Add support for the CHAR datatype

The following patch is also included:

Fix wait-for-hiveserver2.py when Impala moves to HS2 V6

Due to HIVE-6050, older versions of Hive are not compatible with newer
clients (even those that try to use old protocol
versions). wait-for-hiveserver2.py uses HS2 to talk to the HiveServer2
service, but picks up the newer version from V6, and fails.

This patch temporarily re-adds cli_service.thrift (renaming the Thrift
service as LegacyTCLIService) only for wait-for-hiveserver2.py to
use. As soon as Impala's thirdparty Hive moves to HS2 V6, we can get rid
of this change.

Change-Id: I2cbe884345ae7e772620b80a29b6574bd6532940
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4402
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-09-18 20:17:18 -07:00
Henry Robinson
466cedb217 IMPALA-1162: Add GetExecSummary() and GetRuntimeProfile() to HS2
This patch adds RPCs for GetExecSummary() and GetRuntimeProfile() to the
Impala HS2 service, to get reporting parity with the Beeswax service
implementation.

Change-Id: I966a85854e5a657a26c226cc0e68e053fb2aa6c2
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3902
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3944
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-08-20 01:50:18 -07:00
Nong Li
8994f388db Update hs2 client API to the version hive uses in 5.1.
The driving motivation is to be able to return precision/scale for
decimal types.

Change-Id: I1b49c5a61b59a292bc09612c7945191078a22bf8
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3772
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-08-05 16:09:39 -07:00
Srinath Shankar
d193a1e8a5 IMPALA-963: Impala crash in ClearResultCache()
The issue is that Impala crashes in ClearResultCache() with result caching on
for parallel inserts. The reason is that the ClearResltCache() accesses the
coordinator RuntimeState to update the query mem tracker. However, for there is
no coordinator fragment (or RuntimeState) for parallel inserts.
The fix is to intiialize a query mem tracker to track memory usage in the coordinator
instance even if there is no coordinator fragment.

Change-Id: I3a2ef14860f683910c29ae19b931202ca6867b9f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2501
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
2014-05-19 12:40:12 -07:00
Alex Behm
6b769d011d Adds limited support for the FETCH_FIRST fetch orientation in HS2 client requests.
Adds a bounded query-result cache that clients can enable by setting
an 'impala.resultset.cache.size'  option in the HS2 confOverlay mapof the HS2 exec request.
Impala permits FETCH_FIRST for a particular stmt iff result caching is enabled.
FETCH_FIRST will succeed as long all previously fetched rows fit into the bounded
result cache. Regardless of whether a FETCH_FIRST succeeds or not, clients may
always resume fetching with FETCH_NEXT.

The FETCH_FIRST feature is intended to allow HUE users to export an entire
result set (to Excel, CSV, etc.) after browsing through a few pages of results,
without having ro re-run the query from scratch.

Change-Id: I71ab4794ddef30842594c5e1f7bc94724d6ce89f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1356
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1406
2014-01-30 14:58:46 -08:00