impala

mirror of https://github.com/apache/impala.git synced 2025-12-25 02:03:09 -05:00

Author	SHA1	Message	Date
Joe McDonnell	8d5adfd0ba	IMPALA-13123: Add option to run tests with Python 3 This introduces the IMPALA_USE_PYTHON3_TESTS environment variable to select whether to run tests using the toolchain Python 3. This is an experimental option, so it defaults to false, continuing to run tests with Python 2. This fixes a first batch of Python 2 vs 3 issues: - Deciding whether to open a file in bytes mode or text mode - Adapting to APIs that operate on bytes in Python 3 (e.g. codecs) - Eliminating 'basestring' and 'unicode' locations in tests/ by using the recommendations from future ( https://python-future.org/compatible_idioms.html#basestring and https://python-future.org/compatible_idioms.html#unicode ) - Uses impala-python3 for bin/start-impala-cluster.py All fixes leave the Python 2 path working normally. Testing: - Ran an exhaustive run with Python 2 to verify nothing broke - Verified that the new environment variable works and that it uses Python 3 from the toolchain when specified Change-Id: I177d9b8eae9b99ba536ca5c598b07208c3887f8c Reviewed-on: http://gerrit.cloudera.org:8080/21474 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-12-17 07:28:51 +00:00
Xuebin Su	ad868b9947	IMPALA-13115: Add query id to error messages This patch adds the query id to the error messages in both - the result of the `get_log()` RPC, and - the error message in an RPC response before they are returned to the client, so that the users can easily figure out the errored queries on the client side. To achieve this, the query id of the thread debug info is set in the RPC handler method, and is retrieved from the thread debug info each time the error reporting function or `get_log()` gets called. Due to the change of the error message format, some checks in the impala-shell.py are adapted to keep them valid. Testing: - Added helper function `error_msg_expected()` to check whether an error message is expected. It is stricter than only using the `in` operator. - Added helper function `error_msg_equal()` to check if two error messages are equal regardless of the query ids. - Various test cases are adapted to match the new error message format. - `ImpalaBeeswaxException`, which is used in tests only, is simplified so that it has the same error message format as the exceptions for HS2. - Added an assertion to the case of killing and restarting a worker in the custom cluster test to ensure that the query id is in the error message in the client log retrieved with `get_log()`. Change-Id: I67e659681e36162cad1d9684189106f8eedbf092 Reviewed-on: http://gerrit.cloudera.org:8080/21587 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-08-08 14:11:04 +00:00
Joe McDonnell	eb66d00f9f	IMPALA-11974: Fix lazy list operators for Python 3 compatibility Python 3 changes list operators such as range, map, and filter to be lazy. Some code that expects the list operators to happen immediately will fail. e.g. Python 2: range(0,5) == [0,1,2,3,4] True Python 3: range(0,5) == [0,1,2,3,4] False The fix is to wrap locations with list(). i.e. Python 3: list(range(0,5)) == [0,1,2,3,4] True Since the base operators are now lazy, Python 3 also removes the old lazy versions (e.g. xrange, ifilter, izip, etc). This uses future's builtins package to convert the code to the Python 3 behavior (i.e. xrange -> future's builtins.range). Most of the changes were done via these futurize fixes: - libfuturize.fixes.fix_xrange_with_import - lib2to3.fixes.fix_map - lib2to3.fixes.fix_filter This eliminates the pylint warnings: - xrange-builtin - range-builtin-not-iterating - map-builtin-not-iterating - zip-builtin-not-iterating - filter-builtin-not-iterating - reduce-builtin - deprecated-itertools-function Testing: - Ran core job Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f Reviewed-on: http://gerrit.cloudera.org:8080/19589 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Thomas Tauber-Marshall	ca884476bb	IMPALA-9930 (part 2): Introduce new admission control rpc service This patch introduces a new krpc service, AdmissionControlService, which coordinators can use to submit queries for admission. This patch adds some simple configuration flags that make it possible to have coordinators use this service to submit their queries for admission to other coordinators. These flags are only to make this patch testable and will be replaced when the separate admission control daemon is introduced in IMPALA-9975. The interface consists of the following RPCs: - AdmitQuery: takes a TQueryExecRequest and a TQueryOptions (serialized into sidecars), places the request on a queue to be processed by a thread pool and then immediately returns. - GetQueryStatus: takes a query id and returns the current admission status, including the QuerySchedulePB if admission has completed successfully but the query has not been released yet. - ReleaseQueryBackends: called when individual backends complete but the overall query is still running to release resources incrementally. This RPC will be called at most O(log(# backends)) per query due to BackendResourceState, which batches backends to release together. - ReleaseQuery: called when the query has completely finished. Releases all remaining resources. - CancelAdmission: called if a query is cancelled before an admission decision has been made to indicate that it should no longer be considered for admission. The majority of the patch consists of two classes: - AdmissionControlClient: used to abstract whether admission is being performed locally or remotely. In the local case, it is basically just a wrapper around AdmissionController. In the remote case, it handles serializing/deserializing of RPC params, polling GetQueryStatus() until a decision has been made, etc. - AdmissionControlService: exports the RPC interface and acts as a wrapper around AdmissionController. Some notable changes involved: - AdmissionController::SubmitForAdmission() no longer blocks while a query is queued. Instead, a new function WaitOnQueued() can be used to monitor the admission status of a queued query. - Adding events to the query timeline is moved out of AdmissionController and into the AdmissionControlClient classes, so that it always happens on the coordinator. - When a cluster is run in the new admission control service mode, only the impalad that is performing admission control exposes the /admission http endpoint. Observability will be cleaned up in a subsequent patch. Testing: - Modified existing admission control tests to run both with and without the admission control service enabled, including both the functional and stress tests. The 'num_queries' param in the stress test is modified to only use a single value to reduce the number of tests that are run and keep the running time reasonable. - Ran tpch10 on a local minicluster and observed no significant regressions. Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae Reviewed-on: http://gerrit.cloudera.org:8080/16412 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-12-03 23:46:29 +00:00
Tim Armstrong	748e41ab41	IMPALA-9380: async query unregistration This change improves query latency by doing much of the heavyweight work of unregistering a query asynchronously, instead of synchronously on the RPC thread. The biggest win is to move the profile serialization off the RPC thread. Unregistration processing is done by a thread pool with 4 threads by default. This is configurable by --unregistration_thread_pool_size and --unregistration_thread_pool_queue_depth. This fixes a pre-existing bug where a query was temporarily neither in the in-flight queries nor the completed queries. It would be much easier to hit this with async unregistration because there is less synchronisation on the client side. Now the query is briefly in both maps, but this is handled as follows: * All places that look up both the maps will check the in-flight map first, and return a reference to the ClientRequestState, i.e. ignoring the entry in the query log. * The /queries page does not return completed queries if they were found in the in-flight queries map, so avoids duplicate results. The thread safety story changes slightly. Before this change, only one thread could remove the query from the map and close it, with only one thread "winning" the race to remove the ClientRequestState from the map. Since we leave the query in the map while being finalized, we instead use an atomic in ClientRequestState to ensure that only one thread does the finalization. Some misc cleanup was done as a result of these changes: * Fix a pre-existing TSAN race in RuntimeProfile that was revealed by the new concurrent unregister test. * Consolidate the various unknown query handle errors into an error code so that we consistently return the same string. * "Unregister query" should include flushing audit events. Testing: * Add a test that unregisters a query concurrent with other operations. * Ran exhaustive tests Perf: Ran TPC-H 30 with mt_dop=4. No regressions and some improvements: +----------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +----------+-----------------------+---------+------------+------------+----------------+ \| TPCH(30) \| parquet / none / none \| 5.38 \| -2.67% \| 4.02 \| -2.01% \| +----------+-----------------------+---------+------------+------------+----------------+ +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+ \| Workload \| Query \| File Format \| Avg(s) \| Base Avg(s) \| Delta(Avg) \| StdDev(%) \| Base StdDev(%) \| Iters \| Median Diff(%) \| MW Zval \| Tval \| +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+ \| TPCH(30) \| TPCH-Q1 \| parquet / none / none \| 5.36 \| 5.17 \| +3.61% \| 1.82% \| 1.17% \| 5 \| +3.73% \| 1.73 \| 3.65 \| \| TPCH(30) \| TPCH-Q6 \| parquet / none / none \| 1.77 \| 1.74 \| +1.48% \| 2.00% \| 2.50% \| 5 \| +2.89% \| 0.87 \| 1.03 \| \| TPCH(30) \| TPCH-Q12 \| parquet / none / none \| 3.02 \| 3.00 \| +0.79% \| 2.18% \| 2.21% \| 5 \| +1.55% \| 0.00 \| 0.57 \| \| TPCH(30) \| TPCH-Q16 \| parquet / none / none \| 1.65 \| 1.64 \| +0.81% \| 1.35% \| 0.03% \| 5 \| +0.07% \| 1.15 \| 1.34 \| \| TPCH(30) \| TPCH-Q2 \| parquet / none / none \| 1.21 \| 1.21 \| -0.07% \| 2.11% \| 2.14% \| 5 \| -0.04% \| -0.29 \| -0.05 \| \| TPCH(30) \| TPCH-Q4 \| parquet / none / none \| 2.50 \| 2.52 \| -0.49% \| 2.43% \| 3.34% \| 5 \| -0.09% \| -0.29 \| -0.27 \| \| TPCH(30) \| TPCH-Q20 \| parquet / none / none \| 2.86 \| 2.90 \| -1.28% \| 2.30% \| 1.24% \| 5 \| -0.02% \| -0.58 \| -1.11 \| \| TPCH(30) \| TPCH-Q3 \| parquet / none / none \| 4.35 \| 4.40 \| -1.15% \| 1.76% \| 1.78% \| 5 \| -1.12% \| -0.87 \| -1.03 \| \| TPCH(30) \| TPCH-Q19 \| parquet / none / none \| 4.10 \| 4.17 \| -1.80% \| 1.05% \| 1.31% \| 5 \| -1.25% \| -1.73 \| -2.40 \| \| TPCH(30) \| TPCH-Q14 \| parquet / none / none \| 3.20 \| 3.25 \| -1.52% \| 0.79% \| 2.56% \| 5 \| -1.56% \| -0.58 \| -1.26 \| \| TPCH(30) \| TPCH-Q18 \| parquet / none / none \| 10.81 \| 11.07 \| -2.34% \| 5.00% \| 7.01% \| 5 \| -1.40% \| -0.58 \| -0.61 \| \| TPCH(30) \| TPCH-Q7 \| parquet / none / none \| 11.19 \| 11.56 \| -3.18% \| 3.47% \| 6.02% \| 5 \| -0.90% \| -0.87 \| -1.03 \| \| TPCH(30) \| TPCH-Q21 \| parquet / none / none \| 19.91 \| 20.32 \| -2.02% \| 0.66% \| 0.47% \| 5 \| -2.18% \| -2.31 \| -5.64 \| \| TPCH(30) \| TPCH-Q17 \| parquet / none / none \| 5.63 \| 5.77 \| -2.40% \| 1.71% \| 2.01% \| 5 \| -1.84% \| -1.73 \| -2.05 \| \| TPCH(30) \| TPCH-Q5 \| parquet / none / none \| 3.91 \| 4.03 \| -2.74% \| 1.08% \| 1.86% \| 5 \| -2.45% \| -1.44 \| -2.88 \| \| TPCH(30) \| TPCH-Q8 \| parquet / none / none \| 4.55 \| 4.71 \| -3.48% \| 1.90% \| 3.53% \| 5 \| -2.35% \| -1.44 \| -1.96 \| \| TPCH(30) \| TPCH-Q22 \| parquet / none / none \| 1.93 \| 2.01 \| -3.96% \| 0.05% \| 4.05% \| 5 \| -2.59% \| -2.31 \| -2.19 \| \| TPCH(30) \| TPCH-Q10 \| parquet / none / none \| 4.52 \| 4.73 \| -4.26% \| 1.26% \| 2.43% \| 5 \| -3.40% \| -2.02 \| -3.51 \| \| TPCH(30) \| TPCH-Q11 \| parquet / none / none \| 1.02 \| 1.05 \| -3.58% \| 3.94% \| 2.36% \| 5 \| -4.56% \| -1.44 \| -1.79 \| \| TPCH(30) \| TPCH-Q13 \| parquet / none / none \| 9.52 \| 10.04 \| I -5.24% \| 2.14% \| 0.56% \| 5 \| I -4.67% \| -2.31 \| -5.57 \| \| TPCH(30) \| TPCH-Q15 \| parquet / none / none \| 3.49 \| 3.68 \| I -5.08% \| 0.07% \| 0.56% \| 5 \| I -5.66% \| -2.31 \| -20.08 \| \| TPCH(30) \| TPCH-Q9 \| parquet / none / none \| 11.92 \| 12.71 \| I -6.19% \| 0.57% \| 3.15% \| 5 \| I -4.99% \| -2.31 \| -4.33 \| +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+ Change-Id: I80027b1baeb4ab453938c0f6357b120f4035ba08 Reviewed-on: http://gerrit.cloudera.org:8080/15821 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-05 10:12:42 +00:00
stakiar	c47fca5960	IMPALA-8962: FETCH_ROWS_TIMEOUT_MS should apply before rows are available IMPALA-7312 added the query option FETCH_ROWS_TIMEOUT_MS, but it only applies to fetch requests against a query that has already transitioned to the 'FINISHED' state. This patch changes the timeout so that it applies to queries in the 'RUNNING' state as well. Before this patch, fetch requests issued while a query was 'RUNNING' blocked until the query transitioned to the 'FINISHED' state, and then it fetched results and returned them. After this patch, fetch requests against queries in the 'RUNNING' state will block for 'FETCH_ROWS_TIMEOUT_MS' and then return. For HS2 clients, fetch requests that return while a query is 'RUNNING' set their TStatusCode to STILL_EXECUTING_STATUS. For Beeswax clients, fetch requests that return while a query is 'RUNNING' set the 'ready' flag to false. For both clients, hasMoreRows is set to true. If the following sequence of events occurs: * A fetch request is issued and blocks on a 'RUNNING' query * The query transitions to the 'FINISHED' state * The fetch request attempts to read multiple batches Then the time spent waiting for the query to finish is deducted from the timeout used when waiting for rows to be produced by the Coordinator fragment. Fixed a bug in the current usage of FETCH_ROWS_TIMEOUT_MS where the time units for FETCH_ROWS_TIMEOUT_MS and MonotonicStopWatch were not being converted properly. Tests: * Moved existing fetch timeout tests from hs2/test_fetch.py into a new test file hs2/test_fetch_timeout.py. * Added several new tests to hs2/test_fetch_timeout.py to validate that the timeout is applied to 'RUNNING' queries and that the timeout applies across a 'RUNNING' and 'FINISHED' query. * Added new tests to query_test/test_fetch.py to validate the timeout while using the Beeswax protocol. Change-Id: I2cba6bf062dcc1af19471d21857caa797c1ea4a4 Reviewed-on: http://gerrit.cloudera.org:8080/14332 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-10-08 19:13:33 +00:00
Vihang Karajgaonkar	39613c8226	IMPALA-8627: Enable catalog-v2 in tests This patch enables catalog-v2 by default in all the tests. Test fixes: 1. Modified test_observability which fails on catalog-v2 since the profile emits different metadata load events. The test now looks for the right events on the profile depending on whether catalogv2 is enabled or not. 2. TableName.java constructor allows non-lowercased table and database names. This causes problems at the local catalog cache which expects the tablenames to be always in lowercase. More details on this failure are available in IMPALA-8627. The patch makes sure that the loadTable requests in local catalog do a explicit conversion of tablename to lowercase in order to get around the issue. 3. Fixes the JdbcTest which checks for existence of table comment in the getTables metadata jdbc call. In catalog-v2 since the columns are not requested, LocalTable is not loaded and hence the test needs to be modified to check if catalog-v2 is enabled. 4. Skips test_sanity which creates a Hive db and issues a invalidate metadata to make it visible in catalog. Unfortunately, in catalog-v2 currently there is no way to see a newly created database when event polling is disabled. 5. Similar to above (4) test_metadata_query_statements.py creates a hive db and issues a invalidate metadata. The test runs QueryTest/describe-db which is split into two one for checking the hive-db and other contains rest of the queries of the original describe-db. The split makes it possible to only execute the test partially when catalog-v2 is enabled Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d Reviewed-on: http://gerrit.cloudera.org:8080/13933 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-08-07 01:41:15 +00:00
Tim Armstrong	ab908d54c2	IMPALA-8605: clean up HS2/beeswax session management Session/operation secrets are part of the HS2 handle but we haven't made use of them up until now. This patch checks the value and treats it as part of the session key, as originally intended. I.e. if the secret is missing, the session lookup fails. The operation secret is the same as the session secret to save having to generate and store extra secrets (there's no real benefit). A secret is added to each Beeswax session. This secret is internal to the server and not exposed. Adds validation that client requests accessed via Beeswax belong to the same user as the session. We switch uuid_generator_ to use boost::random_device, which uses /dev/urandom as its source of randomness to be more robust - otherwise it's hard to be sure that we won't have collisions, although it doesn't seem to be a problem in practice. For requests - GetRuntimeProfile() and GetExecSummary() that provide both a session and query ID, the code already checks that the session's user matches the query. An exception to the validation mechanisms above is added for Close() and Cancel() beeswax operations, because impala-shell and some administrative tools allow cancellation of queries on different threads and from different tools. We skip validating the session secret when cancelling queries from the web UI, since web UI users don't have the secret. Testing: * Ran exhaustive tests. * Add tests for all HS2 RPCs that provide invalid session secrets. * Add tests for HS2 RPCs that provide both session and query ID to ensure that query belongs to the session. * Add basic test for beeswax testing accessing a query from different connections. Change-Id: I4c014d1a32e273275a773f842b9ed9793dbdba6b Reviewed-on: http://gerrit.cloudera.org:8080/13585 Reviewed-by: Lars Volker <lv@cloudera.com> Reviewed-by: Thomas Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-11 22:38:24 +00:00
Tim Armstrong	f46de21140	IMPALA-1760: Implement shutdown command This is the same patch except with fixes for the test failures on EC and S3 noted in the JIRA. This allows graceful shutdown of executors and partially graceful shutdown of coordinators (new operations fail, old operations can continue). Details: * In order to allow future admin commands, this is implemented with function-like syntax and does not add any reserved words. * ALL privilege is required on the server * The coordinator impalad that the client is connected to can be shut down directly with ":shutdown()". * Remote shutdown of another impalad is supported, e.g. with ":shutdown('hostname')", so that non-coordinators can be shut down and for the convenience of the client, which does not have to connect to the specific impalad. There is no assumption that the other impalad is registered in the statestore; just that the coordinator can connect to the other daemon's thrift endpoint. This simplifies things and allows shutdown in various important cases, e.g. statestore down. * The shutdown time limit can be overridden to force a quicker or slower shutdown by specifying a deadline in seconds after the statement is executed. * If shutting down, a banner is shown on the root debug page. Workflow: 1. (if a coordinator) clients are prevented from submitting queries to this coordinator via some out-of-band mechanism, e.g. load balancer 2. the shutdown process is started via ":shutdown()" 3. a bit is set in the statestore and propagated to coordinators, which stop scheduling fragment instances on this daemon (if an executor). 4. the query startup grace period (which is ideally set to the AC queueing delay plus some additional leeway) expires 5. once the daemon is quiesced (i.e. no fragments, no registered queries), it shuts itself down. 6. If the daemon does not successfully quiesce (e.g. rogue clients, long-running queries), after a longer timeout (counted from the start of the shutdown process) it will shut down anyway. What this does: * Executors can be shut down without causing a service-wide outage * Shutting down an executor will not disrupt any short-running queries and will wait for long-running queries up to a threshold. * Coordinators can be shut down without query failures only if there is an out-of-band mechanism to prevent submission of more queries to the shut down coordinator. If queries are submitted to a coordinator after shutdown has started, they will fail. * Long running queries or other issues (e.g. stuck fragments) will slow down but not prevent eventual shutdown. Limitations: * The startup grace period needs to be configured to be greater than the latency of statestore updates + scheduling + admission + coordinator startup. Otherwise a coordinator may send a fragment instance to the shutting down impalad. (We could automate this configuration as a follow-on) * The startup grace period means a minimum latency for shutdown, even if the cluster is idle. * We depend on the statestore detecting the process going down if queries are still running on that backend when the timeout expires. This may still be subject to existing problems, e.g. IMPALA-2990. Tests: * Added parser, analysis and authorization tests. * End-to-end test of shutting down impalads. * End-to-end test of shutting down then restarting an executor while queries are running. * End-to-end test of shutting down a coordinator - New queries cannot be started on coord, existing queries continue to run - Exercises various Beeswax and HS2 operations. Change-Id: I8f3679ef442745a60a0ab97c4e9eac437aef9463 Reviewed-on: http://gerrit.cloudera.org:8080/11484 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-26 01:28:36 +00:00
Tim Armstrong	16a04ce81b	Revert "IMPALA-1760: Implement shutdown command" This reverts commit `fda44aed9d`. A couple of the tests broken on S3 and erasure coding. Reverting to unblock testing until we can come up with a proper fix. Change-Id: Icef47b3aa67bc056c40592d47e93c4ebc57be98c Reviewed-on: http://gerrit.cloudera.org:8080/11435 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-09-14 01:12:22 +00:00
Tim Armstrong	fda44aed9d	IMPALA-1760: Implement shutdown command This allows graceful shutdown of executors and partially graceful shutdown of coordinators (new operations fail, old operations can continue). Details: * In order to allow future admin commands, this is implemented with function-like syntax and does not add any reserved words. * ALL privilege is required on the server * The coordinator impalad that the client is connected to can be shut down directly with ":shutdown()". * Remote shutdown of another impalad is supported, e.g. with ":shutdown('hostname')", so that non-coordinators can be shut down and for the convenience of the client, which does not have to connect to the specific impalad. There is no assumption that the other impalad is registered in the statestore; just that the coordinator can connect to the other daemon's thrift endpoint. This simplifies things and allows shutdown in various important cases, e.g. statestore down. * The shutdown time limit can be overridden to force a quicker or slower shutdown by specifying a deadline in seconds after the statement is executed. * If shutting down, a banner is shown on the root debug page. Workflow: 1. (if a coordinator) clients are prevented from submitting queries to this coordinator via some out-of-band mechanism, e.g. load balancer 2. the shutdown process is started via ":shutdown()" 3. a bit is set in the statestore and propagated to coordinators, which stop scheduling fragment instances on this daemon (if an executor). 4. the query startup grace period (which is ideally set to the AC queueing delay plus some additional leeway) expires 5. once the daemon is quiesced (i.e. no fragments, no registered queries), it shuts itself down. 6. If the daemon does not successfully quiesce (e.g. rogue clients, long-running queries), after a longer timeout (counted from the start of the shutdown process) it will shut down anyway. What this does: * Executors can be shut down without causing a service-wide outage * Shutting down an executor will not disrupt any short-running queries and will wait for long-running queries up to a threshold. * Coordinators can be shut down without query failures only if there is an out-of-band mechanism to prevent submission of more queries to the shut down coordinator. If queries are submitted to a coordinator after shutdown has started, they will fail. * Long running queries or other issues (e.g. stuck fragments) will slow down but not prevent eventual shutdown. Limitations: * The startup grace period needs to be configured to be greater than the latency of statestore updates + scheduling + admission + coordinator startup. Otherwise a coordinator may send a fragment instance to the shutting down impalad. (We could automate this configuration as a follow-on) * The startup grace period means a minimum latency for shutdown, even if the cluster is idle. * We depend on the statestore detecting the process going down if queries are still running on that backend when the timeout expires. This may still be subject to existing problems, e.g. IMPALA-2990. Tests: * Added parser, analysis and authorization tests. * End-to-end test of shutting down impalads. * End-to-end test of shutting down then restarting an executor while queries are running. * End-to-end test of shutting down a coordinator - New queries cannot be started on coord, existing queries continue to run - Exercises various Beeswax and HS2 operations. Change-Id: I4d5606ccfec84db4482c1e7f0f198103aad141a0 Reviewed-on: http://gerrit.cloudera.org:8080/10744 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-11 23:57:20 +00:00
Bikramjeet Vig	2de9db8fc6	IMPALA-5216: Make admission control queuing async Implement asynchronous admission control queuing. This is achieved by running the admission control code-path in a separate thread. Major changes include: propagating cancellation to the admission control thread and dequeuing thread, and adding a new Query Operation State called "PENDING" that represents the state between completion of planning and starting of query execution. Testing: - Added a deterministic end to end test and a session expiry test. - Ran multiple stress tests successfully with a cancellation probability of 60% and with different values for the following parameters: max_requests, queue_wait_timeout_ms. Ensured that the impalad was in a valid state afterwards (no orphan fragments or wrong metrics). - Ran all exhaustive tests and ASAN core tests successfully. - Ran data load successfully. Change-Id: I989cf5b259afb8f5bc5c35590c94961c81ce88bf Reviewed-on: http://gerrit.cloudera.org:8080/10060 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-06-13 15:48:17 +00:00
Zoltan Borok-Nagy	ce65b43d47	IMPALA-2248: Make idle_session_timeout a query option This commit makes idle_session_timeout a query option. idle_session_timeout currently can be set as a command line option, which will be the default timeout for sessions. HS2 sessions can override it with a smaller value by setting it in the configuration overlay of HS2 OpenSession(). However, we can't override idle_session_timeout for JDBC/ODBC connections, because we cannot put this in the connection string. This commit is a workaround for this problem, it allows JDBC/ODBC connections to set the session timeout as a query option with the SET statement. After this commit, the session timeout can be overridden to any value, i.e. the command line flag idle_session_timeout doesn't limit this option anymore. I created an automated test case in JdbcTest.java based on test_hs2.py::test_concurrent_session_mixed_idle_timeout. I also extended the test_session_expiration and test_set_and_unset test suites. Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e Reviewed-on: http://gerrit.cloudera.org:8080/8490 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-06 01:47:47 +00:00
Philip Zeyliger	c9740b43d1	IMPALA-5908: Allow SET to unset modified query options. The query 'SET <option>=""' will now unset an option within the session, reverting it to its default state. This change became necessary when "SET" started returning an empty string for unset options which don't have a default. The test infrastructure (impala_test_suite.py) resets options to what it thinks is its defaults, and, when this broke, some ASAN builds started to fail, presumably due to a timing issue with how we re-use connections between tests. Previously, SessionState copied over the default options from the server when the session was created and then mutated that. To support unsetting options at the session layer, this change keeps a pointer to the default server settings, keeps separately the mutations, and overlays the options each time they're requested. Similarly, for configuration overlays that happen per-query, the overlay is now done explicitly, because empty per-query overlay values (key=..., value="") now have no effect. Because "set key=''" is ambiguous between "set to the empty string" and "unset", it's now impossible to set to the empty string, at the session layer, an option that is configured at a previous layer. In practice, this is just debug_action and request_pool. debug_action is essentially an internal tool. For request_pool, this means that setting the default request_pool via impalad command line is now a bad idea, as it can't be cleared at a per-session level. For request_pool, the correct course of action for users is to use placement rules, and to have a default placement rule. Testing: * Added a simple test that triggered this side-effect without this code. Specifically, "impala-python infra/python/env/bin/py.test tests/metadata/test_set.py -s" with the modified set.test triggers. * Amended tests/custom_cluster/test_admission_controller.py; it was useful for testing these code paths. * Added cases to query-options-test to check behavior for both defaulted and non-defaulted values. * Added a custom cluster test that checks that overlays are working against * Ran an ASAN build where this was triggering previously. Change-Id: Ia8c383e68064f839cb5000118901dff77b4e5cb9 Reviewed-on: http://gerrit.cloudera.org:8080/8070 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-05 03:04:38 +00:00
Philip Zeyliger	eb11b46be6	Re-apply: IMPALA-5589: change "set" in impala-shell to show empty string for unset query options (Re-applies reverted commit `387bde0639`. The commit broke ASAN tests due to a race in how test infrastructure re-uses connections. The fix for that is in an adjacent commit.) When converting TQueryOptions to a map<string,string>, we now convert unset options to the empty string. Within TQueryOptions we have optional options (like mt_dop or compression_codec) with no default specified. In this case, the user was seeing 0 for numeric types and the first enum option for enumeration types (e.g., "NONE" in the compression case). This was confusing as the implementation handles this "null" case differently (e.g., using SNAPPY as the default codec in the case reported in the JIRA). When running "set" in impala-shell, the difference is as follows: - BUFFER_POOL_LIMIT: [0] + BUFFER_POOL_LIMIT: [] - COMPRESSION_CODEC: [NONE] + COMPRESSION_CODEC: [] - MT_DOP: [0] + MT_DOP: [] - RESERVATION_REQUEST_TIMEOUT: [0] + RESERVATION_REQUEST_TIMEOUT: [] - SEQ_COMPRESSION_MODE: [0] + SEQ_COMPRESSION_MODE: [] - V_CPU_CORES: [0] + V_CPU_CORES: [] Obviously, the empty string is a valid value for a string-typed option, where it will be impossible to tell the difference between "unset" and "set to empty string." Today, there are two string-typed options: debug_string defaults to "" and request_pool has no default. An alternative would have been to use a special token like "_unset" or to introduce a new field in the beeswax Thrift ConfigVariable struct. I think the empty string approach is clearest. The other users of this state, which I believe are HiveServer2's OpenSession() call and HiveServer2's response to a "SET" query are affected. They benefit from the same fix, and a new test has been added to test_hs2.py. I did a mild refactoring in the HS2 tests to write a helper method for the very common pattern of excecuting a query. Testing: * Manual testing with impala-shell * Modified impala-shell tests to check this explicitly for one case. * Modified HS2 test to check this as well as the SET k=v statement, which looked otherwise untested. Change-Id: I29f5d8ab874cb1338077f16019a9537766cac0c4 Reviewed-on: http://gerrit.cloudera.org:8080/8096 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-03 01:11:50 +00:00
Philip Zeyliger	f0e79314fe	Revert "IMPALA-5589: change "set" in impala-shell to show empty string for unset query options" Due to re-use of connections in the test infrastructure, this commit is causing ASAN failures in the builds. That is being worked out as part of IMPALA-5908, but, in the meanwhile, it's prudent to revert. This reverts commit `387bde0639`. Change-Id: I41bc8ab0f1df45bbd311030981a7c18989c2edc8 Reviewed-on: http://gerrit.cloudera.org:8080/8087 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-16 04:06:53 +00:00
Philip Zeyliger	387bde0639	IMPALA-5589: change "set" in impala-shell to show empty string for unset query options When converting TQueryOptions to a map<string,string>, we now convert unset options to the empty string. Within TQueryOptions we have optional options (like mt_dop or compression_codec) with no default specified. In this case, the user was seeing 0 for numeric types and the first enum option for enumeration types (e.g., "NONE" in the compression case). This was confusing as the implementation handles this "null" case differently (e.g., using SNAPPY as the default codec in the case reported in the JIRA). When running "set" in impala-shell, the difference is as follows: - BUFFER_POOL_LIMIT: [0] + BUFFER_POOL_LIMIT: [] - COMPRESSION_CODEC: [NONE] + COMPRESSION_CODEC: [] - MT_DOP: [0] + MT_DOP: [] - RESERVATION_REQUEST_TIMEOUT: [0] + RESERVATION_REQUEST_TIMEOUT: [] - SEQ_COMPRESSION_MODE: [0] + SEQ_COMPRESSION_MODE: [] - V_CPU_CORES: [0] + V_CPU_CORES: [] Obviously, the empty string is a valid value for a string-typed option, where it will be impossible to tell the difference between "unset" and "set to empty string." Today, there are two string-typed options: debug_string defaults to "" and request_pool has no default. An alternative would have been to use a special token like "_unset" or to introduce a new field in the beeswax Thrift ConfigVariable struct. I think the empty string approach is clearest. The other users of this state, which I believe are HiveServer2's OpenSession() call and HiveServer2's response to a "SET" query are affected. They benefit from the same fix, and a new test has been added to test_hs2.py. I did a mild refactoring in the HS2 tests to write a helper method for the very common pattern of excecuting a query. Testing: * Manual testing with impala-shell * Modified impala-shell tests to check this explicitly for one case. * Modified HS2 test to check this as well as the SET k=v statement, which looked otherwise untested. Change-Id: I86bc06a58d67b099da911293202dae9e844c439b Reviewed-on: http://gerrit.cloudera.org:8080/7886 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-06 19:43:57 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Thomas Tauber-Marshall	5231301084	IMPALA-1633: GetOperationStatus should set errorMessage and sqlState Currently, we never populate the errorMessage or sqlState fields of TGetOperationStatusResp when the GetOperationStatus HiveServer2 rpc is called. This patch checks if the query has an error status and if so sets errorMessage and sqlState. GetOperationStatus also now takes the QueryExecState lock since QueryExecState::query_state_ and QueryExecState::query_status_ are supposed to be protected by it. Additionally, this patch performs some cleanup and adds some documentation around our behavior for updating QueryExecState::query_state_/query_status_. This also addresses IMPALA-3298: TGetOperationStatusResp missing error message when data is expired Change-Id: Icb792f88286779fcf2ce409828de818bc4e80bed Reviewed-on: http://gerrit.cloudera.org:8080/3094 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Internal Jenkins	2016-06-01 19:32:39 -07:00
Casey Ching	074e5b4349	Remove hashbang from non-script python files Many python files had a hashbang and the executable bit set though they were not intended to be run a standalone script. That makes determining which python files are actually scripts very difficult. A future patch will update the hashbang in real python scripts so they use $IMPALA_HOME/bin/impala-python. Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba Reviewed-on: http://gerrit.cloudera.org:8080/599 Reviewed-by: Casey Ching <casey@cloudera.com> Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2015-08-04 05:26:07 +00:00
Henry Robinson	79913b01e6	IMPALA-2064: Add effective_user() builtin The user() builtin always returns the connecteduser. However, if the client wants to see which user its queries are actually delegated to, there was no easy way to do that. This patch adds effective_user(), which returns the proxy delegated user for authorization purposes. If no delegated user is set, the effective user is the same as that returned from user(). The only way to test this is via a new custom cluster test, which sets impala.doas.user so that the effective user might be different from the connected one. Change-Id: I7048c27c6808a6986dbe1246929816176dca9f76 Reviewed-on: http://gerrit.cloudera.org:8080/458 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins	2015-06-16 23:42:40 +00:00
Henry Robinson	66295a8554	IMPALA-1264: Re-enable test_fetch_first Since Impala may sometimes return fewer rows than asked for, it's not safe to test for an exact size response from a single fetch() call except in very particular cases (when either 0 or 1 rows are expected). The HS2 fetch_first tests relied on the full request always being honoured in a couple of places, and as a result are prone to occasional failures due to 'underflow'. This patch changes the fetch() call to use fetch_until() in all places where underflow could happen, and removes the xfail restriction from those V1 and V6 tests. Change-Id: Ia62f3624947530d516a87f84e706e305048b916f Reviewed-on: http://gerrit.cloudera.org:8080/192 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2015-03-11 16:39:39 -07:00
Henry Robinson	c850e6c65c	IMPALA-1453: Fix many bugs with HS2 FETCH_FIRST This patch fixes a number of bugs with HS2 columnar result sets, particularly around the computation of their size in memory. It also fixes test_fetch_first.py to be robust to the possibility of a single fetch not returning all expected rows (IMPALA-1264). In order to properly account for the columnar result set's size in memory, we need to cope with the fact that the size of two result sets combined might be larger than a result set which the result of appending one to the other. However, we estimate the expected size of the result cache as its current size plus the size of the result set to be added (we do this so that we don't perform the copy, then find out it used too much memory). This patch adds logic to QueryExecState::FetchRowsInternal() to correct the true memory usage after the copy, in the case where memory was saved. (This happens for example when we stitch together two null columns and wind up saving a byte). Change-Id: Ie172840ea0ff40e43370825ca8f671a86dde4f22 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5199 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins	2014-11-21 16:40:41 -08:00
Henry Robinson	6bc411c890	Add support for HS2 protocol V6 This patch adds support for V6 of the HS2 protocol, which notably includes columnar organisation of result sets. Clients that set their protocol version to < V6 will receive result sets in the traditional row orientation. The performance of fetches over HS2 goes up significantly as a result, since the V1 protocol had some pathologies in its deserialisation performance. Beeswax Row materialisation: 455ms, client processing time: 523ms HS2 V6: Row materialisation: 444ms, client processing time: 1.8s HS2 V1: Row materialisation: 585ms, client processing time: 15.9s (!) TODO: Add support for the CHAR datatype The following patch is also included: Fix wait-for-hiveserver2.py when Impala moves to HS2 V6 Due to HIVE-6050, older versions of Hive are not compatible with newer clients (even those that try to use old protocol versions). wait-for-hiveserver2.py uses HS2 to talk to the HiveServer2 service, but picks up the newer version from V6, and fails. This patch temporarily re-adds cli_service.thrift (renaming the Thrift service as LegacyTCLIService) only for wait-for-hiveserver2.py to use. As soon as Impala's thirdparty Hive moves to HS2 V6, we can get rid of this change. Change-Id: I2cbe884345ae7e772620b80a29b6574bd6532940 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4402 Tested-by: jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2014-09-18 20:17:18 -07:00
Henry Robinson	466cedb217	IMPALA-1162: Add GetExecSummary() and GetRuntimeProfile() to HS2 This patch adds RPCs for GetExecSummary() and GetRuntimeProfile() to the Impala HS2 service, to get reporting parity with the Beeswax service implementation. Change-Id: I966a85854e5a657a26c226cc0e68e053fb2aa6c2 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3902 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3944 Reviewed-by: Henry Robinson <henry@cloudera.com>	2014-08-20 01:50:18 -07:00
Nong Li	8994f388db	Update hs2 client API to the version hive uses in 5.1. The driving motivation is to be able to return precision/scale for decimal types. Change-Id: I1b49c5a61b59a292bc09612c7945191078a22bf8 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3772 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-08-05 16:09:39 -07:00
Srinath Shankar	d193a1e8a5	IMPALA-963: Impala crash in ClearResultCache() The issue is that Impala crashes in ClearResultCache() with result caching on for parallel inserts. The reason is that the ClearResltCache() accesses the coordinator RuntimeState to update the query mem tracker. However, for there is no coordinator fragment (or RuntimeState) for parallel inserts. The fix is to intiialize a query mem tracker to track memory usage in the coordinator instance even if there is no coordinator fragment. Change-Id: I3a2ef14860f683910c29ae19b931202ca6867b9f Reviewed-on: http://gerrit.ent.cloudera.com:8080/2501 Reviewed-by: Srinath Shankar <sshankar@cloudera.com> Tested-by: jenkins	2014-05-19 12:40:12 -07:00
Alex Behm	6b769d011d	Adds limited support for the FETCH_FIRST fetch orientation in HS2 client requests. Adds a bounded query-result cache that clients can enable by setting an 'impala.resultset.cache.size' option in the HS2 confOverlay mapof the HS2 exec request. Impala permits FETCH_FIRST for a particular stmt iff result caching is enabled. FETCH_FIRST will succeed as long all previously fetched rows fit into the bounded result cache. Regardless of whether a FETCH_FIRST succeeds or not, clients may always resume fetching with FETCH_NEXT. The FETCH_FIRST feature is intended to allow HUE users to export an entire result set (to Excel, CSV, etc.) after browsing through a few pages of results, without having ro re-run the query from scratch. Change-Id: I71ab4794ddef30842594c5e1f7bc94724d6ce89f Reviewed-on: http://gerrit.ent.cloudera.com:8080/1356 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1406	2014-01-30 14:58:46 -08:00

29 Commits