impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Riza Suminto	4617c2370f	IMPALA-13908: Remove reference to ImpalaBeeswaxException This patch replace ImpalaBeeswaxException reference to IMPALA_CONNECTION_EXCEPTION as much as possible. Fix some easy flake8 issues caught thorugh this command: git show HEAD --name-only \| grep '^tests.*py' \ \| xargs -I {} impala-flake8 {} \ \| grep -e U100 -e E111 -e E301 -e E302 -e E303 -e F... Testing: - Pass exhaustive tests. Change-Id: I676a9954404613a1cc35ebbc9ffa73e8132f436a Reviewed-on: http://gerrit.cloudera.org:8080/22701 Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-30 00:15:43 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Tim Armstrong	2ca7f8e7c0	IMPALA-7995: part 1: fixes for e2e dockerised impala tests This fixes all core e2e tests running on my local dockerised minicluster build. I do not yet have a CI job or script running but I wanted to get feedback on these changes sooner. The second part of the change will include the CI script and any follow-on fixes required for the exhaustive tests. The following fixes were required: * Detect docker_network from TEST_START_CLUSTER_ARGS * get_webserver_port() does not depend on the caller passing in the default webserver port. It failed previously because it relied on start-impala-cluster.py setting -webserver_port for all processes. * Add SkipIf markers for tests that don't make sense or are non-trivial to fix for containerised Impala. * Support loading Impala-lzo plugin from host for tests that depend on it. * Fix some tests that had 'localhost' hardcoded - instead it should be $INTERNAL_LISTEN_HOST, which defaults to localhost. * Fix bug with sorting impala daemons by backend port, which is the same for all dockerised impalads. Testing: I ran tests locally as follows after having set up a docker network and starting other services: ./buildall.sh -noclean -notests -ninja ninja -j $IMPALA_BUILD_THREADS docker_images export TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster" export FE_TEST=false export BE_TEST=false export JDBC_TEST=false export CLUSTER_TEST=false ./bin/run-all-tests.sh Change-Id: Iee86cbd2c4631a014af1e8cef8e1cd523a812755 Reviewed-on: http://gerrit.cloudera.org:8080/12639 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-13 02:42:32 +00:00
Henry Robinson	5bb48ed71d	IMPALA-4925: Cancel finstance if query has finished This patch is a partial fix for the issue where an finst would not detect that it should cancel if the query limit had not been hit. It changes the UpdateExecStatus() RPC to return a cancelled status to an finst if the query has finished because it hit a limit. For certain queries, this allows them to finish much more quickly than they otherwise would. However, there's still a few-second delay for the finst to pick up the cancellation signal, because there UpdateExecStatus() RPC is only called every few seconds. A complete fix would also call CancelInternal() when returned_all_results_ was set to true. That would be a much larger change. The improvement here is to bound the delay between query completion and fragment teardown to a few seconds. Change-Id: I59f45e64978c9ab9914b5c33e86009960b4a88c4 Reviewed-on: http://gerrit.cloudera.org:8080/5987 Tested-by: Impala Public Jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2017-07-19 17:01:01 +00:00
Tim Armstrong	e2cde13a2b	IMPALA-4519: increase timeout in TestFragmentLifecycle Increase the timeout to over 120s to match datastream_sender_timeout_ms. This should avoid spurious test failures if we are unlucky and a sender gets stuck waiting for a receiver fragment that will never start. Testing: Ran the test in a loop for a while to flush out any flakiness. Change-Id: I9fe6e6c74538d0747e3eeb578cf0518494ff10c8 Reviewed-on: http://gerrit.cloudera.org:8080/5244 Tested-by: Impala Public Jenkins Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2016-11-29 01:46:05 +00:00
Tim Armstrong	94ef5b19b9	IMPALA-4087: TestFragmentLifecycle.test_failure_in_prepare The test previously got the "initial" value of number of fragments by reading the metric. This gave an incorrect non-zero result if there were any leftover queries running on the cluster. Avoid the problem and simplify the test by explicitly waiting for the number of fragments to go to zero. Change-Id: I112e502a25f075928b0f6ef376c7fd9c6376ef4d Reviewed-on: http://gerrit.cloudera.org:8080/4325 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-09-07 23:33:36 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Juan Yu	2ab130aa0a	IMPALA-3575: Add retry to backend connection request and rpc timeout This patch adds a configurable timeout for all backend client RPC to avoid query hang issue. Prior to this change, Impala doesn't set socket send/recv timeout for backend client. RPC will wait forever for data. In extreme cases of bad network or destination host has kernel panic, sender will not get response and RPC will hang. Query hang is hard to detect. If hang happens at ExecRemoteFragment() or CancelPlanFragments(), query cannot be canelled unless you restart coordinator. Added send/recv timeout to all RPCs to avoid query hang. For catalog client, keep default timeout to 0 (no timeout) because ExecDdl() could take very long time if table has many partitons, mainly waiting for HMS API call. Added a wrapper RetryRpcRecv() to wait for receiver response for longer time. This is needed by certain RPCs. For example, TransmitData() by DataStreamSender, receiver could hold response to add back pressure. If an RPC fails, the connection is left in an unrecoverable state. we don't put the underlying connection back to cache but close it. This is to make sure broken connection won't cause more RPC failure. Added retry for CancelPlanFragment RPC. This reduces the chance that cancel request gets lost due to unstable network, but this can cause cancellation takes longer time. and make test_lifecycle.py more flaky. The metric num-fragments-in-flight might not be 0 yet due to previous tests. Modified the test to check the metric delta instead of comparing to 0 to reduce flakyness. However, this might not capture some failures. Besides the new EE test, I used the following iptables rule to inject network failure to verify RPCs never hang. 1. Block network traffic on a port completely iptables -A INPUT -p tcp -m tcp --dport 22002 -j DROP 2. Randomly drop 5% of TCP packets to slowdown network iptables -A INPUT -p tcp -m tcp --dport 22000 -m statistic --mode random --probability 0.05 -j DROP Change-Id: Id6723cfe58df6217f4a9cdd12facd320cbc24964 Reviewed-on: http://gerrit.cloudera.org:8080/3343 Reviewed-by: Juan Yu <jyu@cloudera.com> Tested-by: Internal Jenkins	2016-07-18 13:29:24 -07:00
Henry Robinson	89c3c61a92	IMPALA-1599: Make Prepare() async for plan fragments When codegen is enabled, Prepare() for plan fragments can be relatively expensive (the cost comes mostly from preparing and optimizing the LLVM bitcode module). However, we call it on the critical path for query setup, during the ExecRemoteFragment() RPC issued by the coordinator. This can lead to high startup latencies for queries, particularly those with many fragments. This patch moves Prepare() to the fragment exec thread. This allows the coordinator to start many fragments in quick succession. Doing so complicates matters regarding cancellation, which originally could not occur during Prepare() as the coordinator would wait for all RPCs to finish before issuing any cancellation. To address the new complexity, this patch loosens the existing contract for concurrent calls of PlanFragmentExecutor::Prepare() and Cancel(). Previously Cancel() must not have been called before Prepare() had returned. Now they may be called in any order, including concurrently, and the two methods will coordinate to ensure the correct execution order. Cancel() will always block until Prepare() has finished, unless it was called strictly before Prepare(). Making this change allows us to rework the previous logic where Prepare() always had to be called before a fragment was registered (so that any concurrent Cancel() calls could not 'find' the fragment). The order is now register->prepare->exec->unregister. Cancellation may occur any time after ExecPlanFragment() returns. Change-Id: Ie39737dc419d7708dd881e68d1035e05d3256d19 Reviewed-on: http://gerrit.cloudera.org:8080/539 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins	2016-02-02 20:10:02 +00:00

9 Commits