9 Commits

Author SHA1 Message Date
Riza Suminto
4617c2370f IMPALA-13908: Remove reference to ImpalaBeeswaxException
This patch replace ImpalaBeeswaxException reference to
IMPALA_CONNECTION_EXCEPTION as much as possible.

Fix some easy flake8 issues caught thorugh this command:
git show HEAD --name-only | grep '^tests.*py' \
  | xargs -I {} impala-flake8 {} \
  | grep -e U100 -e E111 -e E301 -e E302 -e E303 -e F...

Testing:
- Pass exhaustive tests.

Change-Id: I676a9954404613a1cc35ebbc9ffa73e8132f436a
Reviewed-on: http://gerrit.cloudera.org:8080/22701
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-30 00:15:43 +00:00
Joe McDonnell
82bd087fb1 IMPALA-11973: Add absolute_import, division to all eligible Python files
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
 1. Python 3 requires absolute imports within packages. This
    can be emulated via "from __future__ import absolute_import"
 2. Python 3 changed division to "true" division that doesn't
    round to an integer. This can be emulated via
    "from __future__ import division"

This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.

I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.

Testing:
 - Ran core tests

Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Tim Armstrong
2ca7f8e7c0 IMPALA-7995: part 1: fixes for e2e dockerised impala tests
This fixes all core e2e tests running on my local dockerised
minicluster build. I do not yet have a CI job or script running
but I wanted to get feedback on these changes sooner. The second
part of the change will include the CI script and any follow-on
fixes required for the exhaustive tests.

The following fixes were required:
* Detect docker_network from TEST_START_CLUSTER_ARGS
* get_webserver_port() does not depend on the caller passing in
  the default webserver port. It failed previously because it
  relied on start-impala-cluster.py setting -webserver_port
  for *all* processes.
* Add SkipIf markers for tests that don't make sense or are
  non-trivial to fix for containerised Impala.
* Support loading Impala-lzo plugin from host for tests that depend on
  it.
* Fix some tests that had 'localhost' hardcoded - instead it should
  be $INTERNAL_LISTEN_HOST, which defaults to localhost.
* Fix bug with sorting impala daemons by backend port, which is
  the same for all dockerised impalads.

Testing:
I ran tests locally as follows after having set up a docker network and
starting other services:

  ./buildall.sh -noclean -notests -ninja
  ninja -j $IMPALA_BUILD_THREADS docker_images
  export TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster"
  export FE_TEST=false
  export BE_TEST=false
  export JDBC_TEST=false
  export CLUSTER_TEST=false
  ./bin/run-all-tests.sh

Change-Id: Iee86cbd2c4631a014af1e8cef8e1cd523a812755
Reviewed-on: http://gerrit.cloudera.org:8080/12639
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-04-13 02:42:32 +00:00
Henry Robinson
5bb48ed71d IMPALA-4925: Cancel finstance if query has finished
This patch is a partial fix for the issue where an finst would not
detect that it should cancel if the query limit had not been hit. It
changes the UpdateExecStatus() RPC to return a cancelled status to an
finst if the query has finished because it hit a limit.

For certain queries, this allows them to finish much more quickly than
they otherwise would. However, there's still a few-second delay for the
finst to pick up the cancellation signal, because there
UpdateExecStatus() RPC is only called every few seconds.

A complete fix would also call CancelInternal() when returned_all_results_
was set to true. That would be a much larger change. The improvement
here is to bound the delay between query completion and fragment
teardown to a few seconds.

Change-Id: I59f45e64978c9ab9914b5c33e86009960b4a88c4
Reviewed-on: http://gerrit.cloudera.org:8080/5987
Tested-by: Impala Public Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2017-07-19 17:01:01 +00:00
Tim Armstrong
e2cde13a2b IMPALA-4519: increase timeout in TestFragmentLifecycle
Increase the timeout to over 120s to match datastream_sender_timeout_ms.
This should avoid spurious test failures if we are unlucky and a sender
gets stuck waiting for a receiver fragment that will never start.

Testing:
Ran the test in a loop for a while to flush out any flakiness.

Change-Id: I9fe6e6c74538d0747e3eeb578cf0518494ff10c8
Reviewed-on: http://gerrit.cloudera.org:8080/5244
Tested-by: Impala Public Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2016-11-29 01:46:05 +00:00
Tim Armstrong
94ef5b19b9 IMPALA-4087: TestFragmentLifecycle.test_failure_in_prepare
The test previously got the "initial" value of number of fragments by
reading the metric. This gave an incorrect non-zero result if there were
any leftover queries running on the cluster.

Avoid the problem and simplify the test by explicitly waiting for the
number of fragments to go to zero.

Change-Id: I112e502a25f075928b0f6ef376c7fd9c6376ef4d
Reviewed-on: http://gerrit.cloudera.org:8080/4325
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-09-07 23:33:36 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Juan Yu
2ab130aa0a IMPALA-3575: Add retry to backend connection request and rpc timeout
This patch adds a configurable timeout for all backend client
RPC to avoid query hang issue.

Prior to this change, Impala doesn't set socket send/recv timeout for
backend client. RPC will wait forever for data. In extreme cases
of bad network or destination host has kernel panic, sender will not
get response and RPC will hang. Query hang is hard to detect. If
hang happens at ExecRemoteFragment() or CancelPlanFragments(), query
cannot be canelled unless you restart coordinator.

Added send/recv timeout to all RPCs to avoid query hang. For catalog
client, keep default timeout to 0 (no timeout) because ExecDdl()
could take very long time if table has many partitons, mainly waiting
for HMS API call.

Added a wrapper RetryRpcRecv() to wait for receiver response for
longer time. This is needed by certain RPCs. For example, TransmitData()
by DataStreamSender, receiver could hold response to add back pressure.

If an RPC fails, the connection is left in an unrecoverable state.
we don't put the underlying connection back to cache but close it. This
is to make sure broken connection won't cause more RPC failure.

Added retry for CancelPlanFragment RPC. This reduces the chance that cancel
request gets lost due to unstable network, but this can cause cancellation
takes longer time. and make test_lifecycle.py more flaky.
The metric num-fragments-in-flight might not be 0 yet due to previous tests.
Modified the test to check the metric delta instead of comparing to 0 to
reduce flakyness. However, this might not capture some failures.

Besides the new EE test, I used the following iptables rule to
inject network failure to verify RPCs never hang.
1. Block network traffic on a port completely
  iptables -A INPUT -p tcp -m tcp --dport 22002 -j DROP
2. Randomly drop 5% of TCP packets to slowdown network
  iptables -A INPUT -p tcp -m tcp --dport 22000 -m statistic --mode random --probability 0.05 -j DROP

Change-Id: Id6723cfe58df6217f4a9cdd12facd320cbc24964
Reviewed-on: http://gerrit.cloudera.org:8080/3343
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
2016-07-18 13:29:24 -07:00
Henry Robinson
89c3c61a92 IMPALA-1599: Make Prepare() async for plan fragments
When codegen is enabled, Prepare() for plan fragments can be relatively
expensive (the cost comes mostly from preparing and optimizing the LLVM
bitcode module). However, we call it on the critical path for query
setup, during the ExecRemoteFragment() RPC issued by the
coordinator. This can lead to high startup latencies for queries,
particularly those with many fragments.

This patch moves Prepare() to the fragment exec thread. This allows the
coordinator to start many fragments in quick succession. Doing so
complicates matters regarding cancellation, which originally could not
occur during Prepare() as the coordinator would wait for all RPCs to
finish before issuing any cancellation.

To address the new complexity, this patch loosens the existing contract
for concurrent calls of PlanFragmentExecutor::Prepare() and
Cancel(). Previously Cancel() must not have been called before Prepare()
had returned.  Now they may be called in any order, including
concurrently, and the two methods will coordinate to ensure the correct
execution order. Cancel() will always block until Prepare() has
finished, unless it was called strictly before Prepare().

Making this change allows us to rework the previous logic where
Prepare() always had to be called before a fragment was registered (so
that any concurrent Cancel() calls could not 'find' the fragment). The
order is now register->prepare->exec->unregister. Cancellation may occur
any time after ExecPlanFragment() returns.

Change-Id: Ie39737dc419d7708dd881e68d1035e05d3256d19
Reviewed-on: http://gerrit.cloudera.org:8080/539
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
2016-02-02 20:10:02 +00:00