Commit Graph

128 Commits

Author SHA1 Message Date
yx91490
c7784bde55 IMPALA-1682: Support printing the output of a query (rows) vertically.
In vertical mode, impala-shell will print each row in the format:
firstly print a line contains line number, then print this row's columns
line by line, each column line started with it's name and a colon.

To enable it: use shell option '-E' or '--vertical', or 'set VERTICAL=
true' in interactive mode. to disable it in interactive mode: 'set
VERTICAL=false'. NOTICE: it will be disabled if '-B' option or 'set
WRITE_DELIMITED=true' is specified.

Tests:
add methods in test_shell_interactive.py and test_shell_commandline.py.

Change-Id: I5cee48d5a239d6b7c0f51331275524a25130fadf
Reviewed-on: http://gerrit.cloudera.org:8080/18549
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-06-13 15:41:07 +00:00
Joe McDonnell
c41e6941ca IMPALA-11332: Fix trailing whitespace for CSV output
The current CSV output is stripping trailing
whitespaces from the last line of CSV output. This
rstrip() was intended to remove an extra newline,
but it is matching other white space. This is a
problem for a SQL query like:
select 'Trailing whitespace          ';

This changes the rstrip() to rstrip('\n') to
avoid removing the other white space.

Testing:
 - Current shell tests pass
 - Added a shell test that verifies trailing whitespace
   is not being stripped.

Change-Id: I69d032ca2f581587b0938d0878fdf402fee0d57e
Reviewed-on: http://gerrit.cloudera.org:8080/18580
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-06-02 09:36:50 +00:00
Joe McDonnell
ed0d9341d3 IMPALA-11325: Fix UnicodeDecodeError for shell file output
When using the --output_file commandline option for
impala-shell, the shell fails with UnicodeDecodeError
if the output contains Unicode characters.

For example, if running this command:
impala-shell -B -q "select '引'" --output_file=output.txt
This fails with:
UnicodeDecodeError : 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)

This happens due to an encode('utf-8') call happening
in OutputStream::write() on a string that is already UTF-8 encoded.
This changes the code to skip the encode('utf-8') call for Python 2.
Python 3 is using a string and still needs the encode call.

This is mostly a pragmatic fix to make the code a little bit
more functional, and there is more work to be done to have
clear contracts for the format() methods and clear points
of conversion to/from bytes.

Testing:
 - Ran shell tests with Python 2 and Python 3 on Ubuntu 18
 - Added a shell test that outputs a Unicode character
   to an output file. Without the fix, this test fails.

Change-Id: Ic40be3d530c2694465f7bd2edb0e0586ff0e1fba
Reviewed-on: http://gerrit.cloudera.org:8080/18576
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-06-02 01:53:09 +00:00
Joe McDonnell
6a199be854 IMPALA-11249: Fix add_test_dimensions() locations to call super()
The original issue is that the strict HS2 shell tests
are not running in precommit or nightly jobs, but they
do run in local developer environments. Investigating
this showed that the shell tests were running with a
weird set of test dimensions that includes
table_format_and_file_extension. That dimension is only
used in test_insert.py::TestInsertFileExtension.

What is happening is that the shell tests and other
locations are running add_test_dimensions() without
calling super(..., cls).add_test_dimensions(). The
behavior is unclear, but there is clearly cross-talk
between the different tests that do this.

This changes all add_test_dimensions() locations to
call super(..., cls).add_test_dimensions() if they
don't already. Each location has been tuned to run
the same set of tests as before (except the shell
tests which now run the strict HS2 tests).

As part of this, several shell tests need to be
skipped or fixed for strict HS2.

Testing:
 - Ran core job
 - Ran tests locally to verify the set of tests
   didn't change.

Change-Id: Ib20fd479d3b91ed0ed89a0bc5623cd2a5a458614
Reviewed-on: http://gerrit.cloudera.org:8080/18557
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-05-26 03:42:51 +00:00
Joe McDonnell
0ee5f8084f IMPALA-11317/IMPALA-11316/IMPALA-11315: impala-shell Python 3 fixes
This fixes a few impala-shell Python 3 issues:
1. In ImpalaShell's do_history(), the decode() call needs to be
   avoided in Python 3, because in Python 3 the cmd is already
   a string and doesn't need further decoding. (IMPALA-11315)
2. TestImpalaShell.test_http_socket_timeout() gets a different
   error message in Python 3. It throws the "BlockingIOError"
   rather than "socker.error". (IMPALA-11316)
3. ImpalaHttpClient.py's code to retrieve the body when
   handling an HTTP error needs to have a decode() call
   for the body. Otherwise, the body remains bytes and
   causes TestImpalaShellInteractive.test_http_interactions_extra()
   to fail. (IMPALA-11317)

Testing:
 - Ran shell tests in the standard way
 - Ran shell tests with the impala-shell executable coming from
   a Python 3 virtualenv using the PyPi package

Change-Id: Ie58380a17d7e011f4ce96b27d34717509a0b80a6
Reviewed-on: http://gerrit.cloudera.org:8080/18556
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-05-25 22:47:40 +00:00
Joe McDonnell
a11450db86 IMPALA-11313: Use Thrift 0.14.2 for impala-shell PyPi package
Thrift 0.11.0 has known issues where Unicode errors are
not handler properly, including one case where the client
can hang. The traditional form factor for impala-shell
uses a patched Thrift that fixes those issues, but the
PyPi package uses the unpatched Thrift 0.11.0.

This modifies the requirements.txt file to use Thrift 0.14.2,
which has fixes for these Unicode issues. Thrift 0.14.2 has
a slightly different error message, so this amends the
allowed error messages in test_utf8_decoding_error_handling().

This is a bit awkward, given that the Python code generation
continues to happen with Thrift 0.11.0. Comparing the
Python code for Thrift 0.11 vs Thrift 0.14, I didn't see
noticeable differences. Given that the client can hang,
this seems worth fixing ahead of the full conversion to
Thrift 0.14 for all of Impala.

Testing:
 - Ran the Unicode error handling tests with a PyPi
   impala-shell
 - Ran the shell tests normally

Change-Id: I63e0a5dda98df20c9184a347397118b1f3529603
Reviewed-on: http://gerrit.cloudera.org:8080/18560
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-05-24 21:09:27 +00:00
Steve Carlin
c1f99d1369 IMPALA-11213: Fixed impala-shell strict hs2 mode for large fetches
The strict hs2 protocol mode is broken when fetching large results.
The FetchResults.hasMoreRows field is always returned as false.  When
there are no more results, Hive returns an empty batch with no rows.
HIVE-26108 has been filed to support the hasMoreRows field.

Added a framework test that retrieves 1M rows from tpcds. The default
number of rows returned from Hive is 10K so this should be more than
enough to ensure that multiple fetches are done.

Change-Id: Ife436d91e7fe0c30bf020024e20a5d8ad89faa24
Reviewed-on: http://gerrit.cloudera.org:8080/18370
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2022-04-02 20:30:35 +00:00
Abhishek Rawat
8e755e7571 IMPALA-11126: impala-shell: Support configurable socket timeout for http
client

In 'hs2-http' mode, the socket timeout is None, which could cause
hang like symptoms in case of a problematic remote server.

Added support for configurable socket timeout using the new impala-shell
config option '--http_socket_timeout_s'. If a reasonable timeout is
set, impala-shell client can retry in case of connection issues, when
possible. The default value of '--http_socket_timeout_s' is set to None,
to prevent behavior changes for existing clients.

More details on socket timeout here:
https://docs.python.org/3/library/socket.html#socket-timeouts

Testing:
- Added tests for various timeout values in test_shell_commandline.py
- Ran e2e shell tests.

Change-Id: I29fa4ff96cdcf154c3aac7e43340af60d7d61e94
Reviewed-on: http://gerrit.cloudera.org:8080/18336
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2022-04-01 16:31:19 +00:00
Steve Carlin
5cfdff03f7 IMPALA-11095: Fix Impala-shell strict_hs2 mode inserts
The insert command was broken for impala-shell in the strict_hs2
mode. The return parameter for close_dml should return two parameters.

The parameters returned by close_dml are rows returned and error
rows. These are not supported by strict hs2 mode since the close
does not return the TDmlResult structure. So the message to
the end user also had to be changed.

Change-Id: Ibe837c99e54d68d1e27b97f0025e17faf0a2cb9f
Reviewed-on: http://gerrit.cloudera.org:8080/18176
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2022-02-04 07:42:52 +00:00
Steve Carlin
bb9fb663ce IMPALA-10778: Allow impala-shell to connect directly to HS2
Impala-shell already uses HS2 protocol to connect to Impalad.
This commit allows impala-shell to connect to any server (for
example, Hive) using the hs2 protocol. This will be done via
the "--strict_hs2_protocol" option.

When the "--strict_hs2_protocol" option is turned on, only features
supported by hs2 will work. For instance, "runtime-profile" is an
impalad specific feature and will be disabled.

The "--strict_hs2_protocol" will only work on servers that abide
by the strict definition of what is supported by HS2. So one will
be able to connect to Hive in this mode, but connections to Impala
will not work. Any feature supported by Hive (e.g. kerberos
authentication) should work as well.

Note: While authentication should work, the test framework is not
set up to create an HS2 server that does authentication at this point
so this feature should be used with caution.
Change-Id: I674a45640a4a7b3c9a577830dbc7b16a89865a9e
Reviewed-on: http://gerrit.cloudera.org:8080/17660
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-08-27 09:45:59 +00:00
Bikramjeet Vig
a9c8166694 IMPALA-10783: Fixed flakiness in run_and_verify_query_cancellation_test
The issue was that after the impala-shell is started in a seperate
process and an error is encountered then the process lingers on
and a long running query can hold on to resources and potentially
affect other tests running on the impala cluster.
This patch just makes sure that the impala-shell process is killed
regardless of any errors encountered.

Change-Id: I9f6d22d639921051cde5675fae1845bedb61c8cc
Reviewed-on: http://gerrit.cloudera.org:8080/17768
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-08-17 13:41:26 +00:00
stiga-huang
2dfc68d852 IMPALA-7712: Support Google Cloud Storage
This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.

New flags for GCS:
 - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.

Follow-up:
 - Support for spilling to GCS will be addressed in IMPALA-10561.
 - Support for caching GCS file handles will be addressed in
   IMPALA-10568.
 - test_concurrent_inserts and test_failing_inserts in
   test_acid_stress.py are skipped due to slow file listing on
   GCS (IMPALA-10562).
 - Some tests are skipped due to issues introduced by /etc/hosts setting
   on GCE instances (IMPALA-10563).

Tests:
 - Compile and create hdfs test data on a GCE instance. Upload test data
   to a GCS bucket. Modify all locations in HMS DB to point to the GCS
   bucket. Remove some hdfs caching params. Run CORE tests.
 - Compile and load snapshot data to a GCS bucket. Run CORE tests.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Reviewed-on: http://gerrit.cloudera.org:8080/17121
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-13 11:20:08 +00:00
Andrew Sherman
3b763b5c32 IMPALA-10447: Add a newline when exporting shell output to a file.
Impala shell outputs a batch of rows using OutputStream. Inside
OutputStream, output to a file is handled slightly differently from
output that is written to stdout. When writing to stdout we use print()
(which appends a newline) while when writing to a file we use write()
(which adds nothing). This difference was introduced in IMPALA-3343 so
this bug may be a regression introduced then. To ensure that output is
the same in either case we need to add a newline after writing each
batch of rows to a file.

TESTING:
    Added a new test for this case.

Change-Id: I078a06c54e0834bc1f898626afbfff4ded579fa9
Reviewed-on: http://gerrit.cloudera.org:8080/16966
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-01-26 08:32:29 +00:00
Tim Armstrong
ab6b7960db IMPALA-10027: configurable default anonymous user
A username can be determined for a session via two mechanisms:
* In a secure env, the user is authenticated by LDAP or Kerberos
* In an unsecure env, the client specifies the user name, either
  as a parameter to the OpenSession API (HS2) or as a parameter
  to the first query run (beeswax)

This patch affects what happens if neither of the above mechanisms
is used. Previously we would end up with the username being an
empty string, but this makes Ranger unhappy. Hive uses the name
"anonymous" in this situation, so we change Impala's behaviour too.

This is configurable by -anonymous_user_name. -anonymous_user_name=
reverts to the old behaviour.

Test
* Add an end-to-end test that exercises this via impala-shell for
  HS2, HS2-HTTP and beeswax protocols.
* Tweak a couple of existing tests that depended on the previous
  behavior.

Change-Id: I6db491231fa22484aed476062b8fe4c8f69130b0
Reviewed-on: http://gerrit.cloudera.org:8080/16902
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-01-09 00:15:25 +00:00
stiga-huang
cc8ecd0926 IMPALA-10333: Fix utf-8 test failures when impala-shell using older thrift versions
In some branches that impala-shell still uses older version of thrift,
e.g. thrift-0.9.3-p8, test_utf8_decoding_error_handling will fail since
the internal string representation of thrift versions lower than 0.10.0
is still bytes. Strings won't be decoded to unicodes so there won't be
any decoding errors. The test expects some bytes that can't be decoded
correctly be replaced with U+FFFD so fails.

This patch improve the test by also expecting results from older thrift
versions. So it can be cherry-picked to older branches.

Tests:
 - Verify the test in master branch and a downstream branch that still
   uses thrift-0.9.3-p8 in impala-shell.

Change-Id: Ieb0baa9b3a1480673af77f7cc35c05eacf4b449f
Reviewed-on: http://gerrit.cloudera.org:8080/16767
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-23 11:30:48 +00:00
Andrew Sherman
47dbfde0b2 IMPALA-10249: Fix the flaky TestImpalaShell.test_queries_closed test.
This test for IMPALA-897 is testing that queries run by Impala Shell
from a script file are closed correctly.  This is tested by an assertion
that there is one in-flight query during execution of a script
containing several queries. The test then closes the shell and checks
that there are no in-flight queries. This is the assertion which failed.
Change this assertion to instead wait for the number of in-flight
queries to be zero. This avoids whatever race was causing the flakiness.

Change-Id: Ib0485097c34282523ed0df6faa143fee6f74676d
Reviewed-on: http://gerrit.cloudera.org:8080/16743
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-19 03:19:55 +00:00
stiga-huang
9d6bf35090 IMPALA-10145,IMPALA-10299: Bump impala-shell thrift version to 0.11.0-p4
Currently, impala-shell depends on thrift-0.11.0-p2, while impala
servers depend on thrift-0.9.3-p8. After 0.10.0, thrift changes its
internal strings representation from bytes to unicode (THRIFT-3503) to
support Python3. THRIFT-2087 and THRIFT-5303 are two patches for
specifying an error handling method in decoding utf-8 strings in thrift.
Without them, impala-shell may get an unexpected UnicodeDecodeError when
decoding thrift objects from impala servers. This patch bumps
impala-shell's thrift version to 0.11.0-p4 to include these two patches.

Tests:
 - This is a regression after we bump impala-shell's thrift version to
   0.11. Added a test to avoid the regression in the future.

Change-Id: I0f9898639b5648658efc2d3c5c0ee4721fb85776
Reviewed-on: http://gerrit.cloudera.org:8080/16700
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-10 04:25:50 +00:00
Thomas Tauber-Marshall
01e1b4df80 IMPALA-10303: Fix warnings from impala-shell with --quiet
When the --quiet flag is used with impala-shell, the intention is that
if the query is successful then only the query results should be
printed.

This patch fixes two cases where --quiet was not being respected:
- When using the HTTP transport and --client_connect_timeout_ms is
  set, a warning is printed that the timeout is not applied.
- When running in non-interactive mode, a warning is printed that
  --live_progress is automatically disabled. This warning is now also
  only printed if --live_progress is actually set.

Testing:
- Added a test that runs a simple query with --quiet and confirms the
  output is as expected.

Change-Id: I1e94c9445ffba159725bacd6f6bc36f7c91b88fe
Reviewed-on: http://gerrit.cloudera.org:8080/16673
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-30 02:17:29 +00:00
Sahil Takiar
13f50eaec5 IMPALA-9229: impala-shell 'profile' to show original and retried queries
Currently, the impala-shell 'profile' command only returns the profile
for the most recent profile attempt. There is no way to get the original
query profile (the profile of the first query attempt that failed) from
the impala-shell.

This patch modifies TGetRuntimeProfileReq and TGetRuntimeProfileResp to
add support for returning both the original and retried profiles for a
retried query. When a query is retried, TGetRuntimeProfileResp currently
contains the profile for the most recent query attempt.
TGetRuntimeProfileReq has a new field called 'include_query_attempts'
and when it is set to true, the TGetRuntimeProfileResp will include all
failed profiles in a new field called failed_profiles /
failed_thrift_profiles.

impala-shell has been modified so the 'profile' command has a new set of
options. The syntax is now:

PROFILE [ALL | LATEST | ORIGINAL]

If 'ALL' is specified, both the latest and original profiles are
printed. If 'LATEST' is specified, only the latest profile is printed.
If 'ORIGINAL' is printed, only the original profile is printed. The
default behavior is equivalent to specifying 'LATEST' (which is the
current behavior before this patch as well).

Support for this has only been added to HS2 given that Beeswax is being
deprecated soon. The new 'profile' options have no affect when the
Beeswax protocol is used.

Most of the code change is in impala-hs2-server and impala-server; a lot
of the GetRuntimeProfile code has been re-factored.

Testing:
* Added new impala-shell tests
* Ran core tests

Change-Id: I89cee02947b311e7bf9c7274f47dfc7214c1bb65
Reviewed-on: http://gerrit.cloudera.org:8080/16406
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-17 20:55:45 +00:00
Sahil Takiar
ea95691b77 IMPALA-9953: Shell should continue fetching even when 0 rows are returned
The Impala shell stops fetching rows if it receives a batch that
contains 0 rows. This is incorrect because a batch with 0 rows can be
returned if the fetch request hits a timeout. Instead, the shell should
rely on the value of has_rows / hasMoreRows to determine when to stop
issuing fetch requests.

Tests:
* Added a regression test to test_shell_commandline.py
* Ran all shell tests

Change-Id: I5f8527aea9e433f8cf426435c0ba41355bbf9d88
Reviewed-on: http://gerrit.cloudera.org:8080/16222
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-07-22 23:28:10 +00:00
Tim Armstrong
6ec6aaae8e IMPALA-3695: Remove KUDU_IS_SUPPORTED
Testing:
Ran exhaustive tests.

Change-Id: I059d7a42798c38b570f25283663c284f2fcee517
Reviewed-on: http://gerrit.cloudera.org:8080/16085
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-06-18 01:11:18 +00:00
Sahil Takiar
3088ca8580 IMPALA-9818: Add fetch size as option to impala shell
Adds the option --fetch_size to the Impala shell. This new option allows
users to specify the fetch size used when issuing fetch RPCs to the
Impala Coordinator (e.g. TFetchResultsReq and BeeswaxService.fetch).
This parameter applies for all client protocols: beeswax, hs2, hs2-http.
The default --fetch_size is set to 10240 (10x the default batch size).

The new --fetch_size parameter is most effective when result spooling is
enabled. When result spooling is disabled, Impala can only return a
single row batch per fetch RPC (so 1024 rows by default). When result
spooling is enabled, Impala can return up to 100 row batches per fetch
request.

Removes some logic in the the impala_client.py file that attempts to
simulate a fetch_size. The code would issue multiple fetch requests to
fullfill the given fetch_size. This logic is no longer needed now that
result spooling is available.

Testing:
* Ran core tests
* Added new tests in test_shell_client.py and test_shell_commandline.py

Change-Id: I8dc7962aada6b38795241d067a99bd94fabca57b
Reviewed-on: http://gerrit.cloudera.org:8080/16041
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Sahil Takiar <stakiar@cloudera.com>
2020-06-10 17:46:21 +00:00
Tim Armstrong
c43c03c5ee IMPALA-3926: part 2: avoid setting LD_LIBRARY_PATH
This removes LD_LIBRARY_PATH and LD_PRELOAD from the
developer's shell and cleans it up. With the preceding
change, toolchain utilities like clang can be run without
a special LD_LIBRARY_PATH.

This fixes a bug where libjvm.so was registered as a
static instead of a shared library, which adds it to the
RUNPATH variable in the binary, which provides a default
search location that can be overriden by LD_LIBRARY_PATH.

Impala binaries don't have the rpath baked in for some
libraries, including Impala-lzo, libgcc and libstdc++.
, so we still need to set LD_LIBRARY_PATH when running
those. That is solved with wrapper scripts that sets
the environment variables only when invoking those
binaries, e.g. starting a daemon or running a backend
test. I added three scripts because there were 3 sets
of environment variables. The scripts are:
* run-binary.sh: just sets LD_LIBRARY_PATH
* run-jvm-binary.sh: sets LD_LIBRARY_PATH and CLASSPATH
* start-daemon.sh: sets LD_LIBRARY_PATH and CLASSPATH and
  kerberos-related environment variables.

The binaries, in almost all cases, work fine without
those tweaks, because libstdc++ and libgcc are picked
up along with libkuduclient.so from the toolchain (they
are in the same directory). I decided to leave good enough
alone here. run-binary.sh and friends can be used in
any remaining edge cases to run binaries.

An alternative to the 3 scripts would be to have an
uber-script that set all the variables, but I felt
that it was better to be specific about what
each binary needed. Cleaning the LD_LIBRARY_PATH
mess up has given me a distaste for scattershot
setting of environment variables. I am open to
revisiting this.

Testing:
* Ran tests on centos 7
* Manually tested that my dev env with
 LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu continued
 to work (for now). All ubuntu 16.04 and 18.04 dev
 envs that were set up with bootstrap_development.sh
 will be in this state.

Change-Id: I61c83e6cca6debb87a12135e58ee501244bc9603
Reviewed-on: http://gerrit.cloudera.org:8080/14494
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-05-07 08:50:44 +00:00
Tim Armstrong
748e41ab41 IMPALA-9380: async query unregistration
This change improves query latency by doing much
of the heavyweight work of unregistering a query
asynchronously, instead of synchronously on the
RPC thread. The biggest win is to move the
profile serialization off the RPC thread.

Unregistration processing is done by a thread pool
with 4 threads by default. This is configurable by
--unregistration_thread_pool_size and
--unregistration_thread_pool_queue_depth.

This fixes a pre-existing bug where a
query was temporarily neither in the in-flight
queries nor the completed queries. It would be
much easier to hit this with async unregistration
because there is less synchronisation on the client
side. Now the query is briefly in both maps, but
this is handled as follows:
* All places that look up both the maps will check
  the in-flight map first, and return a reference to
  the ClientRequestState, i.e. ignoring the entry in
  the query log.
* The /queries page does not return completed queries
  if they were found in the in-flight queries map, so
  avoids duplicate results.

The thread safety story changes slightly.
Before this change, only one thread could
remove the query from the map and close it,
with only one thread "winning" the race to remove
the ClientRequestState from the map. Since we leave
the query in the map while being finalized, we
instead use an atomic in ClientRequestState to ensure
that only one thread does the finalization.

Some misc cleanup was done as a result of these changes:
* Fix a pre-existing TSAN race in RuntimeProfile that
  was revealed by the new concurrent unregister test.
* Consolidate the various unknown query handle errors into
  an error code so that we consistently return the same
  string.
* "Unregister query" should include flushing audit events.

Testing:
* Add a test that unregisters a query concurrent with other
  operations.
* Ran exhaustive tests

Perf:
Ran TPC-H 30 with mt_dop=4. No regressions and some improvements:
+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 5.38    | -2.67%     | 4.02       | -2.01%         |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval   |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+
| TPCH(30) | TPCH-Q1  | parquet / none / none | 5.36   | 5.17        |   +3.61%   |   1.82%   |   1.17%        | 5     |   +3.73%       | 1.73    | 3.65   |
| TPCH(30) | TPCH-Q6  | parquet / none / none | 1.77   | 1.74        |   +1.48%   |   2.00%   |   2.50%        | 5     |   +2.89%       | 0.87    | 1.03   |
| TPCH(30) | TPCH-Q12 | parquet / none / none | 3.02   | 3.00        |   +0.79%   |   2.18%   |   2.21%        | 5     |   +1.55%       | 0.00    | 0.57   |
| TPCH(30) | TPCH-Q16 | parquet / none / none | 1.65   | 1.64        |   +0.81%   |   1.35%   |   0.03%        | 5     |   +0.07%       | 1.15    | 1.34   |
| TPCH(30) | TPCH-Q2  | parquet / none / none | 1.21   | 1.21        |   -0.07%   |   2.11%   |   2.14%        | 5     |   -0.04%       | -0.29   | -0.05  |
| TPCH(30) | TPCH-Q4  | parquet / none / none | 2.50   | 2.52        |   -0.49%   |   2.43%   |   3.34%        | 5     |   -0.09%       | -0.29   | -0.27  |
| TPCH(30) | TPCH-Q20 | parquet / none / none | 2.86   | 2.90        |   -1.28%   |   2.30%   |   1.24%        | 5     |   -0.02%       | -0.58   | -1.11  |
| TPCH(30) | TPCH-Q3  | parquet / none / none | 4.35   | 4.40        |   -1.15%   |   1.76%   |   1.78%        | 5     |   -1.12%       | -0.87   | -1.03  |
| TPCH(30) | TPCH-Q19 | parquet / none / none | 4.10   | 4.17        |   -1.80%   |   1.05%   |   1.31%        | 5     |   -1.25%       | -1.73   | -2.40  |
| TPCH(30) | TPCH-Q14 | parquet / none / none | 3.20   | 3.25        |   -1.52%   |   0.79%   |   2.56%        | 5     |   -1.56%       | -0.58   | -1.26  |
| TPCH(30) | TPCH-Q18 | parquet / none / none | 10.81  | 11.07       |   -2.34%   |   5.00%   |   7.01%        | 5     |   -1.40%       | -0.58   | -0.61  |
| TPCH(30) | TPCH-Q7  | parquet / none / none | 11.19  | 11.56       |   -3.18%   |   3.47%   |   6.02%        | 5     |   -0.90%       | -0.87   | -1.03  |
| TPCH(30) | TPCH-Q21 | parquet / none / none | 19.91  | 20.32       |   -2.02%   |   0.66%   |   0.47%        | 5     |   -2.18%       | -2.31   | -5.64  |
| TPCH(30) | TPCH-Q17 | parquet / none / none | 5.63   | 5.77        |   -2.40%   |   1.71%   |   2.01%        | 5     |   -1.84%       | -1.73   | -2.05  |
| TPCH(30) | TPCH-Q5  | parquet / none / none | 3.91   | 4.03        |   -2.74%   |   1.08%   |   1.86%        | 5     |   -2.45%       | -1.44   | -2.88  |
| TPCH(30) | TPCH-Q8  | parquet / none / none | 4.55   | 4.71        |   -3.48%   |   1.90%   |   3.53%        | 5     |   -2.35%       | -1.44   | -1.96  |
| TPCH(30) | TPCH-Q22 | parquet / none / none | 1.93   | 2.01        |   -3.96%   |   0.05%   |   4.05%        | 5     |   -2.59%       | -2.31   | -2.19  |
| TPCH(30) | TPCH-Q10 | parquet / none / none | 4.52   | 4.73        |   -4.26%   |   1.26%   |   2.43%        | 5     |   -3.40%       | -2.02   | -3.51  |
| TPCH(30) | TPCH-Q11 | parquet / none / none | 1.02   | 1.05        |   -3.58%   |   3.94%   |   2.36%        | 5     |   -4.56%       | -1.44   | -1.79  |
| TPCH(30) | TPCH-Q13 | parquet / none / none | 9.52   | 10.04       | I -5.24%   |   2.14%   |   0.56%        | 5     | I -4.67%       | -2.31   | -5.57  |
| TPCH(30) | TPCH-Q15 | parquet / none / none | 3.49   | 3.68        | I -5.08%   |   0.07%   |   0.56%        | 5     | I -5.66%       | -2.31   | -20.08 |
| TPCH(30) | TPCH-Q9  | parquet / none / none | 11.92  | 12.71       | I -6.19%   |   0.57%   |   3.15%        | 5     | I -4.99%       | -2.31   | -4.33  |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+

Change-Id: I80027b1baeb4ab453938c0f6357b120f4035ba08
Reviewed-on: http://gerrit.cloudera.org:8080/15821
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-05-05 10:12:42 +00:00
David Knupp
bc9d7e063d IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.
This is the main patch for making the the impala-shell cross-compatible with
python 2 and python 3. The goal is wind up with a version of the shell that will
pass python e2e tests irrepsective of the version of python used to launch the
shell, under the assumption that the test framework itself will continue to run
with python 2.7.x for the time being.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- The $IMPALA_HOME/bin/impala-shell.sh now sets up the impala-shell python
  environment independenty from bin/set-pythonpath.sh. The default version
  of thrift is thrift-0.11.0 (See IMPALA-9489).

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell with no authentication using Python 3.7.5
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell with LDAP-based authentication using Python 2.7.12
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result, this was inconsistency was reflected in the way we handled
  strings in the impala-shell code, but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could, and try to
  implement what is has colloquially been called a "unicode sandwich" -- namely,
  "bytes on the outside, unicode on the inside, encode/decode at the edges."

  The primary spot in the shell where we call decode() now is when sanitising
  input...

  args = self.sanitise_input(args.decode('utf-8'))

  ...and also whenever a library like re required it. Similarly, str.encode()
  is primarily used where a library like readline or csv requires is.

- PYTHONIOENCODING needs to be set to utf-8 to override the default setting for
  python 2. Without this, piping or redirecting stdout results in unicode errors.

- from __future__ import unicode_literals was added throughout

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and manually ran the tests against that.

  No effort has been made at this point to come up with a way to integrate
  testing of the shell in a python3 environment into our automated test
  processes.

Change-Id: Idb004d352fe230a890a6b6356496ba76c2fab615
Reviewed-on: http://gerrit.cloudera.org:8080/15524
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-18 05:13:50 +00:00
David Knupp
c26e3db4bd IMPALA-9362: Upgrade sqlparse 0.1.19 -> 0.3.1
Upgrades the impala-shell's bundled version of sqlparse to 0.3.1.
There were some API changes in 0.2.0+ that required a re-write of
the StripLeadingCommentFilter in impala_shell.py. A slight perf
optimization was also added to avoid using the filter altogether
if no leading comment is readily discernible.

As 0.1.19 was the last version of sqlparse to support python 2.6,
this patch also breaks Impala's compatibility with python 2.6.

No new tests were added, but all existing tests passed without
modification.

Change-Id: I77a1fd5ae311634a18ee04b8c389d8a3f3a6e001
Reviewed-on: http://gerrit.cloudera.org:8080/15642
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-17 05:04:23 +00:00
Tim Armstrong
35d2718d36 IMPALA-9547: retry accept in test_shell_commandline
This is a point solution to this particular socket.accept()
call failing. The more general problem is described in
https://www.python.org/dev/peps/pep-0475/ and fixed in
Python 3.5.

Change-Id: Icc9cab98b059042855ca9149427d079951471be0
Reviewed-on: http://gerrit.cloudera.org:8080/15541
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-03-24 20:31:19 +00:00
Thomas Tauber-Marshall
3fd6f60b22 IMPALA-9414 (part 2): Support the 'Expect: 100-continue' http header
The 'Expect: 100-continue' http header allows http clients to send
only the headers for their request, get a confirmation back from the
server that the headers are valid, and only then send the body of the
request, avoiding the overhead of sending large requests that will
ultimately fail.

This patch adds support for this in the HS2 HTTP server by having
THttpServer look for the header, and if it's present and the request
is validated returning a '100 Continue' response before reading the
body of the request.

It also adds supports for using this header on large requests sent by
impala-shell.

Testing:
- This case is covered by the existing test_large_sql, however that
  test was previously broken and passing spuriously. This patch fixes
  the test.
- Passed all other shell tests.

Change-Id: I4153968551acd58b25c7923c2ebf75ee29a7e76b
Reviewed-on: http://gerrit.cloudera.org:8080/15284
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
2020-03-13 17:00:42 +00:00
Alice Fan
e1d1428181 IMPALA-9384: Improve Impala shell usability by enabling live_progress in interactive mode
In order to improve usability, this patch makes Impala shell show query
processing status while the query is running. The patch enables shell
option live_progress by default when a user launches impala shell in
the interactive mode. The patch also adds a new command line flag
"--disable_live_progress", which allows a user to disable live_progress
at runtime. In the interactive mode, a user can disable live_progress
by either using the command line flag or setting the option as False in
the config file. As for in the non-interactive mode (when the -q or -f
options are used), live reporting is not supported. Impala-shell will
disable live_progress if the mode is detected.

Testing:
- Added and updated tests in test_shell_interactive.py and test_shell_commandline.py
- Successfully ran all shell related tests

Change-Id: I3765b775f663fa227e59728acffe4d5ea9a5e2d3
Reviewed-on: http://gerrit.cloudera.org:8080/15219
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2020-03-09 21:28:19 +00:00
wzhou-code
66e6879e8c IMPALA-9346: Fix TestImpalaShell.test_config_file failing issue
on CentOS6/Python 2.6

ImpalaShell.test_config_file failed in negative test case, which
ran impala shell with bad format config file - wrong option name and
wrong option value. The testing code expect impala shell return both
warning and error messages. But on CentOS6/Python 2.6, Impala shell
only return error message. To fix it, separate the test cases as two
test cases by running Impala shell in two different config file.

Testing:
 - Passed all test cases in test_shell_commandline.py and
   test_shell_interactive.py.
 - Passed all core test in pre-review-test.
 - Passed EE tests in impala-private-parameterized with CentOS6.

Change-Id: Ief5e825aa3baead5519132d47efcf0d5300860fd
Reviewed-on: http://gerrit.cloudera.org:8080/15139
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-04 00:55:54 +00:00
wzhou-code
6a23ec6985 IMPALA-6393: Add support for live_summary and live_progress in impalarc
This patch adds support for live_summary and live_progress in impalarc.

Testing:
1) Added unit-test cases in test_shell_commandline.py and
   test_shell_interactive.py for live_summary and live_progress.
2) Successfully ran all other tests in test_shell_interactive.py and
   test_shell_commandline.py

Change-Id: If4549b775a7966ad89d661d0349cc78754e13a86
Reviewed-on: http://gerrit.cloudera.org:8080/14927
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-01-23 01:48:13 +00:00
Lars Volker
74c7b7e55f IMPALA-8863: Add support to run tests over HTTP/HS2
This change adds support to run backend tests over HTTP using a new
version of Impyla (0.16.1). It also adds a test that exercises
authentication over HTTP.

Change-Id: I7156558071781378fcb9c8941c0f4dd82eb0d018
Reviewed-on: http://gerrit.cloudera.org:8080/14059
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-11-26 22:46:40 +00:00
norbert.luksa
2114fc6155 IMPALA-4618: Fixing #Hosts and adding #Instances in exec summary
When mt_dop > 0, the summary is reporting the number of fragment
instances, instead of the number of hosts as the header would
imply.

This commit fixes the issue so the number of hosts will be shown
under the #Hosts column. The commit also adds an #Inst column
where the number of instances are shown (current behaviour).

Tests:
 * Changed profile tests with mt_dop > 0.
 * Updated benchmark tests and shell tests accordingly.

Change-Id: I3bdf9a06d9bd842b2397cd16c28294b6bec7af69
Reviewed-on: http://gerrit.cloudera.org:8080/14715
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-11-26 07:28:23 +00:00
Tim Armstrong
0364e5f8d4 IMPALA-8859: fix test_global_config_file for remote clusters
I think the bug is that necessary environment variables were
not passed in - the environment was clobbered instead of
just having the necessary variable added.

Change-Id: I448e5a7dfc0ab6fd53182a593e2fff1a12a10fd7
Reviewed-on: http://gerrit.cloudera.org:8080/14053
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-08-15 21:21:56 +00:00
Bharath Vissapragada
72c9370856 IMPALA-8717: impala-shell support for HS2 HTTP endpoint
Adds impala-shell support to connect to HiveServer2 HTTP endpoint.
Relies on toolchain change at https://gerrit.cloudera.org/#/c/13725/.

Use --protocol='hs2-http' to enable this behavior.

Example usages:
---------------
impala-shell --protocol='hs2-http'  (No auth)
impala-shell --protocol='hs2-http' --ldap -u..... (PLAIN auth)
impala-shell --protocol-'hs2-http' --ssl --ca_cert... (TLS)
impala-shell --protocol='hs2-http' --ldap --ssl --ca_cert... (LDAP +
TLS)

Limitations:
-----------
- Does not support Kerberos (-k) due to lack ot SPNEGO support.

Testing:
--------
- Parameterized existing shell tests to support this combination.
- Added shell test coverage for LDAP auth.

Change-Id: I8323950857dfe1c1dfd5377fde79f87bc2ce9534
Reviewed-on: http://gerrit.cloudera.org:8080/13746
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
2019-07-29 05:43:48 +00:00
Tim Armstrong
9ecbe7d3dc IMPALA-8553,IMPALA-8552: fix checks for remote cluster
Apparently IMPALA_REMOTE_URL is not generally used for remote cluster
tests: only --testing_remote_cluster is reliably set. Fix the
is_remote_cluster() implementation to take into account
REMOTE_DATA_LOAD and --testing_remote_cluster in addition to
IMPALA_REMOTE_URL. Consistently use is_remote_cluster() in
other tests instead of checking the pytest flag directly.

There were a few lifecycle headaches with how
ImpalaTestClusterProperties is used:
* common.environ is imported from conftest, which means that
  the top-level code in the file runs *before* pytest
  command-line arguments have been registered and parsed.
* ImpalaTestClusterProperties is used by various code,
  like build_flavor_timeout(), which runs before pytest
  command-line arguments have been parsed.
* ImpalaTestClusterProperties is called from non-pytest
  scripts like start-impala-cluster.py, so the command-line
  arguments are not available.

I dealt with the above challenges by making a few changes
to do the detection later:
* Lazily initializing a singleton ImpalaTestClusterProperties.
  This was not strictly necessary but makes the whole problem
  less sensitive to import order and module dependencies.
* Adding cluster_properties fixture to make ImpalaTestClusterProperties
  available in tests without additional boilerplate.
* Removing the caching of the local/remote build calculation.
  ImpalaTestClusterProperties is instantiated outside of python
  tests, but is_remote_cluster() is only called from python tests,
  so if we check flags in is_remote_cluster() we'll get the
  right results reliably.

As a workaround to unblock remote tests, also assume catalog_v1 if
accessing the web UI fails.

Testing:
Ran core tests against a regular minicluster.

Ran tests against a remote cluster

Change-Id: Ifa6b2a1391f53121d3d7c00c5cf0a57590899ce4
Reviewed-on: http://gerrit.cloudera.org:8080/13386
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-20 20:27:31 +00:00
Tim Armstrong
f1f3ae9ec2 IMPALA-7290: part 2: Add HS2 support to Impala shell
HS2 is added as an option via --protocol=hs2. The user-visible
differences in behaviour are minimal. Beeswax is still the
default and can be explicitly enabled via --protocol=beeswax
but will be deprecated. The default is unchanged because
changing the default could break certain workflows, e.g.
those that explicitly specify the port with -i or deployments
that hit --fe_service_threads for HS2 and somehow rely on
impala-shell not contributing to that limit. For most
workflows the change is transparent and we should change
the default in a major version change.

This support requires Impala-specific extensions to
the HS2 interface, similar to the existing extensions
to Beeswax. Thus the HS2 shell is only
forwards-compatible with newer Impala versions.
I considered trying to gracefully degrade when the
new extensions weren't present, but it didn't seem to be
worth the ongoing testing effort.

Differences between HS2 and Beeswax are abstracted into
ImpalaClient subclasses.
Here are the changes required to make it work:
* Switch to TBinaryProtocolAccelerated to avoid perf
  regression. The HS2 protocol requires decoding
  more primitive values (because its not a string-per-row),
  which was slow with the pure python implementation of
  TBinaryProtocol.
* Added bitarray module to efficiently unpack null indicators
* Minimise invasiveness of changes by transposing and stringifying
  the columnar results into rows in impala_client.py. The transposition
  needs to happen before display anyway.
* Add PingImpalaHS2Service() to get back version string and webserver
  address.
* Add CloseImpalaOperation() extension to return DML row counts. This
  possibly addresses IMPALA-1789, although we need to confirm that
  this is a sufficient solution.
* Add is_closed member to query handles to avoid shell independently
  tracking whether the query handle was closed or not.
* Include query status in HS2 log to match beeswax.
* HS2 GetLog() command now includes query status error message for
  consistency with beeswax.
* "set"/"set all" uses the client requests options, not the session
  default. This captures the effective value of TIMEZONE, which
  was previously missing. This also requires test changes where
  the tests set non-default values, e.g. for ABORT_ON_ERROR.
* "set all" on the server side returns REMOVED query options - the
  shell needs to know these so it can correctly ignore them.
* Clean up self.orig_cmd/self.last_leading comment argument
  passing to avoid implicit parameter passing through multiple
  function calls.
* Clean up argument handling in shell tests to consistently pass
  around lists of arguments instead of strings that are subject
  to shell tokenisation rules.
* Consistently close connections in the shell to avoid leaking
  HS2 sessions. This is enforced by making ImpalaShell a context
  manager and also eliminating all sys.exit() calls that would
  bypass the explicit connection closing.

Testing:
* Shell tests can run with both protocols
* Add tests for formatting of all types and NULL values
* Added testing for floating point output formatting, which does
  change as a result of switching to server-side vs client-side
  formatting.
* Verified that newly-added tests were actually going through HS2
  by disabling hs2 on the minicluster and running tests.
* Add checks to test_verify_metrics.py to ensure that no sessions
  are left open at the end of tests.

Performance:
Baseline from beeswax shell for large extract is as follows:

  $ time impala-shell.sh -B -q 'select * from tpch_parquet.orders' > /dev/null
  real    0m6.708s
  user    0m5.132s
  sys     0m0.204s

After this change it is somewhat slower, but we generally don't consider
bulk extract performance through the shell to be perf-critical:
  real    0m7.625s
  user    0m6.436s
  sys     0m0.256s

Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12
Reviewed-on: http://gerrit.cloudera.org:8080/12884
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-20 10:23:28 +00:00
Ethan Xue
487547ec44 IMPALA-6042: Allow Impala shell to use a global impalarc config
Currently, impalarc files can be specified on a per-user basis
(stored in ~/.impalarc), and they aren't created by default. The
Impala shell should pick up /etc/impalarc as well, in addition
to the user-specific configurations.

The intent here is to allow a "global" configuration of the shell
by a system administrator. The default path of the global config
file can be changed by setting the $IMPALA_SHELL_GLOBAL_CONFIG_FILE
environment variable.

Note that the options set in the user config file take precedence
over those in the global config file.

Change-Id: I3a3179b6d9c9e3b2b01d6d3c5847cadb68782816
Reviewed-on: http://gerrit.cloudera.org:8080/13313
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-05-30 03:59:54 +00:00
Tim Armstrong
b55d905322 IMPALA-8515: port shell tests to use shell build
shell/make_shell_tarball.sh builds a tarball with all the
shell dependencies bundled. We should test the contents of
that tarball in the shell tests instead of using infra/python/env
and the libraries bundled there.

This tarball is one of the default targets (e.g. run by buildall.sh) so
this should not affect any typical development workflows.

Note that this means the shell tests now requires the shell tarball to
be built locally, which doesn't necessarily happen for remote cluster
tests, so we preserve the old behaviour in that case.

Testing:
Ran core tests on CentOS 6 and CentOS 7.

Change-Id: I581363639b279a9c2ff1fd982bdb140260b24baa
Reviewed-on: http://gerrit.cloudera.org:8080/13267
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-05-14 01:32:47 +00:00
Tim Armstrong
0a9ea803d2 IMPALA-7290: part 1: clean up shell tests
This sets up the tests to be extensible to test shell
in both beeswax and HS2 modes.

Testing:
* Add test dimension containing only beeswax in preparation
  for HS2 dimension.
* Factor out hardcoded ports.
* Add tests for formatting of all types and NULL values.
* Merge date shell test into general type tests.
* Added testing for floating point output formatting, which does
  change as a result of switching to server-side vs client-side
  formatting.
* Use unique_database for tests that create tables.

Change-Id: Ibe5ab7f4817e690b7d3be08d71f8f14364b84412
Reviewed-on: http://gerrit.cloudera.org:8080/13083
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-04-30 11:30:45 +00:00
Fredy Wijaya
7adb411bd9 IMPALA-8330: Impala shell config file should support flag names
This patch updates the file format in Impala shell config file to accept
both short and long flag names in addition to optparse's dest names
(variable names to store flag values) for better user experience because
dest names are internal to Impala shell.

Format:
[impala]
flag_name=flag_value

Example:
[impala]
; This is long flag.
query=select 1
; This is short flag.
Q=DEFAULT_FILE_FORMAT=parquet
; Flags can be repeated with ,
var=msg1=hello,var=msg2=world
; The old format using internal variable name is still supported for
; backward compatibility.
keyval=msg3=foo,keyval=msg4=bar

Testing:
- Ran all E2E shell tests on Python 2.6 and 2.7.

Change-Id: Ic43603c1b538af08fddcab1b2c1f6ad1af1a6cb9
Reviewed-on: http://gerrit.cloudera.org:8080/12823
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-03-29 09:38:24 +00:00
Fredy Wijaya
6853184234 IMPALA-8317: Add support for list type flags in Impala shell config file
This patch adds support for list type flags in Impala shell config
file, i.e. those that use action="append", such as --var and
--query_option. To make it less error-prone, this patch also updates
the logic for bool flags in the config file to also look at the
correct type from the argument parser instead of relying on whether or
not the default values are set in impala_shell_config_defaults.py.

Testing:
- Added a new test for list type flags
- Ran all shell E2E tests

Change-Id: I824ca15b4e1064a391b13deef9cecd34c928ef73
Reviewed-on: http://gerrit.cloudera.org:8080/12781
Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-03-21 10:29:43 +00:00
Fredy Wijaya
561158b306 IMPALA-3323: Fix unrecognizable shell option when --config_file is specified
Impala shell defines a dictionary of default values for some shell
options. Before this patch, the logic for --config_file checks if
a shell option exists by using the default value dictionary, which
does not contain the exhaustive list of shell options. This causes
a valid option in the Impala shell config file to be treated as
unrecognizable shell option due to the option not having a default
value. The patch fixes the issue by changing the logic that checks
for the existence of an option using the option list from optparse.
The patch also fixes the missing dest parameter for ldap_password_cmd
option.

Testing:
- Updated test_shell_commandline::test_config_file
- Ran all shell tests

Change-Id: Iff371d038fa77ba659e9b7c7a4ed5b374237f2ea
Reviewed-on: http://gerrit.cloudera.org:8080/12245
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-01-23 00:15:28 +00:00
Philip Zeyliger
a8d3b765d8 IMPALA-7666: Adding an opaque client identifier to query options.
We sometimes struggle to identify the client (e.g., a given version of a
JDBC driver, Tableau, Hue, etc.) for a given query. This commit adds a
User-Agent header style, called "Client Identifier", which clients can
set as a Query Option. Nothing is done with this header, but it's
written into logs and query profiles.

This commit includes changes to impala-shell to include the version of
impala shell with an associated test.

A future commit will serialize the name of the py.test being run into
this field, which is handy for figuring out where a query came from.

Change-Id: I0a7708492f05d33b2bc99fc3a03b461bbb6f3ea4
Reviewed-on: http://gerrit.cloudera.org:8080/12130
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-01-10 03:11:35 +00:00
Yongjun Zhang
7cc9092212 IMPALA-5474: Adding a trivial subquery turns error into warning
After adding a subquery to a query that fails with ERROR, it fails with WARNING.
The fix here makes it return ERROR.

Testing:
Added unit tests;
Done real cluster testing with reported cases.

Change-Id: Ibedb11dd3d50bcdb21d508f7d21691925491946e
Reviewed-on: http://gerrit.cloudera.org:8080/12022
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2019-01-04 21:51:48 +00:00
Fredy Wijaya
9c44853998 IMPALA-6591: Fix test_ssl flaky test
test_ssl has a logic that waits for the number of in-flight queries to
be 1. However, the logic for wait_for_num_in_flight_queries(1) only
waits for the condition to be true for a period of time and does not
throw an exception when the time has elapsed and the condition is not
met. In other words, the logic in test_ssl that loops while the number
of in-flight queries is 1 never gets executed. I was able to simulate
this issue by making Impala shell start much longer.

Prior to this patch, in the event that Impala shell took much longer to
start, the test started sending the commands to Impala shell even when
Impala shell was not ready to receive commands. The patch fixes the
issue by waiting until Impala shell is connected. The patch also adds
assert in other places that calls wait_for_num_in_flight_queries and
updates the default behavior for Impala shell to wait until it is
connected.

Testing:
- Ran core and exhaustive tests several times on CentOS 6 without any
  issue

Change-Id: I9805269d8b806aecf5d744c219967649a041d49f
Reviewed-on: http://gerrit.cloudera.org:8080/12047
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-12-12 22:44:34 +00:00
David Knupp
4c923b29d8 IMPALA-7783: Skip test_default_timezone when testing a real cluster.
test_shell_commandline.py::test_default_timezone assumes that the
cluster is running on the same platform as the test process, but
that's only guaranteed when the testing a local minicluster. When
run against a real cluster, the test executor can be a completely
different OS.

Change-Id: Ia4d4c503d2c77136cedd8f3fd830b6ce70d4457f
Reviewed-on: http://gerrit.cloudera.org:8080/11820
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-10-30 19:47:08 +00:00
aphadke
2fb8ebaef2 IMPALA-7555: Set socket timeout in impala-shell
impala-shell does not set any socket timeout while connecting to the
impala server. This change sets a timeout on the socket before
connecting and unsets it back after successfully connecting. The default
timeout on this socket is 5 sec.
Usage: impala-shell --client_connect_timeout=<value in ms>

Testing:
1. Added a test where I create a random listening socket.
impala-shell (with ssl enabled) connects to this socket and
times out after 2 sec.

2. Created a kerberized impala cluster with ssl enabled and
connected to the impalad using an openssl client (block the
beeswax server thread to accept new connection) -

E.g. - openssl s_client -connect <IP Addr>:21000
Used impala-shell to connect to the same impalad later.
impala-shell timed out after the default of 5 sec.I verified
it manually.

Change-Id: I130fc47f7a83f591918d6842634b4e5787d00813
Reviewed-on: http://gerrit.cloudera.org:8080/11540
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-10-18 01:41:42 +00:00
Fredy Wijaya
31dfa3e28c IMPALA-7673: Support values from other variables in Impala shell --var
Prior to this patch, Impala shell --var could not accept values from
other variables unlike the one in Impala interactive shell with the SET
command.  This patch refactors the logic of variable substitution to
use the same logic in both interactive and command line shells.

Example:
$ impala-shell.sh \
    --var="msg1=1" \
    --var="msg2=\${var:msg1}2" \
    --var="msg3=\${var:msg1}\${var:msg2}"

[localhost:21000] default> select ${var:msg3};
Query: select 112
+-----+
| 112 |
+-----+
| 112 |
+-----+

Testing:
- Added a new shell test
- Ran all shell tests

Change-Id: Ib5b9fda329c45f2e5682f3cbc76d29ceca2e226a
Reviewed-on: http://gerrit.cloudera.org:8080/11623
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-10-16 00:50:26 +00:00
Thomas Tauber-Marshall
dccc2de86a IMPALA-7407: Fix test_cancellation failure on KeyboardInterrupt
test_cancellation runs a shell process, executes a query, sleeps,
sends a sigint to the process, and then checks that the query is
cancelled. If the sigint is sent prior to the shell installing its
signal handler, the test can fail with a KeyboardInterrupt.

This patch removes the reliance on the sleep being long enough by
actually reading the output of the shell and only cancelling the
query once the shell shows that it has started running.

Testing:
- Ran test_cancellation in a loop.

Change-Id: I65302ffb838d5185f77853bc2e53296f3a701d93
Reviewed-on: http://gerrit.cloudera.org:8080/11255
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Thomas Marshall <thomasmarshall@cmu.edu>
2018-08-20 19:56:11 +00:00