Commit Graph

271 Commits

Author SHA1 Message Date
gaurav1086
c3cbd79b56 IMPALA-13288: OAuth AuthN Support for Impala
This patch added OAuth support with following functionality:
 * Load and parse OAuth JWKS from configured JSON file or url.
 * Read the OAuth Access token from the HTTP Header which is
   the same format as JWT Authorization Bearer token.
 * Verify the OAuth's signature with public key in JWKS.
 * Get the username out of the payload of OAuth Access token.
 * If kerberos or ldap is enabled, then both jwt and oauth are
   supported together. Else only one of jwt or oauth is supported.
   This has been a pre existing flow for jwt. So OAuth will follow
   the same policy.
 * Impala Shell side changes: OAuth  options -a and --oauth_cmd

Testing:
 - Added 3 custom cluster be test in test_shell_jwt_auth.py:
   - test_oauth_auth_valid: authenticate with valid token.
   - test_oauth_auth_expired: authentication failure with
     expired token.
   - test_oauth_auth_invalid_jwk: authentication failure with
     valid signature but expired.
 - Added 1 custom cluster fe test in JwtWebserverTest.java
   - testWebserverOAuthAuth: Basic tests for OAuth
 - Added 1 custom cluster fe test in LdapHS2Test.java
   - testHiveserver2JwtAndOAuthAuth: tests all combinations of
     jwt and oauth token verification with separate jwks keys.
 - Manually tested with a valid, invalid and expired oauth
   access token.
 - Passed core run.

Change-Id: I65dc8db917476b0f0d29b659b9fa51ebaf45b7a6
Reviewed-on: http://gerrit.cloudera.org:8080/21728
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-01-15 03:32:57 +00:00
Riza Suminto
aac375eb20 IMPALA-13556: Log GetRuntimeProfile and GetExecSummary at VLOG_QUERY
Calls to both of these RPC endpoints are previously logged at
VLOG_RPC (or VLOG(2)). This patch change the log level to VLOG_QUERY (or
VLOG(1)). This is helpful because both RPC are usually called after
query execution complete, but the query handle is not released yet. They
are also rarely called by client, so they will not be too noisy. Missing
query driver log in GetAllQueryHandles is moved to its caller, where the
log message is clarified.

ImpalaShell._execute_stmt() also modified to call get_runtime_profile()
only if show_profile option is true.

Testing:
- Using impala-shell, run a TPC-DS query followed by 'profile' and
  summary command. Verify that logs are printed, both with beeswax and
  HS2 protocol.
- Pass core tests.

Change-Id: I90ef7d0fadd81c58ec1072e53430f51fea146cf1
Reviewed-on: http://gerrit.cloudera.org:8080/22085
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-11-22 05:23:29 +00:00
Riza Suminto
1f35747ea3 IMPALA-5792: Eliminate duplicate beeswax python code
This patch unify duplicated exec summary code used by python beeswax
clients: one used by the shell in impala_shell.py and one used by tests
in impala_beeswax.py. The code that has progress furthest is the one in
shell/impala_client.py, which is the one that can print correct exec
summary table for MT_DOP>0 queries. It is made into a dedicated
build_exec_summary_table function in impala_client.py, and then
impala_beeswax.py import it from impala_client.py.

This patch also fix several flake8 issues around the modified files.

Testing:
- Manually run TPC-DS Q74 in impala-shell and then type "summary"
  command. Confirm that plan tree is displayed properly.
- Run single_node_perf_run.py over branches that produce different
  TPC-DS Q74 plan tree. Confirm that the plan tree are displayed
  correctly in performance_result.txt

Change-Id: Ica57c90dd571d9ac74d76d9830da26c7fe20c74f
Reviewed-on: http://gerrit.cloudera.org:8080/22060
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
2024-11-14 11:19:19 +00:00
Saurabh Katiyal
2535e79491 IMPALA-12216: Print timestamp for impala-shell errors
This change will print timestamp of an exception or warning
occurred during execution of a query via impala-shell.
The timestamp will use timezone of the machine running impala-shell.

example:
Query submitted at: 2024-08-22 16:17:57 (Coordinator: http://host:25000)
Query state can be monitored at:
http://localhost:25000/query_plan?query_id=e04dcc55e560d1ee:11173fe800000000
^C Cancelling Query
Opened TCP connection to localhost:21050
2024-08-22 16:17:58 [Exception] type=<class 'socket.error'> in FetchResults.
[Errno 4] Interrupted system call
2024-08-22 16:17:58 [Warning]  Cancelling Query
2024-08-22 16:17:58 [Warning] close session RPC failed: <class
'shell_exceptions.QueryCancelledByShellException'>
Opened TCP connection to localhost:21050
[localhost:21050] default>

Change-Id: I4abbd02aa9f61210b0333495bf191e72c22a5944
Reviewed-on: http://gerrit.cloudera.org:8080/21426
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-10-12 01:43:59 +00:00
Peter Rozsa
a0aaf338ae IMPALA-12732: Add support for MERGE statements for Iceberg tables
MERGE statement is a DML command that allows users to perform
conditional insert, update, or delete operations on a target table based
on the results of a join with a source table. This change adds MERGE
statement parsing and an Iceberg-specific semantic analysis, planning,
and execution. The parsing grammar follows the SQL standard, it accepts
the same syntax as Hive, Spark, and Trino by supporting arbitrary number
of WHEN clauses, with conditions or without and accepting inline views
as source.

Example:
'MERGE INTO target t USING source s ON t.id = s.id
WHEN MATCHED AND t.id < 100 THEN UPDATE SET column1 = s.column1
WHEN MATCHED AND t.id > 100 THEN DELETE
WHEN MATCHED THEN UPDATE SET column1 = "value"
WHEN NOT MATCHED THEN INSERT VALUES (s.id, s.column1);'

The Iceberg-specific analysis, planning, and execution are based on a
concept that was previously used for UPDATE: The analyzer creates a
SELECT statement with all target and source columns (including
Iceberg's virtual columns) and a 'row_present' column that defines
whether the source, the target, or both rows are present in the result
set after joining the two table references by the ON clause. The join
condition should be an equi-join, as it is a FULL OUTER JOIN, and Impala
currently supports only equi-joins in this case. The joining order is
forced by a query hint, this guarantees that the target table is always
on the left side.

A new, IcebergMergeNode is added at planning phase, this node does the
row-level filtering for each MATCHED/ NOT MATCHED cases. The
'row_present' column decides which case group will be evaluated; if
both sides are available, the matched cases, if only the source side
matches then the not matched cases and their filter expressions
will be evaluated over the row. If one of the cases match, then the
execution evaluates the result expressions into the output row batch,
and an auxiliary tuple will store the merge action. The merge action is
a flag for the newly added IcebergMergeSink; this sink will route each
incoming row from IcebergMergeNode to their respective destination. Each
row could go to the delete sink, insert sink, or to both sinks.

Target-side duplicate records are filtered during IcebergMergeNode's
execution, if one target table-side duplicate is detected, the whole
statement's execution is stopped and the error is reported back to the
user.

Added tests:
 - Parser tests
 - Analyzer tests
 - Unit test for WHEN NOT MATCHED INSERT column collation
 - Planner tests for partitioned/sorted cases
 - Authorization tests
 - E2E tests

Change-Id: I3416a79740eddc446c87f72bf1a85ed3f71af268
Reviewed-on: http://gerrit.cloudera.org:8080/21423
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-09-05 01:01:05 +00:00
Andrew Sherman
3bdf1d2648 IMPALA-13310 Add the value of the http 'X-Forwarded-For' header to the runtime profile
When using hs2-http protocol, http messages from Impala clients may pass
through one or more proxies before reaching the Impala coordinator.
This can make it harder to track the origin of the http messages. The
'X-Forwarded-For' header is added to or edited by HTTP proxies when
forwarding a request, so it may contain multiple source addresses. Add
the value of this header to the runtime profile so that it can be
observed.

Impala will truncate the 'X-Forwarded-For' header value at 8096
characters. Apart from this, Impala does not do any verification or
sanitization of this value, so its value should only be trusted if the
deployment environment protects against spoofing.

A good reference for understanding the use of 'X-Forwarded-For' is
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Forwarded-For

This patch does not address the cases where http proxies insert
multiple 'X-Forwarded-For' headers. This issue is tracked in
IMPALA-13335.

TESTING: add an option '--hs2_x_forward' to impala-shell which will
set the 'X-Forwarded-For' header. Add tests which verify that the value
is set in the profile, and that a long value is truncated correctly.

Change-Id: I2e010cfb09674c5d043ef915347c3836696e03cf
Reviewed-on: http://gerrit.cloudera.org:8080/21700
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-08-28 05:56:27 +00:00
Xuebin Su
ad868b9947 IMPALA-13115: Add query id to error messages
This patch adds the query id to the error messages in both

- the result of the `get_log()` RPC, and
- the error message in an RPC response

before they are returned to the client, so that the users can easily
figure out the errored queries on the client side.

To achieve this, the query id of the thread debug info is set in the
RPC handler method, and is retrieved from the thread debug info each
time the error reporting function or `get_log()` gets called.

Due to the change of the error message format, some checks in the
impala-shell.py are adapted to keep them valid.

Testing:
- Added helper function `error_msg_expected()` to check whether an
  error message is expected. It is stricter than only using the `in`
  operator.
- Added helper function `error_msg_equal()` to check if two error
  messages are equal regardless of the query ids.
- Various test cases are adapted to match the new error message format.
- `ImpalaBeeswaxException`, which is used in tests only, is simplified
  so that it has the same error message format as the exceptions for
  HS2.
- Added an assertion to the case of killing and restarting a worker
  in the custom cluster test to ensure that the query id is in
  the error message in the client log retrieved with `get_log()`.

Change-Id: I67e659681e36162cad1d9684189106f8eedbf092
Reviewed-on: http://gerrit.cloudera.org:8080/21587
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-08-08 14:11:04 +00:00
Joe McDonnell
2b98e5fb95 IMPALA-13230: Dump stacktrace for impala-shell when it receives SIGUSR1
It can be useful to get a stacktrace for a running impala-shell
for debugging. This uses Python 3's faulthandler to handle the
SIGUSR1, so it prints a stacktrace for all threads when it
receives SIGUSR1.

This does not implement an equivalent functionality for Python 2.
Python 2 doesn't have the faulthandler library, and hand tests
showed that sending SIGUSR1 to Python 2 impala-shell can interrupt
network calls and abort a running query.

Testing:
 - Added a test that verifies the stacktrace is printed and a
   running query succeeds.

Change-Id: If7dae2686b65a1a4f02488abadca3b3c90e48bf1
Reviewed-on: http://gerrit.cloudera.org:8080/21611
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2024-08-02 18:07:20 +00:00
Csaba Ringhofer
541fc5ee9e IMPALA-12990: Fix impala-shell handling of unset rows_deleted
The issue occurred in Python 3 when 0 rows were deleted from Iceberg.
It could also happen in other DMLs with older Impala servers where
TDmlResult.rows_deleted was not set. See the Jira for details of
the error.

Testing:
Extended shell tests for Kudu DML reporting to also cover Iceberg.

Change-Id: I5812b8006b9cacf34a7a0dbbc89a486d8b454438
Reviewed-on: http://gerrit.cloudera.org:8080/21284
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-04-17 18:52:25 +00:00
Csaba Ringhofer
5c003cdcda IMPALA-12978: Fix impala-shell`s live progress with older Impalas
If the Impala server has an older version that does not contain
IMPALA-12048 then TExecProgress.total_fragment_instances will be
None, leading to error when checking total_fragment_instances > 0.

Note that this issue only comes with Python 3, in Python 2 None > 0
returns False.

Testing:
- Manually checked with a modified Impala that doesn't set
  total_fragment_instances. Only the scanner progress bar is shown
  in this case.

Change-Id: Ic6562ff6c908bfebd09b7612bc5bcbd92623a8e6
Reviewed-on: http://gerrit.cloudera.org:8080/21256
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zihao Ye <eyizoha@163.com>
2024-04-09 02:23:05 +00:00
Zoltan Borok-Nagy
e326b3cc0d IMPALA-12313: (part 2) Limited UPDATE support for Iceberg tables
This patch adds limited UPDATE support for Iceberg tables. The
limitations mean users cannot update Iceberg tables if any of
the following is true:
 * UPDATE value of partitioning column
 * UPDATE table that went through partition evolution
 * Table has SORT BY properties

The above limitations will be resolved by part 3. The usual limitations
like writing non-Parquet files, using copy-on-write, modifying V1 tables
are out of scope of IMPALA-12313.

This patch implements UPDATEs with the merge-on-read technique. This
means the UPDATE statement writes both data files and delete files.
Data files contain the updated records, delete files contain the
position delete records of the old data records that have been
touched.

To achieve the above this patch introduces a new sink: MultiDataSink.
We can configure multiple TableSinks for a single MultiDataSink object.
During execution, the row batches sent to the MultiDataSink will be
forwarded to all the TableSinks that have been registered.

The UPDATE statement for an Iceberg table creates a source select
statement with all table columns and virtual columns INPUT__FILE__NAME
and FILE__POSITION. E.g. imagine we have a table 'tbl' with schema
(i int, s string, k int), and we update the table with:

  UPDATE tbl SET k = 5 WHERE i % 100 = 11;

 The generated source statement will be ==>

  SELECT i, s, 5, INPUT__FILE__NAME, FILE__POSITION
  FROM tbl WHERE i % 100 = 11;

Then we create two table sinks that refer to expressions from the above
source statement:

  Insert sink (i, s, 5)
  Delete sink (INPUT__FILE__NAME, FILE__POSITION)

The tuples in the rowbatch of MultiDataSink contain slots for all the
above expressions (i, s, 5, INPUT__FILE__NAME, FILE__POSITION).
MultiDataSink forwards each row batch to each registered TableSink.
They will pick their relevant expressions from the tuple and write
data/delete files. The tuples are sorted by INPUTE__FILE__NAME and
FILE__POSITION because we need to write the delete records in this
order.

For partitioned tables we need to shuffle and sort the input tuples.
In this case we also add virtual columns "PARTITION__SPEC__ID" and
"ICEBERG__PARTITION__SERIALIZED" to the source statement and shuffle
and sort the rows based on them.

Data files and delete files are now separated in the DmlExecState, so
at the end of the operation we'll have two sets of files. We use these
two sets to create a new Iceberg snapshot.

Why does this patch have the limitations?
 - Because we are shuffling and sorting rows based on the delete
   records and their partitions. This means that the new data files
   might not get written in an efficient way, e.g. there will be
   too many of them, or we will need to keep too many open file
   handles during writing.
   Also, if the table has SORT BY properties, we cannot respect
   it as the input rows are ordered in a way to favor the position
   deletes.
   Part 3 will introduce a buffering writer for position delete
   files. This means we will shuffle and sort records based on
   the data records' partitions and SORT BY properties while
   delete records get buffered and written out at the end (sorted
   by file_path and position). In some edge cases the delete records
   might not get written efficiently, but it is a smaller problem
   then inefficient data files.

Testing:
 * negative tests
 * planner tests
 * update all supported data types
 * partitioned tables
 * Impala/Hive interop tests
 * authz tests
 * concurrent tests

Change-Id: Iff0ef6075a2b6ebe130d15daa389ac1a505a7a08
Reviewed-on: http://gerrit.cloudera.org:8080/20677
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-12-09 03:04:05 +00:00
Eyizoha
52ad12bc0c IMPALA-12544: Add additional query progress reporting for the shell
This patch modifies the dynamic query progress reporting in impala-shell
by adding an extra query progress bar below the scan progress bar.
The query progress is calculated using the number of completed fragment
instances divided by the total number of fragment instances. Compared to
the scan progress, which is calculated based on completed scan ranges
divided by the total scan ranges, the query progress provides a more
accurate reflection of the actual completion progress of the query.
Particularly for computationally intensive queries involving complex
aggregations or sorting, such as tpcds query78, there is often
additional computation time required after the scanning is complete. In
such cases, displaying only 100% scan progress would be inaccurate.

Change-Id: I11a704885505442b7499a026fcee3b86696cd064
Reviewed-on: http://gerrit.cloudera.org:8080/20672
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2023-11-10 16:18:51 +00:00
Joe McDonnell
bad064dbea IMPALA-12224: Improve error handling for shell interactive tests
Interactive shell tests can hang waiting for input if the
shell process hits errors or exits. For example, the problems
in the sasl package seen in IMPALA-12220 cause test_shell_interactive.py
to hang.

This improves the error detection/handling to avoid hangs for
most common shell errors. Specifically, it adds a check for
the impala-shell process exiting, and it adds a check for
a failure to connect to Impala. Both would previous result
in hangs.

Testing:
 - Verified test_shell_interactive.py doesn't hang with hand
   tests
 - Remove a vital import from impala-shell so it exits instantly
 - Simulate a connection problem by overwriting the port
   with a non-functional port
 - Test on Redhat 9 with the IMPALA-12220 issue

Change-Id: I7556fb687e06b41caa538d8c3231ec9f2ad98162
Reviewed-on: http://gerrit.cloudera.org:8080/20087
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-06-21 05:21:01 +00:00
Vincent Tran
9727b46f3b IMPALA-11435: Fixup - Suppress logging for 'thrift' in impala-shell
Commit cd9f3f578 aims to suppres logging for the 'thrift' library
within impala-shell. However, it does not work in all case. This change
moves the fix into the 'main' function, which suppresses the unwanted
messagge.
Tested by connecting through impala-shell with Python2.7 and Python3.6
with SSL enabled.

Change-Id: I4de95b1b67abe9a0b4637910b0894addddda23d5
Reviewed-on: http://gerrit.cloudera.org:8080/20074
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-15 12:06:26 +00:00
Joe McDonnell
e9fb8e717c IMPALA-12114: Pull in fix for THRIFT-5705 and add test
This pulls in a new toolchain to get a Thrift with
the patch for THRIFT-5705. This fixes an issue where
idle clients using TLS are needlessly disconnected due
to a bug in the read retry count logic inside Thrift.

Tests:
 - This modifies test_thrift_socket.py to make it do
   more idle polls and check that ImpalaShell is not
   disconnected. It fails without the THRIFT-5705 patch
   and passes now.

Change-Id: Ifc7704cba032a91b9fd0d5d54d1e0a7e17fb10bb
Reviewed-on: http://gerrit.cloudera.org:8080/19962
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
2023-06-02 15:57:37 +00:00
Csaba Ringhofer
14035065fa IMPALA-12145: Fix profiles with non-ascii character in impala-shell (python2)
As __future__.unicode_literals is imported in impala-shell
concatenating an str with a literal leads to decoding the
string with 'ascii' codec which fails if there are non-ascii
characters. Converting the literal to str solves the issue.

Testing:
- added regression test + ran related EE tests

Change-Id: I99b72dd262fc7c382e8baee1dce7592880c84de2
Reviewed-on: http://gerrit.cloudera.org:8080/19893
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-25 00:33:34 +00:00
Joe McDonnell
451543a2e5 IMPALA-11785: Warn if Thrift fastbinary is not working for impala-shell
Thrift's fastbinary module provides native code that
accelerations the BinaryProtocol. It can make a large
performance difference when using the Hiveserver2
protocol with impala-shell. If the fastbinary is not
working, it silently falls back to interpreted code.
This can happen because the fastbinary couldn't load
a particular library, etc.

This adds a warning on impala-shell startup when
it detects that Thrift's fastbinary is not working.

When bin/impala-shell.sh is modified to use python3,
impala-shell outputs this error (shortened for legibility):
WARNING: Failed to load Thrift's fastbinary module. Thrift's
BinaryProtocol will not be accelerated, which can reduce performance.
Error was '{path to Python2 thrift fastbinary.so}: undefined symbol: _Py_ZeroStruct'

Testing:
 - Added a simple test that verifies the impala-shell
   does not output the warning
 - Outputs warning when Python 2 thrift used for Python 3 shell

Change-Id: Id5d0e5db5cfdf1db4521b00f912b4697a7f646e8
Reviewed-on: http://gerrit.cloudera.org:8080/19806
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-23 06:41:02 +00:00
jasonmfehr
63d13a35f3 IMPALA-11880: Adds support for authenticating to Impala using JWTs.
This support was modeled after the LDAP authentication.

If JWT authentication is used, the Impala shell enforces the use of the
hs2-http protocol since the JWT is sent via the "Authentication"
HTTP header.

The following flags have been added to the Impala shell:
* -j, --jwt: indicates that JWT authentication will be used
* --jwt_cmd: shell command to run to retrieve the JWT to use for
  authentication

Testing
New Python tests have been added:
* The shell tests ensure that the various command line arguments are
  handled properly. Situations such as a single authentication method,
  JWTs cannot be sent in clear text without the proper arguments, etc
  are asserted.
* The Python custom cluster tests leverage a test JWKS and test JWTs.
  Then, a custom Impala cluster is started with the test JWKS. The
  Impala shell attempts to authenticate using a valid JWT, an expired
  (invalid) JWT, and a valid JWT signed by a different, untrusted JWKS.
  These tests also exercise the Impala JWT authentication mechanism and
  assert the prometheus JWT auth success and failure metrics are
  reported accurately.

Change-Id: I52247f9262c548946269fe5358b549a3e8c86d4c
Reviewed-on: http://gerrit.cloudera.org:8080/19837
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-11 23:22:05 +00:00
Michael Smith
910c6ecc85 IMPALA-12094: Fix impala shell summary command
Fix various quality-of-life issues with the 'summary' command:
- update regex to correctly match query ID for handling "Query id ...
  not found" errors
- fail the command rather than exiting the shell when 'summary' is
  called with an incorrect argument (such as 'summary 1')
- provide a useful message rather than print an exception when 'summary
  original' is invoked with no failed queries

Testing:
- added new tests for the 'summary' command

Change-Id: I7523d45b27e5e63e1f962fb1f6ebb4f0adc85213
Reviewed-on: http://gerrit.cloudera.org:8080/19797
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-04-28 20:51:50 +00:00
Michael Smith
0a42185d17 IMPALA-9627: Update utility scripts for Python 3 (part 2)
We're starting to see environments where the system Python ('python') is
Python 3. Updates utility and build scripts to work with Python 3, and
updates check-pylint-py3k.sh to check scripts that use system python.

Fixes other issues found during a full build and test run with Python
3.8 as the default for 'python'.

Fixes a impala-shell tip that was supposed to have been two tips (and
had no space after period when they were printed).

Removes out-of-date deploy.py and various Python 2.6 workarounds.

Testing:
- Full build with /usr/bin/python pointed to python3
- run-all-tests passed with python pointed to python3
- ran push_to_asf.py

Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb
Reviewed-on: http://gerrit.cloudera.org:8080/19697
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-04-26 18:52:23 +00:00
Peter Rozsa
cd9f3f578e IMPALA-11435: Suppress logging for 'thrift' in impala-shell
This change removes the "No handlers could be found for logger
"thrift.transport.sslcompat" notification from impala-shell when SSL is
enabled, by adding a NullHandler to logger 'thrift'.

Change-Id: Idaa0871751969ec3a3aa8b44fe35f0743c03c547
Reviewed-on: http://gerrit.cloudera.org:8080/19671
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-04-04 07:23:54 +00:00
jasonmfehr
e17fd9a0d5 IMPALA-11850 Adds HTTP tracing headers when using the hs2-http protocol.
When using the hs2 protocol with the http transport, include several
tracing http headers by default.  These headers are:

  * X-Request-Id        -- client defined string that identifies the
                           http request, this string is meaningful only
                           to the client
  * X-Impala-Session-Id -- session id generated by the Impala backend,
                           will be omitted on http calls that occur
                           before this id has been generated
  * X-Impala-Query-Id   -- query id generated by the Impala backend,
                           will be omitted on http calls that occur
                           before this id has been generated

The Impala shell includes these headers by default.  The command
line argument --no_http_tracing has been added to remove these
headers.

The Impala backend logs out these headers if they are on the http
request.  The log messages are written out at log level 2 (RPC).

Testing:
  - manual testing (verified using debugging proxy and impala logs)
  - new python test

Change-Id: I7857eb5ec03eba32e06ec8d4133480f2e958ad2f
Reviewed-on: http://gerrit.cloudera.org:8080/19428
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-02-10 02:09:17 +00:00
jasonmfehr
f2f6b4b580 IMPALA-11375 Impala shell outputs details of each RPC
When the Impala shell is using the hs2 protocol, it makes multiple RPCs
to the Impala daemon.  These calls pass Thrift objects back and forth.
This change adds the '--show_rpc' which outputs the details of the RPCs
to stdout and the '--rpc_file' flag which outputs the RPC details to the
specified file path.

RPC details include:
- operation name
- request attempt count
- Impala session/query ids (if applicable)
- call duration
- call status (success/failure)
- request Thrift objects
- response Thrift objects

Certain information is not included in the RPC details:
- Thrift object attributes named 'secret' or 'password'
  are redacted.
- Thrift objects with a type of TRowSet or TGetRuntimeProfileResp
  are not include as the information contained within them is
  already available in the standard output from the Impala shell.

Testing:
- Added new tests in the end-to-end test suite.

Change-Id: I36f8dbc96726aa2a573133acbe8a558299381f8b
Reviewed-on: http://gerrit.cloudera.org:8080/19388
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-01-12 23:31:14 +00:00
Michael Smith
b17446818c IMPALA-11755: Fix impala-shell --ldap_password_cmd with python 3
subprocess.Popen returns a byte string in Python 3, which serializes
incorrectly when sending it as the LDAP password and causes `endswith`
to error with
> first arg must be bytes or a tuple of bytes, not str

Fixes `impala-shell --ldap_password_cmd` run with Python 3 by decoding
bytes as unicode.

Testing: confirmed that I can successfully authenticate via LDAP with
impala-shell in Python 2.7 and Python 3.8.

Change-Id: I3638d6f8d3ed7184495dbe3512d9e5ceb0ee8c45
Reviewed-on: http://gerrit.cloudera.org:8080/19283
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-11-29 22:24:43 +00:00
wzhou-code
8e350d0a8a IMPALA-11304: impala-shell make the client retry attempts configurable
Currently max tries for connecting to coordinator is hard coded to 4
in hs2-http mode. It's required to make the max tries when connecting
to coordinator a configurable option, especially in the environment
where coordinator is started slowly.

This patch added support for configurable max tries in hs2-http mode
using the new impala-shell config option '--connect_max_tries'.
The default value of '--connect_max_tries' is set to 4.

Testing:
 - Ran e2e shell tests.
 - Ran impala-shell with connect_max_tries as 100 before starting
   impala coordinator daemon, verified that impala-shell connects to
   coordinator after coordinator daemon was started.

Change-Id: I5f7caeb91a69e71a38689785fb1636094295fdb1
Reviewed-on: http://gerrit.cloudera.org:8080/19105
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-10-25 16:07:37 +00:00
Peter Rozsa
81e36d4584 IMPALA-10660: Impala shell prints DOUBLEs with less precision in HS2 than beeswax
This change adds a shell option called "hs2_fp_format"
which manipulates the print format of floating-point values in HS2.
It lets the user to specify a Python-based format specification
expression (https://docs.python.org/2.7/library/string.html#formatspec)
which will get parsed and applied to floating-point
column values. The default value is None, in this case the
formatting is the same as the state before this change.
This option does not support the Beeswax protocol, because Beeswax
converts all of the column values to strings in its response.

Tests: command line tests for various formatting options and
       for invalid formatting option

Change-Id: I424339266be66437941be8bafaa83fa0f2dfbd4e
Reviewed-on: http://gerrit.cloudera.org:8080/18990
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-09-23 14:34:52 +00:00
gaoxq
f5fc085733 IMPALA-11233: Unset all query option
When using jdbc connection pool, a connection set some query options,
after query finished, connection is closed and put back to the connection
pool. When connection used again, the last query option also come into
effect. We need a feature that a set statement can reset all query option
without recreating a new connection.

Support UNSET statements in SQL dialect. UNSET ALL can unset all query
option.

Testing:
  - add unset all query option in test_hs2.py

Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250
Reviewed-on: http://gerrit.cloudera.org:8080/18430
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-06-23 05:59:02 +00:00
Joe McDonnell
7eb200abf1 IMPALA-11337: Flush row output before writing "Fetched X row(s)"
When redirecting stdout and stderr to a file, the
existing code can sometimes output the "Fetched X row(s)"
line before finishing the row output. e.g.
impala-shell -B -q "select 1" >> outfile.txt 2>> outfile.txt

The rows output goes to stdout while the control messages
like "Fetched X row(s)" go to stderr. Since stdout can buffer
output, that can delay the output. This adds a flush for
stdout before writing the "Fetched X row(s)" message.

Testing:
 - Added a shell test that redirects stdout and stderr to
   a file and verifies the contents. This consistently
   fails without the flush.
 - Other shell tests pass

Change-Id: I83f89c110fd90d2d54331c7121e407d9de99146c
Reviewed-on: http://gerrit.cloudera.org:8080/18625
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-06-15 05:28:05 +00:00
yx91490
c7784bde55 IMPALA-1682: Support printing the output of a query (rows) vertically.
In vertical mode, impala-shell will print each row in the format:
firstly print a line contains line number, then print this row's columns
line by line, each column line started with it's name and a colon.

To enable it: use shell option '-E' or '--vertical', or 'set VERTICAL=
true' in interactive mode. to disable it in interactive mode: 'set
VERTICAL=false'. NOTICE: it will be disabled if '-B' option or 'set
WRITE_DELIMITED=true' is specified.

Tests:
add methods in test_shell_interactive.py and test_shell_commandline.py.

Change-Id: I5cee48d5a239d6b7c0f51331275524a25130fadf
Reviewed-on: http://gerrit.cloudera.org:8080/18549
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-06-13 15:41:07 +00:00
Joe McDonnell
0ee5f8084f IMPALA-11317/IMPALA-11316/IMPALA-11315: impala-shell Python 3 fixes
This fixes a few impala-shell Python 3 issues:
1. In ImpalaShell's do_history(), the decode() call needs to be
   avoided in Python 3, because in Python 3 the cmd is already
   a string and doesn't need further decoding. (IMPALA-11315)
2. TestImpalaShell.test_http_socket_timeout() gets a different
   error message in Python 3. It throws the "BlockingIOError"
   rather than "socker.error". (IMPALA-11316)
3. ImpalaHttpClient.py's code to retrieve the body when
   handling an HTTP error needs to have a decode() call
   for the body. Otherwise, the body remains bytes and
   causes TestImpalaShellInteractive.test_http_interactions_extra()
   to fail. (IMPALA-11317)

Testing:
 - Ran shell tests in the standard way
 - Ran shell tests with the impala-shell executable coming from
   a Python 3 virtualenv using the PyPi package

Change-Id: Ie58380a17d7e011f4ce96b27d34717509a0b80a6
Reviewed-on: http://gerrit.cloudera.org:8080/18556
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-05-25 22:47:40 +00:00
Riza Suminto
cf5eaae176 IMPALA-11305: Fix TypeError in impala-shell summary progress
impala-shell fail with TypeError when installed with python3. This is
due to behavior change of division operator ('/') between python2 vs
python3. This patch fix the issue by changing the operator with floor
division ('//') that result in integer type as described in
https://peps.python.org/pep-0238/.

Testing:
- Manually install impala-shell with from pip with python3 and verify
  the fix works.

Change-Id: Ifbe4df6a7a4136e590f383fc6475e2283e35eadc
Reviewed-on: http://gerrit.cloudera.org:8080/18546
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-05-23 18:44:33 +00:00
wzhou-code
397d1d15a2 IMPALA-10745: Support Kerberos over HTTP for impala-shell
This patch ports the implementation of GSSAPI authentication over http
transport from Impyla (https://github.com/cloudera/impyla/pull/415) to
impala-shell.

The implementation adds a new dependency on 'kerberos' python module,
which is a pip-installed module distributed under Apache License Version
2.
When using impala-shell with Kerberos over http, it is assumed that the
host has a preexisting kinit-cached Kerberos ticket that impala-shell
can pass to the server automatically without the user to reenter the
password.

Testing:
 - Passed exhaustive tests.
 - Tested manually on a real cluster with a full Kerberos setup.

Change-Id: Ia59ba4004490735162adbd468a00a962165c5abd
Reviewed-on: http://gerrit.cloudera.org:8080/18493
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-05-10 03:22:41 +00:00
Abhishek Rawat
8e755e7571 IMPALA-11126: impala-shell: Support configurable socket timeout for http
client

In 'hs2-http' mode, the socket timeout is None, which could cause
hang like symptoms in case of a problematic remote server.

Added support for configurable socket timeout using the new impala-shell
config option '--http_socket_timeout_s'. If a reasonable timeout is
set, impala-shell client can retry in case of connection issues, when
possible. The default value of '--http_socket_timeout_s' is set to None,
to prevent behavior changes for existing clients.

More details on socket timeout here:
https://docs.python.org/3/library/socket.html#socket-timeouts

Testing:
- Added tests for various timeout values in test_shell_commandline.py
- Ran e2e shell tests.

Change-Id: I29fa4ff96cdcf154c3aac7e43340af60d7d61e94
Reviewed-on: http://gerrit.cloudera.org:8080/18336
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2022-04-01 16:31:19 +00:00
Steve Carlin
d34039ced9 IMPALA-11096: Strict_hs2 mode in impala-shell does not support get_summary
The get_summary() thrift call is not supported in strict_hs2 mode
on impala-shell. The live_progress and live_summary options are
disabled when the strict_hs2_protocol flag is set.

Change-Id: I6aee838a80b4659a13a0a0cb9eabffa2c8767c8f
Reviewed-on: http://gerrit.cloudera.org:8080/18177
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2022-02-07 11:37:44 +00:00
Steve Carlin
5cfdff03f7 IMPALA-11095: Fix Impala-shell strict_hs2 mode inserts
The insert command was broken for impala-shell in the strict_hs2
mode. The return parameter for close_dml should return two parameters.

The parameters returned by close_dml are rows returned and error
rows. These are not supported by strict hs2 mode since the close
does not return the TDmlResult structure. So the message to
the end user also had to be changed.

Change-Id: Ibe837c99e54d68d1e27b97f0025e17faf0a2cb9f
Reviewed-on: http://gerrit.cloudera.org:8080/18176
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2022-02-04 07:42:52 +00:00
Steve Carlin
bb9fb663ce IMPALA-10778: Allow impala-shell to connect directly to HS2
Impala-shell already uses HS2 protocol to connect to Impalad.
This commit allows impala-shell to connect to any server (for
example, Hive) using the hs2 protocol. This will be done via
the "--strict_hs2_protocol" option.

When the "--strict_hs2_protocol" option is turned on, only features
supported by hs2 will work. For instance, "runtime-profile" is an
impalad specific feature and will be disabled.

The "--strict_hs2_protocol" will only work on servers that abide
by the strict definition of what is supported by HS2. So one will
be able to connect to Hive in this mode, but connections to Impala
will not work. Any feature supported by Hive (e.g. kerberos
authentication) should work as well.

Note: While authentication should work, the test framework is not
set up to create an HS2 server that does authentication at this point
so this feature should be used with caution.
Change-Id: I674a45640a4a7b3c9a577830dbc7b16a89865a9e
Reviewed-on: http://gerrit.cloudera.org:8080/17660
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-08-27 09:45:59 +00:00
wzhou-code
2b815cbd51 IMPALA-10784: Add support for retaining cookies in impala-shell
IMPALA-10234 added support for cookie authentication for LDAP to
impala-shell. But it does not accept user input cookie name via
startup flags, and it retains only one cookie.

In some scenarios, we could use proxy to manage the sessions with
additional HTTP cookies added by proxy.
This patch made cookie support more generic for impala-shell.
It lets the user specify cookie names via a startup flag
"--http_cookie_names" and could retain more than one cookies.

Testing:
 - Manualy tested the multiple cookies in HTTP headers with a
   customized Impala server which could send and receive multiple
   cookies.
 - Passed core test, including new test cases.

Change-Id: I193422d5ec891886a522d82ecb0e9d974132ff2a
Reviewed-on: http://gerrit.cloudera.org:8080/17667
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-07-13 00:07:11 +00:00
Csaba Ringhofer
94f67a3432 IMPALA-7825: Upgrade Thrift version to 0.11.0
Before this patch Impala mainly used Thrift 0.9.3, but it was
possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0
Thrift lib was already included in the toolchain.

Most of the changes are related to replacing boost:: with std::
shared_ptr-s in cpp code (this is a continuation of patch by Sahil).

The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as
Impala's test framework relies on Impyla. A thrift_sasl release is also
needed, because it currently pins Thrift version to 0.9.3 for Python 2.

The current patch uses alpha releases from Impyla and thrift_sasl that
use thrift 0.11.0.

Notable side effects:
- old logic to compile thrift for impala-shell with 0.11.0 was removed
- impala_shell's utf8 handling had to be updated as the new 0.11.0
  compilation happens with no_utf8strings. This also made things a
  bit faster, e.g the following is ~0.22s instead of ~0.25
  shell/impala_shell.py \
    -B -q "select * from functional_parquet.alltypes;" > /dev/null
- THRIFT-3921 changed the stream operators to print an enum's name
  instead of its number, leading to slightly different messages
  in some cases.
- "templates" was added to the thift generator's parameters to avoid
  a compilation issue (related to IMPALA-10600). I didn't notice any
  change in compilation time. This option generated .tcc files with
  templetized readers/writers for Thrift types. Currently we don't
  use these, but they could potentially speed up (de)serialization.

Testing:
- ran Impyla's test suite with Python 2 and 3
- ran core tests

Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Reviewed-on: http://gerrit.cloudera.org:8080/17170
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-04-27 13:36:54 +00:00
stiga-huang
d5f67fce41 IMPALA-10523: Fix impala-shell crash in printing error messages that contain UTF-8 characters
In Python2, print() converts all non-keyword arguments to strings like
str() does and writes them to the stream. str() on QueryStateException
returns its value(i.e. error message) which could be in unicode type.
Python2 will implicitly encode it to str type using the default
encoding, 'ascii'. This could result in UnicodeEncodeError when there
are non-ascii characters in the error message.

This patch explicitly encodes the error message using 'utf-8' encoding
if it's in unicode type and the shell is run in Python2.

Tests:
 - Add test in test_shell_interactive.py

Change-Id: Ie10f5b03ecc5877053c2fbada1afaf256b423a71
Reviewed-on: http://gerrit.cloudera.org:8080/17099
Reviewed-by: Tamas Mate <tmate@cloudera.com>
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-12 18:19:11 +00:00
stiga-huang
4c6cf4b2ef IMPALA-10434: Fix impala-shell's unicode regressions on Python2
To make impala-shell compatible for Python3, we explicitly distinguish
bytes and text in Python2 by decoding the bytes for all inputs.

Regression 1: multiple queries in one line with unicode chars will break

In precmd() of impala-shell, if there are multiple queries present in
one input line, we split it into individual queries (by
sqlparse.split()) and append them back to the 'cmdqueue'. They will be
passed to precmd() again. In our Python2 implementation, precmd()
expects them to be str type, and will decode them into unicode type.
However, the output type of sqlparse.split() is unicode which doesn't
have a decode() method. Calling decode() on a unicode var will let
Python2 implicitly encode it to str. This may cause UnicodeEncodeError
since implicitly encoding use 'ascii'.

Regression 2: multi-line query with unicode chars will break when
command history is enabled

In _check_for_command_completion(), when calling
readline.replace_history_item in Python2. We encode the completed_cmd
into bytes. However, we shouldn't replace it since the return type is
expected to be unicode.

Tests:
 - Add tests for these two regressions in Python2.

Change-Id: Icc4a8d31311a5c59e5fc0e65fe09f770df41bea4
Reviewed-on: http://gerrit.cloudera.org:8080/16960
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-01-20 10:20:02 +00:00
Attila Jeges
1c72c5a8f9 IMPALA-10234: Add support for cookie authentication to impala-shell
IMPALA-8584 added support for cookie authentication to Impala.
This change adds cookie authentication support to impala-shell
as well when using 'hs2-http' protocol.

Testing:
- Unit tests were added to test cookie handling methods.
- Tested e2e manually with nginx HTTP proxy.
TODO:
- Test with Knox HTTP proxy as well.

Change-Id: Icb0bc6e0f58f236866ca9913a2e63d97d5148f51
Reviewed-on: http://gerrit.cloudera.org:8080/16660
Reviewed-by: Attila Jeges <attilaj@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-17 19:21:19 +00:00
Thomas Tauber-Marshall
01e1b4df80 IMPALA-10303: Fix warnings from impala-shell with --quiet
When the --quiet flag is used with impala-shell, the intention is that
if the query is successful then only the query results should be
printed.

This patch fixes two cases where --quiet was not being respected:
- When using the HTTP transport and --client_connect_timeout_ms is
  set, a warning is printed that the timeout is not applied.
- When running in non-interactive mode, a warning is printed that
  --live_progress is automatically disabled. This warning is now also
  only printed if --live_progress is actually set.

Testing:
- Added a test that runs a simple query with --quiet and confirms the
  output is as expected.

Change-Id: I1e94c9445ffba159725bacd6f6bc36f7c91b88fe
Reviewed-on: http://gerrit.cloudera.org:8080/16673
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-30 02:17:29 +00:00
stiga-huang
2a1d3acaf1 IMPALA-9870: impala-shell 'summary' to show original and retried queries
This patch extends the 'summary' command of impala-shell to support
retrieving the summary of the original query attempt. The new syntax is

SUMMARY [ALL | LATEST | ORIGINAL]

If 'ALL' is specified, both the latest and original summaries are
printed. If 'LATEST' is specified, only the summary of the latest query
attempt is printed. If 'ORIGINAL' is specified, only the summary of the
original query attempt is printed. The default option is 'LATEST'.
Support for this has only been added to HS2 given that Beeswax is being
deprecated soon.

Tests:
 - Add new tests in test_shell_interactive.py

Change-Id: I8605dd0eb2d3a2f64f154afb6c2fd34251c1fec2
Reviewed-on: http://gerrit.cloudera.org:8080/16502
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-24 05:11:06 +00:00
Attila Jeges
3d067572dd IMPALA-10224: Add startup flag not to expose debug web url to clients
This patch introduces a new startup flag
--ping_expose_webserver_url (true by default) to control whether
PingImpalaService, PingImpalaHS2Service RPC calls should expose
the debug web url to the client or not.

This is necessary as the debug web UI is not something that
end-users will necessarily have access to.

If the flag is set to false, the RPC calls will return an empty
string instead of the real url signalling that the debug web ui
is not available.

Note that if the webserver is disabled (--enable_webserver flag
is set to false) the RPC calls will behave the same and return an
empty string for the url.

Change-Id: I7ec3e92764d712b8fee63c1f45b038c31c184cfc
Reviewed-on: http://gerrit.cloudera.org:8080/16573
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-14 14:39:39 +00:00
Thomas Tauber-Marshall
179b14876d IMPALA-10074: Set impala-shell's default protocol to hs2
The beeswax interface has been marked deprecated for awhile, but it
remains the default protocol for impala-shell because we felt that
changing the default protocol constituted a break change. Now that 4.0
is the next release, we can switch the default protocol to hs2.

This patch also adds a few more deprecation warnings around beeswax.

Change-Id: I65eb14ec03c1e1ef26782554aedd6670bbeedfe8
Reviewed-on: http://gerrit.cloudera.org:8080/16327
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-22 03:03:32 +00:00
Tamas Mate
3ef7756628 IMPALA-10051: impala-shell exits with ValueError with WITH clauses
When a query contains WITH clause impala-shell tries to identify whether
it is a DML query or not, so that later it can provide appropriate
result messages. Earlier shlex was used to create tokens and assess the
query type based on that. However shlex can misinterpret some query
strings where whitespace charachters are mixed with quotes, because it
splits the string based on whitespace charachters. In some scenarios
'ValueError: No closing quotation' error can occur.

This change moves the tokenization from shlex to sqlparse.

Testing:
 - Added unit test to cover queries that contain mixed whitespaces
   and strings

Change-Id: I442d3bc65b90a55c73c847948d5179a8586d71ad
Reviewed-on: http://gerrit.cloudera.org:8080/16389
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-18 04:06:22 +00:00
Sahil Takiar
13f50eaec5 IMPALA-9229: impala-shell 'profile' to show original and retried queries
Currently, the impala-shell 'profile' command only returns the profile
for the most recent profile attempt. There is no way to get the original
query profile (the profile of the first query attempt that failed) from
the impala-shell.

This patch modifies TGetRuntimeProfileReq and TGetRuntimeProfileResp to
add support for returning both the original and retried profiles for a
retried query. When a query is retried, TGetRuntimeProfileResp currently
contains the profile for the most recent query attempt.
TGetRuntimeProfileReq has a new field called 'include_query_attempts'
and when it is set to true, the TGetRuntimeProfileResp will include all
failed profiles in a new field called failed_profiles /
failed_thrift_profiles.

impala-shell has been modified so the 'profile' command has a new set of
options. The syntax is now:

PROFILE [ALL | LATEST | ORIGINAL]

If 'ALL' is specified, both the latest and original profiles are
printed. If 'LATEST' is specified, only the latest profile is printed.
If 'ORIGINAL' is printed, only the original profile is printed. The
default behavior is equivalent to specifying 'LATEST' (which is the
current behavior before this patch as well).

Support for this has only been added to HS2 given that Beeswax is being
deprecated soon. The new 'profile' options have no affect when the
Beeswax protocol is used.

Most of the code change is in impala-hs2-server and impala-server; a lot
of the GetRuntimeProfile code has been re-factored.

Testing:
* Added new impala-shell tests
* Ran core tests

Change-Id: I89cee02947b311e7bf9c7274f47dfc7214c1bb65
Reviewed-on: http://gerrit.cloudera.org:8080/16406
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-17 20:55:45 +00:00
Adam Tamas
fe6e625747 IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix
While the ds_hll_sketch() generates a string value as output the data
is not an ascii encoded text but a bitsketch, because of this, when
the shell get this data it disconnect while it tries to decode it.

The issue can be reproduced with a simple method like using unhex
with a wrong input.
Example: SELECT unhex("aa");

This patch contains a solution, where we replace any not UTF-8
decodable characters if we run into an UnicodeDecodeError after
fetching it.

This solution is working with the Thrift 0.9.3 autogenerated gen-py
but still fails with Thrift 0.11.0.

For Thrift 0.11.0 the error is catched and an error message is sent
(not working with beeswax protocol, because it generates a different
error (TypeError) which can come for other reasons too).

Testing:
-manual testing with these protocols: 'hs2-http', 'hs2', 'beeswax'

Change-Id: I0c5f1290356e21aed8ca7f896f953541942aed05
Reviewed-on: http://gerrit.cloudera.org:8080/16418
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
2020-09-05 09:42:46 +00:00
Csaba Ringhofer
b7965d8240 Revert "IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix"
This reverts commit 75146c9138.

Change-Id: I57f790389a8c847877999d2b9b8185939b416c07
Reviewed-on: http://gerrit.cloudera.org:8080/16417
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>
2020-09-04 12:28:56 +00:00
Adam Tamas
75146c9138 IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix
While the ds_hll_sketch() generates a string value as output the data
is not an ascii encoded text but a bitsketch, because of this, when
the shell get this data it disconnect while it tries to decode it.

The issue can be reproduced with a simple method like using unhex
with a wrong input.
Example: SELECT unhex("aa");

This patch contains a solution, where we replace any not UTF-8
decodable characters if we run into an UnicodeDecodeError after
fetching it.

This solution is working with the Thrift 0.9.3 autogenerated gen-py
but still fails with Thrift 0.11.0.

For Thrift 0.11.0 the error is catched and an error message is sent
(not working with beeswax protocol, because it generates a different
error (TypeError) which can come for other reasons too).

Testing:
-manual testing with these protocols: 'hs2-http', 'hs2', 'beeswax'

Change-Id: Ic5cfb907871ca83e5f04a39ca9d7a8e138d711a8
Reviewed-on: http://gerrit.cloudera.org:8080/16305
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>
2020-09-04 12:18:28 +00:00