This patch changes the behaviour of the Impala shell to refuse to
attempt an LDAP-authenticated connection to Impala unless SSL/TLS is
configured.
A new flag --auth_creds_in_clear_ok is added to suppress this
behaviour. This is similar to Impala's --ldap_passwords_in_clear_ok
flag. The shell will also now print a warning if an insecure
configuration is used.
Change-Id: Ide25d8dd881a61b9f08900112466c430da64a038
Reviewed-on: http://gerrit.cloudera.org:8080/546
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This patch adds a 'tip-of-the-day' message to the shell's intro header,
and a TIP command that prints out a random tip. This might be a good way
to make advanced or little-known functionality of both Impala and the
shell better known to users.
Change-Id: I987b386f8c96f8a75ccd5cd9197a8f2981c8bf43
Reviewed-on: http://gerrit.cloudera.org:8080/586
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This patch adds a way to allow for dynamic progress reporting in the
shell. There are two new command line flags for the shell
--live_progress - will print the completed vs total # of scan ranges
--live_summary - prints an updated exec summary
In addition to the command line flags, these options can be set from
within the shell using:
set LIVE_SUMMARY=True
set LIVE_PROGRESS=True
The new options will be listed under shell options. Both reports will be
updated at most every second, for longer running queries it will be
adjusted to the time between two RPC calls to get the query status. To
provide this information in the ExecSummary, the Thrift structure for
the ExecSummary was extended to contain a progress indicator. The output
is printed to stderr and only available in interactive mode.
An example video is available here:
https://asciinema.org/a/5wi7ypckx4ol4ha1hlg3e3q1k
Change-Id: I70b2ab5fa74dc2ba5bc3b338ef13ddc6ccf367d2
Reviewed-on: http://gerrit.cloudera.org:8080/508
Tested-by: Internal Jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
The '-f' option in Impala shell is used to read from a file containing SQL queries.
Now, additional support is added to read from STDIN by using:
"-f -"
An additional bug was fixed in the test script test_shell_commandline.py where certain
tests(test_default_db and test_unsecure_message) would hang indefinitely due to the
subprocess(impala-shell) waiting for user input. Fixed by piping STDIN to the subprocess
which sends an implicit EOF that closes the impala-shell once the test is completed.
Change-Id: I9a2682e086a3345e089f3e9db7cc049ce3d2c19a
Reviewed-on: http://gerrit.cloudera.org:8080/479
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
This patch introduces the concept of error codes for errors that are
recorded in Impala and are going to be presented to the client. These
error codes are used to aggregate and group incoming error / warning
messages to reduce the spill on the shell and increase the usefulness of
the messages. By splitting the message string from the implementation,
it becomes possible to edit the string independently of the code and
pave the way for internationalization.
Error messages are defined as a combination of an enum value and a
string. Both are defined in the Error.thrift file that is automatically
generated using the script in common/thrift/generate_error_codes.py. The
goal of the script is to have a central understandable repository of
error messages. Adding new messages to this file will require rebuilding
the thrift part. The proxy class ErrorMessage is responsible to
represent an error and capture the parameters that are used to format
the error message string.
When error messages are recorded they are recorded based on the
following algorithm:
- If an error message is of type GENERAL, do not aggregate this message
and simply add it to the total number of messages
- If an error messages is of specific type, record the first error
message as a sample and for all other occurrences increment the count.
- The coordinator will merge all error messages except the ones of type
GENERAL and display a count.
For example, in the case of the parquet file spanning multiple blocks
the output will look like:
Parquet files should not be split into multiple hdfs-blocks.
file=hdfs://localhost:20500/fid.parq (1 of 321 similar)
All messages are always logged to VLOG. In the coordinator error
messages are merged across all backends to retain readability in the
case of large clusters.
The current version of this patch adds these new error codes to some of
the most important error messages as a reference implementation.
Change-Id: I1f1811631836d2dd6048035ad33f7194fb71d6b8
Reviewed-on: http://gerrit.cloudera.org:8080/39
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
Upgrading sqlparse ended up trading one bug for another. The new bug is
not fixed upstream, I sent a patch. The problem is '\\' is not
considered a terminated string and we use this in the phrase "fields
escaped by '\\'" when creating tables.
Change-Id: Id57081f5a96e997afd3aa9b26dca23f627488fc3
Reviewed-on: http://gerrit.cloudera.org:8080/117
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
The only thing this commit does is upgrade sqlparse. The upgrade was
done by downloading and extracting the tarball, nothing else (such as
patching). The older version of sqlparse would parse
SELECT
'
;
'
;
into two statements. Neither statement is complete due to the open quote
and this would cause an infinite loop. The bug is already fixed in the
newest version of sqlparse.
Change-Id: I7ce7c269769ae0cde3dc8ca386d0b0e11bea71c1
Reviewed-on: http://gerrit.cloudera.org:8080/102
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
The shell uses the thrift sasl client (thrift_sasl.py) for communication with
impalads when using ldap and/or kerberos. The sasl client currently makes calls
into sasl for every buffer repeatedly even when it is not necessary, resulting
in a significant performance degradation when using the impala-shell w/ kerberos
and/or ldap.
thrift_sasl.py was forked from hue at some point, and this change updates the
code to reflect the updated code as of hue commit a9898b4e815b3ec9918c5db65e0d9bd1d0ecdde0
which only calls into sasl when frames are encoded.
Change-Id: Ic2194d51c2c4470d48c617c054ba8e90053052f9
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5482
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
This patch sets the impala-shell's PYTHON_EGG_CACHE to a per-user temporary location. Not
doing this can sometimes lead to permission issues.
Change-Id: I6dda335e3b2a91b4d471f8794bed8c351d90c9ae
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5311
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
This patch adds a line to the signal handler that closes
queries that have been cancelled. This patch closes the
cancelled query in the signal handler if it is not already
closed. This patch also improves the cancellation test so
it catches this problem in the future.
Change-Id: I1bb2a4a8fc3c3d40b8e4ba41f4b2bcf6d32bc297
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5303
Reviewed-by: Alex Leblang <alex.leblang@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5320
This patch will make sure that all error and warning messages are
fetched from impala. Before this patch, error messages that would occur
after the first row was retrieved would not be catched correctly.
Change-Id: I82a9a1c810e2ffabc7e56c996276082548ced805
(cherry picked from commit c357f530963673859be33726e8e5a896d77de097)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5263
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: jenkins
This patch enables the shell to pass query strings as-is to Impala's parser. In order to
preserve previous behaviour, we transform multi-line queries before writing them to
history. We replace EOL with an obscure ascii character (DLE), and re-apply the
transformation when reading it back from history.
Change-Id: I021b9c3d50b03df73bea1afd6ce3ec6b413484e0
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4664
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Conflicts:
shell/impala_shell.py
socket.error is the only caught exception using the 'as e' syntax. This patch
fixes the syntax to be compatible with python2.4, which does not support it.
The shell cancellation tests exercise this path, so no tests need to be added.
Change-Id: I5588f25612953a28d0817081005d5770115ca106
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4638
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Earlier, if the user connected via the command line, the shell would only attempt to
create a new Impala Client instance if a connection did not exist. This resulted in the
shell not connecting to the user specified Impalad.
Change-Id: I74c291256d0c063f6324b01aa7336282e6969a4e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4392
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Follow up fix to IMPALA-1153. Ensure that the correct
CmdStatus is returned by the summary command. (ERROR for
invalid queries and SUCCESS even if summary is not available.)
Change-Id: Icf67164dc82f202ec15071541f6ed3b26e3ad7fb
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4089
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Restructured how the cmd control flow executes commands in postcmd,
removing the hack to make non-interactive mode work. Now there are
values to represent the different command execution statuses.
Change-Id: I149b65d8a64d63a978fed284f0ad0da95833149c
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3850
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
(cherry picked from commit b33dc4d10bcc3982dad43015343c91f1d277bb3f)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4096
Reviewed-by: Henry Robinson <henry@cloudera.com>
This is the first iteration of a kerberized development environment.
All the daemons start and use kerberos, with the sole exception of the
hive metastore. This is sufficient to test impala authentication.
When buildall.sh is run using '-kerberize', it will stop before
loading data or attempting to run tests.
Loading data into the cluster is known to not work at this time, the
root causes being that Beeline -> HiveServer2 -> MapReduce throws
errors, and Beeline -> HiveServer2 -> HBase has problems. These are
left for later work.
However, the impala daemons will happily authenticate using kerberos
both from clients (like the impala shell) and amongst each other.
This means that if you can get data into the mini-cluster, you could
query it.
Usage:
* Supply a '-kerberize' option to buildall.sh, or
* Supply a '-kerberize' option to create-test-configuration.sh, then
'run-all.sh -format', re-source impala-config.sh, and then start
impala daemons as usual. You must reformat the cluster because
kerberizing it will change all the ownership of all files in HDFS.
Notable changes:
* Added clean start/stop script for the llama-minikdc
* Creation of Kerberized HDFS - namenode and datanodes
* Kerberized HBase (and Zookeeper)
* Kerberized Hive (minus the MetaStore)
* Kerberized Impala
* Loading of data very nearly working
Still to go:
* Kerberize the MetaStore
* Get data loading working
* Run all tests
* The unknown unknowns
* Extensive testing
Change-Id: Iee3f56f6cc28303821fc6a3bf3ca7f5933632160
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4019
Reviewed-by: Michael Yoder <myoder@cloudera.com>
Tested-by: jenkins
An anti-join query in IMPALA-1175 revealed that the shell
mishandled the exception when the impalad would crash. This
fix prints the correct error message and sets the shell
state as disconnected.
Note there isn't a way to test this easily due to the difficulty
in coming up with a query that will crash the impalad.
Change-Id: I534674880db224a0d93dfd8bd2c081a12b65532b
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3998
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4080
This is a reorganization of the existing impala-shell.
The basic idea was to split up the shell into two components: one part
soley responsible for the CLI functionality, and another to represent
the impala client/connection that would interact with the Beeswax api and
execute queries, fetch results, etc.
One major change was to redo how the existing shell handled cancellation,
which was to create a thread for each rpc, so that Ctrl+C would not interrupt
the system calls and break the socket connection. In the new approach,
a new client instance is created to close the query and if the socket connection is
broken, the client reconnects. Cancellation currently works.
Change-Id: I0f371f68552c065b2317f967c6cf7483b44be3df
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3316
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4008
Commands with escaped single quotes would cause
the shell to enter an infinite loop while trying
to parse the command due to shlex not escaping single
quotes correctly. Once that change was implemented,
shlex would now ignore escaped single and double quotes
outside of closed quotes, so there needed to be a check for
that as well.
ALSO, implemented testing of commands in interactive mode.
Needed this to test these inputs, as command line input
cannot span multiple lines.
Change-Id: Id67368944eeb9a73061bc3e90bd6cda73c9d9f64
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3408
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3893
Before, when the show_profiles -p option was enabled, the runtime profile
would be printed before the query was closed, preventing the query summary
from being printed. Now the profile is printed after the query is closed.
Change-Id: Icf7b10f7612d8016736aac70aa7b77265d391a98
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3770
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3821
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Optionally loads options from a file in the user's home directory, called
'.impalarc', though the path to another file can be passed in as a command-line
option. The file must have a case-sensitive [impala] header. Specifying
the option in the command line overwrites the config file's value
for the option for that instance of the shell. If an option is not
specified in the config file, its default value is used.
Change-Id: I218da2c1e10308c5b8729883fa625f0c284397a7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2956
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3629
There was an issue with the previous fix to IMPALA-1059
if the user tried to reconnect within the shell after
having passed in a database via the -d option. The
passed database would be doubly backticked. This makes
the backticking of the argument idempotent.
Change-Id: I6eaed997c2be73d8659a2a12046ce393b97ec82c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3467
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3502
If not backticked, arguments such as parquet are interpreted as
keywords, when it is possible a database by that name exists.
This could have been avoided via single quotes around backticks: -d '`parquet`'
Otherwise, -d `parquet` throws a commandline error.
In interactive mode, backticks alone (ex. use `parquet`) will pass the
name as an identifier rather than a keyword.
Change-Id: I24b43eeeb6b4bfda5388165856788a20b64bc2ba
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3307
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3500
Users can now type 'summary' in the Impala shell after a query executes
to get a breakdown of the work done by each part of the query plan.
Change-Id: Ia6a43429ffc7778f3c2c8fcbf45d83828263c2ab
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2963
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
(cherry picked from commit 9b98d42acb14d43a64832767528ee572eac4979b)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2995
Options displayed with 'set' command. Default values distinguished
from set values by square brackets.
Change-Id: Iacf0574555aab78aa0ba2008ceb8776d372a57a5
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2913
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Without this change, the shell would always print the error log twice
for successful non-insert queries (once in __execute_query() and once
in __fetch()).
Change-Id: I0ab038230df897559b30feaea34778ea72988bc3
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2815
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 095dba7f395491db03daf19ff3bff2e2b4640ee4)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2908
COMPUTE STATS is an async DDL command. When COMPUTE STATS fails it will set the
query status of the QueryExecState properly, but the original Beeswax::query() RPC
won't throw. The Impala shell sometimes did not pick up and display the
query status because no RPC actually threw. To fix this, I modified
Beeswax::get_log() to include the query status if it is not ok. The shell looks
for a special prefix to distinguish the query status from the runtime state error log.
Change-Id: I0d9dbf0801629a37de22ea4ebb6d2e5d53b836ef
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1899
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2063
The problem was that we were setting a flag marking the last_query_handle as closed, but
were not resetting the flag before the next query. This caused the first query to
be closed properly, but subsequent queries would not be closed. The fix is to change
where the flag is reset to the same place as where we assign last_query_handle.
Added a test case.
Change-Id: I870a96789489bfe4f388910b808409cd0584af8a
(cherry picked from commit 1439151af5b63112b0dd631fac9c7ab4d43bba37)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1976
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
The problem was that were were deleting the version.info file because the default
of gen_build_version.py recently changed from --noclean to --clean.
Also fixed a bug in the shell version generation and made debugging a bit easier
by dumping the contents of version.info whenever it is generated.
Change-Id: I764d01c9e46eed1bd39de79bf076c15afa599486
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1901
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
(cherry picked from commit fa673b4d3342fc825ee7fa942bd254234d222906)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1910
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
This change makes the fetch rpc interruptable. If the user cancels the query in
the middle of a fetch, the shell reconnects to the impalad and closes the
query. It also includes some code consolidation.
Change-Id: Iaaf0dfd4cba9ce2557e4a7d0447bc9c3ffda5e29
Reviewed-on: http://gerrit.ent.cloudera.com:8080/717
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Reindent is very expensive for large queries, removing it makes query parsing performant
for no loss of functionality.
Change-Id: I4d2bec6bbabaf949aa0f64193a6e2b3b3725407e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/966
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
The shell uses an external module called sqlparse to strip the comments from a query file.
When sqlparse.format() is invoked, it runs several grouping functions on the
tokenized query text; some of these methods are very slow, and not needed for comment
removal. This change restricts sqlparse to only invoke the grouping function for removing
comments.
Change-Id: I3a067187667fcd3cd331156a325960a3de2db9c2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/944
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins