This patch allows you to write SOURCE <file> or SRC <file>, and have the
shell read the file and execute all the queries in it.
Change-Id: Ib05df3e755cd12e9e9562de6b353857940eace03
Reviewed-on: http://gerrit.cloudera.org:8080/2663
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
The SET command has been extended with the following syntax, to allow
setting of variables in the Impala Shell:
SET VAR:<variable_name>=<value>
The UNSET command has also been modified to allow:
UNSET VAR:<variable_name>
This patch builds on the changes in IMPALA-2179. The main change for
this patch was to ensure that all SET commands are processed by the
shell, rather than being send to the front end as a query. For this
I had to modify the command sanitization function to remove comments
that happen in front of a SET command.
Comments can be a can of worms to parse, so I tried to be as strict
as possible to avoid collateral effects. Comments are only removed
if they happen right at the beginning of the line AND before a SET
command. NO other comments are touched, including comments before,
after or within queries.
Change-Id: I87e07385122187ab8d324346499896a3dfbbafe6
Reviewed-on: http://gerrit.cloudera.org:8080/679
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This patch adds the command line option `--var` to allow the user to set
variable to be used in commands within the shell. It does *not* implement the
setting of variables through the SET command, as Hive does. This extension will
be implemented separately on IMPALA-2180.
The syntax for specifying a parameter in the command line is --var=KEY=VAL, as
for example: --var=start_date=20150101
Variables are textually replaced by their value in the Impala shell commands.
The substitution work similarly for interactive sessions as well as for command
line queries and/or scripts (-q and -f options, respectively).
Variables can be referenced as ${VAR:VAR_NAME} (case-insensitive). The form
${HIVEVAR:VAR_NAME} can also be used for compatibility with Hive scripts.
To prevent any of the reference expressions above from being replaced you can
escape them with a backslash (e.g. \${VAR:VAR_NAME} and \${HIVEVAR:VAR_NAME}).
The Impala shell's SET command now also reports the set variables and their
values.
Change-Id: Ia491fae91256334bb60c9066d119fe9a1e9779dd
Reviewed-on: http://gerrit.cloudera.org:8080/611
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Adds a new option --ldap_password_cmd that takes a string which is
executed as a shell command. The stdout results are used as the LDAP
password for this shell session.
Tests are added for the negative case (where the command fails for some
reason), but without tests for successful LDAP connections we can't test
the case where the password is correct.
Change-Id: Ib0362be5e167ff752e764ad2152c4c4b679f83c2
Reviewed-on: http://gerrit.cloudera.org:8080/1542
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
The original error reporting relied on $0 being accessible from the
current working dir, which failed if a script changed the working dir
and $0 was relative. This updates the error reporting command to cd back
to the original dir before accessing $0.
Change-Id: I2185af66e35e29b41dbe1bb08de24200bacea8a1
Reviewed-on: http://gerrit.cloudera.org:8080/1666
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Use -q flag so that less output is produced when building the eggs.
Change-Id: Ic356d9a84b30d2b1d8ba02558b2565e22bbfcda2
Reviewed-on: http://gerrit.cloudera.org:8080/1742
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Remove a hack in impala-shell that suppressed any error messages that
included the string "Cancelled". This originally was necessary since
user-initiated cancellation messages were propagated back to the client,
but is no longer necessary. Certain errors that occur asynchronously
were suppressed, because the error is propagated by cancelling the
query.
Confirmed that no error messages are printed for manually cancelled
queries, and that error messages that were previously suppressed (from
my IMPALA-2298 patch) now show in the shell.
Change-Id: Iac53b1307768cbb07640ddc88b152ae71c71beab
Reviewed-on: http://gerrit.cloudera.org:8080/1529
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Changes:
1) Consistently use "set -euo pipefail".
2) When an error happens, print the file and line.
3) Consolidated some of the kill scripts.
4) Added better error messages to the load data script.
5) Changed use of #!/bin/sh to bash.
Change-Id: I14fef66c46c1b4461859382ba3fd0dee0fbcdce1
Reviewed-on: http://gerrit.cloudera.org:8080/1620
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Impala shell cannot get child query handle so it cannot
query live progress for COMPUTE STATS query. Disable live
progress callback for compute stats query.
Change-Id: I2d2f342a805905a4fa868686e7c9e9362c2c2223
Reviewed-on: http://gerrit.cloudera.org:8080/1109
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
This patch addresses the problem by not importing readline if the shell is used in
non-interactive mode. More information on the readline bug can be found
here: https://bugs.python.org/issue19884
Change-Id: Ia9dae6308c3807d3fd323449766bf7e7371f800f
Reviewed-on: http://gerrit.cloudera.org:8080/511
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
This patch calls PingImpalaService() at the beginning of each command
loop (if the shell is currently connected). If the call fails, the shell
will try and reconnect. Reconnecting is best-effort - if it fails, the
command is processed anyway so as not to interfere with any commands
that might still give useful output in a disconnected state.
Change-Id: I37cb2f4fc235fedff16d48ad5125b9a30bd7dfd0
Reviewed-on: http://gerrit.cloudera.org:8080/547
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
This patch changes the behaviour of the Impala shell to refuse to
attempt an LDAP-authenticated connection to Impala unless SSL/TLS is
configured.
A new flag --auth_creds_in_clear_ok is added to suppress this
behaviour. This is similar to Impala's --ldap_passwords_in_clear_ok
flag. The shell will also now print a warning if an insecure
configuration is used.
Change-Id: Ide25d8dd881a61b9f08900112466c430da64a038
Reviewed-on: http://gerrit.cloudera.org:8080/546
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This patch adds a 'tip-of-the-day' message to the shell's intro header,
and a TIP command that prints out a random tip. This might be a good way
to make advanced or little-known functionality of both Impala and the
shell better known to users.
Change-Id: I987b386f8c96f8a75ccd5cd9197a8f2981c8bf43
Reviewed-on: http://gerrit.cloudera.org:8080/586
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This patch adds a way to allow for dynamic progress reporting in the
shell. There are two new command line flags for the shell
--live_progress - will print the completed vs total # of scan ranges
--live_summary - prints an updated exec summary
In addition to the command line flags, these options can be set from
within the shell using:
set LIVE_SUMMARY=True
set LIVE_PROGRESS=True
The new options will be listed under shell options. Both reports will be
updated at most every second, for longer running queries it will be
adjusted to the time between two RPC calls to get the query status. To
provide this information in the ExecSummary, the Thrift structure for
the ExecSummary was extended to contain a progress indicator. The output
is printed to stderr and only available in interactive mode.
An example video is available here:
https://asciinema.org/a/5wi7ypckx4ol4ha1hlg3e3q1k
Change-Id: I70b2ab5fa74dc2ba5bc3b338ef13ddc6ccf367d2
Reviewed-on: http://gerrit.cloudera.org:8080/508
Tested-by: Internal Jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
The '-f' option in Impala shell is used to read from a file containing SQL queries.
Now, additional support is added to read from STDIN by using:
"-f -"
An additional bug was fixed in the test script test_shell_commandline.py where certain
tests(test_default_db and test_unsecure_message) would hang indefinitely due to the
subprocess(impala-shell) waiting for user input. Fixed by piping STDIN to the subprocess
which sends an implicit EOF that closes the impala-shell once the test is completed.
Change-Id: I9a2682e086a3345e089f3e9db7cc049ce3d2c19a
Reviewed-on: http://gerrit.cloudera.org:8080/479
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
This patch introduces the concept of error codes for errors that are
recorded in Impala and are going to be presented to the client. These
error codes are used to aggregate and group incoming error / warning
messages to reduce the spill on the shell and increase the usefulness of
the messages. By splitting the message string from the implementation,
it becomes possible to edit the string independently of the code and
pave the way for internationalization.
Error messages are defined as a combination of an enum value and a
string. Both are defined in the Error.thrift file that is automatically
generated using the script in common/thrift/generate_error_codes.py. The
goal of the script is to have a central understandable repository of
error messages. Adding new messages to this file will require rebuilding
the thrift part. The proxy class ErrorMessage is responsible to
represent an error and capture the parameters that are used to format
the error message string.
When error messages are recorded they are recorded based on the
following algorithm:
- If an error message is of type GENERAL, do not aggregate this message
and simply add it to the total number of messages
- If an error messages is of specific type, record the first error
message as a sample and for all other occurrences increment the count.
- The coordinator will merge all error messages except the ones of type
GENERAL and display a count.
For example, in the case of the parquet file spanning multiple blocks
the output will look like:
Parquet files should not be split into multiple hdfs-blocks.
file=hdfs://localhost:20500/fid.parq (1 of 321 similar)
All messages are always logged to VLOG. In the coordinator error
messages are merged across all backends to retain readability in the
case of large clusters.
The current version of this patch adds these new error codes to some of
the most important error messages as a reference implementation.
Change-Id: I1f1811631836d2dd6048035ad33f7194fb71d6b8
Reviewed-on: http://gerrit.cloudera.org:8080/39
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
Upgrading sqlparse ended up trading one bug for another. The new bug is
not fixed upstream, I sent a patch. The problem is '\\' is not
considered a terminated string and we use this in the phrase "fields
escaped by '\\'" when creating tables.
Change-Id: Id57081f5a96e997afd3aa9b26dca23f627488fc3
Reviewed-on: http://gerrit.cloudera.org:8080/117
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
The only thing this commit does is upgrade sqlparse. The upgrade was
done by downloading and extracting the tarball, nothing else (such as
patching). The older version of sqlparse would parse
SELECT
'
;
'
;
into two statements. Neither statement is complete due to the open quote
and this would cause an infinite loop. The bug is already fixed in the
newest version of sqlparse.
Change-Id: I7ce7c269769ae0cde3dc8ca386d0b0e11bea71c1
Reviewed-on: http://gerrit.cloudera.org:8080/102
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
The shell uses the thrift sasl client (thrift_sasl.py) for communication with
impalads when using ldap and/or kerberos. The sasl client currently makes calls
into sasl for every buffer repeatedly even when it is not necessary, resulting
in a significant performance degradation when using the impala-shell w/ kerberos
and/or ldap.
thrift_sasl.py was forked from hue at some point, and this change updates the
code to reflect the updated code as of hue commit a9898b4e815b3ec9918c5db65e0d9bd1d0ecdde0
which only calls into sasl when frames are encoded.
Change-Id: Ic2194d51c2c4470d48c617c054ba8e90053052f9
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5482
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
This patch sets the impala-shell's PYTHON_EGG_CACHE to a per-user temporary location. Not
doing this can sometimes lead to permission issues.
Change-Id: I6dda335e3b2a91b4d471f8794bed8c351d90c9ae
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5311
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
This patch adds a line to the signal handler that closes
queries that have been cancelled. This patch closes the
cancelled query in the signal handler if it is not already
closed. This patch also improves the cancellation test so
it catches this problem in the future.
Change-Id: I1bb2a4a8fc3c3d40b8e4ba41f4b2bcf6d32bc297
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5303
Reviewed-by: Alex Leblang <alex.leblang@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5320
This patch will make sure that all error and warning messages are
fetched from impala. Before this patch, error messages that would occur
after the first row was retrieved would not be catched correctly.
Change-Id: I82a9a1c810e2ffabc7e56c996276082548ced805
(cherry picked from commit c357f530963673859be33726e8e5a896d77de097)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5263
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: jenkins
This patch enables the shell to pass query strings as-is to Impala's parser. In order to
preserve previous behaviour, we transform multi-line queries before writing them to
history. We replace EOL with an obscure ascii character (DLE), and re-apply the
transformation when reading it back from history.
Change-Id: I021b9c3d50b03df73bea1afd6ce3ec6b413484e0
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4664
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Conflicts:
shell/impala_shell.py
socket.error is the only caught exception using the 'as e' syntax. This patch
fixes the syntax to be compatible with python2.4, which does not support it.
The shell cancellation tests exercise this path, so no tests need to be added.
Change-Id: I5588f25612953a28d0817081005d5770115ca106
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4638
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Earlier, if the user connected via the command line, the shell would only attempt to
create a new Impala Client instance if a connection did not exist. This resulted in the
shell not connecting to the user specified Impalad.
Change-Id: I74c291256d0c063f6324b01aa7336282e6969a4e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4392
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Follow up fix to IMPALA-1153. Ensure that the correct
CmdStatus is returned by the summary command. (ERROR for
invalid queries and SUCCESS even if summary is not available.)
Change-Id: Icf67164dc82f202ec15071541f6ed3b26e3ad7fb
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4089
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Restructured how the cmd control flow executes commands in postcmd,
removing the hack to make non-interactive mode work. Now there are
values to represent the different command execution statuses.
Change-Id: I149b65d8a64d63a978fed284f0ad0da95833149c
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3850
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
(cherry picked from commit b33dc4d10bcc3982dad43015343c91f1d277bb3f)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4096
Reviewed-by: Henry Robinson <henry@cloudera.com>
This is the first iteration of a kerberized development environment.
All the daemons start and use kerberos, with the sole exception of the
hive metastore. This is sufficient to test impala authentication.
When buildall.sh is run using '-kerberize', it will stop before
loading data or attempting to run tests.
Loading data into the cluster is known to not work at this time, the
root causes being that Beeline -> HiveServer2 -> MapReduce throws
errors, and Beeline -> HiveServer2 -> HBase has problems. These are
left for later work.
However, the impala daemons will happily authenticate using kerberos
both from clients (like the impala shell) and amongst each other.
This means that if you can get data into the mini-cluster, you could
query it.
Usage:
* Supply a '-kerberize' option to buildall.sh, or
* Supply a '-kerberize' option to create-test-configuration.sh, then
'run-all.sh -format', re-source impala-config.sh, and then start
impala daemons as usual. You must reformat the cluster because
kerberizing it will change all the ownership of all files in HDFS.
Notable changes:
* Added clean start/stop script for the llama-minikdc
* Creation of Kerberized HDFS - namenode and datanodes
* Kerberized HBase (and Zookeeper)
* Kerberized Hive (minus the MetaStore)
* Kerberized Impala
* Loading of data very nearly working
Still to go:
* Kerberize the MetaStore
* Get data loading working
* Run all tests
* The unknown unknowns
* Extensive testing
Change-Id: Iee3f56f6cc28303821fc6a3bf3ca7f5933632160
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4019
Reviewed-by: Michael Yoder <myoder@cloudera.com>
Tested-by: jenkins
An anti-join query in IMPALA-1175 revealed that the shell
mishandled the exception when the impalad would crash. This
fix prints the correct error message and sets the shell
state as disconnected.
Note there isn't a way to test this easily due to the difficulty
in coming up with a query that will crash the impalad.
Change-Id: I534674880db224a0d93dfd8bd2c081a12b65532b
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3998
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4080
This is a reorganization of the existing impala-shell.
The basic idea was to split up the shell into two components: one part
soley responsible for the CLI functionality, and another to represent
the impala client/connection that would interact with the Beeswax api and
execute queries, fetch results, etc.
One major change was to redo how the existing shell handled cancellation,
which was to create a thread for each rpc, so that Ctrl+C would not interrupt
the system calls and break the socket connection. In the new approach,
a new client instance is created to close the query and if the socket connection is
broken, the client reconnects. Cancellation currently works.
Change-Id: I0f371f68552c065b2317f967c6cf7483b44be3df
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3316
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4008
Commands with escaped single quotes would cause
the shell to enter an infinite loop while trying
to parse the command due to shlex not escaping single
quotes correctly. Once that change was implemented,
shlex would now ignore escaped single and double quotes
outside of closed quotes, so there needed to be a check for
that as well.
ALSO, implemented testing of commands in interactive mode.
Needed this to test these inputs, as command line input
cannot span multiple lines.
Change-Id: Id67368944eeb9a73061bc3e90bd6cda73c9d9f64
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3408
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3893
Before, when the show_profiles -p option was enabled, the runtime profile
would be printed before the query was closed, preventing the query summary
from being printed. Now the profile is printed after the query is closed.
Change-Id: Icf7b10f7612d8016736aac70aa7b77265d391a98
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3770
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3821
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Optionally loads options from a file in the user's home directory, called
'.impalarc', though the path to another file can be passed in as a command-line
option. The file must have a case-sensitive [impala] header. Specifying
the option in the command line overwrites the config file's value
for the option for that instance of the shell. If an option is not
specified in the config file, its default value is used.
Change-Id: I218da2c1e10308c5b8729883fa625f0c284397a7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2956
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3629
There was an issue with the previous fix to IMPALA-1059
if the user tried to reconnect within the shell after
having passed in a database via the -d option. The
passed database would be doubly backticked. This makes
the backticking of the argument idempotent.
Change-Id: I6eaed997c2be73d8659a2a12046ce393b97ec82c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3467
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3502
If not backticked, arguments such as parquet are interpreted as
keywords, when it is possible a database by that name exists.
This could have been avoided via single quotes around backticks: -d '`parquet`'
Otherwise, -d `parquet` throws a commandline error.
In interactive mode, backticks alone (ex. use `parquet`) will pass the
name as an identifier rather than a keyword.
Change-Id: I24b43eeeb6b4bfda5388165856788a20b64bc2ba
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3307
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3500
Users can now type 'summary' in the Impala shell after a query executes
to get a breakdown of the work done by each part of the query plan.
Change-Id: Ia6a43429ffc7778f3c2c8fcbf45d83828263c2ab
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2963
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
(cherry picked from commit 9b98d42acb14d43a64832767528ee572eac4979b)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2995