This change fixes a regression introduced by "IMPALA-2195 Improper
handling of comments in queries."
The Impala Shell parses input text into several strings using the
sqlparse library. One of the returned strings is the sql command, this
is used to determine the correct do_<command> method to call. Another of
the returned strings is the leading comment, which is a comment that
appears before legal sql text.
Python2 has strings with multiple encodings. The strings returned from
the sqlparse library have the Unicode encoding. Impala Shell converts
the sql command string to utf-8 encoding before using it.
If the Impala Shell needs to send the sql command to an Impala
Coordinator then it (re)constructs the query out of the strings
returned by the sqlparse library. This query is sent to the Coordinator
via Beeswax protocol. The query is converted to an ascii string before
being sent. The conversion can fail if the leading comment string
contains Unicode characters, which can't be directly converted to ascii.
So the trigger for the bug is that the leading comment contains Unicode.
The fix is that the leading comment string should be converted to utf-8
in the same way as the sql command.
TESTING:
Ran all end -to-end tests.
Added two test cases to tests/shell/test_shell_interactive.py
Change-Id: I8633935b6e0ca33594afd32ad242779555e09944
Reviewed-on: http://gerrit.cloudera.org:8080/12812
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The FE has several repeated blocks of code to set up the lexer and
parser, to parse, and to handle errors. This patch moves this code
into a static function that can be used in place of the copies.
At the same time, provide a specific ParseException to replace the
generic Exception thrown by the parser to allow easier error handling.
Some of the uses of the parser assume the return value is Object,
others that the value is ParseNode and still others that it is
StatementBase. Since the actual return is StatementBase, declares
that as the return value of the new static method to clearly state the
actual output.
Testing: This is just a refactoring. Reran all FE tests to ensure no
regressions.
Change-Id: I174c59d38542ff311c6c3dc10cf3ad4e40f8b30e
Reviewed-on: http://gerrit.cloudera.org:8080/12016
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
test_reconnect launches a shell that connects to one impalad in the
minicluster then reconnects to a different impalad while checking that
the impalad's open session metric changes accordingly.
To do this, the test gets the number of open sessions at the start of
the test and then expects that the number of sessions will have
increased by 1 on the impalad that the shell is currently connected
to.
This can be a problem if there is a session left over from another
test that is still active when test_reconnect starts but exits while
it's running.
test_reconnect is already marked to run serially, so there shouldn't
be any other sessions open while it runs anyways. The solution is to
wait at the start of the test until any sessions left over from other
tests have exited.
Testing:
- Ran the test in an environment where the timing was previously
causing it to fail almost deterministically and it now passes.
Change-Id: I3017ca3bf7b4e33440cffb80e9a48a63bec14434
Reviewed-on: http://gerrit.cloudera.org:8080/12045
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
test_ssl has a logic that waits for the number of in-flight queries to
be 1. However, the logic for wait_for_num_in_flight_queries(1) only
waits for the condition to be true for a period of time and does not
throw an exception when the time has elapsed and the condition is not
met. In other words, the logic in test_ssl that loops while the number
of in-flight queries is 1 never gets executed. I was able to simulate
this issue by making Impala shell start much longer.
Prior to this patch, in the event that Impala shell took much longer to
start, the test started sending the commands to Impala shell even when
Impala shell was not ready to receive commands. The patch fixes the
issue by waiting until Impala shell is connected. The patch also adds
assert in other places that calls wait_for_num_in_flight_queries and
updates the default behavior for Impala shell to wait until it is
connected.
Testing:
- Ran core and exhaustive tests several times on CentOS 6 without any
issue
Change-Id: I9805269d8b806aecf5d744c219967649a041d49f
Reviewed-on: http://gerrit.cloudera.org:8080/12047
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The logic that checks whether a CTE is DML or SELECT uses shlex that
splits the statement into tokens and check if any of the tokens matches
the DML regular expression. Before this patch, the shlex was set to
posix=True, which means the quotes are stripped from the token, e.g.
select a from foo where a = 'update' becomes
['select', 'a', 'from', 'foo', 'where', 'a', '=', 'update'].
As a result, any token that contains "insert", "delete", "upsert", and
"update" in it will be categorized as DML even though the token is part
of string literal value.
This patch fixes the issue by setting posix=False in shlex that
preserves the quotes. For example:
['select', 'a', 'from', 'foo', 'where', 'a', '=', '"update"']
Testing:
- Added a new shell test
- Ran all shell tests
Change-Id: I011b8e73a0477ac6b2357725452458f972785ae7
Reviewed-on: http://gerrit.cloudera.org:8080/12052
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fixes the issue with Ctrl+C handling for cancelling a
non-running query to behave similar to Linux shell.
Before (pressing Ctrl+C does not do anything):
[localhost:21000] default> select
After (pressing Ctrl+C cancels the query and starts a new prompt):
[localhost:21000] default> select^C
[localhost:21000] default>
Testing:
- Added a new cancellation test
- Ran all shell E2E tests
Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d
Reviewed-on: http://gerrit.cloudera.org:8080/11990
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This fixes a flakiness in test_multiline_queries_in_history wherein a
part of the shell prompt would be absorbed in a previous regex search
that would ultimately result in the failure of the subsequent regex
search that looks for the prompt.
Also fixed a few formatting issues flagged by flake8.
Change-Id: If7474f832a88bc29b321f21b050c9665294e63d5
Reviewed-on: http://gerrit.cloudera.org:8080/11175
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds an IMPALA_HISTFILE environment variable (and --history_file
argument) to the shell which overrides the default location of
~/.impalahistory for the shell history. The shell tests now override
this variable to /dev/null so they don't store history. The tests that
need history use a pytest fixture to use a temporary file for their
history. This allows so that they can run in parallel without stomping
on each other's history.
This also fixes a couple flaky test which were previously missing the
"execute_serially" annotation -- that annotation is no longer needed
after this fix.
A couple of the tests still need to be executed serially because they
look at metrics such as the number of executed or running queries, and
those metrics are unstable if other tests run in parallel.
I tested this by running:
./bin/impala-py.test tests/shell/test_shell_interactive.py \
-m 'not execute_serially' \
-n 80 \
--random
... several times in a row on an 88-core box. Prior to the change,
several would fail each time. Now they pass.
Change-Id: I1da5739276e63a50590dfcb2b050703f8e35fec7
Reviewed-on: http://gerrit.cloudera.org:8080/11045
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Todd Lipcon <todd@apache.org>
This change adds a new query option "timezone" which
defines the timezone used for utc<->local conversions.
The main goal is to simplify testing, but I think that
some users may also find it useful so it is added as a
"general" query option.
Examples:
set timezone=UTC;
set timezone="Europe/Budapest"
The timezones are validated, but as query options are not
sent to the coordinator immediately, the error checking
will only happen when running a query.
Leading/trailing " and 'characters are stripped because the
/ character cannot be entered unquoted in some contexts.
Currently the timezone has effect in the following cases:
-function now()
-conversions between unix time and timestamp if flag
use_local_tz_for_unix_timestamp_conversions is true
-reading parquet timestamps written by Hive if flag
convert_legacy_hive_parquet_utc_timestamps is true
In the near future Parquet timestamps's isAdjustedToUTC
property will be supported, which will decide whether
to do utc->local conversion on a per file+column basis.
This conversion will be also affected.
Testing:
- Extended test_local_tz_conversion.py to actually
test utc<->local conversion. Until now the effect
of flag use_local_tz_for_unix_timestamp_conversions
was practically untested.
- Added a shell test to check that the default of the
query option is the system's timezone.
- Added a shell test to check timezone validation.
Change-Id: I73de86eff096e1c581d3b56a0d9330d686f77272
Reviewed-on: http://gerrit.cloudera.org:8080/11064
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change provides a way to modify command-line options like -B,
--output_file and --delimiter inside impala-shell without quitting
the shell and then restarting again.
Also fixed IMPALA-7286: command "unset" does not work for shell options
Testing:
Added tests for all new options in test_shell_interactive.py
Tested on Python 2.6 and Python 2.7
Change-Id: Id8d4487c24f24806223bfd5c54336914e3afd763
Reviewed-on: http://gerrit.cloudera.org:8080/10900
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch removes write support for unsupported formats like Sequence,
Avro and compressed text. Also, the related query options
ALLOW_UNSUPPORTED_FORMATS and SEQ_COMPRESSION_MODE have been migrated
to the REMOVED query options type.
Testing:
Ran exhaustive build.
Change-Id: I821dc7495a901f1658daa500daf3791b386c7185
Reviewed-on: http://gerrit.cloudera.org:8080/10823
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fixes an issue where parseline is unable to deduce the
correct command when a statement has a leading comment.
Before:
> -- comment
> insert into table t values(100);
Fetched 1 row(s) in 0.01s
After:
> -- comment
> insert into table t values(100);
Modified 1 row(s) in 0.01s
Before (FE syntax error):
> /*comment*/ help;
After (show help correctly):
> /*comment*/ help;
Testing:
- Added shell tests
- Ran end-to-end shell tests on Python 2.6 and Python 2.7
Change-Id: I7ac7cb5a30e6dda73ebe761d9f0eb9ba038e14a7
Reviewed-on: http://gerrit.cloudera.org:8080/9933
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
This patch fixes the issue where non-matching quotes inside comments
will cause the shell to not terminate.
The fix is to strip any SQL comments before sending to shlex since shlex
does not understand SQL comments and will raise an exception when it
sees unmatched quotes regardless whether the quotes are in the comments or
not.
Testing:
- Added new shell tests
- Ran all end-to-end shell tests on Python 2.6 and Python 2.7
Change-Id: I2feae34026a7e63f3d31489f757f093a73ca5d2c
Reviewed-on: http://gerrit.cloudera.org:8080/10541
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fixes the issue where non-matching quotes inside comments
will cause the shell to not terminate.
The fix is to strip any SQL comments before sending to shlex since shlex
does not understand SQL comments and will raise an exception when it sees
unmatched quotes regardless whether the quotes are in the comments or
not.
Testing:
- Added new shell tests
- Ran all end-to-end shell tests
Change-Id: Ic899fdddc182947f73101ddbc2e3c8caf97d9085
Reviewed-on: http://gerrit.cloudera.org:8080/10474
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fixes a bug in sqlparse where sqlparse incorrectly splits a
statement that has a new line inside double quotes. The bug in sqlparse
causes Impala shell to go to infinite loop when a statement contains a
new line inside double quotes.
The patch in sqlparse is based on the upstream fix at
https://github.com/andialbrecht/sqlparse/pull/396
Testing:
- Added new end-to-end shell tests
- Ran end-to-end shell tests
Change-Id: I9142f21a888189d351f00ce09baeba123bc0959b
Reviewed-on: http://gerrit.cloudera.org:8080/9195
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We have seen this test fail because the fully-qualified domain name
differed between the python test process and the impala shell process
(see JIRA for details). The exact domain name is irrelevant to the test
- we only really care about whether the prompt appeared or not.
Change-Id: I24078ef97d56e5bb32fd866af861e3a1d19c8c44
Reviewed-on: http://gerrit.cloudera.org:8080/9831
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When passing comamnd line options to a new instance of the
ImpalaShell, we ususally transfer the options to member
variables of that new instance. We weren't doing that with
all of the LDAP-related options, even though we wanted to
access them later. In some environments and under certain
conditions, this could then lead to a NameError exception
being thrown.
This patch takes away any reliance on the original options
object returned by parse_args() beyond the __init__()
method of the ImpalaShell class, by tranferring all LDAP
options to member variables. Also, a test has been added to
exercise the code path where the exception had been occurring.
Change-Id: I810850f569ef3f4487f7eeba81ca520dc955ac2e
Reviewed-on: http://gerrit.cloudera.org:8080/9744
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
The semicolon was in the wrong place in one of the test queries and
the failure was swallowed silently.
This meant that one fewer prompt was displayed than expected. This
didn't cause a test failure because the prompt regex also matched the
"Connected to host:port" message printed in the shell preamble. I'm
unsure why this would cause the test failure but my best theory is that
in the failure case, the "Connected" and prompt messages are both
buffered when we evaluate the first prompt regex, and the regex swallows
up the whole input, rather than just the first instance.
Testing:
Tightened up the prompt regex and checked that the query actually
executed successfully. With these improvements, the broken query
text caused a test failure.
I looped the test for a while to make sure it was robust.
Added a couple of related test cases to make sure we aren't losing
coverage.
Change-Id: If917bbc8e87b83c188b6d5e1acad912892b8c6fe
Reviewed-on: http://gerrit.cloudera.org:8080/9441
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
The patch is to remove any comments in a statement when checking if a
statement ends with a semicolon delimiter.
For example:
Before (semicolon delimiter is needed at the end):
select 1 + 1; -- comment\n;
After (semicolon delimiter is no longer needed):
select 1 + 1; -- comment
Testing:
- Ran end-to-end tests in shell
Change-Id: I54f9a8f65214023520eaa010fc462a663d02d258
Reviewed-on: http://gerrit.cloudera.org:8080/9191
Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Impala Public Jenkins
Adds a concept of a "removed" query option that has no effect but does
not return an error when a user attempts to set it. These options are
not returned by "set" or "set all" commands that are executed in
impala-shell or server-side.
These query options have been deprecated for several releases:
DEFAULT_ORDER_BY_LIMIT, ABORT_ON_DEFAULT_LIMIT_EXCEEDED,
V_CPU_CORES, RESERVATION_REQUEST_TIMEOUT, RM_INITIAL_MEM,
SCAN_NODE_CODEGEN_THRESHOLD, MAX_IO_BUFFERS
RM_INITIAL_MEM did still have an effect, but it was undocumented and
MEM_LIMIT should be used in preference.
DISABLE_CACHED_READS also had an effect but it was documented as
deprecated.
Otherwise the options had no effect at all.
Testing:
Ran exhaustive build.
Updated query option tests to reflect the new behaviour.
Cherry-picks: not for 2.x.
Change-Id: I9e742e9b0eca0e5c81fd71db3122fef31522fcad
Reviewed-on: http://gerrit.cloudera.org:8080/9118
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Impala shell can accidentally convert certain
literal strings to lowercase. Impala shell splits
each command into tokens and then converts the
first token to lowercase to figure out how it
should execute the command. The splitting is done
by spaces only. Thus, if the user types a TAB
after the SELECT, the first token after the split
becomes the SELECT plus whatever comes after it.
Testing:
TestImpalaShellInteractive.test_case_sensitive_command
TestImpalaShellInteractive.test_unexpected_conversion_for_literal_string_to_lowercase
TestImpalaShell.test_var_substitution
Change-Id: Ifdce9781d1d97596c188691b62a141b9bd137610
Reviewed-on: http://gerrit.cloudera.org:8080/8762
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Four display levels are introduced for each query option: REGULAR, ADVANCED,
DEVELOPMENT and DEPRECATED. When the query options are displayed in Impala
shell using SET then only the REGULAR and ADVANCED options are shown. A new
command called SET ALL shows all the options grouped by their option levels.
When the query options are displayed through the SET SQL statement then the
result set would contain an extra column indicating the level of each option.
Similarly to Impala shell here the SET command only diplays the REGULAR and
ADVANCED options while SET ALL shows them all.
If the Impala shell connects to an Impala daemon that predates this change
then all the options would be displayed in the REGULAR group.
Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1
Reviewed-on: http://gerrit.cloudera.org:8080/8447
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
The ImpalaShell didn't issue the 'USE <current-db>' command after
reconnecting to the Impala daemon. Therefore the client session
used the default DB after reconnection, not the previously selected DB.
Setting the current DB is done by the _validate_database method.
Before this commit it appended the "use <db>" command to the
command queue of the Cmd class. But, at this point we might already
have commands in the command queue that will run before the
"use <db>" command. In case of reconnection, we want to invoke
the USE command right away.
Also, the command processed by the precmd() method can entirely skip
the command queue, therefore it is not enough to insert the USE
command to the front of the command queue. We need to issue the
USE command with the onecmd() method to execute it immediately.
I extended the _validate_database method with an "immediately" flag.
If this flag is true, _validate_database will use the onecmd() method.
Otherwise, it will append the USE command to the command queue to
maintain the previous behaviour.
I added a new automated test suite named test_shell_interactive_reconnect.py
to the "custom cluster" tests. It sets the default database, and after
reconnection it checks if the shell set it again automatically.
One test case checks if the shell set the DB after manually reconnecting
to the impala daemon by issuing the CONNECT command.
The other test case checks if the shell set the DB after automatic
reconnection due to cluster restart.
I needed to backup the impala shell history file because I didn't
want to pollute it by the test cases (just like the way it is done in
tests/shell/test_shell_interactive.py). I created utility functions for
this in tests/shell/util.py and now test_shell_interactive.py and
the newly created test suite are using these utility functions.
Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e
Reviewed-on: http://gerrit.cloudera.org:8080/8368
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Query options can be set from command line and impala rc as
key=value pairs, where key is case insensitive.
Examples:
command line:
impala-shell.sh -Q MT_DOP=1 --query_option=MAX_ERRORS=200
.impalarc:
[impala.query_options]
EXPLAIN_LEVEL=2
MT_DOP=2
The options set in command line will update the ones
in impalarc one by one, so the result of the example
above will be:
EXPLAIN_LEVEL=2
MT_DOP=1
MAX_ERRORS=200
Additional changes:
- 0 and 1 are accepted as bools in section [impala] to
make it more consistent with [impala.query_options]
- options that are expected to be bool but are not
0/1/true/false lead to error instead of warning
Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986
Reviewed-on: http://gerrit.cloudera.org:8080/8038
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
Impala-shell crashes with 2 source commands on the same line and runs
a command multiple times if it shares the same line with a source
command.
The bug is caused by a misuse of cmdqueue. The cmdqueue member of
cmd.Cmd is used to execute commands not directly from user input in an
event loop. When a 'source' is run, execute_query_list() is called which
also executes the commands in cmdqueue, causing them to be executed
twice.
The fix is for execute_query_list() to not run the commands in cmdqueue.
For the non-interactive case, where the event loop won't be run, we call
execute_query_list() with cmdqueue so that the commands get run.
A test case is added to test_shell_interactive.py.
Change-Id: I453af2d4694d47e184031cb07ecd2af259ba20f3
Reviewed-on: http://gerrit.cloudera.org:8080/8063
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
This patch adds a new command "rerun" and a shortcut "@" to impala-shell
. Users can rerun a certain query by its index given by history command.
A valid index is an integer in [1, history_length] or
[-history_length, -1]. Negative values index history in reverse order.
For example, "@1;" or "rerun 1;" reruns the first query shown in history
and "@-1;" reruns the last query. The rerun command itself won't appear
in history. The history index is 1-based and increasing. Old entries
might be truncated when impala-shell starts, and the indexes will be
realigned to 1, so the same index may refer to different commands among
multiple impala-shell instances.
Testing: A test case test_rerun is added to
shell/test_shell_interactive.py
Change-Id: Ifc28e8ce07845343267224c3b9ccb71b29a524d2
Reviewed-on: http://gerrit.cloudera.org:8080/7674
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Impala Public Jenkins
This change avoids printing blank lines when the Impala
shell fetches 0 rows from a statement.
Change-Id: I6e18ce36be07ee90a16b007b1e30d5255ef8a839
Reviewed-on: http://gerrit.cloudera.org:8080/7055
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Many of our test scripts have import statements that look like
"from xxx import *". It is a good practice to explicitly name what
needs to be imported. This commit implements this practice. Also,
unused import statements are removed.
Change-Id: I6a33bb66552ae657d1725f765842f648faeb26a8
Reviewed-on: http://gerrit.cloudera.org:8080/3444
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
To cancel a query, the shell will create a separate connection inside
it's SIGINT handler, and send the cancellation RPC. However this
connection did not start a secure connection if it needed to, meaning
that the cancellation attempt would just hang.
A workaround is to kill the shell process, which I expect is what users
have been doing with this bug which has been around since 2014.
Testing:
I added a custom cluster test that starts Impala with SSL
enabled, and wrote two tests - one just to check SSL connectivity, and
the other to mimic the existing test_cancellation which sends SIGINT to
the shell process. In doing so I refactored the shell testing code a bit
so that all tests use a single ImpalaShell object, rather than rolling
their own Popen() based approaches when they needed to do something
unusual, like cancel a query.
In the cancellation test on my machine, SIGINT can take a few tries to
be effective. I'm not sure if this is a timing thing - perhaps the
Python interpreter doesn't correctly pass signals through to a handler
if it's in a blocking call, for example. The test reliably passes within
~5 tries on my machine, so the test tries 30 times, once per second.
Change-Id: If99085e75708d92a08dbecf0131a2234fedad33a
Reviewed-on: http://gerrit.cloudera.org:8080/3302
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
This patch allows you to write SOURCE <file> or SRC <file>, and have the
shell read the file and execute all the queries in it.
Change-Id: Ib05df3e755cd12e9e9562de6b353857940eace03
Reviewed-on: http://gerrit.cloudera.org:8080/2663
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
The SET command has been extended with the following syntax, to allow
setting of variables in the Impala Shell:
SET VAR:<variable_name>=<value>
The UNSET command has also been modified to allow:
UNSET VAR:<variable_name>
This patch builds on the changes in IMPALA-2179. The main change for
this patch was to ensure that all SET commands are processed by the
shell, rather than being send to the front end as a query. For this
I had to modify the command sanitization function to remove comments
that happen in front of a SET command.
Comments can be a can of worms to parse, so I tried to be as strict
as possible to avoid collateral effects. Comments are only removed
if they happen right at the beginning of the line AND before a SET
command. NO other comments are touched, including comments before,
after or within queries.
Change-Id: I87e07385122187ab8d324346499896a3dfbbafe6
Reviewed-on: http://gerrit.cloudera.org:8080/679
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This patch adds the command line option `--var` to allow the user to set
variable to be used in commands within the shell. It does *not* implement the
setting of variables through the SET command, as Hive does. This extension will
be implemented separately on IMPALA-2180.
The syntax for specifying a parameter in the command line is --var=KEY=VAL, as
for example: --var=start_date=20150101
Variables are textually replaced by their value in the Impala shell commands.
The substitution work similarly for interactive sessions as well as for command
line queries and/or scripts (-q and -f options, respectively).
Variables can be referenced as ${VAR:VAR_NAME} (case-insensitive). The form
${HIVEVAR:VAR_NAME} can also be used for compatibility with Hive scripts.
To prevent any of the reference expressions above from being replaced you can
escape them with a backslash (e.g. \${VAR:VAR_NAME} and \${HIVEVAR:VAR_NAME}).
The Impala shell's SET command now also reports the set variables and their
values.
Change-Id: Ia491fae91256334bb60c9066d119fe9a1e9779dd
Reviewed-on: http://gerrit.cloudera.org:8080/611
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Remove a hack in impala-shell that suppressed any error messages that
included the string "Cancelled". This originally was necessary since
user-initiated cancellation messages were propagated back to the client,
but is no longer necessary. Certain errors that occur asynchronously
were suppressed, because the error is propagated by cancelling the
query.
Confirmed that no error messages are printed for manually cancelled
queries, and that error messages that were previously suppressed (from
my IMPALA-2298 patch) now show in the shell.
Change-Id: Iac53b1307768cbb07640ddc88b152ae71c71beab
Reviewed-on: http://gerrit.cloudera.org:8080/1529
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Allow Impala to start only with a running HMS (and no additional services like HDFS,
HBase, Hive, YARN) and use the local file system.
Skip all tests that need these services, use HDFS caching or assume that multiple impalads
are running.
To run Impala with the local filesystem, set TARGET_FILESYSTEM to 'local' and
WAREHOUSE_LOCATION_PREFIX to a location on the local filesystem where the current user has
permissions since this is the location where the test data will be extracted.
Test coverage (with core strategy) in comparison with HDFS and S3:
HDFS 1348 tests passed
S3 1157 tests passed
Local Filesystem 1161 tests passed
Change-Id: Ic9718c7e0307273382b1cc6baf203ff2fb2acd03
Reviewed-on: http://gerrit.cloudera.org:8080/1352
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Readability: Alex Behm <alex.behm@cloudera.com>
Impala shell cannot get child query handle so it cannot
query live progress for COMPUTE STATS query. Disable live
progress callback for compute stats query.
Change-Id: I2d2f342a805905a4fa868686e7c9e9362c2c2223
Reviewed-on: http://gerrit.cloudera.org:8080/1109
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
Python tests and infra scripts will now use "python" from the virtualenv
via $IMPALA_HOME/bin/impala-python. Some scripts could be simplified now
that python 2.6 and a dependable set of third-party libraries are
available but that is not done as part of this commit.
Change-Id: If1cf96898d6350e78ea107b9026b12ba63a4162f
Reviewed-on: http://gerrit.cloudera.org:8080/603
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
This patch adds a 'tip-of-the-day' message to the shell's intro header,
and a TIP command that prints out a random tip. This might be a good way
to make advanced or little-known functionality of both Impala and the
shell better known to users.
Change-Id: I987b386f8c96f8a75ccd5cd9197a8f2981c8bf43
Reviewed-on: http://gerrit.cloudera.org:8080/586
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
The tests were doing unnecessary things. One such thing that stopped
working with the virtualenv patch was searching for the shell process to
get the pid. The search was never needed since the process was spawned
with Popen which provides the pid directly.
Change-Id: I2455e58de4fdba8fd2770f0489fac8cddf6b90a0
Reviewed-on: http://gerrit.cloudera.org:8080/555
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This patch adds a way to allow for dynamic progress reporting in the
shell. There are two new command line flags for the shell
--live_progress - will print the completed vs total # of scan ranges
--live_summary - prints an updated exec summary
In addition to the command line flags, these options can be set from
within the shell using:
set LIVE_SUMMARY=True
set LIVE_PROGRESS=True
The new options will be listed under shell options. Both reports will be
updated at most every second, for longer running queries it will be
adjusted to the time between two RPC calls to get the query status. To
provide this information in the ExecSummary, the Thrift structure for
the ExecSummary was extended to contain a progress indicator. The output
is printed to stderr and only available in interactive mode.
An example video is available here:
https://asciinema.org/a/5wi7ypckx4ol4ha1hlg3e3q1k
Change-Id: I70b2ab5fa74dc2ba5bc3b338ef13ddc6ccf367d2
Reviewed-on: http://gerrit.cloudera.org:8080/508
Tested-by: Internal Jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
The existing error message ("Error: could not match input") is not
helpful and too confusing. This accepts a trailing semicolon and
provides a better error message for other unacceptable chars.
Change-Id: I48ee90d109ce27c8a34e68f79d2e57e8426f1074
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5617
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: jenkins
This patch adds a line to the signal handler that closes
queries that have been cancelled. This patch closes the
cancelled query in the signal handler if it is not already
closed. This patch also improves the cancellation test so
it catches this problem in the future.
Change-Id: I1bb2a4a8fc3c3d40b8e4ba41f4b2bcf6d32bc297
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5303
Reviewed-by: Alex Leblang <alex.leblang@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5320
This patch enables the shell to pass query strings as-is to Impala's parser. In order to
preserve previous behaviour, we transform multi-line queries before writing them to
history. We replace EOL with an obscure ascii character (DLE), and re-apply the
transformation when reading it back from history.
Change-Id: I021b9c3d50b03df73bea1afd6ce3ec6b413484e0
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4664
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Conflicts:
shell/impala_shell.py
Earlier, if the user connected via the command line, the shell would only attempt to
create a new Impala Client instance if a connection did not exist. This resulted in the
shell not connecting to the user specified Impalad.
Change-Id: I74c291256d0c063f6324b01aa7336282e6969a4e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4392
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins