After additional testing around IMPALA-2782, it was discovered
that impala-shell starts the session displaying the expected
hostname (as passed -i flag) on the prompt. This gives the
impression that the load balancer was bypassed, however the
actual TSSLSocket is still created with the hostname passed
in via the -b or --kerberos_host_fqdn flag.
This change ensures that the hostname used to create the
TSSLSocket will always be the one passed in via the -i flag
on impala-shell. This change is required by IMPALA-2782.
Testing:
Using netcat, we verified that the impala daemon host[:port]
value passed into the -i/--impalad option is indeed the one
impala-shell tries to connect to in both cases (with and
without -b)
Change-Id: Ibee05bd0dbe8c6ae108b890f0ae0f6900149773a
Reviewed-on: http://gerrit.cloudera.org:8080/10580
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fixes an issue where parseline is unable to deduce the
correct command when a statement has a leading comment.
Before:
> -- comment
> insert into table t values(100);
Fetched 1 row(s) in 0.01s
After:
> -- comment
> insert into table t values(100);
Modified 1 row(s) in 0.01s
Before (FE syntax error):
> /*comment*/ help;
After (show help correctly):
> /*comment*/ help;
Testing:
- Added shell tests
- Ran end-to-end shell tests on Python 2.6 and Python 2.7
Change-Id: I7ac7cb5a30e6dda73ebe761d9f0eb9ba038e14a7
Reviewed-on: http://gerrit.cloudera.org:8080/9933
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
This patch fixes the issue where non-matching quotes inside comments
will cause the shell to not terminate.
The fix is to strip any SQL comments before sending to shlex since shlex
does not understand SQL comments and will raise an exception when it
sees unmatched quotes regardless whether the quotes are in the comments or
not.
Testing:
- Added new shell tests
- Ran all end-to-end shell tests on Python 2.6 and Python 2.7
Change-Id: I2feae34026a7e63f3d31489f757f093a73ca5d2c
Reviewed-on: http://gerrit.cloudera.org:8080/10541
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fixes the issue where non-matching quotes inside comments
will cause the shell to not terminate.
The fix is to strip any SQL comments before sending to shlex since shlex
does not understand SQL comments and will raise an exception when it sees
unmatched quotes regardless whether the quotes are in the comments or
not.
Testing:
- Added new shell tests
- Ran all end-to-end shell tests
Change-Id: Ic899fdddc182947f73101ddbc2e3c8caf97d9085
Reviewed-on: http://gerrit.cloudera.org:8080/10474
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fixes a bug in sqlparse where sqlparse incorrectly splits a
statement that has a new line inside double quotes. The bug in sqlparse
causes Impala shell to go to infinite loop when a statement contains a
new line inside double quotes.
The patch in sqlparse is based on the upstream fix at
https://github.com/andialbrecht/sqlparse/pull/396
Testing:
- Added new end-to-end shell tests
- Ran end-to-end shell tests
Change-Id: I9142f21a888189d351f00ce09baeba123bc0959b
Reviewed-on: http://gerrit.cloudera.org:8080/9195
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The bug is that PrettyOutputFormatter.format() returned a unicode
object, and Python cannot automatically write unicode objects to
output streams where there is no default encoding.
The fix is to convert to UTF-8 encoded in a regular string, which
can be output to any output device. This makes the output type
consistent with DelimitedOutputFormatter.format().
Based on code by Marcell Szabo.
Testing:
Added a basic test.
Played around in an interactive shell to make sure that unicode
characters still work in interactive mode.
Change-Id: I9de641ecf767a2feef3b9f48b344ef2d55e17a7f
Reviewed-on: http://gerrit.cloudera.org:8080/9928
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this commit it was quite random which DDL oprations
returned a result set and which didn't.
With this commit, every DDL operations return a summary of
its execution. They declare their result set schema in
Frontend.java, and provide the summary in CalatogOpExecutor.java.
Updated the tests according to the new behavior.
Change-Id: Ic542fb8e49e850052416ac663ee329ee3974e3b9
Reviewed-on: http://gerrit.cloudera.org:8080/9090
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When passing comamnd line options to a new instance of the
ImpalaShell, we ususally transfer the options to member
variables of that new instance. We weren't doing that with
all of the LDAP-related options, even though we wanted to
access them later. In some environments and under certain
conditions, this could then lead to a NameError exception
being thrown.
This patch takes away any reliance on the original options
object returned by parse_args() beyond the __init__()
method of the ImpalaShell class, by tranferring all LDAP
options to member variables. Also, a test has been added to
exercise the code path where the exception had been occurring.
Change-Id: I810850f569ef3f4487f7eeba81ca520dc955ac2e
Reviewed-on: http://gerrit.cloudera.org:8080/9744
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
configured with load balancer and kerberos.
This change adds an impala-shell option -b / --kerberos_host_fqdn.
This allows user to optionally specify the load-balancer's host so
that impala-shell will accept a direct connection to impala daemons
in a kerberized cluster.
Change-Id: I4726226a7a3817421b133f74dd4f4cf8c52135f9
Reviewed-on: http://gerrit.cloudera.org:8080/7241
Reviewed-by: <andy@phdata.io>
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins
The value of LDAP password in Impala shell contains extra line break
causes authentication failure, but the user can't detect the cause of
the failure.
I fixed the issue by adding inspection to the password for common
pitfalls and issuing a warning in the shell when authentication fails.
Change-Id: Ie570166aea62af223905b7f0124e9efb15a88ac7
Reviewed-on: http://gerrit.cloudera.org:8080/9506
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Impala Public Jenkins
This patch is to show query submitted and query progress web links in
Impala shell for CTE queries.
Testing:
- Ran end-to-end shell tests
Change-Id: Ie3352406e3b048be395a20405c8e6b911e663164
Reviewed-on: http://gerrit.cloudera.org:8080/9537
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
The patch is to remove any comments in a statement when checking if a
statement ends with a semicolon delimiter.
For example:
Before (semicolon delimiter is needed at the end):
select 1 + 1; -- comment\n;
After (semicolon delimiter is no longer needed):
select 1 + 1; -- comment
Testing:
- Ran end-to-end tests in shell
Change-Id: I54f9a8f65214023520eaa010fc462a663d02d258
Reviewed-on: http://gerrit.cloudera.org:8080/9191
Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Impala Public Jenkins
Adds a concept of a "removed" query option that has no effect but does
not return an error when a user attempts to set it. These options are
not returned by "set" or "set all" commands that are executed in
impala-shell or server-side.
These query options have been deprecated for several releases:
DEFAULT_ORDER_BY_LIMIT, ABORT_ON_DEFAULT_LIMIT_EXCEEDED,
V_CPU_CORES, RESERVATION_REQUEST_TIMEOUT, RM_INITIAL_MEM,
SCAN_NODE_CODEGEN_THRESHOLD, MAX_IO_BUFFERS
RM_INITIAL_MEM did still have an effect, but it was undocumented and
MEM_LIMIT should be used in preference.
DISABLE_CACHED_READS also had an effect but it was documented as
deprecated.
Otherwise the options had no effect at all.
Testing:
Ran exhaustive build.
Updated query option tests to reflect the new behaviour.
Cherry-picks: not for 2.x.
Change-Id: I9e742e9b0eca0e5c81fd71db3122fef31522fcad
Reviewed-on: http://gerrit.cloudera.org:8080/9118
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
This removes the deprecated option in time for 3.0.
Testing:
Ran core tests. Manually ran the shell with the argument to confirm that
it reported "no such option".
Cherry-picks: not for 2.x.
Change-Id: I8f430bad0578e150d5e80066b9e7572041af4a15
Reviewed-on: http://gerrit.cloudera.org:8080/9072
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Impala shell can accidentally convert certain
literal strings to lowercase. Impala shell splits
each command into tokens and then converts the
first token to lowercase to figure out how it
should execute the command. The splitting is done
by spaces only. Thus, if the user types a TAB
after the SELECT, the first token after the split
becomes the SELECT plus whatever comes after it.
Testing:
TestImpalaShellInteractive.test_case_sensitive_command
TestImpalaShellInteractive.test_unexpected_conversion_for_literal_string_to_lowercase
TestImpalaShell.test_var_substitution
Change-Id: Ifdce9781d1d97596c188691b62a141b9bd137610
Reviewed-on: http://gerrit.cloudera.org:8080/8762
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Issue 1: When query is cancelled via CTRL-C while being executed in Impala-shell
then an exception is thrown from Impala backend saying 'Invalid query handle'.
This is because one ImpalaClient was making RPC's while another ImpalaClient
cancelled the query on the backend. As a result RPC handlers in ImpalaServer
try to access a ClientRequestState that had been cleared from the backend. The
issue is confidently reproducable both in wait_to_finish and in fetch states of
the query.
As a solution the query cancellation is indicated to ImpalaClient via a bool
flag. Once a cancellation originated exception reaches Impala shell this flag
is checked to decide whether to suppress the error or not.
Issue 2: Every time a query was cancelled a 'use db' command was issued
automatically. This happened to historical reasons but is not needed anymore
(see Jira for more details).
Change-Id: I6cefaf1dae78baae238289816a7cb9d210fb38e2
Reviewed-on: http://gerrit.cloudera.org:8080/8549
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Four display levels are introduced for each query option: REGULAR, ADVANCED,
DEVELOPMENT and DEPRECATED. When the query options are displayed in Impala
shell using SET then only the REGULAR and ADVANCED options are shown. A new
command called SET ALL shows all the options grouped by their option levels.
When the query options are displayed through the SET SQL statement then the
result set would contain an extra column indicating the level of each option.
Similarly to Impala shell here the SET command only diplays the REGULAR and
ADVANCED options while SET ALL shows them all.
If the Impala shell connects to an Impala daemon that predates this change
then all the options would be displayed in the REGULAR group.
Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1
Reviewed-on: http://gerrit.cloudera.org:8080/8447
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
The ImpalaShell didn't issue the 'USE <current-db>' command after
reconnecting to the Impala daemon. Therefore the client session
used the default DB after reconnection, not the previously selected DB.
Setting the current DB is done by the _validate_database method.
Before this commit it appended the "use <db>" command to the
command queue of the Cmd class. But, at this point we might already
have commands in the command queue that will run before the
"use <db>" command. In case of reconnection, we want to invoke
the USE command right away.
Also, the command processed by the precmd() method can entirely skip
the command queue, therefore it is not enough to insert the USE
command to the front of the command queue. We need to issue the
USE command with the onecmd() method to execute it immediately.
I extended the _validate_database method with an "immediately" flag.
If this flag is true, _validate_database will use the onecmd() method.
Otherwise, it will append the USE command to the command queue to
maintain the previous behaviour.
I added a new automated test suite named test_shell_interactive_reconnect.py
to the "custom cluster" tests. It sets the default database, and after
reconnection it checks if the shell set it again automatically.
One test case checks if the shell set the DB after manually reconnecting
to the impala daemon by issuing the CONNECT command.
The other test case checks if the shell set the DB after automatic
reconnection due to cluster restart.
I needed to backup the impala shell history file because I didn't
want to pollute it by the test cases (just like the way it is done in
tests/shell/test_shell_interactive.py). I created utility functions for
this in tests/shell/util.py and now test_shell_interactive.py and
the newly created test suite are using these utility functions.
Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e
Reviewed-on: http://gerrit.cloudera.org:8080/8368
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Query options can be set from command line and impala rc as
key=value pairs, where key is case insensitive.
Examples:
command line:
impala-shell.sh -Q MT_DOP=1 --query_option=MAX_ERRORS=200
.impalarc:
[impala.query_options]
EXPLAIN_LEVEL=2
MT_DOP=2
The options set in command line will update the ones
in impalarc one by one, so the result of the example
above will be:
EXPLAIN_LEVEL=2
MT_DOP=1
MAX_ERRORS=200
Additional changes:
- 0 and 1 are accepted as bools in section [impala] to
make it more consistent with [impala.query_options]
- options that are expected to be bool but are not
0/1/true/false lead to error instead of warning
Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986
Reviewed-on: http://gerrit.cloudera.org:8080/8038
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
This adds a warning to impala-shell if -r/--refresh_after_connect is
used, in anticipation of us removing the feature in a future version.
Change-Id: Id297f80c0f596a69ef8ecde948812b82d2a5c0fa
Reviewed-on: http://gerrit.cloudera.org:8080/8381
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Impala Public Jenkins
Impala-shell crashes with 2 source commands on the same line and runs
a command multiple times if it shares the same line with a source
command.
The bug is caused by a misuse of cmdqueue. The cmdqueue member of
cmd.Cmd is used to execute commands not directly from user input in an
event loop. When a 'source' is run, execute_query_list() is called which
also executes the commands in cmdqueue, causing them to be executed
twice.
The fix is for execute_query_list() to not run the commands in cmdqueue.
For the non-interactive case, where the event loop won't be run, we call
execute_query_list() with cmdqueue so that the commands get run.
A test case is added to test_shell_interactive.py.
Change-Id: I453af2d4694d47e184031cb07ecd2af259ba20f3
Reviewed-on: http://gerrit.cloudera.org:8080/8063
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
This patch adds a new command "rerun" and a shortcut "@" to impala-shell
. Users can rerun a certain query by its index given by history command.
A valid index is an integer in [1, history_length] or
[-history_length, -1]. Negative values index history in reverse order.
For example, "@1;" or "rerun 1;" reruns the first query shown in history
and "@-1;" reruns the last query. The rerun command itself won't appear
in history. The history index is 1-based and increasing. Old entries
might be truncated when impala-shell starts, and the indexes will be
realigned to 1, so the same index may refer to different commands among
multiple impala-shell instances.
Testing: A test case test_rerun is added to
shell/test_shell_interactive.py
Change-Id: Ifc28e8ce07845343267224c3b9ccb71b29a524d2
Reviewed-on: http://gerrit.cloudera.org:8080/7674
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Impala Public Jenkins
The shell uses Thrift's TSSLSocket to negotiate secure connections to
Impala. This socket uses a variable SSL_VERSION to determine which SSL
and TLS protocol versions it will connect to.
SSL_VERSION was hardcoded to be PROTOCOL_TLSv1, which only supports
TLSv1 servers and no other protocol version. Change the allowed version
to be PROTOCOL_SSLv23, which supports any TLS or SSL protocol. We rely
on the server not to allow SSLv2 or v3 connections.
Testing: Added a new custom cluster test to confirm that the shell can
connect to a TLSv1.2 cluster. Confirmed that the test is correctly
skipped on machines with an old version of OpenSSL that does not support
TLSv1.2.
Change-Id: I5487f82d110676b9c3c7a5305931da00c7f68ca0
Reviewed-on: http://gerrit.cloudera.org:8080/7675
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
For some queries, the exec summary will not be completely filled
in even if the query is FINISHED. In particular, the exec_stats field
may not be set. This was causing an error in our test code that
converts the exec summary to a more usable format.
The situation is essentially deterministic for some queries, but
it was being hidden by testing code that caught the error and
discarded it in most situations, leading to flaky tests.
This patch removes the 'try' that was hiding the error and makes
the code check for the presence of exec_stats and handle it rather
than generating an error.
I filed IMPALA-5783 for followup work to be more rigorous about
when the exec summary should and shouldn't be fully present.
Testing:
- Ran the affected tests in a loop and they are no longer flaky.
Change-Id: Id52ac62da2b01f9e163e97cbe4590f8db6b663d2
Reviewed-on: http://gerrit.cloudera.org:8080/7627
Tested-by: Impala Public Jenkins
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Function print_to_stderr() has a syntax error when an error message is
displayed.
I solved this problem by exchanging the position of variable and the
subsequent strings in function print_to_stderr().
Change-Id: Ib883499a88f39d91b69bea4291f1ce5dd264ccf6
Reviewed-on: http://gerrit.cloudera.org:8080/7187
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Impala Public Jenkins
Help information of KEYVAL option in impala-shell is not clear enough.
I fix this issue by adding clear description to help information of
KEYVAL option.
Change-Id: I68cfc16838c6c0e7813f03dd4296f9eb54ec4c63
Reviewed-on: http://gerrit.cloudera.org:8080/7179
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
When only with ldap_password_cmd option, impala-shell runs successfully.
I solved this problem by throwing an error when --ldap_password_cmd is
used without LDAP auth, that is, ldap_password_cmd option will only
take effect if ldap option presents.
Change-Id: I3711d8a0eca2fa8612e2943fa9121945db6b012e
Reviewed-on: http://gerrit.cloudera.org:8080/7188
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Impala Public Jenkins
This change avoids printing blank lines when the Impala
shell fetches 0 rows from a statement.
Change-Id: I6e18ce36be07ee90a16b007b1e30d5255ef8a839
Reviewed-on: http://gerrit.cloudera.org:8080/7055
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
Help information of query_file option in impala-shell misses stdin
description.
I fix this issue by adding stdin description to help information of
query_file option.
Change-Id: I0264174fd062497c72891b31cf9ac1ba6c00f716
Reviewed-on: http://gerrit.cloudera.org:8080/7178
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Impala Public Jenkins
The introduction has redundant '\' in impala-shell when using LDAP.
I fix this issue by removing extraneous '\' in introduction when
impala-shell using LDAP.
Change-Id: I30c601ab255a4882260f7be23b5763ef8ec76d28
Reviewed-on: http://gerrit.cloudera.org:8080/7166
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Impala Public Jenkins
Bug:
When Sentry-based authorization is enabled, a user that isn't authorized
to EXPLAIN a statement that uses a view can still access unauthorized
information, such as view's definition, by running the statement and
asking for the query profile or the execution summary.
Fix:
During query compilation, determine if the user can access the the runtime
profile or the execution summary. Upon request for a runtime profile or
execution summary from a user, determine based on that information and
the user that is asking for the profile if the runtime profile
(or execution summary) will be returned or an authorization error.
The authorization rule enforced is the following:
- User A runs statement S, A asks for profile, A has profile access:
Runtime profile is returned
- User A runs statement S, A asks for profile, A doesn't have profile access:
Authorization error
- User A runs statement S, user B asks for profile:
Authorization error.
This patch doesn't enforce access to the runtime profile or execution summary
through the Web UI.
Change-Id: I2255d587367c2d328590ae8534a5406c4b0c9b15
Reviewed-on: http://gerrit.cloudera.org:8080/7064
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Impala Public Jenkins
Allow users to keep a longer history of queries if desired. I
personally find it useful to keep a long history of queries to
reference and want to bump this up to a very large value, but
keep the default reasonable. Also change the config loader
to not freak out over unknown parameters so as not to break
for users that end up with new options set running on older
shells.
Testing: Created .impalarc as follows, now getting more history saved.
Put broken things in .impalarc and make sure they are logged as
warnings.
[impala]
history_max=1000
Change-Id: Iaf65bbecb8fd7f1105aac62b6745d6125a603d7f
Reviewed-on: http://gerrit.cloudera.org:8080/6335
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
When using connections secured with SSL, a connection close comprises
of a bi-directional SSL_shutdown(). The second part of the
bi-directional shutdown requires that the client also close the socket
explicitly, and the server blocks till it gets the close
notification from the client.
This patch ensures that the above happens. Without this fix, the
impala-shell was found to hang over connections secured with SSL
when an error was encountered.
Change-Id: I814df93bbcd457ad3f96b4c1ef5d8b0ddd6d141f
Reviewed-on: http://gerrit.cloudera.org:8080/6587
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Impala Public Jenkins
The bug was in the third-party pkg_resources.py script. The version
check was broken because it matches any version with a "0.7" substring
instead of just versions starting with 0.7.
This is a known bug. setuptools even re-released 20.7.0 as version
20.8.0 to avoid it:
e5822f0d5b
Testing:
I was unable to reproduce this locally, but I think the fix is clear-cut
enough that this is ok.
Change-Id: I0565c0e6c1be7d82c3f35d2545ba044a684bb075
Reviewed-on: http://gerrit.cloudera.org:8080/5314
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Adds support in the shell to report the number of modified
rows for all DML operations, as well as the number of rows
with errors.
Testing: Added shell tests.
Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1
Reviewed-on: http://gerrit.cloudera.org:8080/5103
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This commit handles partition related DDL in a more general way. We can
now use compound predicates to specify a list of partitions in
statements like ALTER TABLE DROP PARTITION and COMPUTE INCREMENTAL
STATS, etc. It will also make sure some statements only accept one
partition at a time, such as PARTITION SET LOCATION and LOAD DATA. ALTER
TABLE ADD PARTITION remains using the old PartitionKeyValue's logic.
The changed partition related DDLs are as follows,
Table: p (i int) partitioned by (j int, k string)
Partitions:
+-------+---+-------+--------+------+--------------+-------------------+
| j | k | #Rows | #Files | Size | Bytes Cached | Cache Replication |
+-------+---+-------+--------+------+--------------+-------------------+
| 1 | a | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| 1 | b | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| 1 | c | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| 2 | d | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| 2 | e | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| 2 | f | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| Total | | -1 | 0 | 0B | 0B | |
+-------+---+-------+--------+------+--------------+-------------------+
1. show files in p partition (j<2, k='a');
2. alter table p partition (j<2, k in ("b","c") set cached in 'testPool';
// j can appear more than once,
3.1. alter table p partition (j<2, j>0, k<>"d") set uncached;
// it is the same as
3.2. alter table p partition (j<2 and j>0, not k="e") set uncached;
// we can also do 'or'
3.3. alter table p partition (j<2 or j>0, k like "%") set uncached;
// missing 'k' matches all values of k
4. alter table p partition (j<2) set fileformat textfile;
5. alter table p partition (k rlike ".*") set serdeproperties ("k"="v");
6. alter table p partition (j is not null) set tblproperties ("k"="v");
7. alter table p drop partition (j<2);
8. compute incremental stats p partition(j<2);
The remaining old partition related DDLs are as follows,
1. load data inpath '/path/from' into table p partition (j=2, k="d");
2. alter table p add partition (j=2, k="g");
3. alter table p partition (j=2, k="g") set location '/path/to';
4. insert into p partition (j=2, k="g") values (1), (2), (3);
General partition expressions or partially specified partition specs
allows partition predicates to return empty partition set no matter
'IF EXISTS' is specified.
Examples:
[localhost.localdomain:21000] >
alter table p drop partition (j=2, k="f");
Query: alter table p drop partition (j=2, k="f")
+-------------------------+
| summary |
+-------------------------+
| Dropped 1 partition(s). |
+-------------------------+
Fetched 1 row(s) in 0.78s
[localhost.localdomain:21000] >
alter table p drop partition (j=2, k<"f");
Query: alter table p drop partition (j=2, k<"f")
+-------------------------+
| summary |
+-------------------------+
| Dropped 2 partition(s). |
+-------------------------+
Fetched 1 row(s) in 0.41s
[localhost.localdomain:21000] >
alter table p drop partition (k="a");
Query: alter table p drop partition (k="a")
+-------------------------+
| summary |
+-------------------------+
| Dropped 1 partition(s). |
+-------------------------+
Fetched 1 row(s) in 0.25s
[localhost.localdomain:21000] > show partitions p;
Query: show partitions p
+-------+---+-------+--------+------+--------------+-------------------+
| j | k | #Rows | #Files | Size | Bytes Cached | Cache Replication |
+-------+---+-------+--------+------+--------------+-------------------+
| 1 | b | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| 1 | c | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| Total | | -1 | 0 | 0B | 0B | |
+-------+---+-------+--------+------+--------------+-------------------+
Fetched 3 row(s) in 0.01s
Change-Id: I2c9162fcf9d227b8daf4c2e761d57bab4e26408f
Reviewed-on: http://gerrit.cloudera.org:8080/3942
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
1.) IMPALA-4134: Use Kudu AUTO FLUSH
Improves performance of writes to Kudu up to 4.2x in
bulk data loading tests (load 200 million rows from
lineitem).
2.) IMPALA-3704: Improve errors on PK conflicts
The Kudu client reports an error for every PK conflict,
and all errors were being returned in the error status.
As a result, inserts/updates/deletes could return errors
with thousands errors reported. This changes the error
handling to log all reported errors as warnings and
return only the first error in the query error status.
3.) Improve the DataSink reporting of the insert stats.
The per-partition stats returned by the data sink weren't
useful for Kudu sinks. Firstly, the number of appended rows
was not being displayed in the profile. Secondly, the
'stats' field isn't populated for Kudu tables and thus was
confusing in the profile, so it is no longer printed if it
is not set in the thrift struct.
Testing: Ran local tests, including new tests to verify
the query profile insert stats. Manual cluster testing was
conducted of the AUTO FLUSH functionality, and that testing
informed the default mutation buffer value of 100MB which
was found to provide good results.
Change-Id: I5542b9a061b01c543a139e8722560b1365f06595
Reviewed-on: http://gerrit.cloudera.org:8080/4728
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
IMPALA-3002:
The shell prints an incorrect value for '#Rows' in the exec
summary for broadcast nodes due to incorrect logic around
whether to use max or agg stats. This patch makes the behavior
consistent with the way the be treats exec summaries in
summary-util.cc. This incorrect logic was also duplicated in
the impala_beeswax test framework.
IMPALA-1473:
When there is a merging exchange with a limit, we may copy rows
into the output batch beyond the limit. In this case, we currently
update the output batch's size to reflect the limit, but we also
need to update ExecNode::num_rows_returned_ or the exec summary
may show that the exchange node returned more rows than it really
did.
Additionally, PlanFragmentExecutor::GetNext does not update
rows_produced_counter_ in some cases, leading the runtime profile
to display an incorrect value for 'RowsProduced'.
Change-Id: I386719370386c9cff09b8b35d15dc712dc6480aa
Reviewed-on: http://gerrit.cloudera.org:8080/4679
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
The webserver address was always configured as 0.0.0.0 (meaning that
the webserver could be reached on any IP for that machine) unless
otherwise specified. This is not a correct value to dispay to the
user. This patch returns the hostname of the node, when requested,
if the webserver host address is 0.0.0.0.
This patch also does not print the coordinator link for very simple
queries, as it's not necessary and is unnecessarily verbose.
This patch also does away with pinging the impalad an extra time per
query for finding the host time and webserver address. It instead
remembers the webserver address at connect time and displays client
local time for every query instead.
Change-Id: I9d167b66f2dd8629e40a7094d21ea7ce6b43d23b
Reviewed-on: http://gerrit.cloudera.org:8080/3994
Tested-by: Internal Jenkins
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Sailesh Mukil <sailesh@cloudera.com>
Fix the error handling code and add a test.
Change-Id: Iebcf1dc8a1a08b400a2c769a9cff38ea02c8e525
Reviewed-on: http://gerrit.cloudera.org:8080/4022
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
TSSLSocketWithWildcardSAN.py was recently added to the impala-shell
as a part of IMPALA-3159. However, it was not exported as a part of
the shell tarball.
This change adds the file to the tarball.
Change-Id: I5a7ab8c20c0b20c21b7f8d008e39c940419e3c4d
Reviewed-on: http://gerrit.cloudera.org:8080/3872
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
This file snuck in with the old copyright header after I put together
the original change for this JIRA.
Change-Id: I311e493bec7e63ea6dd7229140045d486540612a
Reviewed-on: http://gerrit.cloudera.org:8080/3867
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins