Adds impala-shell support to connect to HiveServer2 HTTP endpoint.
Relies on toolchain change at https://gerrit.cloudera.org/#/c/13725/.
Use --protocol='hs2-http' to enable this behavior.
Example usages:
---------------
impala-shell --protocol='hs2-http' (No auth)
impala-shell --protocol='hs2-http' --ldap -u..... (PLAIN auth)
impala-shell --protocol-'hs2-http' --ssl --ca_cert... (TLS)
impala-shell --protocol='hs2-http' --ldap --ssl --ca_cert... (LDAP +
TLS)
Limitations:
-----------
- Does not support Kerberos (-k) due to lack ot SPNEGO support.
Testing:
--------
- Parameterized existing shell tests to support this combination.
- Added shell test coverage for LDAP auth.
Change-Id: I8323950857dfe1c1dfd5377fde79f87bc2ce9534
Reviewed-on: http://gerrit.cloudera.org:8080/13746
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
HS2 is added as an option via --protocol=hs2. The user-visible
differences in behaviour are minimal. Beeswax is still the
default and can be explicitly enabled via --protocol=beeswax
but will be deprecated. The default is unchanged because
changing the default could break certain workflows, e.g.
those that explicitly specify the port with -i or deployments
that hit --fe_service_threads for HS2 and somehow rely on
impala-shell not contributing to that limit. For most
workflows the change is transparent and we should change
the default in a major version change.
This support requires Impala-specific extensions to
the HS2 interface, similar to the existing extensions
to Beeswax. Thus the HS2 shell is only
forwards-compatible with newer Impala versions.
I considered trying to gracefully degrade when the
new extensions weren't present, but it didn't seem to be
worth the ongoing testing effort.
Differences between HS2 and Beeswax are abstracted into
ImpalaClient subclasses.
Here are the changes required to make it work:
* Switch to TBinaryProtocolAccelerated to avoid perf
regression. The HS2 protocol requires decoding
more primitive values (because its not a string-per-row),
which was slow with the pure python implementation of
TBinaryProtocol.
* Added bitarray module to efficiently unpack null indicators
* Minimise invasiveness of changes by transposing and stringifying
the columnar results into rows in impala_client.py. The transposition
needs to happen before display anyway.
* Add PingImpalaHS2Service() to get back version string and webserver
address.
* Add CloseImpalaOperation() extension to return DML row counts. This
possibly addresses IMPALA-1789, although we need to confirm that
this is a sufficient solution.
* Add is_closed member to query handles to avoid shell independently
tracking whether the query handle was closed or not.
* Include query status in HS2 log to match beeswax.
* HS2 GetLog() command now includes query status error message for
consistency with beeswax.
* "set"/"set all" uses the client requests options, not the session
default. This captures the effective value of TIMEZONE, which
was previously missing. This also requires test changes where
the tests set non-default values, e.g. for ABORT_ON_ERROR.
* "set all" on the server side returns REMOVED query options - the
shell needs to know these so it can correctly ignore them.
* Clean up self.orig_cmd/self.last_leading comment argument
passing to avoid implicit parameter passing through multiple
function calls.
* Clean up argument handling in shell tests to consistently pass
around lists of arguments instead of strings that are subject
to shell tokenisation rules.
* Consistently close connections in the shell to avoid leaking
HS2 sessions. This is enforced by making ImpalaShell a context
manager and also eliminating all sys.exit() calls that would
bypass the explicit connection closing.
Testing:
* Shell tests can run with both protocols
* Add tests for formatting of all types and NULL values
* Added testing for floating point output formatting, which does
change as a result of switching to server-side vs client-side
formatting.
* Verified that newly-added tests were actually going through HS2
by disabling hs2 on the minicluster and running tests.
* Add checks to test_verify_metrics.py to ensure that no sessions
are left open at the end of tests.
Performance:
Baseline from beeswax shell for large extract is as follows:
$ time impala-shell.sh -B -q 'select * from tpch_parquet.orders' > /dev/null
real 0m6.708s
user 0m5.132s
sys 0m0.204s
After this change it is somewhat slower, but we generally don't consider
bulk extract performance through the shell to be perf-critical:
real 0m7.625s
user 0m6.436s
sys 0m0.256s
Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12
Reviewed-on: http://gerrit.cloudera.org:8080/12884
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
After adding a subquery to a query that fails with ERROR, it fails with WARNING.
The fix here makes it return ERROR.
Testing:
Added unit tests;
Done real cluster testing with reported cases.
Change-Id: Ibedb11dd3d50bcdb21d508f7d21691925491946e
Reviewed-on: http://gerrit.cloudera.org:8080/12022
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
As part of IMPALA-7555, we added a default socket timeout of 5 seconds
when connecting to an impalad. Under heavy load with kerberos and SSL
enabled, we could hit this default timeout. This change bumps up the
timeout to 60 secs to make the impala-shell more robust.
Change-Id: Ifc40069e86cbf93634320804efba003fb5551afe
Reviewed-on: http://gerrit.cloudera.org:8080/12051
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
impala-shell does not set any socket timeout while connecting to the
impala server. This change sets a timeout on the socket before
connecting and unsets it back after successfully connecting. The default
timeout on this socket is 5 sec.
Usage: impala-shell --client_connect_timeout=<value in ms>
Testing:
1. Added a test where I create a random listening socket.
impala-shell (with ssl enabled) connects to this socket and
times out after 2 sec.
2. Created a kerberized impala cluster with ssl enabled and
connected to the impalad using an openssl client (block the
beeswax server thread to accept new connection) -
E.g. - openssl s_client -connect <IP Addr>:21000
Used impala-shell to connect to the same impalad later.
impala-shell timed out after the default of 5 sec.I verified
it manually.
Change-Id: I130fc47f7a83f591918d6842634b4e5787d00813
Reviewed-on: http://gerrit.cloudera.org:8080/11540
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
If the remote impalad died while the shell waited for a
command to complete, the shell disconnected. Previously
after restarting the remote impalad, we needed to run
"connect;" to reconnect, now the shell will automatically
reconnect.
Testing:
Added test_auto_connect_after_impalad_died in
test_shell_interactive_reconnect.py
Change-Id: Ia13365a9696886f01294e98054cf4e7cd66ab712
Reviewed-on: http://gerrit.cloudera.org:8080/10992
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
After additional testing around IMPALA-2782, it was discovered
that impala-shell starts the session displaying the expected
hostname (as passed -i flag) on the prompt. This gives the
impression that the load balancer was bypassed, however the
actual TSSLSocket is still created with the hostname passed
in via the -b or --kerberos_host_fqdn flag.
This change ensures that the hostname used to create the
TSSLSocket will always be the one passed in via the -i flag
on impala-shell. This change is required by IMPALA-2782.
Testing:
Using netcat, we verified that the impala daemon host[:port]
value passed into the -i/--impalad option is indeed the one
impala-shell tries to connect to in both cases (with and
without -b)
Change-Id: Ibee05bd0dbe8c6ae108b890f0ae0f6900149773a
Reviewed-on: http://gerrit.cloudera.org:8080/10580
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this commit it was quite random which DDL oprations
returned a result set and which didn't.
With this commit, every DDL operations return a summary of
its execution. They declare their result set schema in
Frontend.java, and provide the summary in CalatogOpExecutor.java.
Updated the tests according to the new behavior.
Change-Id: Ic542fb8e49e850052416ac663ee329ee3974e3b9
Reviewed-on: http://gerrit.cloudera.org:8080/9090
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
configured with load balancer and kerberos.
This change adds an impala-shell option -b / --kerberos_host_fqdn.
This allows user to optionally specify the load-balancer's host so
that impala-shell will accept a direct connection to impala daemons
in a kerberized cluster.
Change-Id: I4726226a7a3817421b133f74dd4f4cf8c52135f9
Reviewed-on: http://gerrit.cloudera.org:8080/7241
Reviewed-by: <andy@phdata.io>
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins
Issue 1: When query is cancelled via CTRL-C while being executed in Impala-shell
then an exception is thrown from Impala backend saying 'Invalid query handle'.
This is because one ImpalaClient was making RPC's while another ImpalaClient
cancelled the query on the backend. As a result RPC handlers in ImpalaServer
try to access a ClientRequestState that had been cleared from the backend. The
issue is confidently reproducable both in wait_to_finish and in fetch states of
the query.
As a solution the query cancellation is indicated to ImpalaClient via a bool
flag. Once a cancellation originated exception reaches Impala shell this flag
is checked to decide whether to suppress the error or not.
Issue 2: Every time a query was cancelled a 'use db' command was issued
automatically. This happened to historical reasons but is not needed anymore
(see Jira for more details).
Change-Id: I6cefaf1dae78baae238289816a7cb9d210fb38e2
Reviewed-on: http://gerrit.cloudera.org:8080/8549
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Four display levels are introduced for each query option: REGULAR, ADVANCED,
DEVELOPMENT and DEPRECATED. When the query options are displayed in Impala
shell using SET then only the REGULAR and ADVANCED options are shown. A new
command called SET ALL shows all the options grouped by their option levels.
When the query options are displayed through the SET SQL statement then the
result set would contain an extra column indicating the level of each option.
Similarly to Impala shell here the SET command only diplays the REGULAR and
ADVANCED options while SET ALL shows them all.
If the Impala shell connects to an Impala daemon that predates this change
then all the options would be displayed in the REGULAR group.
Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1
Reviewed-on: http://gerrit.cloudera.org:8080/8447
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
For some queries, the exec summary will not be completely filled
in even if the query is FINISHED. In particular, the exec_stats field
may not be set. This was causing an error in our test code that
converts the exec summary to a more usable format.
The situation is essentially deterministic for some queries, but
it was being hidden by testing code that caught the error and
discarded it in most situations, leading to flaky tests.
This patch removes the 'try' that was hiding the error and makes
the code check for the presence of exec_stats and handle it rather
than generating an error.
I filed IMPALA-5783 for followup work to be more rigorous about
when the exec summary should and shouldn't be fully present.
Testing:
- Ran the affected tests in a loop and they are no longer flaky.
Change-Id: Id52ac62da2b01f9e163e97cbe4590f8db6b663d2
Reviewed-on: http://gerrit.cloudera.org:8080/7627
Tested-by: Impala Public Jenkins
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Adds support in the shell to report the number of modified
rows for all DML operations, as well as the number of rows
with errors.
Testing: Added shell tests.
Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1
Reviewed-on: http://gerrit.cloudera.org:8080/5103
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This commit handles partition related DDL in a more general way. We can
now use compound predicates to specify a list of partitions in
statements like ALTER TABLE DROP PARTITION and COMPUTE INCREMENTAL
STATS, etc. It will also make sure some statements only accept one
partition at a time, such as PARTITION SET LOCATION and LOAD DATA. ALTER
TABLE ADD PARTITION remains using the old PartitionKeyValue's logic.
The changed partition related DDLs are as follows,
Table: p (i int) partitioned by (j int, k string)
Partitions:
+-------+---+-------+--------+------+--------------+-------------------+
| j | k | #Rows | #Files | Size | Bytes Cached | Cache Replication |
+-------+---+-------+--------+------+--------------+-------------------+
| 1 | a | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| 1 | b | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| 1 | c | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| 2 | d | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| 2 | e | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| 2 | f | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| Total | | -1 | 0 | 0B | 0B | |
+-------+---+-------+--------+------+--------------+-------------------+
1. show files in p partition (j<2, k='a');
2. alter table p partition (j<2, k in ("b","c") set cached in 'testPool';
// j can appear more than once,
3.1. alter table p partition (j<2, j>0, k<>"d") set uncached;
// it is the same as
3.2. alter table p partition (j<2 and j>0, not k="e") set uncached;
// we can also do 'or'
3.3. alter table p partition (j<2 or j>0, k like "%") set uncached;
// missing 'k' matches all values of k
4. alter table p partition (j<2) set fileformat textfile;
5. alter table p partition (k rlike ".*") set serdeproperties ("k"="v");
6. alter table p partition (j is not null) set tblproperties ("k"="v");
7. alter table p drop partition (j<2);
8. compute incremental stats p partition(j<2);
The remaining old partition related DDLs are as follows,
1. load data inpath '/path/from' into table p partition (j=2, k="d");
2. alter table p add partition (j=2, k="g");
3. alter table p partition (j=2, k="g") set location '/path/to';
4. insert into p partition (j=2, k="g") values (1), (2), (3);
General partition expressions or partially specified partition specs
allows partition predicates to return empty partition set no matter
'IF EXISTS' is specified.
Examples:
[localhost.localdomain:21000] >
alter table p drop partition (j=2, k="f");
Query: alter table p drop partition (j=2, k="f")
+-------------------------+
| summary |
+-------------------------+
| Dropped 1 partition(s). |
+-------------------------+
Fetched 1 row(s) in 0.78s
[localhost.localdomain:21000] >
alter table p drop partition (j=2, k<"f");
Query: alter table p drop partition (j=2, k<"f")
+-------------------------+
| summary |
+-------------------------+
| Dropped 2 partition(s). |
+-------------------------+
Fetched 1 row(s) in 0.41s
[localhost.localdomain:21000] >
alter table p drop partition (k="a");
Query: alter table p drop partition (k="a")
+-------------------------+
| summary |
+-------------------------+
| Dropped 1 partition(s). |
+-------------------------+
Fetched 1 row(s) in 0.25s
[localhost.localdomain:21000] > show partitions p;
Query: show partitions p
+-------+---+-------+--------+------+--------------+-------------------+
| j | k | #Rows | #Files | Size | Bytes Cached | Cache Replication |
+-------+---+-------+--------+------+--------------+-------------------+
| 1 | b | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| 1 | c | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| Total | | -1 | 0 | 0B | 0B | |
+-------+---+-------+--------+------+--------------+-------------------+
Fetched 3 row(s) in 0.01s
Change-Id: I2c9162fcf9d227b8daf4c2e761d57bab4e26408f
Reviewed-on: http://gerrit.cloudera.org:8080/3942
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
1.) IMPALA-4134: Use Kudu AUTO FLUSH
Improves performance of writes to Kudu up to 4.2x in
bulk data loading tests (load 200 million rows from
lineitem).
2.) IMPALA-3704: Improve errors on PK conflicts
The Kudu client reports an error for every PK conflict,
and all errors were being returned in the error status.
As a result, inserts/updates/deletes could return errors
with thousands errors reported. This changes the error
handling to log all reported errors as warnings and
return only the first error in the query error status.
3.) Improve the DataSink reporting of the insert stats.
The per-partition stats returned by the data sink weren't
useful for Kudu sinks. Firstly, the number of appended rows
was not being displayed in the profile. Secondly, the
'stats' field isn't populated for Kudu tables and thus was
confusing in the profile, so it is no longer printed if it
is not set in the thrift struct.
Testing: Ran local tests, including new tests to verify
the query profile insert stats. Manual cluster testing was
conducted of the AUTO FLUSH functionality, and that testing
informed the default mutation buffer value of 100MB which
was found to provide good results.
Change-Id: I5542b9a061b01c543a139e8722560b1365f06595
Reviewed-on: http://gerrit.cloudera.org:8080/4728
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
IMPALA-3002:
The shell prints an incorrect value for '#Rows' in the exec
summary for broadcast nodes due to incorrect logic around
whether to use max or agg stats. This patch makes the behavior
consistent with the way the be treats exec summaries in
summary-util.cc. This incorrect logic was also duplicated in
the impala_beeswax test framework.
IMPALA-1473:
When there is a merging exchange with a limit, we may copy rows
into the output batch beyond the limit. In this case, we currently
update the output batch's size to reflect the limit, but we also
need to update ExecNode::num_rows_returned_ or the exec summary
may show that the exchange node returned more rows than it really
did.
Additionally, PlanFragmentExecutor::GetNext does not update
rows_produced_counter_ in some cases, leading the runtime profile
to display an incorrect value for 'RowsProduced'.
Change-Id: I386719370386c9cff09b8b35d15dc712dc6480aa
Reviewed-on: http://gerrit.cloudera.org:8080/4679
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
The webserver address was always configured as 0.0.0.0 (meaning that
the webserver could be reached on any IP for that machine) unless
otherwise specified. This is not a correct value to dispay to the
user. This patch returns the hostname of the node, when requested,
if the webserver host address is 0.0.0.0.
This patch also does not print the coordinator link for very simple
queries, as it's not necessary and is unnecessarily verbose.
This patch also does away with pinging the impalad an extra time per
query for finding the host time and webserver address. It instead
remembers the webserver address at connect time and displays client
local time for every query instead.
Change-Id: I9d167b66f2dd8629e40a7094d21ea7ce6b43d23b
Reviewed-on: http://gerrit.cloudera.org:8080/3994
Tested-by: Internal Jenkins
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Sailesh Mukil <sailesh@cloudera.com>
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
The impala-shell could not accept wildcard or SAN certificates
previously as the thrift library it depended on did not support them.
This patch subclasses TSSLSocket and adds the logic to take care of
the above mentioned cases by introducing the new
TSSLSocketWithWildcardSAN class.
The certificate matching logic is based on the python-ssl source code.
Added custom cluster tests to test both wildcard matching and SAN
matching.
Added be/src/testutil/certificates-info.txt which contains all the
information about the certificates which are added for the tests.
This has been tested with Python2.4 and Python2.6.
Change-Id: I75e37012eeeb0bcf87a5edf875f0ff915daf8b89
Reviewed-on: http://gerrit.cloudera.org:8080/3765
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
To help supportability and debugging, it's helpful to have the impala
shell print out the coordinator time and the link to the coordinator
web UI once the query is submitted.
This is done by calling the PingImpalaService() routine everytime a
query is submitted, which returns the coordinator's hostname,
webserver port and the coordinator epoch time at that moment which the
shell then formats and prints out.
Added tests to verify these new messages.
Change-Id: I704eb64546e27c367830120241311fea6091266b
Reviewed-on: http://gerrit.cloudera.org:8080/3507
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
This patch calls PingImpalaService() at the beginning of each command
loop (if the shell is currently connected). If the call fails, the shell
will try and reconnect. Reconnecting is best-effort - if it fails, the
command is processed anyway so as not to interfere with any commands
that might still give useful output in a disconnected state.
Change-Id: I37cb2f4fc235fedff16d48ad5125b9a30bd7dfd0
Reviewed-on: http://gerrit.cloudera.org:8080/547
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
This patch changes the behaviour of the Impala shell to refuse to
attempt an LDAP-authenticated connection to Impala unless SSL/TLS is
configured.
A new flag --auth_creds_in_clear_ok is added to suppress this
behaviour. This is similar to Impala's --ldap_passwords_in_clear_ok
flag. The shell will also now print a warning if an insecure
configuration is used.
Change-Id: Ide25d8dd881a61b9f08900112466c430da64a038
Reviewed-on: http://gerrit.cloudera.org:8080/546
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This patch adds a way to allow for dynamic progress reporting in the
shell. There are two new command line flags for the shell
--live_progress - will print the completed vs total # of scan ranges
--live_summary - prints an updated exec summary
In addition to the command line flags, these options can be set from
within the shell using:
set LIVE_SUMMARY=True
set LIVE_PROGRESS=True
The new options will be listed under shell options. Both reports will be
updated at most every second, for longer running queries it will be
adjusted to the time between two RPC calls to get the query status. To
provide this information in the ExecSummary, the Thrift structure for
the ExecSummary was extended to contain a progress indicator. The output
is printed to stderr and only available in interactive mode.
An example video is available here:
https://asciinema.org/a/5wi7ypckx4ol4ha1hlg3e3q1k
Change-Id: I70b2ab5fa74dc2ba5bc3b338ef13ddc6ccf367d2
Reviewed-on: http://gerrit.cloudera.org:8080/508
Tested-by: Internal Jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
This patch introduces the concept of error codes for errors that are
recorded in Impala and are going to be presented to the client. These
error codes are used to aggregate and group incoming error / warning
messages to reduce the spill on the shell and increase the usefulness of
the messages. By splitting the message string from the implementation,
it becomes possible to edit the string independently of the code and
pave the way for internationalization.
Error messages are defined as a combination of an enum value and a
string. Both are defined in the Error.thrift file that is automatically
generated using the script in common/thrift/generate_error_codes.py. The
goal of the script is to have a central understandable repository of
error messages. Adding new messages to this file will require rebuilding
the thrift part. The proxy class ErrorMessage is responsible to
represent an error and capture the parameters that are used to format
the error message string.
When error messages are recorded they are recorded based on the
following algorithm:
- If an error message is of type GENERAL, do not aggregate this message
and simply add it to the total number of messages
- If an error messages is of specific type, record the first error
message as a sample and for all other occurrences increment the count.
- The coordinator will merge all error messages except the ones of type
GENERAL and display a count.
For example, in the case of the parquet file spanning multiple blocks
the output will look like:
Parquet files should not be split into multiple hdfs-blocks.
file=hdfs://localhost:20500/fid.parq (1 of 321 similar)
All messages are always logged to VLOG. In the coordinator error
messages are merged across all backends to retain readability in the
case of large clusters.
The current version of this patch adds these new error codes to some of
the most important error messages as a reference implementation.
Change-Id: I1f1811631836d2dd6048035ad33f7194fb71d6b8
Reviewed-on: http://gerrit.cloudera.org:8080/39
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
This is a reorganization of the existing impala-shell.
The basic idea was to split up the shell into two components: one part
soley responsible for the CLI functionality, and another to represent
the impala client/connection that would interact with the Beeswax api and
execute queries, fetch results, etc.
One major change was to redo how the existing shell handled cancellation,
which was to create a thread for each rpc, so that Ctrl+C would not interrupt
the system calls and break the socket connection. In the new approach,
a new client instance is created to close the query and if the socket connection is
broken, the client reconnects. Cancellation currently works.
Change-Id: I0f371f68552c065b2317f967c6cf7483b44be3df
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3316
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4008