Restructured how the cmd control flow executes commands in postcmd,
removing the hack to make non-interactive mode work. Now there are
values to represent the different command execution statuses.
Change-Id: I149b65d8a64d63a978fed284f0ad0da95833149c
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3850
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
(cherry picked from commit b33dc4d10bcc3982dad43015343c91f1d277bb3f)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4096
Reviewed-by: Henry Robinson <henry@cloudera.com>
This is the first iteration of a kerberized development environment.
All the daemons start and use kerberos, with the sole exception of the
hive metastore. This is sufficient to test impala authentication.
When buildall.sh is run using '-kerberize', it will stop before
loading data or attempting to run tests.
Loading data into the cluster is known to not work at this time, the
root causes being that Beeline -> HiveServer2 -> MapReduce throws
errors, and Beeline -> HiveServer2 -> HBase has problems. These are
left for later work.
However, the impala daemons will happily authenticate using kerberos
both from clients (like the impala shell) and amongst each other.
This means that if you can get data into the mini-cluster, you could
query it.
Usage:
* Supply a '-kerberize' option to buildall.sh, or
* Supply a '-kerberize' option to create-test-configuration.sh, then
'run-all.sh -format', re-source impala-config.sh, and then start
impala daemons as usual. You must reformat the cluster because
kerberizing it will change all the ownership of all files in HDFS.
Notable changes:
* Added clean start/stop script for the llama-minikdc
* Creation of Kerberized HDFS - namenode and datanodes
* Kerberized HBase (and Zookeeper)
* Kerberized Hive (minus the MetaStore)
* Kerberized Impala
* Loading of data very nearly working
Still to go:
* Kerberize the MetaStore
* Get data loading working
* Run all tests
* The unknown unknowns
* Extensive testing
Change-Id: Iee3f56f6cc28303821fc6a3bf3ca7f5933632160
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4019
Reviewed-by: Michael Yoder <myoder@cloudera.com>
Tested-by: jenkins
An anti-join query in IMPALA-1175 revealed that the shell
mishandled the exception when the impalad would crash. This
fix prints the correct error message and sets the shell
state as disconnected.
Note there isn't a way to test this easily due to the difficulty
in coming up with a query that will crash the impalad.
Change-Id: I534674880db224a0d93dfd8bd2c081a12b65532b
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3998
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4080
This is a reorganization of the existing impala-shell.
The basic idea was to split up the shell into two components: one part
soley responsible for the CLI functionality, and another to represent
the impala client/connection that would interact with the Beeswax api and
execute queries, fetch results, etc.
One major change was to redo how the existing shell handled cancellation,
which was to create a thread for each rpc, so that Ctrl+C would not interrupt
the system calls and break the socket connection. In the new approach,
a new client instance is created to close the query and if the socket connection is
broken, the client reconnects. Cancellation currently works.
Change-Id: I0f371f68552c065b2317f967c6cf7483b44be3df
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3316
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4008
Commands with escaped single quotes would cause
the shell to enter an infinite loop while trying
to parse the command due to shlex not escaping single
quotes correctly. Once that change was implemented,
shlex would now ignore escaped single and double quotes
outside of closed quotes, so there needed to be a check for
that as well.
ALSO, implemented testing of commands in interactive mode.
Needed this to test these inputs, as command line input
cannot span multiple lines.
Change-Id: Id67368944eeb9a73061bc3e90bd6cda73c9d9f64
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3408
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3893
Before, when the show_profiles -p option was enabled, the runtime profile
would be printed before the query was closed, preventing the query summary
from being printed. Now the profile is printed after the query is closed.
Change-Id: Icf7b10f7612d8016736aac70aa7b77265d391a98
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3770
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3821
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Optionally loads options from a file in the user's home directory, called
'.impalarc', though the path to another file can be passed in as a command-line
option. The file must have a case-sensitive [impala] header. Specifying
the option in the command line overwrites the config file's value
for the option for that instance of the shell. If an option is not
specified in the config file, its default value is used.
Change-Id: I218da2c1e10308c5b8729883fa625f0c284397a7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2956
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3629
There was an issue with the previous fix to IMPALA-1059
if the user tried to reconnect within the shell after
having passed in a database via the -d option. The
passed database would be doubly backticked. This makes
the backticking of the argument idempotent.
Change-Id: I6eaed997c2be73d8659a2a12046ce393b97ec82c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3467
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3502
If not backticked, arguments such as parquet are interpreted as
keywords, when it is possible a database by that name exists.
This could have been avoided via single quotes around backticks: -d '`parquet`'
Otherwise, -d `parquet` throws a commandline error.
In interactive mode, backticks alone (ex. use `parquet`) will pass the
name as an identifier rather than a keyword.
Change-Id: I24b43eeeb6b4bfda5388165856788a20b64bc2ba
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3307
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3500
Users can now type 'summary' in the Impala shell after a query executes
to get a breakdown of the work done by each part of the query plan.
Change-Id: Ia6a43429ffc7778f3c2c8fcbf45d83828263c2ab
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2963
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
(cherry picked from commit 9b98d42acb14d43a64832767528ee572eac4979b)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2995
Options displayed with 'set' command. Default values distinguished
from set values by square brackets.
Change-Id: Iacf0574555aab78aa0ba2008ceb8776d372a57a5
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2913
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Without this change, the shell would always print the error log twice
for successful non-insert queries (once in __execute_query() and once
in __fetch()).
Change-Id: I0ab038230df897559b30feaea34778ea72988bc3
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2815
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 095dba7f395491db03daf19ff3bff2e2b4640ee4)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2908
COMPUTE STATS is an async DDL command. When COMPUTE STATS fails it will set the
query status of the QueryExecState properly, but the original Beeswax::query() RPC
won't throw. The Impala shell sometimes did not pick up and display the
query status because no RPC actually threw. To fix this, I modified
Beeswax::get_log() to include the query status if it is not ok. The shell looks
for a special prefix to distinguish the query status from the runtime state error log.
Change-Id: I0d9dbf0801629a37de22ea4ebb6d2e5d53b836ef
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1899
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2063
The problem was that we were setting a flag marking the last_query_handle as closed, but
were not resetting the flag before the next query. This caused the first query to
be closed properly, but subsequent queries would not be closed. The fix is to change
where the flag is reset to the same place as where we assign last_query_handle.
Added a test case.
Change-Id: I870a96789489bfe4f388910b808409cd0584af8a
(cherry picked from commit 1439151af5b63112b0dd631fac9c7ab4d43bba37)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1976
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
This change makes the fetch rpc interruptable. If the user cancels the query in
the middle of a fetch, the shell reconnects to the impalad and closes the
query. It also includes some code consolidation.
Change-Id: Iaaf0dfd4cba9ce2557e4a7d0447bc9c3ffda5e29
Reviewed-on: http://gerrit.ent.cloudera.com:8080/717
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Reindent is very expensive for large queries, removing it makes query parsing performant
for no loss of functionality.
Change-Id: I4d2bec6bbabaf949aa0f64193a6e2b3b3725407e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/966
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
This patch reworks our Kerberos authentication layer to support multiple
authentication protocols, particularly PLAIN/SASL to support external
LDAP authentication.
There is now a system-wide AuthManager object, initialised by InitAuth()
which occurs during the usual InitCommonRuntime() setup. The AuthManager
is responsible for supplying AuthProvider objects to ThriftServers and
ThriftClients. The AuthProvider in turn generates Thrift transport
objects which are usually SASL-enabled, and which either employ GSSAPI
or PLAIN mechanisms.
In miscellaneous changes:
* Cyrus SASL now builds both with LDAP and the dummy '--enable-true'
external authentication mechanisms enabled.
* To test PLAIN/SASL authentication, you must now include
$IMPALA_HOME/thirdparty/${IMPALA_CYRUS_SASL_VERSION}/build/lib/sasl2 in
FLAGS_sasl_path.
* The shell now has an option to authenticate using LDAP, and will
prompt for a password at startup before doing so.
* Since the authentication code is almost entirely Thrift-specific, it
has been moved to the rpc lib.
Change-Id: I771de50f05630efdf1606ab9f0f48146ad54595e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/716
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
quotation ending in a semi-colon.
Currently, the shell cannot parse a line which has an open quotation ending in a
semi-colon. It treats the semi-colon as a command terminator. With this change, the shell
will be able to detect an quotation which has not been closed, and parse the line
properly.
Change-Id: If0107952b9ae3144fb590f734a965d12c950c137
Reviewed-on: http://gerrit.ent.cloudera.com:8080/566
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
Rhel5x doesn't have the python ssl module installed by default. Thrift's TSSLSocket
module tries to import the ssl module. Currently, if the import fails, the exception
is not caught, making the shell unusable on Rhel5x. This change attempts to only import
TSSLSocket when the user wants an SSL-secured connection; If it's not found, the shell
exits with a warning.
Change-Id: I5cf30b2d0533b91d207a1aadb9ded7e753d7b01b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/648
Tested-by: jenkins
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
exist or is uneditable.
Currently, the shell warns the user that it's unable to load the command history if the
command history file (~/.impalahistory) is not found. Moreover, if the file is not
editable, then an error is thrown after each the execution of each command. This change
disables readline if the history file is not editable instead of throwing repeated
errors, and removes the warning if the history file does not exist.
Change-Id: Ie4c94629431f2407b0679a7721a6bdf28907437f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/532
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
This change has the following additions:
- If the user's connecting to a kerberized impalad, the Impala shell will check
whether a valid ticket exists by running 'klist -s'. If a valid ticket is not found,
then the shell will exit with an appropriate error message on the commandline.
- If the user's connecting to a kerberized impalad without the '-k' option, the Impala
Shell will issue a 'klist -s' to check if there are valid kerberos tickets in the
credentials cache. If a valid ticket is found, it will retry the connection with
kerberos enabled.
- The Impala shell encodes strings entered on the commandline as unicode. The sasl
module expects ascii strings as arguments. Explcitly encode any string sent to the
sasl module to ascii.
Change-Id: I1799b1e7988a19fa513b683afe1e3b66b68c1ffc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/535
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
This patch allows Impala to start either Beeswax or HS2 on an
SSL-secured port. SSL is a certificate-based authentication scheme,
where the server provides a certificate to the client as part of the
handshake process. The client verifies that certificate, either by
contacting a trusted third-party certificate authority (CA), or by
accepting a 'self-signed' certificate from the server that is also
provided to the client out-of-band; the client simply compares the two
certificate copies.
Once the certificate is verified, the client and server negotiate an
encryption key for the session, using a public key provided by the
server to encrypt that negotiation. Therefore the server has to have
access to a private key in order to decrypt the encryption key.
Both certificate and key are stored in industry standard .PEM
format. Impala uses the same certificate and key for both Beeswax and
HS2, and the files containing the certificate and key are provided via
--ssl_server_certificate and --ssl_private_key. If either are non-blank,
SSL is enabled for Beeswax and HS2.
The Python shell supports SSL as of this patch via new --ssl and
--ca_cert flags.
Finally, this patch also adds support for Impala's ThriftClients to use
SSL, paving the way for having the backend service use encryption on the
wire as well (although such a configuration is not used by this
patch). The client SSL support is only currently used for the new test
case.
This patch does not enable 'mutual' authentication, where clients
provide certificates to the server in order to authenticate
themselves. Impala has other authentication mechanisms for that purpose.
Change-Id: I3942aa0d21b34b7cda748292f04a9523f35ee6d4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/514
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
This adds support for CREATE TABLE AS SELECT to Impala. It supports all functionality a
regular CREATE TABLE statement includes, except it does not allow for for specifying
partition columns. Hive also has this limitation and it wouldn't be too hard to support
in the future.
Change-Id: I4ca3c3b8f1576441b8bb5ed9dc521d7dfa96ab74
Reviewed-on: http://gerrit.ent.cloudera.com:8080/157
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
This change adds Impala support for LOAD DATA statements. This allows the user
to load one or more files into a table or partition from a given HDFS location. The
load operation only moves files, it does not convert data to match the target
table/partition's file format.