impala

mirror of https://github.com/apache/impala.git synced 2025-12-22 03:18:15 -05:00

Author	SHA1	Message	Date
Bharath Vissapragada	72c9370856	IMPALA-8717: impala-shell support for HS2 HTTP endpoint Adds impala-shell support to connect to HiveServer2 HTTP endpoint. Relies on toolchain change at https://gerrit.cloudera.org/#/c/13725/. Use --protocol='hs2-http' to enable this behavior. Example usages: --------------- impala-shell --protocol='hs2-http' (No auth) impala-shell --protocol='hs2-http' --ldap -u..... (PLAIN auth) impala-shell --protocol-'hs2-http' --ssl --ca_cert... (TLS) impala-shell --protocol='hs2-http' --ldap --ssl --ca_cert... (LDAP + TLS) Limitations: ----------- - Does not support Kerberos (-k) due to lack ot SPNEGO support. Testing: -------- - Parameterized existing shell tests to support this combination. - Added shell test coverage for LDAP auth. Change-Id: I8323950857dfe1c1dfd5377fde79f87bc2ce9534 Reviewed-on: http://gerrit.cloudera.org:8080/13746 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>	2019-07-29 05:43:48 +00:00
Tim Armstrong	f1f3ae9ec2	IMPALA-7290: part 2: Add HS2 support to Impala shell HS2 is added as an option via --protocol=hs2. The user-visible differences in behaviour are minimal. Beeswax is still the default and can be explicitly enabled via --protocol=beeswax but will be deprecated. The default is unchanged because changing the default could break certain workflows, e.g. those that explicitly specify the port with -i or deployments that hit --fe_service_threads for HS2 and somehow rely on impala-shell not contributing to that limit. For most workflows the change is transparent and we should change the default in a major version change. This support requires Impala-specific extensions to the HS2 interface, similar to the existing extensions to Beeswax. Thus the HS2 shell is only forwards-compatible with newer Impala versions. I considered trying to gracefully degrade when the new extensions weren't present, but it didn't seem to be worth the ongoing testing effort. Differences between HS2 and Beeswax are abstracted into ImpalaClient subclasses. Here are the changes required to make it work: * Switch to TBinaryProtocolAccelerated to avoid perf regression. The HS2 protocol requires decoding more primitive values (because its not a string-per-row), which was slow with the pure python implementation of TBinaryProtocol. * Added bitarray module to efficiently unpack null indicators * Minimise invasiveness of changes by transposing and stringifying the columnar results into rows in impala_client.py. The transposition needs to happen before display anyway. * Add PingImpalaHS2Service() to get back version string and webserver address. * Add CloseImpalaOperation() extension to return DML row counts. This possibly addresses IMPALA-1789, although we need to confirm that this is a sufficient solution. * Add is_closed member to query handles to avoid shell independently tracking whether the query handle was closed or not. * Include query status in HS2 log to match beeswax. * HS2 GetLog() command now includes query status error message for consistency with beeswax. * "set"/"set all" uses the client requests options, not the session default. This captures the effective value of TIMEZONE, which was previously missing. This also requires test changes where the tests set non-default values, e.g. for ABORT_ON_ERROR. * "set all" on the server side returns REMOVED query options - the shell needs to know these so it can correctly ignore them. * Clean up self.orig_cmd/self.last_leading comment argument passing to avoid implicit parameter passing through multiple function calls. * Clean up argument handling in shell tests to consistently pass around lists of arguments instead of strings that are subject to shell tokenisation rules. * Consistently close connections in the shell to avoid leaking HS2 sessions. This is enforced by making ImpalaShell a context manager and also eliminating all sys.exit() calls that would bypass the explicit connection closing. Testing: * Shell tests can run with both protocols * Add tests for formatting of all types and NULL values * Added testing for floating point output formatting, which does change as a result of switching to server-side vs client-side formatting. * Verified that newly-added tests were actually going through HS2 by disabling hs2 on the minicluster and running tests. * Add checks to test_verify_metrics.py to ensure that no sessions are left open at the end of tests. Performance: Baseline from beeswax shell for large extract is as follows: $ time impala-shell.sh -B -q 'select * from tpch_parquet.orders' > /dev/null real 0m6.708s user 0m5.132s sys 0m0.204s After this change it is somewhat slower, but we generally don't consider bulk extract performance through the shell to be perf-critical: real 0m7.625s user 0m6.436s sys 0m0.256s Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12 Reviewed-on: http://gerrit.cloudera.org:8080/12884 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-20 10:23:28 +00:00
Yongjun Zhang	7cc9092212	IMPALA-5474: Adding a trivial subquery turns error into warning After adding a subquery to a query that fails with ERROR, it fails with WARNING. The fix here makes it return ERROR. Testing: Added unit tests; Done real cluster testing with reported cases. Change-Id: Ibedb11dd3d50bcdb21d508f7d21691925491946e Reviewed-on: http://gerrit.cloudera.org:8080/12022 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2019-01-04 21:51:48 +00:00
Tim Armstrong	d8792c21c5	IMPALA-1048: show sinks in exec summary The exec summary now includes the total time taken and memory consumed by the data sink at the root of each fragment. Previously the exec summary could hide where time and memory went while executing a query. The high-level changes are: * Generalising logic in the exec summary and runtime profile to handle data sinks, not just plan nodes, including adding richer metadata to runtime profile nodes. * Threading through metadata about the data sinks, like names and estimates, so that it can appear in the exec summary. The major potential downside is that the new timings reported for data stream sender can overlap with the receiver's time and potentially cause confusion. [localhost:21000] default> select count(distinct l_comment) from tpch_parquet.lineitem; summary; Query: select count(distinct l_comment) from tpch_parquet.lineitem Query submitted at: 2018-11-20 16:47:03 (Coordinator: http://tarmstrong-box:25000) Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=f5464383a3bb6878:54b5252b00000000 +---------------------------+ \| count(distinct l_comment) \| +---------------------------+ \| 4580667 \| +---------------------------+ Fetched 1 row(s) in 4.53s +---------------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------+ \| Operator \| #Hosts \| Avg Time \| Max Time \| #Rows \| Est. #Rows \| Peak Mem \| Est. Peak Mem \| Detail \| +---------------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------+ \| F02:ROOT \| 1 \| 50.56us \| 50.56us \| \| \| 0 B \| 0 B \| \| \| 06:AGGREGATE \| 1 \| 275.89us \| 275.89us \| 1 \| 1 \| 16.00 KB \| 10.00 MB \| FINALIZE \| \| 05:EXCHANGE \| 1 \| 49.08us \| 49.08us \| 3 \| 1 \| 32.00 KB \| 16.00 KB \| UNPARTITIONED \| \| F01:EXCHANGE SENDER \| 3 \| 100.06us \| 113.49us \| \| \| 16.00 KB \| 0 B \| \| \| 02:AGGREGATE \| 3 \| 19.32ms \| 19.57ms \| 3 \| 1 \| 16.00 KB \| 10.00 MB \| \| \| 04:AGGREGATE \| 3 \| 1.29s \| 1.43s \| 4.58M \| 4.65M \| 98.02 MB \| 62.63 MB \| \| \| 03:EXCHANGE \| 3 \| 241.64ms \| 246.54ms \| 5.01M \| 4.65M \| 9.05 MB \| 10.12 MB \| HASH(l_comment) \| \| F00:EXCHANGE SENDER \| 3 \| 2.43s \| 2.58s \| \| \| 337.53 KB \| 0 B \| \| \| 01:AGGREGATE \| 3 \| 1.26s \| 1.46s \| 5.01M \| 4.65M \| 97.20 MB \| 121.17 MB \| STREAMING \| \| 00:SCAN HDFS \| 3 \| 39.87ms \| 41.36ms \| 6.00M \| 6.00M \| 27.87 MB \| 80.00 MB \| tpch_parquet.lineitem \| +---------------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------+ Testing: Added a basic observability test. Change-Id: I3fdf7bacae8ff597b255da65af453e174ba53544 Reviewed-on: http://gerrit.cloudera.org:8080/11967 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-12-07 07:33:01 +00:00
aphadke	a18809ba1f	IMPALA-7943: Bump the default client timeout set on impala-shell As part of IMPALA-7555, we added a default socket timeout of 5 seconds when connecting to an impalad. Under heavy load with kerberos and SSL enabled, we could hit this default timeout. This change bumps up the timeout to 60 secs to make the impala-shell more robust. Change-Id: Ifc40069e86cbf93634320804efba003fb5551afe Reviewed-on: http://gerrit.cloudera.org:8080/12051 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-12-07 06:57:24 +00:00
aphadke	2fb8ebaef2	IMPALA-7555: Set socket timeout in impala-shell impala-shell does not set any socket timeout while connecting to the impala server. This change sets a timeout on the socket before connecting and unsets it back after successfully connecting. The default timeout on this socket is 5 sec. Usage: impala-shell --client_connect_timeout=<value in ms> Testing: 1. Added a test where I create a random listening socket. impala-shell (with ssl enabled) connects to this socket and times out after 2 sec. 2. Created a kerberized impala cluster with ssl enabled and connected to the impalad using an openssl client (block the beeswax server thread to accept new connection) - E.g. - openssl s_client -connect <IP Addr>:21000 Used impala-shell to connect to the same impalad later. impala-shell timed out after the default of 5 sec.I verified it manually. Change-Id: I130fc47f7a83f591918d6842634b4e5787d00813 Reviewed-on: http://gerrit.cloudera.org:8080/11540 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-18 01:41:42 +00:00
Nghia Le	72db58acd0	IMPALA-6490: Reconnect shell when remote restarts If the remote impalad died while the shell waited for a command to complete, the shell disconnected. Previously after restarting the remote impalad, we needed to run "connect;" to reconnect, now the shell will automatically reconnect. Testing: Added test_auto_connect_after_impalad_died in test_shell_interactive_reconnect.py Change-Id: Ia13365a9696886f01294e98054cf4e7cd66ab712 Reviewed-on: http://gerrit.cloudera.org:8080/10992 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-31 21:50:33 +00:00
Vincent Tran	e7d5a25a45	IMPALA-7130: impala-shell -b / --kerberos_host_fqdn flag overrides value passed in via -i / --impalad After additional testing around IMPALA-2782, it was discovered that impala-shell starts the session displaying the expected hostname (as passed -i flag) on the prompt. This gives the impression that the load balancer was bypassed, however the actual TSSLSocket is still created with the hostname passed in via the -b or --kerberos_host_fqdn flag. This change ensures that the hostname used to create the TSSLSocket will always be the one passed in via the -i flag on impala-shell. This change is required by IMPALA-2782. Testing: Using netcat, we verified that the impala daemon host[:port] value passed into the -i/--impalad option is indeed the one impala-shell tries to connect to in both cases (with and without -b) Change-Id: Ibee05bd0dbe8c6ae108b890f0ae0f6900149773a Reviewed-on: http://gerrit.cloudera.org:8080/10580 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-06-17 06:45:38 +00:00
Zoltan Borok-Nagy	2ee914d5b3	IMPALA-5903: Inconsistent specification of result set and result set metadata Before this commit it was quite random which DDL oprations returned a result set and which didn't. With this commit, every DDL operations return a summary of its execution. They declare their result set schema in Frontend.java, and provide the summary in CalatogOpExecutor.java. Updated the tests according to the new behavior. Change-Id: Ic542fb8e49e850052416ac663ee329ee3974e3b9 Reviewed-on: http://gerrit.cloudera.org:8080/9090 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-04-11 02:21:48 +00:00
Vincent Tran	2c1fbecc9f	IMPALA-2782: Allow impala-shell to connect directly to impalad when configured with load balancer and kerberos. This change adds an impala-shell option -b / --kerberos_host_fqdn. This allows user to optionally specify the load-balancer's host so that impala-shell will accept a direct connection to impala daemons in a kerberized cluster. Change-Id: I4726226a7a3817421b133f74dd4f4cf8c52135f9 Reviewed-on: http://gerrit.cloudera.org:8080/7241 Reviewed-by: <andy@phdata.io> Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-21 20:45:48 +00:00
Gabor Kaszab	6d9da17288	IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C Issue 1: When query is cancelled via CTRL-C while being executed in Impala-shell then an exception is thrown from Impala backend saying 'Invalid query handle'. This is because one ImpalaClient was making RPC's while another ImpalaClient cancelled the query on the backend. As a result RPC handlers in ImpalaServer try to access a ClientRequestState that had been cleared from the backend. The issue is confidently reproducable both in wait_to_finish and in fetch states of the query. As a solution the query cancellation is indicated to ImpalaClient via a bool flag. Once a cancellation originated exception reaches Impala shell this flag is checked to decide whether to suppress the error or not. Issue 2: Every time a query was cancelled a 'use db' command was issued automatically. This happened to historical reasons but is not needed anymore (see Jira for more details). Change-Id: I6cefaf1dae78baae238289816a7cb9d210fb38e2 Reviewed-on: http://gerrit.cloudera.org:8080/8549 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-29 03:44:51 +00:00
Gabor Kaszab	88cb68cfbe	IMPALA-2181: Add query option levels for display Four display levels are introduced for each query option: REGULAR, ADVANCED, DEVELOPMENT and DEPRECATED. When the query options are displayed in Impala shell using SET then only the REGULAR and ADVANCED options are shown. A new command called SET ALL shows all the options grouped by their option levels. When the query options are displayed through the SET SQL statement then the result set would contain an extra column indicating the level of each option. Similarly to Impala shell here the SET command only diplays the REGULAR and ADVANCED options while SET ALL shows them all. If the Impala shell connects to an Impala daemon that predates this change then all the options would be displayed in the REGULAR group. Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 Reviewed-on: http://gerrit.cloudera.org:8080/8447 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-28 00:31:15 +00:00
Thomas Tauber-Marshall	6757b6235c	IMPALA-5708: Test failure with invalid exec summary For some queries, the exec summary will not be completely filled in even if the query is FINISHED. In particular, the exec_stats field may not be set. This was causing an error in our test code that converts the exec summary to a more usable format. The situation is essentially deterministic for some queries, but it was being hidden by testing code that caught the error and discarded it in most situations, leading to flaky tests. This patch removes the 'try' that was hiding the error and makes the code check for the presence of exec_stats and handle it rather than generating an error. I filed IMPALA-5783 for followup work to be more rigorous about when the exec summary should and shouldn't be fully present. Testing: - Ran the affected tests in a loop and they are no longer flaky. Change-Id: Id52ac62da2b01f9e163e97cbe4590f8db6b663d2 Reviewed-on: http://gerrit.cloudera.org:8080/7627 Tested-by: Impala Public Jenkins Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>	2017-08-14 19:35:12 +00:00
Matthew Jacobs	77a2941a42	IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting Adds support in the shell to report the number of modified rows for all DML operations, as well as the number of rows with errors. Testing: Added shell tests. Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1 Reviewed-on: http://gerrit.cloudera.org:8080/5103 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-17 04:13:25 +00:00
Amos Bird	628685ae74	IMPALA-1654: General partition exprs in DDL operations. This commit handles partition related DDL in a more general way. We can now use compound predicates to specify a list of partitions in statements like ALTER TABLE DROP PARTITION and COMPUTE INCREMENTAL STATS, etc. It will also make sure some statements only accept one partition at a time, such as PARTITION SET LOCATION and LOAD DATA. ALTER TABLE ADD PARTITION remains using the old PartitionKeyValue's logic. The changed partition related DDLs are as follows, Table: p (i int) partitioned by (j int, k string) Partitions: +-------+---+-------+--------+------+--------------+-------------------+ \| j \| k \| #Rows \| #Files \| Size \| Bytes Cached \| Cache Replication \| +-------+---+-------+--------+------+--------------+-------------------+ \| 1 \| a \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| 1 \| b \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| 1 \| c \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| 2 \| d \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| 2 \| e \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| 2 \| f \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| Total \| \| -1 \| 0 \| 0B \| 0B \| \| +-------+---+-------+--------+------+--------------+-------------------+ 1. show files in p partition (j<2, k='a'); 2. alter table p partition (j<2, k in ("b","c") set cached in 'testPool'; // j can appear more than once, 3.1. alter table p partition (j<2, j>0, k<>"d") set uncached; // it is the same as 3.2. alter table p partition (j<2 and j>0, not k="e") set uncached; // we can also do 'or' 3.3. alter table p partition (j<2 or j>0, k like "%") set uncached; // missing 'k' matches all values of k 4. alter table p partition (j<2) set fileformat textfile; 5. alter table p partition (k rlike ".*") set serdeproperties ("k"="v"); 6. alter table p partition (j is not null) set tblproperties ("k"="v"); 7. alter table p drop partition (j<2); 8. compute incremental stats p partition(j<2); The remaining old partition related DDLs are as follows, 1. load data inpath '/path/from' into table p partition (j=2, k="d"); 2. alter table p add partition (j=2, k="g"); 3. alter table p partition (j=2, k="g") set location '/path/to'; 4. insert into p partition (j=2, k="g") values (1), (2), (3); General partition expressions or partially specified partition specs allows partition predicates to return empty partition set no matter 'IF EXISTS' is specified. Examples: [localhost.localdomain:21000] > alter table p drop partition (j=2, k="f"); Query: alter table p drop partition (j=2, k="f") +-------------------------+ \| summary \| +-------------------------+ \| Dropped 1 partition(s). \| +-------------------------+ Fetched 1 row(s) in 0.78s [localhost.localdomain:21000] > alter table p drop partition (j=2, k<"f"); Query: alter table p drop partition (j=2, k<"f") +-------------------------+ \| summary \| +-------------------------+ \| Dropped 2 partition(s). \| +-------------------------+ Fetched 1 row(s) in 0.41s [localhost.localdomain:21000] > alter table p drop partition (k="a"); Query: alter table p drop partition (k="a") +-------------------------+ \| summary \| +-------------------------+ \| Dropped 1 partition(s). \| +-------------------------+ Fetched 1 row(s) in 0.25s [localhost.localdomain:21000] > show partitions p; Query: show partitions p +-------+---+-------+--------+------+--------------+-------------------+ \| j \| k \| #Rows \| #Files \| Size \| Bytes Cached \| Cache Replication \| +-------+---+-------+--------+------+--------------+-------------------+ \| 1 \| b \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| 1 \| c \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| Total \| \| -1 \| 0 \| 0B \| 0B \| \| +-------+---+-------+--------+------+--------------+-------------------+ Fetched 3 row(s) in 0.01s Change-Id: I2c9162fcf9d227b8daf4c2e761d57bab4e26408f Reviewed-on: http://gerrit.cloudera.org:8080/3942 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-15 03:27:36 +00:00
Matthew Jacobs	99ed6dc67a	IMPALA-4134,IMPALA-3704: Kudu INSERT improvements 1.) IMPALA-4134: Use Kudu AUTO FLUSH Improves performance of writes to Kudu up to 4.2x in bulk data loading tests (load 200 million rows from lineitem). 2.) IMPALA-3704: Improve errors on PK conflicts The Kudu client reports an error for every PK conflict, and all errors were being returned in the error status. As a result, inserts/updates/deletes could return errors with thousands errors reported. This changes the error handling to log all reported errors as warnings and return only the first error in the query error status. 3.) Improve the DataSink reporting of the insert stats. The per-partition stats returned by the data sink weren't useful for Kudu sinks. Firstly, the number of appended rows was not being displayed in the profile. Secondly, the 'stats' field isn't populated for Kudu tables and thus was confusing in the profile, so it is no longer printed if it is not set in the thrift struct. Testing: Ran local tests, including new tests to verify the query profile insert stats. Manual cluster testing was conducted of the AUTO FLUSH functionality, and that testing informed the default mutation buffer value of 100MB which was found to provide good results. Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 Reviewed-on: http://gerrit.cloudera.org:8080/4728 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-10-25 02:06:10 +00:00
Thomas Tauber-Marshall	7fad3e5dc3	IMPALA-3002/IMPALA-1473: Cardinality observability cleanup IMPALA-3002: The shell prints an incorrect value for '#Rows' in the exec summary for broadcast nodes due to incorrect logic around whether to use max or agg stats. This patch makes the behavior consistent with the way the be treats exec summaries in summary-util.cc. This incorrect logic was also duplicated in the impala_beeswax test framework. IMPALA-1473: When there is a merging exchange with a limit, we may copy rows into the output batch beyond the limit. In this case, we currently update the output batch's size to reflect the limit, but we also need to update ExecNode::num_rows_returned_ or the exec summary may show that the exchange node returned more rows than it really did. Additionally, PlanFragmentExecutor::GetNext does not update rows_produced_counter_ in some cases, leading the runtime profile to display an incorrect value for 'RowsProduced'. Change-Id: I386719370386c9cff09b8b35d15dc712dc6480aa Reviewed-on: http://gerrit.cloudera.org:8080/4679 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-10-15 01:25:51 +00:00
Sailesh Mukil	c23bf38a20	IMPALA-3893, IMPALA-3901: impala-shell prints incorrect coordinator address, overly verbose The webserver address was always configured as 0.0.0.0 (meaning that the webserver could be reached on any IP for that machine) unless otherwise specified. This is not a correct value to dispay to the user. This patch returns the hostname of the node, when requested, if the webserver host address is 0.0.0.0. This patch also does not print the coordinator link for very simple queries, as it's not necessary and is unnecessarily verbose. This patch also does away with pinging the impalad an extra time per query for finding the host time and webserver address. It instead remembers the webserver address at connect time and displays client local time for every query instead. Change-Id: I9d167b66f2dd8629e40a7094d21ea7ce6b43d23b Reviewed-on: http://gerrit.cloudera.org:8080/3994 Tested-by: Internal Jenkins Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Sailesh Mukil <sailesh@cloudera.com>	2016-08-23 18:25:06 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Sailesh Mukil	45ff0f9e67	IMPALA-3159: impala-shell does not accept wildcard or SAN certificates The impala-shell could not accept wildcard or SAN certificates previously as the thrift library it depended on did not support them. This patch subclasses TSSLSocket and adds the logic to take care of the above mentioned cases by introducing the new TSSLSocketWithWildcardSAN class. The certificate matching logic is based on the python-ssl source code. Added custom cluster tests to test both wildcard matching and SAN matching. Added be/src/testutil/certificates-info.txt which contains all the information about the certificates which are added for the tests. This has been tested with Python2.4 and Python2.6. Change-Id: I75e37012eeeb0bcf87a5edf875f0ff915daf8b89 Reviewed-on: http://gerrit.cloudera.org:8080/3765 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2016-07-26 02:44:25 +00:00
Sailesh Mukil	900f148078	IMPALA-1671: Print time and link to coordinator web UI once query is submitted in shell To help supportability and debugging, it's helpful to have the impala shell print out the coordinator time and the link to the coordinator web UI once the query is submitted. This is done by calling the PingImpalaService() routine everytime a query is submitted, which returns the coordinator's hostname, webserver port and the coordinator epoch time at that moment which the shell then formats and prints out. Added tests to verify these new messages. Change-Id: I704eb64546e27c367830120241311fea6091266b Reviewed-on: http://gerrit.cloudera.org:8080/3507 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2016-07-14 19:04:45 +00:00
Henry Robinson	c1fd862238	IMPALA-1975: Automatically reconnect failed connections from the shell This patch calls PingImpalaService() at the beginning of each command loop (if the shell is currently connected). If the call fails, the shell will try and reconnect. Reconnecting is best-effort - if it fails, the command is processed anyway so as not to interfere with any commands that might still give useful output in a disconnected state. Change-Id: I37cb2f4fc235fedff16d48ad5125b9a30bd7dfd0 Reviewed-on: http://gerrit.cloudera.org:8080/547 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins	2015-08-05 01:00:54 +00:00
Henry Robinson	621205ebbc	IMPALA-2143: Avoid sending auth credentials over insecure connections This patch changes the behaviour of the Impala shell to refuse to attempt an LDAP-authenticated connection to Impala unless SSL/TLS is configured. A new flag --auth_creds_in_clear_ok is added to suppress this behaviour. This is similar to Impala's --ldap_passwords_in_clear_ok flag. The shell will also now print a warning if an insecure configuration is used. Change-Id: Ide25d8dd881a61b9f08900112466c430da64a038 Reviewed-on: http://gerrit.cloudera.org:8080/546 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-07-30 07:15:29 +00:00
Martin Grund	ed18dd4a8b	IMPALA-80: Dynamic progress reporting for the shell This patch adds a way to allow for dynamic progress reporting in the shell. There are two new command line flags for the shell --live_progress - will print the completed vs total # of scan ranges --live_summary - prints an updated exec summary In addition to the command line flags, these options can be set from within the shell using: set LIVE_SUMMARY=True set LIVE_PROGRESS=True The new options will be listed under shell options. Both reports will be updated at most every second, for longer running queries it will be adjusted to the time between two RPC calls to get the query status. To provide this information in the ExecSummary, the Thrift structure for the ExecSummary was extended to contain a progress indicator. The output is printed to stderr and only available in interactive mode. An example video is available here: https://asciinema.org/a/5wi7ypckx4ol4ha1hlg3e3q1k Change-Id: I70b2ab5fa74dc2ba5bc3b338ef13ddc6ccf367d2 Reviewed-on: http://gerrit.cloudera.org:8080/508 Tested-by: Internal Jenkins Reviewed-by: Martin Grund <mgrund@cloudera.com>	2015-07-17 17:59:29 +00:00
Martin Grund	b582cdc22b	IMPALA-1598: Adding Error Codes to Log Messages This patch introduces the concept of error codes for errors that are recorded in Impala and are going to be presented to the client. These error codes are used to aggregate and group incoming error / warning messages to reduce the spill on the shell and increase the usefulness of the messages. By splitting the message string from the implementation, it becomes possible to edit the string independently of the code and pave the way for internationalization. Error messages are defined as a combination of an enum value and a string. Both are defined in the Error.thrift file that is automatically generated using the script in common/thrift/generate_error_codes.py. The goal of the script is to have a central understandable repository of error messages. Adding new messages to this file will require rebuilding the thrift part. The proxy class ErrorMessage is responsible to represent an error and capture the parameters that are used to format the error message string. When error messages are recorded they are recorded based on the following algorithm: - If an error message is of type GENERAL, do not aggregate this message and simply add it to the total number of messages - If an error messages is of specific type, record the first error message as a sample and for all other occurrences increment the count. - The coordinator will merge all error messages except the ones of type GENERAL and display a count. For example, in the case of the parquet file spanning multiple blocks the output will look like: Parquet files should not be split into multiple hdfs-blocks. file=hdfs://localhost:20500/fid.parq (1 of 321 similar) All messages are always logged to VLOG. In the coordinator error messages are merged across all backends to retain readability in the case of large clusters. The current version of this patch adds these new error codes to some of the most important error messages as a reference implementation. Change-Id: I1f1811631836d2dd6048035ad33f7194fb71d6b8 Reviewed-on: http://gerrit.cloudera.org:8080/39 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: Internal Jenkins	2015-03-01 03:37:32 +00:00
Nong Li	8f09a1b4ad	Fix exec summary printing to match the plan exactly. Change-Id: I92f234a6f7adf4061d82ac4a32d220af17fe152d Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4440 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-09-23 07:26:56 -07:00
Abdullah Yousufi	a80506ff3c	Refactored impala-shell This is a reorganization of the existing impala-shell. The basic idea was to split up the shell into two components: one part soley responsible for the CLI functionality, and another to represent the impala client/connection that would interact with the Beeswax api and execute queries, fetch results, etc. One major change was to redo how the existing shell handled cancellation, which was to create a thread for each rpc, so that Ctrl+C would not interrupt the system calls and break the socket connection. In the new approach, a new client instance is created to close the query and if the socket connection is broken, the client reconnects. Cancellation currently works. Change-Id: I0f371f68552c065b2317f967c6cf7483b44be3df Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3316 Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4008	2014-08-22 20:13:04 -07:00

1 2

77 Commits