The strict hs2 protocol mode is broken when fetching large results.
The FetchResults.hasMoreRows field is always returned as false. When
there are no more results, Hive returns an empty batch with no rows.
HIVE-26108 has been filed to support the hasMoreRows field.
Added a framework test that retrieves 1M rows from tpcds. The default
number of rows returned from Hive is 10K so this should be more than
enough to ensure that multiple fetches are done.
Change-Id: Ife436d91e7fe0c30bf020024e20a5d8ad89faa24
Reviewed-on: http://gerrit.cloudera.org:8080/18370
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
client
In 'hs2-http' mode, the socket timeout is None, which could cause
hang like symptoms in case of a problematic remote server.
Added support for configurable socket timeout using the new impala-shell
config option '--http_socket_timeout_s'. If a reasonable timeout is
set, impala-shell client can retry in case of connection issues, when
possible. The default value of '--http_socket_timeout_s' is set to None,
to prevent behavior changes for existing clients.
More details on socket timeout here:
https://docs.python.org/3/library/socket.html#socket-timeouts
Testing:
- Added tests for various timeout values in test_shell_commandline.py
- Ran e2e shell tests.
Change-Id: I29fa4ff96cdcf154c3aac7e43340af60d7d61e94
Reviewed-on: http://gerrit.cloudera.org:8080/18336
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
The get_summary() thrift call is not supported in strict_hs2 mode
on impala-shell. The live_progress and live_summary options are
disabled when the strict_hs2_protocol flag is set.
Change-Id: I6aee838a80b4659a13a0a0cb9eabffa2c8767c8f
Reviewed-on: http://gerrit.cloudera.org:8080/18177
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
The insert command was broken for impala-shell in the strict_hs2
mode. The return parameter for close_dml should return two parameters.
The parameters returned by close_dml are rows returned and error
rows. These are not supported by strict hs2 mode since the close
does not return the TDmlResult structure. So the message to
the end user also had to be changed.
Change-Id: Ibe837c99e54d68d1e27b97f0025e17faf0a2cb9f
Reviewed-on: http://gerrit.cloudera.org:8080/18176
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Impala-shell already uses HS2 protocol to connect to Impalad.
This commit allows impala-shell to connect to any server (for
example, Hive) using the hs2 protocol. This will be done via
the "--strict_hs2_protocol" option.
When the "--strict_hs2_protocol" option is turned on, only features
supported by hs2 will work. For instance, "runtime-profile" is an
impalad specific feature and will be disabled.
The "--strict_hs2_protocol" will only work on servers that abide
by the strict definition of what is supported by HS2. So one will
be able to connect to Hive in this mode, but connections to Impala
will not work. Any feature supported by Hive (e.g. kerberos
authentication) should work as well.
Note: While authentication should work, the test framework is not
set up to create an HS2 server that does authentication at this point
so this feature should be used with caution.
Change-Id: I674a45640a4a7b3c9a577830dbc7b16a89865a9e
Reviewed-on: http://gerrit.cloudera.org:8080/17660
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Previous patch added support for retaining cookies in impala-shell.
Some issues were found when porting the patch to impyla, like expiry
time of cookie could be none, coding not as Pythonic coding.
This patch fixs those issues for impala-shell.
Testing:
- Passed core test.
Change-Id: I65432b952929c1c96a081bb87fd4a096624d711b
Reviewed-on: http://gerrit.cloudera.org:8080/17796
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The issue was that after the impala-shell is started in a seperate
process and an error is encountered then the process lingers on
and a long running query can hold on to resources and potentially
affect other tests running on the impala cluster.
This patch just makes sure that the impala-shell process is killed
regardless of any errors encountered.
Change-Id: I9f6d22d639921051cde5675fae1845bedb61c8cc
Reviewed-on: http://gerrit.cloudera.org:8080/17768
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-10234 added support for cookie authentication for LDAP to
impala-shell. But it does not accept user input cookie name via
startup flags, and it retains only one cookie.
In some scenarios, we could use proxy to manage the sessions with
additional HTTP cookies added by proxy.
This patch made cookie support more generic for impala-shell.
It lets the user specify cookie names via a startup flag
"--http_cookie_names" and could retain more than one cookies.
Testing:
- Manualy tested the multiple cookies in HTTP headers with a
customized Impala server which could send and receive multiple
cookies.
- Passed core test, including new test cases.
Change-Id: I193422d5ec891886a522d82ecb0e9d974132ff2a
Reviewed-on: http://gerrit.cloudera.org:8080/17667
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this patch Impala mainly used Thrift 0.9.3, but it was
possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0
Thrift lib was already included in the toolchain.
Most of the changes are related to replacing boost:: with std::
shared_ptr-s in cpp code (this is a continuation of patch by Sahil).
The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as
Impala's test framework relies on Impyla. A thrift_sasl release is also
needed, because it currently pins Thrift version to 0.9.3 for Python 2.
The current patch uses alpha releases from Impyla and thrift_sasl that
use thrift 0.11.0.
Notable side effects:
- old logic to compile thrift for impala-shell with 0.11.0 was removed
- impala_shell's utf8 handling had to be updated as the new 0.11.0
compilation happens with no_utf8strings. This also made things a
bit faster, e.g the following is ~0.22s instead of ~0.25
shell/impala_shell.py \
-B -q "select * from functional_parquet.alltypes;" > /dev/null
- THRIFT-3921 changed the stream operators to print an enum's name
instead of its number, leading to slightly different messages
in some cases.
- "templates" was added to the thift generator's parameters to avoid
a compilation issue (related to IMPALA-10600). I didn't notice any
change in compilation time. This option generated .tcc files with
templetized readers/writers for Thrift types. Currently we don't
use these, but they could potentially speed up (de)serialization.
Testing:
- ran Impyla's test suite with Python 2 and 3
- ran core tests
Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Reviewed-on: http://gerrit.cloudera.org:8080/17170
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.
New flags for GCS:
- num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.
Follow-up:
- Support for spilling to GCS will be addressed in IMPALA-10561.
- Support for caching GCS file handles will be addressed in
IMPALA-10568.
- test_concurrent_inserts and test_failing_inserts in
test_acid_stress.py are skipped due to slow file listing on
GCS (IMPALA-10562).
- Some tests are skipped due to issues introduced by /etc/hosts setting
on GCE instances (IMPALA-10563).
Tests:
- Compile and create hdfs test data on a GCE instance. Upload test data
to a GCS bucket. Modify all locations in HMS DB to point to the GCS
bucket. Remove some hdfs caching params. Run CORE tests.
- Compile and load snapshot data to a GCS bucket. Run CORE tests.
Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Reviewed-on: http://gerrit.cloudera.org:8080/17121
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In Python2, print() converts all non-keyword arguments to strings like
str() does and writes them to the stream. str() on QueryStateException
returns its value(i.e. error message) which could be in unicode type.
Python2 will implicitly encode it to str type using the default
encoding, 'ascii'. This could result in UnicodeEncodeError when there
are non-ascii characters in the error message.
This patch explicitly encodes the error message using 'utf-8' encoding
if it's in unicode type and the shell is run in Python2.
Tests:
- Add test in test_shell_interactive.py
Change-Id: Ie10f5b03ecc5877053c2fbada1afaf256b423a71
Reviewed-on: http://gerrit.cloudera.org:8080/17099
Reviewed-by: Tamas Mate <tmate@cloudera.com>
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Result spooling has been relatively stable since it was introduced, and
it has several benefits described in IMPALA-8656. This patch enable
result spooling (SPOOL_QUERY_RESULTS) query options by default.
Furthermore, some tests need to be adjusted to account for result
spooling by default. The following are the adjustment categories and
list of tests that fall under such category.
Change in assertions:
PlannerTest#testAcidTableScans
PlannerTest#testBloomFilterAssignment
PlannerTest#testConstantFolding
PlannerTest#testFkPkJoinDetection
PlannerTest#testFkPkJoinDetectionWithHDFSNumRowsEstDisabled
PlannerTest#testKuduSelectivity
PlannerTest#testMaxRowSize
PlannerTest#testMinMaxRuntimeFilters
PlannerTest#testMinMaxRuntimeFiltersWithHDFSNumRowsEstDisabled
PlannerTest#testMtDopValidation
PlannerTest#testParquetFiltering
PlannerTest#testParquetFilteringDisabled
PlannerTest#testPartitionPruning
PlannerTest#testPreaggBytesLimit
PlannerTest#testResourceRequirements
PlannerTest#testRuntimeFilterQueryOptions
PlannerTest#testSortExprMaterialization
PlannerTest#testSpillableBufferSizing
PlannerTest#testTableSample
PlannerTest#testTpch
PlannerTest#testKuduTpch
PlannerTest#testTpchNested
PlannerTest#testUnion
TpcdsPlannerTest
custom_cluster/test_admission_controller.py::TestAdmissionController::test_dedicated_coordinator_planner_estimates
custom_cluster/test_admission_controller.py::TestAdmissionController::test_memory_rejection
custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_mem_limit_configs
metadata/test_explain.py::TestExplain::test_explain_level2
metadata/test_explain.py::TestExplain::test_explain_level3
metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation
Increase BUFFER_POOL_LIMIT:
query_test/test_queries.py::TestQueries::test_analytic_fns
query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filter_reservation
query_test/test_sort.py::TestQueryFullSort::test_multiple_mem_limits_full_output
query_test/test_spilling.py::TestSpillingBroadcastJoins::test_spilling_broadcast_joins
query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_aggs
query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive
query_test/test_udfs.py::TestUdfExecution::test_mem_limits
Increase MEM_LIMIT:
query_test/test_mem_usage_scaling.py::TestExchangeMemUsage::test_exchange_mem_usage_scaling
query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling
Increase MAX_ROW_SIZE:
custom_cluster/test_parquet_max_page_header.py::TestParquetMaxPageHeader::test_large_page_header_config
query_test/test_insert.py::TestInsertQueries::test_insert_large_string
query_test/test_query_mem_limit.py::TestQueryMemLimit::test_mem_limit
query_test/test_scanners.py::TestTextSplitDelimiters::test_text_split_across_buffers_delimiter
query_test/test_scanners.py::TestWideRow::test_wide_row
Disable result spooling to maintain assertion:
custom_cluster/test_admission_controller.py::TestAdmissionController::test_set_request_pool
custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_host_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_pool_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_queue_reasons_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_config_change_while_queued
custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_fetched_rows
custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_finished_query
custom_cluster/test_scratch_disk.py::TestScratchDir::test_no_dirs
custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_existing_dirs
custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_writable_dirs
query_test/test_insert.py::TestInsertQueries::test_insert_large_string (the last query only)
query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan
query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_kudu_scan_mem_usage
query_test/test_queries.py::TestQueriesParquetTables::test_very_large_strings
query_test/test_query_mem_limit.py::TestCodegenMemLimit::test_codegen_mem_limit
shell/test_shell_client.py::TestShellClient::test_fetch_size
Testing:
- Pass exhaustive tests.
Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4
Reviewed-on: http://gerrit.cloudera.org:8080/16755
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala shell outputs a batch of rows using OutputStream. Inside
OutputStream, output to a file is handled slightly differently from
output that is written to stdout. When writing to stdout we use print()
(which appends a newline) while when writing to a file we use write()
(which adds nothing). This difference was introduced in IMPALA-3343 so
this bug may be a regression introduced then. To ensure that output is
the same in either case we need to add a newline after writing each
batch of rows to a file.
TESTING:
Added a new test for this case.
Change-Id: I078a06c54e0834bc1f898626afbfff4ded579fa9
Reviewed-on: http://gerrit.cloudera.org:8080/16966
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
To make impala-shell compatible for Python3, we explicitly distinguish
bytes and text in Python2 by decoding the bytes for all inputs.
Regression 1: multiple queries in one line with unicode chars will break
In precmd() of impala-shell, if there are multiple queries present in
one input line, we split it into individual queries (by
sqlparse.split()) and append them back to the 'cmdqueue'. They will be
passed to precmd() again. In our Python2 implementation, precmd()
expects them to be str type, and will decode them into unicode type.
However, the output type of sqlparse.split() is unicode which doesn't
have a decode() method. Calling decode() on a unicode var will let
Python2 implicitly encode it to str. This may cause UnicodeEncodeError
since implicitly encoding use 'ascii'.
Regression 2: multi-line query with unicode chars will break when
command history is enabled
In _check_for_command_completion(), when calling
readline.replace_history_item in Python2. We encode the completed_cmd
into bytes. However, we shouldn't replace it since the return type is
expected to be unicode.
Tests:
- Add tests for these two regressions in Python2.
Change-Id: Icc4a8d31311a5c59e5fc0e65fe09f770df41bea4
Reviewed-on: http://gerrit.cloudera.org:8080/16960
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
A username can be determined for a session via two mechanisms:
* In a secure env, the user is authenticated by LDAP or Kerberos
* In an unsecure env, the client specifies the user name, either
as a parameter to the OpenSession API (HS2) or as a parameter
to the first query run (beeswax)
This patch affects what happens if neither of the above mechanisms
is used. Previously we would end up with the username being an
empty string, but this makes Ranger unhappy. Hive uses the name
"anonymous" in this situation, so we change Impala's behaviour too.
This is configurable by -anonymous_user_name. -anonymous_user_name=
reverts to the old behaviour.
Test
* Add an end-to-end test that exercises this via impala-shell for
HS2, HS2-HTTP and beeswax protocols.
* Tweak a couple of existing tests that depended on the previous
behavior.
Change-Id: I6db491231fa22484aed476062b8fe4c8f69130b0
Reviewed-on: http://gerrit.cloudera.org:8080/16902
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change adds additional synchronisation to fix the flaky test. The
test failures were happening because the test did not wait for the
output of the SIGINT (^C) to arrive. When this was delayed it cluttered
the impala-shell output and other expect calls could fail.
Testing:
- executed the test locally 250 times without failures, without this
fix there were about 3 failures in a 100 execution
Change-Id: Ief384ce59f3ce24f1ab2dfb5fbaf7c9a39b434e0
Reviewed-on: http://gerrit.cloudera.org:8080/16847
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In some branches that impala-shell still uses older version of thrift,
e.g. thrift-0.9.3-p8, test_utf8_decoding_error_handling will fail since
the internal string representation of thrift versions lower than 0.10.0
is still bytes. Strings won't be decoded to unicodes so there won't be
any decoding errors. The test expects some bytes that can't be decoded
correctly be replaced with U+FFFD so fails.
This patch improve the test by also expecting results from older thrift
versions. So it can be cherry-picked to older branches.
Tests:
- Verify the test in master branch and a downstream branch that still
uses thrift-0.9.3-p8 in impala-shell.
Change-Id: Ieb0baa9b3a1480673af77f7cc35c05eacf4b449f
Reviewed-on: http://gerrit.cloudera.org:8080/16767
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This test for IMPALA-897 is testing that queries run by Impala Shell
from a script file are closed correctly. This is tested by an assertion
that there is one in-flight query during execution of a script
containing several queries. The test then closes the shell and checks
that there are no in-flight queries. This is the assertion which failed.
Change this assertion to instead wait for the number of in-flight
queries to be zero. This avoids whatever race was causing the flakiness.
Change-Id: Ib0485097c34282523ed0df6faa143fee6f74676d
Reviewed-on: http://gerrit.cloudera.org:8080/16743
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-8584 added support for cookie authentication to Impala.
This change adds cookie authentication support to impala-shell
as well when using 'hs2-http' protocol.
Testing:
- Unit tests were added to test cookie handling methods.
- Tested e2e manually with nginx HTTP proxy.
TODO:
- Test with Knox HTTP proxy as well.
Change-Id: Icb0bc6e0f58f236866ca9913a2e63d97d5148f51
Reviewed-on: http://gerrit.cloudera.org:8080/16660
Reviewed-by: Attila Jeges <attilaj@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently, impala-shell depends on thrift-0.11.0-p2, while impala
servers depend on thrift-0.9.3-p8. After 0.10.0, thrift changes its
internal strings representation from bytes to unicode (THRIFT-3503) to
support Python3. THRIFT-2087 and THRIFT-5303 are two patches for
specifying an error handling method in decoding utf-8 strings in thrift.
Without them, impala-shell may get an unexpected UnicodeDecodeError when
decoding thrift objects from impala servers. This patch bumps
impala-shell's thrift version to 0.11.0-p4 to include these two patches.
Tests:
- This is a regression after we bump impala-shell's thrift version to
0.11. Added a test to avoid the regression in the future.
Change-Id: I0f9898639b5648658efc2d3c5c0ee4721fb85776
Reviewed-on: http://gerrit.cloudera.org:8080/16700
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The legacy Thrift based Impala internal service has been removed so
the backend port 22000 can be freed up.
This patch set flag be_port as a REMOVED_FLAG and all infrastructures
around it are cleaned up. StatestoreSubscriber::subscriber_id is set
as hostname + krpc_port.
Testing:
- Passed the exhaustive test.
Change-Id: Ic6909a8da449b4d25ee98037b3eb459af4850dc6
Reviewed-on: http://gerrit.cloudera.org:8080/16533
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Recent testing showed that the pytests are not
respecting the log level and format set in
conftest.py's configure_logging(). It is using
the default log level of WARNING and the
default formatter.
The issue is that logging.basicConfig() is only
effective the first time it is called. The code
in lib/python/impala_py_lib/helpers.py does a
call to logging.basicConfig() at the global
level, and conftest.py imports that file. This
renders the call in configure_logging()
ineffective.
To avoid this type of confusion, logging.basicConfig()
should only be called from the main() functions for
libraries. This removes the call in lib/python/impala_py_lib
(as it is not needed for a library without a main function).
It also fixes up various other locations to move the
logging.basicConfig() call to the main() function.
Testing:
- Ran the end to end tests and custom cluster tests
- Confirmed the logging format
- Added an assert in configure_logging() to test that
the INFO log level is applied to the root logger.
Change-Id: I5d91b7f910b3606c50bcba4579179a0bc8c20588
Reviewed-on: http://gerrit.cloudera.org:8080/16679
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When the --quiet flag is used with impala-shell, the intention is that
if the query is successful then only the query results should be
printed.
This patch fixes two cases where --quiet was not being respected:
- When using the HTTP transport and --client_connect_timeout_ms is
set, a warning is printed that the timeout is not applied.
- When running in non-interactive mode, a warning is printed that
--live_progress is automatically disabled. This warning is now also
only printed if --live_progress is actually set.
Testing:
- Added a test that runs a simple query with --quiet and confirms the
output is as expected.
Change-Id: I1e94c9445ffba159725bacd6f6bc36f7c91b88fe
Reviewed-on: http://gerrit.cloudera.org:8080/16673
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When a query contains WITH clause impala-shell tries to identify whether
it is a DML query or not, so that later it can provide appropriate
result messages. Earlier shlex was used to create tokens and assess the
query type based on that. However shlex can misinterpret some query
strings where whitespace charachters are mixed with quotes, because it
splits the string based on whitespace charachters. In some scenarios
'ValueError: No closing quotation' error can occur.
This change moves the tokenization from shlex to sqlparse.
Testing:
- Added unit test to cover queries that contain mixed whitespaces
and strings
Change-Id: I442d3bc65b90a55c73c847948d5179a8586d71ad
Reviewed-on: http://gerrit.cloudera.org:8080/16389
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently, the impala-shell 'profile' command only returns the profile
for the most recent profile attempt. There is no way to get the original
query profile (the profile of the first query attempt that failed) from
the impala-shell.
This patch modifies TGetRuntimeProfileReq and TGetRuntimeProfileResp to
add support for returning both the original and retried profiles for a
retried query. When a query is retried, TGetRuntimeProfileResp currently
contains the profile for the most recent query attempt.
TGetRuntimeProfileReq has a new field called 'include_query_attempts'
and when it is set to true, the TGetRuntimeProfileResp will include all
failed profiles in a new field called failed_profiles /
failed_thrift_profiles.
impala-shell has been modified so the 'profile' command has a new set of
options. The syntax is now:
PROFILE [ALL | LATEST | ORIGINAL]
If 'ALL' is specified, both the latest and original profiles are
printed. If 'LATEST' is specified, only the latest profile is printed.
If 'ORIGINAL' is printed, only the original profile is printed. The
default behavior is equivalent to specifying 'LATEST' (which is the
current behavior before this patch as well).
Support for this has only been added to HS2 given that Beeswax is being
deprecated soon. The new 'profile' options have no affect when the
Beeswax protocol is used.
Most of the code change is in impala-hs2-server and impala-server; a lot
of the GetRuntimeProfile code has been re-factored.
Testing:
* Added new impala-shell tests
* Ran core tests
Change-Id: I89cee02947b311e7bf9c7274f47dfc7214c1bb65
Reviewed-on: http://gerrit.cloudera.org:8080/16406
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The flaky test was
TestImpalaShellInteractive.test_history_does_not_duplicate_on_interrupt
The test failed with timeout error when the interrupt signal arrived
later after the next test query was started. The impala-shell output was
^C instead of the expected query result.
This change adds an additional blocking expect call to wait for the
interrupt signal to arrive before sending in the next query.
Change-Id: I242eb47cc8093c4566de206f46b75b3feab1183c
Reviewed-on: http://gerrit.cloudera.org:8080/16391
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Make Impala Shell closer to Impyla by printing the body of any http
error code message received when using hs2-over-http. The common case is
that there is nothing in the body, in which case the behavior is
unchanged.
TESTING
Added a test for the new functionality.
Ran all end-to-end tests.
Change-Id: Iabc45eda0b87ca694b8359148cda6a7c1d5a8fff
Reviewed-on: http://gerrit.cloudera.org:8080/16269
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The Impala shell stops fetching rows if it receives a batch that
contains 0 rows. This is incorrect because a batch with 0 rows can be
returned if the fetch request hits a timeout. Instead, the shell should
rely on the value of has_rows / hasMoreRows to determine when to stop
issuing fetch requests.
Tests:
* Added a regression test to test_shell_commandline.py
* Ran all shell tests
Change-Id: I5f8527aea9e433f8cf426435c0ba41355bbf9d88
Reviewed-on: http://gerrit.cloudera.org:8080/16222
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Many http servers will not accept an http request that has multiple
copies of the "Host" header. A recent toolchain change patches
Thrift so that will not send the extraneous header (in THttpClient).
This change tests that the duplicate headers are not sent,
TESTING:
Ran all end-to-end tests.
Rewrote an existing Shell test to check that only one "Host" header
is sent.
Change-Id: I82996015d0205923e854dac8bb88604778684c46
Reviewed-on: http://gerrit.cloudera.org:8080/15752
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds the option --fetch_size to the Impala shell. This new option allows
users to specify the fetch size used when issuing fetch RPCs to the
Impala Coordinator (e.g. TFetchResultsReq and BeeswaxService.fetch).
This parameter applies for all client protocols: beeswax, hs2, hs2-http.
The default --fetch_size is set to 10240 (10x the default batch size).
The new --fetch_size parameter is most effective when result spooling is
enabled. When result spooling is disabled, Impala can only return a
single row batch per fetch RPC (so 1024 rows by default). When result
spooling is enabled, Impala can return up to 100 row batches per fetch
request.
Removes some logic in the the impala_client.py file that attempts to
simulate a fetch_size. The code would issue multiple fetch requests to
fullfill the given fetch_size. This logic is no longer needed now that
result spooling is available.
Testing:
* Ran core tests
* Added new tests in test_shell_client.py and test_shell_commandline.py
Change-Id: I8dc7962aada6b38795241d067a99bd94fabca57b
Reviewed-on: http://gerrit.cloudera.org:8080/16041
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Sahil Takiar <stakiar@cloudera.com>
This removes LD_LIBRARY_PATH and LD_PRELOAD from the
developer's shell and cleans it up. With the preceding
change, toolchain utilities like clang can be run without
a special LD_LIBRARY_PATH.
This fixes a bug where libjvm.so was registered as a
static instead of a shared library, which adds it to the
RUNPATH variable in the binary, which provides a default
search location that can be overriden by LD_LIBRARY_PATH.
Impala binaries don't have the rpath baked in for some
libraries, including Impala-lzo, libgcc and libstdc++.
, so we still need to set LD_LIBRARY_PATH when running
those. That is solved with wrapper scripts that sets
the environment variables only when invoking those
binaries, e.g. starting a daemon or running a backend
test. I added three scripts because there were 3 sets
of environment variables. The scripts are:
* run-binary.sh: just sets LD_LIBRARY_PATH
* run-jvm-binary.sh: sets LD_LIBRARY_PATH and CLASSPATH
* start-daemon.sh: sets LD_LIBRARY_PATH and CLASSPATH and
kerberos-related environment variables.
The binaries, in almost all cases, work fine without
those tweaks, because libstdc++ and libgcc are picked
up along with libkuduclient.so from the toolchain (they
are in the same directory). I decided to leave good enough
alone here. run-binary.sh and friends can be used in
any remaining edge cases to run binaries.
An alternative to the 3 scripts would be to have an
uber-script that set all the variables, but I felt
that it was better to be specific about what
each binary needed. Cleaning the LD_LIBRARY_PATH
mess up has given me a distaste for scattershot
setting of environment variables. I am open to
revisiting this.
Testing:
* Ran tests on centos 7
* Manually tested that my dev env with
LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu continued
to work (for now). All ubuntu 16.04 and 18.04 dev
envs that were set up with bootstrap_development.sh
will be in this state.
Change-Id: I61c83e6cca6debb87a12135e58ee501244bc9603
Reviewed-on: http://gerrit.cloudera.org:8080/14494
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change adds a new condition to avoid re-reading the impala-shell
history when the cmdloop is broken. The loop can break due to exceptions
such as KeyboardInterrupt.
Testing:
- The change was tested manually on local dev env
- Added a new EE shell test to verify the history after SIGINT
Change-Id: If4faf46134f44d91e56748642f47d448707db53c
Reviewed-on: http://gerrit.cloudera.org:8080/15345
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is the main patch for making the the impala-shell cross-compatible with
python 2 and python 3. The goal is wind up with a version of the shell that will
pass python e2e tests irrepsective of the version of python used to launch the
shell, under the assumption that the test framework itself will continue to run
with python 2.7.x for the time being.
Notable changes for reviewers to consider:
- With regard to validating the patch, my assumption is that simply passing
the existing set of e2e shell tests is sufficient to confirm that the shell
is functioning properly. No new tests were added.
- A new pytest command line option was added in conftest.py to enable a user
to specify a path to an alternate impala-shell executable to test. It's
possible to use this to point to an instance of the impala-shell that was
installed as a standalone python package in a separate virtualenv.
Example usage:
USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py
The target virtualenv may be based on either python3 or python2. However,
this has no effect on the version of python used to run the test framework,
which remains tied to python 2.7.x for the foreseeable future.
- The $IMPALA_HOME/bin/impala-shell.sh now sets up the impala-shell python
environment independenty from bin/set-pythonpath.sh. The default version
of thrift is thrift-0.11.0 (See IMPALA-9489).
- The wording of the header changed a bit to include the python version
used to run the shell.
Starting Impala Shell with no authentication using Python 3.7.5
Opened TCP connection to localhost:21000
...
OR
Starting Impala Shell with LDAP-based authentication using Python 2.7.12
Opened TCP connection to localhost:21000
...
- By far, the biggest hassle has been juggling str versus unicode versus
bytes data types. Python 2.x was fairly loose and inconsistent in
how it dealt with strings. As a quick demo of what I mean:
Python 2.7.12 (default, Nov 12 2018, 14:36:49)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> d = 'like a duck'
>>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
True
...and yet there are weird unexpected gotchas.
>>> d.decode('utf-8') == d.encode('utf-8')
True
>>> d.encode('utf-8') == bytearray(d, 'utf-8')
True
>>> d.decode('utf-8') == bytearray(d, 'utf-8') # fails the eq property?
False
As a result, this was inconsistency was reflected in the way we handled
strings in the impala-shell code, but things still just worked.
In python3, there's a much clearer distinction between strings and bytes, and
as such, much tighter type consistency is expected by standard libs like
subprocess, re, sqlparse, prettytable, etc., which are used throughout the
shell. Even simple calls that worked in python 2.x:
>>> import re
>>> re.findall('foo', b'foobar')
['foo']
...can throw exceptions in python 3.x:
>>> import re
>>> re.findall('foo', b'foobar')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
return _compile(pattern, flags).findall(string)
TypeError: cannot use a string pattern on a bytes-like object
Exceptions like this resulted in a many, if not most shell tests failing
under python 3.
What ultimately seemed like a better approach was to try to weed out as many
existing spurious str.encode() and str.decode() calls as I could, and try to
implement what is has colloquially been called a "unicode sandwich" -- namely,
"bytes on the outside, unicode on the inside, encode/decode at the edges."
The primary spot in the shell where we call decode() now is when sanitising
input...
args = self.sanitise_input(args.decode('utf-8'))
...and also whenever a library like re required it. Similarly, str.encode()
is primarily used where a library like readline or csv requires is.
- PYTHONIOENCODING needs to be set to utf-8 to override the default setting for
python 2. Without this, piping or redirecting stdout results in unicode errors.
- from __future__ import unicode_literals was added throughout
Testing:
To test the changes, I ran the e2e shell tests the way we always do (against
the normal build tarball), and then I set up a python 3 virtual env with the
shell installed as a package, and manually ran the tests against that.
No effort has been made at this point to come up with a way to integrate
testing of the shell in a python3 environment into our automated test
processes.
Change-Id: Idb004d352fe230a890a6b6356496ba76c2fab615
Reviewed-on: http://gerrit.cloudera.org:8080/15524
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Upgrades the impala-shell's bundled version of sqlparse to 0.3.1.
There were some API changes in 0.2.0+ that required a re-write of
the StripLeadingCommentFilter in impala_shell.py. A slight perf
optimization was also added to avoid using the filter altogether
if no leading comment is readily discernible.
As 0.1.19 was the last version of sqlparse to support python 2.6,
this patch also breaks Impala's compatibility with python 2.6.
No new tests were added, but all existing tests passed without
modification.
Change-Id: I77a1fd5ae311634a18ee04b8c389d8a3f3a6e001
Reviewed-on: http://gerrit.cloudera.org:8080/15642
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The 'Expect: 100-continue' http header allows http clients to send
only the headers for their request, get a confirmation back from the
server that the headers are valid, and only then send the body of the
request, avoiding the overhead of sending large requests that will
ultimately fail.
This patch adds support for this in the HS2 HTTP server by having
THttpServer look for the header, and if it's present and the request
is validated returning a '100 Continue' response before reading the
body of the request.
It also adds supports for using this header on large requests sent by
impala-shell.
Testing:
- This case is covered by the existing test_large_sql, however that
test was previously broken and passing spuriously. This patch fixes
the test.
- Passed all other shell tests.
Change-Id: I4153968551acd58b25c7923c2ebf75ee29a7e76b
Reviewed-on: http://gerrit.cloudera.org:8080/15284
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
In order to improve usability, this patch makes Impala shell show query
processing status while the query is running. The patch enables shell
option live_progress by default when a user launches impala shell in
the interactive mode. The patch also adds a new command line flag
"--disable_live_progress", which allows a user to disable live_progress
at runtime. In the interactive mode, a user can disable live_progress
by either using the command line flag or setting the option as False in
the config file. As for in the non-interactive mode (when the -q or -f
options are used), live reporting is not supported. Impala-shell will
disable live_progress if the mode is detected.
Testing:
- Added and updated tests in test_shell_interactive.py and test_shell_commandline.py
- Successfully ran all shell related tests
Change-Id: I3765b775f663fa227e59728acffe4d5ea9a5e2d3
Reviewed-on: http://gerrit.cloudera.org:8080/15219
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Modified the '_signal_handler()' in impala-shell.py so when a user
cancels a multiline query by hitting CTRL+C it will cancel the
query, instead of just the current line.
Testing:
-Added 'test_cancellation_mid_command()' to test_shell_interactive.py
to test if it really cancels the partial commands.
-Manually tested by giving partial commands then cancelling them.
Change-Id: Id8d8bdaee929e2655eb66e886ae92a02d3fbd83f
Reviewed-on: http://gerrit.cloudera.org:8080/15233
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
on CentOS6/Python 2.6
ImpalaShell.test_config_file failed in negative test case, which
ran impala shell with bad format config file - wrong option name and
wrong option value. The testing code expect impala shell return both
warning and error messages. But on CentOS6/Python 2.6, Impala shell
only return error message. To fix it, separate the test cases as two
test cases by running Impala shell in two different config file.
Testing:
- Passed all test cases in test_shell_commandline.py and
test_shell_interactive.py.
- Passed all core test in pre-review-test.
- Passed EE tests in impala-private-parameterized with CentOS6.
Change-Id: Ief5e825aa3baead5519132d47efcf0d5300860fd
Reviewed-on: http://gerrit.cloudera.org:8080/15139
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for live_summary and live_progress in impalarc.
Testing:
1) Added unit-test cases in test_shell_commandline.py and
test_shell_interactive.py for live_summary and live_progress.
2) Successfully ran all other tests in test_shell_interactive.py and
test_shell_commandline.py
Change-Id: If4549b775a7966ad89d661d0349cc78754e13a86
Reviewed-on: http://gerrit.cloudera.org:8080/14927
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this change Impala Shell is not checking HTTP return codes when
using the hs2-http protocol. The shell is sending a request message
(e.g. send_CloseOperation) but the HTTP call to send this message may
fail. This will result in a failure when reading the reply (e.g. in
recv_CloseOperation) as there is no reply data to read. This will
typically result in an 'EOFError'.
In code that overrides THttpClient.flush(), check the HTTP code that is
returned after the HTTP call is made. If the code is not 1XX
(informational response) or 2XX (successful) then throw an RPCException.
This change does not contain any attempt to recover from an HTTP failures
but it does allow the failure to be detected and a message to be
printed.
In future it may be possible to retry after certain HTTP errors.
Testing:
- Add a new test for impala-shell that tries to connect to an HTTP
server that always returns a 503 error. Check that an appropriate
error message is printed.
Change-Id: I3c105f4b8237b87695324d759ffff81821c08c43
Reviewed-on: http://gerrit.cloudera.org:8080/14924
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When mt_dop > 0, the summary is reporting the number of fragment
instances, instead of the number of hosts as the header would
imply.
This commit fixes the issue so the number of hosts will be shown
under the #Hosts column. The commit also adds an #Inst column
where the number of instances are shown (current behaviour).
Tests:
* Changed profile tests with mt_dop > 0.
* Updated benchmark tests and shell tests accordingly.
Change-Id: I3bdf9a06d9bd842b2397cd16c28294b6bec7af69
Reviewed-on: http://gerrit.cloudera.org:8080/14715
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When the impala-shell is disconnected, it will try to reconnect
for any command that a user runs (as part of ImpalaShell's precmd()).
This doesn't make sense when the user is trying to quit the shell
(i.e. by typing 'quit' or 'exit' or hitting Ctrl-D).
This skips the attempt to reconnect when quitting the shell.
Testing:
- Added test in test_shell_interactive.py
- Verified by hand
Change-Id: I6a76bc515db609498fa8772e9f0b0c547b82c09e
Reviewed-on: http://gerrit.cloudera.org:8080/14391
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds impala-shell support to connect to HiveServer2 HTTP endpoint.
Relies on toolchain change at https://gerrit.cloudera.org/#/c/13725/.
Use --protocol='hs2-http' to enable this behavior.
Example usages:
---------------
impala-shell --protocol='hs2-http' (No auth)
impala-shell --protocol='hs2-http' --ldap -u..... (PLAIN auth)
impala-shell --protocol-'hs2-http' --ssl --ca_cert... (TLS)
impala-shell --protocol='hs2-http' --ldap --ssl --ca_cert... (LDAP +
TLS)
Limitations:
-----------
- Does not support Kerberos (-k) due to lack ot SPNEGO support.
Testing:
--------
- Parameterized existing shell tests to support this combination.
- Added shell test coverage for LDAP auth.
Change-Id: I8323950857dfe1c1dfd5377fde79f87bc2ce9534
Reviewed-on: http://gerrit.cloudera.org:8080/13746
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Apparently IMPALA_REMOTE_URL is not generally used for remote cluster
tests: only --testing_remote_cluster is reliably set. Fix the
is_remote_cluster() implementation to take into account
REMOTE_DATA_LOAD and --testing_remote_cluster in addition to
IMPALA_REMOTE_URL. Consistently use is_remote_cluster() in
other tests instead of checking the pytest flag directly.
There were a few lifecycle headaches with how
ImpalaTestClusterProperties is used:
* common.environ is imported from conftest, which means that
the top-level code in the file runs *before* pytest
command-line arguments have been registered and parsed.
* ImpalaTestClusterProperties is used by various code,
like build_flavor_timeout(), which runs before pytest
command-line arguments have been parsed.
* ImpalaTestClusterProperties is called from non-pytest
scripts like start-impala-cluster.py, so the command-line
arguments are not available.
I dealt with the above challenges by making a few changes
to do the detection later:
* Lazily initializing a singleton ImpalaTestClusterProperties.
This was not strictly necessary but makes the whole problem
less sensitive to import order and module dependencies.
* Adding cluster_properties fixture to make ImpalaTestClusterProperties
available in tests without additional boilerplate.
* Removing the caching of the local/remote build calculation.
ImpalaTestClusterProperties is instantiated outside of python
tests, but is_remote_cluster() is only called from python tests,
so if we check flags in is_remote_cluster() we'll get the
right results reliably.
As a workaround to unblock remote tests, also assume catalog_v1 if
accessing the web UI fails.
Testing:
Ran core tests against a regular minicluster.
Ran tests against a remote cluster
Change-Id: Ifa6b2a1391f53121d3d7c00c5cf0a57590899ce4
Reviewed-on: http://gerrit.cloudera.org:8080/13386
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>