76 Commits

Author SHA1 Message Date
Riza Suminto
28cff4022d IMPALA-14333: Run impala-py.test using Python3
Running exhaustive tests with env var IMPALA_USE_PYTHON3_TESTS=true
reveals some tests that require adjustment. This patch made such
adjustment, which mostly revolves around encoding differences and string
vs bytes type in Python3. This patch also switch the default to run
pytest with Python3 by setting IMPALA_USE_PYTHON3_TESTS=true. The
following are the details:

Change hash() function in conftest.py to crc32() to produce
deterministic hash. Hash randomization is enabled by default since
Python 3.3 (see
https://docs.python.org/3/reference/datamodel.html#object.__hash__).
This cause test sharding (like --shard_tests=1/2) produce inconsistent
set of tests per shard. Always restart minicluster during custom cluster
tests if --shard_tests argument is set, because test order may change
and affect test correctness, depending on whether running on fresh
minicluster or not.

Moved one test case from delimited-latin-text.test to
test_delimited_text.py for easier binary comparison.

Add bytes_to_str() as a utility function to decode bytes in Python3.
This is often needed when inspecting the return value of
subprocess.check_output() as a string.

Implement DataTypeMetaclass.__lt__ to substitute
DataTypeMetaclass.__cmp__ that is ignored in Python3 (see
https://peps.python.org/pep-0207/).

Fix WEB_CERT_ERR difference in test_ipv6.py.

Fix trivial integer parsing in test_restart_services.py.

Fix various encoding issues in test_saml2_sso.py,
test_shell_commandline.py, and test_shell_interactive.py.

Change timeout in Impala.for_each_impalad() from sys.maxsize to 2^31-1.

Switch to binary comparison in test_iceberg.py where needed.

Specify text mode when calling tempfile.NamedTemporaryFile().

Simplify create_impala_shell_executable_dimension to skip testing dev
and python2 impala-shell when IMPALA_USE_PYTHON3_TESTS=true. The reason
is that several UTF-8 related tests in test_shell_commandline.py break
in Python3 pytest + Python2 impala-shell combo. This skipping already
happen automatically in build OS without system Python2 available like
RHEL9 (IMPALA_SYSTEM_PYTHON2 env var is empty).

Removed unused vector argument and fixed some trivial flake8 issues.

Several test logic require modification due to intermittent issue in
Python3 pytest. These include:

Add _run_query_with_client() in test_ranger.py to allow reusing a single
Impala client for running several queries. Ensure clients are closed
when the test is done. Mark several tests in test_ranger.py with

SkipIfFS.hive because they run queries through beeline + HiveServer2,
but Ozone and S3 build environment does not start HiveServer2 by
default.

Increase the sleep period from 0.1 to 0.5 seconds per iteration in
test_statestore.py and mark TestStatestore to execute serially. This is
because TServer appears to shut down more slowly when run concurrently
with other tests. Handle the deprecation of Thread.setDaemon() as well.

Always force_restart=True each test method in TestLoggingCore,
TestShellInteractiveReconnect, and TestQueryRetries to prevent them from
reusing minicluster from previous test method. Some of these tests
destruct minicluster (kill impalad) and will produce minidump if metrics
verifier for next tests fail to detect healthy minicluster state.

Testing:
Pass exhaustive tests with IMPALA_USE_PYTHON3_TESTS=true.

Change-Id: I401a93b6cc7bcd17f41d24e7a310e0c882a550d4
Reviewed-on: http://gerrit.cloudera.org:8080/23319
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-03 10:01:29 +00:00
stiga-huang
73de6517a4 IMPALA-14280: Deflake catalogd HA failover tests
Several tests on catalogd HA failover have a loop of the following
pattern:
 - Do some operations
 - Kills the active catalogd
 - Verifies some results
 - Starts the killed catalogd
After starting the killed catalogd, the test gets the new active and
standby catalogds and check their /healthz pages immediately. This could
fail if the web pages are not registered yet. The cause is when starting
catalogd, we just wait for its 'statestore-subscriber.connected' to be
True. This doesn't guarantee that the web pages are initialized. This
patch adds a wait for this, i.e. when getting the web pages hits 404
(Not Found) error, wait and retry.

Another flaky issue of these failover tests is cleanup unique_database
could fail due to impalad still using the old active catalogd address
even in RPC failure retries (IMPALA-14228). This patch adds a retry on
the DROP DATABASE statement to work around this.

Sets disable_log_buffering to True so the killed catalogd has complete
logs.

Sets catalog_client_connection_num_retries to 2 to save time in
coordinator retrying RPCs to the killed catalogd. This reduce the
duration of test_warmed_up_metadata_failover_catchup from 100s to 50s.

Tests:
 - Ran all (15) failover tests in test_catalogd_ha.py 10 times (each
   round takes 450s).

Change-Id: Iad42a55ed7c357ed98d85c69e16ff705a8cae89d
Reviewed-on: http://gerrit.cloudera.org:8080/23235
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Quanlong Huang <huangquanlong@gmail.com>
2025-08-04 09:12:30 +00:00
Riza Suminto
f28a32fbc3 IMPALA-13916: Change BaseTestSuite.default_test_protocol to HS2
This is the final patch to move all Impala e2e and custom cluster tests
to use HS2 protocol by default. Only beeswax-specific test remains
testing against beeswax protocol by default. We can remove them once
Impala officially remove beeswax support.

HS2 error message formatting in impala-hs2-server.cc is adjusted a bit
to match with formatting in impala-beeswax-server.cc.

Move TestWebPageAndCloseSession from webserver/test_web_pages.py to
custom_cluster/test_web_pages.py to disable glog log buffering.

Testing:
- Pass exhaustive tests, except for some known and unrelated flaky
  tests.

Change-Id: I42e9ceccbba1e6853f37e68f106265d163ccae28
Reviewed-on: http://gerrit.cloudera.org:8080/22845
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Jason Fehr <jfehr@cloudera.com>
2025-05-20 14:32:10 +00:00
Riza Suminto
6f2ac8a406 IMPALA-13822: Add more detail in impala_connection.py logs
This patch make impala_connection.py to use the same log format as
declared in conftest.py. Connection specific logs will have the protocol
name printed. Modified set_configuration() and
set_configuration_option() to make option related logging more concise.

Moved LOG_FORMAT from conftest.py to patterns.py for reuse in
impala_connection.py.

Testing:
- Run TestExprLimits locally and confirm that the log lines printed at
  logs/ee_tests/results/TEST-impala-parallel.xml is OK.

Change-Id: I44ea7fbec15684ac5379703f781a400b4f17da8d
Reviewed-on: http://gerrit.cloudera.org:8080/22577
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2025-03-07 12:28:58 +00:00
Riza Suminto
9cb9bae84e IMPALA-13758: Use context manager in ImpalaTestSuite.change_database
ImpalaTestSuite.change_database is responsible to point impala client to
database under test. However, it left client pointing to that database
after the test without reverting them back to default database. This
patch does the reversal by changing ImpalaTestSuite.change_database to
use context manager.

This patch change the behavior of execute_query_using_client() and
execute_query_async_using_client(). They used to change database
according to the given vector parameter, but not anymore after this
patch. In practice, this behavior change does not affect many tests
because most queries going through these functions already use fully
qualified table name. Going forward, querying through function other
than run_test_case() should try to use fully qualified table name as
much as possible.

Retain behavior of ImpalaTestSuite._get_table_location() since there are
considerable number of tests relies on it (changing database when
called).

Removed unused test fixtures and fixed several flake8 issues in modified
test files.

Testing:
- Moved nested-types-subplan-single-node.test. This allows the test
  framework to point to the right tpch_nested* database.
- Pass exhaustive test except IMPALA-13752 and IMPALA-13761. They will
  be fixed in separate patch.

Change-Id: I75bec7403cc302728a630efe3f95e852a84594e2
Reviewed-on: http://gerrit.cloudera.org:8080/22487
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-19 23:50:34 +00:00
Riza Suminto
e83a8e312a IMPALA-13747: Use fresh HS2 client for unique_database fixture
Just like IMPALA-13634 bug, unique_database fixture may taint
ImpalaTestSuite.client by setting 'sync_ddl=True' option and never
cleaning it up for test method usage.

This patch fixes unique_database fixture by always creating fresh HS2
client during CREATE DATABASE and DROP DATABASE.

Testing:
- Caught an assert error at test_set_fallback_db_for_functions
  Expected: "default.fn() unknown for database default"
  Actual: "functional.fn() unknown for database functional"
  In this case, shared ImpalaTestSuite.client may have changed database
  via ImpalaTestSuite.run_test_case() or
  ImpalaTestSuite.execute_wrapper() in other test method.
  Removing unused 'vector' argument somehow fixed the issue.
- For both query_test/test_udfs.py and metadata/test_ddl.py:
  - Fixed flake8 issues and remove unused pytest fixture.
  - Run and pass the tests in exhaustive exploration.

Change-Id: Ib503e829552d436035c57b489ffda0d0299f8405
Reviewed-on: http://gerrit.cloudera.org:8080/22471
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-14 01:15:10 +00:00
Riza Suminto
c08aff420d IMPALA-13672: Migrate query_test/test_kudu.py to use hs2 protocol
This patch migrate query_test/test_kudu.py to use hs2 client protocol.
Here are the steps taken:

- Override default_test_protocol() to return 'hs2'.
  See documentation in ImpalaTestSuite about what this method does.
- Remove usage of deprecated cursor and unique_cursor fixture.
- Replace all direct ImpalaTestSuite.client usage with helper
  function call such as execute_query() or execute_query_using_vector().
- Remove all "SET" query invocation and replace it with passing
  exec_option dictionary to helper method.
- Replace veryfing kudu modified / inserted rows from reading query
  output to reading runtime profile counters.
- Add HS2_TYPES section at test cases where only TYPES exist.
- Remove all drop_impala_table_after_context() calls and replace it with
  proper use of unique_database fixture.

KuduTestSuite is fixed with hs2 protocol dimension. Meanwhile,
CustomKuduTest is fixed to use beeswax protocol dimension until proper
migration can be done.

Added following convenience methods:
- ImpalaTestSuite.default_test_protocol() to allow individual test
  class to override its default test procol.
- ImpylaHS2ResultSet.tuples() to access the raw HS2 result set that is
  a list of tuples.

This patch also added several literal constants around test vector
dimension to help with traceability.

Fixed a bug where "SHOW PARTITIONS" via hs2 over kudu table will shows
NULL number of #Replicas because TResultRowBuilder does not have
overload for int type value. Adjust numFiles variable inside
HdfsTable.getTableStats() from int to long to match Type.BIGINT of
column '#Files'.

Fixed py.test classes that does not inherit BaseTestSuite. Fixed flake8
issues in test_statestore.py.

Testing:
- Run and pass all tests extended from KuduTestSuite in exhaustive mode.

Change-Id: I5f38baf5a0bbde1a1ad0bb4666c300f4f3cabd33
Reviewed-on: http://gerrit.cloudera.org:8080/22358
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-07 11:57:59 +00:00
Riza Suminto
3005092332 IMPALA-13668: Add default_test_protocol parameter to py.test
ImpalaTestSuite.client is always initialized as beeswax client. And many
tests use it directly rather than going through helper method such as
execute_query().

This patch add add default_test_protocol parameter to conftest.py. It
control whether to initialize ImpalaTestSuite.client equals to
'beeswax_client', 'hs2_client', or 'hs2_http_client'. This parameter is
still default to 'beeswax'.

This patch also adds helper method 'default_client_protocol_dimension',
'beeswax_client_protocol_dimension' and 'hs2_client_protocol_dimension'
for convenience and traceability.

Reduced occurrence where test method manually override
ImpalaTestSuite.client. They are replaced by combination of
ImpalaTestSuite.create_impala_clients and
ImpalaTestSuite.close_impala_clients.

Testing:
- Pass core tests.

Change-Id: I9165ea220b2c83ca36d6e68ef3b88b128310af23
Reviewed-on: http://gerrit.cloudera.org:8080/22336
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-01-24 12:19:02 +00:00
Joe McDonnell
8d5adfd0ba IMPALA-13123: Add option to run tests with Python 3
This introduces the IMPALA_USE_PYTHON3_TESTS environment variable
to select whether to run tests using the toolchain Python 3.
This is an experimental option, so it defaults to false,
continuing to run tests with Python 2.

This fixes a first batch of Python 2 vs 3 issues:
 - Deciding whether to open a file in bytes mode or text mode
 - Adapting to APIs that operate on bytes in Python 3 (e.g. codecs)
 - Eliminating 'basestring' and 'unicode' locations in tests/ by using
   the recommendations from future
   ( https://python-future.org/compatible_idioms.html#basestring and
     https://python-future.org/compatible_idioms.html#unicode )
 - Uses impala-python3 for bin/start-impala-cluster.py

All fixes leave the Python 2 path working normally.

Testing:
 - Ran an exhaustive run with Python 2 to verify nothing broke
 - Verified that the new environment variable works and that
   it uses Python 3 from the toolchain when specified

Change-Id: I177d9b8eae9b99ba536ca5c598b07208c3887f8c
Reviewed-on: http://gerrit.cloudera.org:8080/21474
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2024-12-17 07:28:51 +00:00
Joe McDonnell
7369ebb8ba IMPALA-13415: Add a special testing mode to track Calcite progress
This introduces a Calcite report mode for pytests. The mode
changes the behavior for run_test_case() so that it continues
past failures and exports information about the test results
to JSON. The JSON files can then be processed into an HTML
summary by the bin/calcite_report_generator.py. The HTML has
multiple layers of reporting:
1. Reporting on individual tests shows the test section along
   with the reported result
2. There are multiple aggregations of these individual results
   to have clear summaries and organization for browing:
   a. Level 1 is a report at the test function level
      (e.g. query_test/test_foo.py::TestFoo::test_foo)
      with links to the individaul results.
   b. Level 2 is at the test file level (e.g. query_test/test_foo.py)
      with links down to the test function level.
   c. Level 3 is a top level view with a summary across all the tests
      with links down to the test file level.

The errors are classified into different categories (e.g.
parse failures, analysis failures, result differences, etc).
In general, parse failures and unsupported features are lower
priority issues while result differences and runtime failures
are higher priority.

The report is designed to compare two different points in time
to see differences. For example, someone can run tests for a
baseline and then do a comparison run with a new commit.

Testing:
 - Ran on the code change for IMPALA-13468 and browsed the results
 - Ran tests normally and verified that they continue to work

Change-Id: I453c219c22b6cbc253574e0467d2c0d7b1fac092
Reviewed-on: http://gerrit.cloudera.org:8080/21866
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2024-11-07 23:25:13 +00:00
Joe McDonnell
0c7c6a335e IMPALA-11977: Fix Python 3 broken imports and object model differences
Python 3 changed some object model methods:
 - __nonzero__ was removed in favor of __bool__
 - func_dict / func_name were removed in favor of __dict__ / __name__
 - The next() function was deprecated in favor of __next__
   (Code locations should use next(iter) rather than iter.next())
 - metaclasses are specified a different way
 - Locations that specify __eq__ should also specify __hash__

Python 3 also moved some packages around (urllib2, Queue, httplib,
etc), and this adapts the code to use the new locations (usually
handled on Python 2 via future). This also fixes the code to
avoid referencing exception variables outside the exception block
and variables outside of a comprehension. Several of these seem
like false positives, but it is better to avoid the warning.

This fixes these pylint warnings:
bad-python3-import
eq-without-hash
metaclass-assignment
next-method-called
nonzero-method
exception-escape
comprehension-escape

Testing:
 - Ran core tests
 - Ran release exhaustive tests

Change-Id: I988ae6c139142678b0d40f1f4170b892eabf25ee
Reviewed-on: http://gerrit.cloudera.org:8080/19592
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
eb66d00f9f IMPALA-11974: Fix lazy list operators for Python 3 compatibility
Python 3 changes list operators such as range, map, and filter
to be lazy. Some code that expects the list operators to happen
immediately will fail. e.g.

Python 2:
range(0,5) == [0,1,2,3,4]
True

Python 3:
range(0,5) == [0,1,2,3,4]
False

The fix is to wrap locations with list(). i.e.

Python 3:
list(range(0,5)) == [0,1,2,3,4]
True

Since the base operators are now lazy, Python 3 also removes the
old lazy versions (e.g. xrange, ifilter, izip, etc). This uses
future's builtins package to convert the code to the Python 3
behavior (i.e. xrange -> future's builtins.range).

Most of the changes were done via these futurize fixes:
 - libfuturize.fixes.fix_xrange_with_import
 - lib2to3.fixes.fix_map
 - lib2to3.fixes.fix_filter

This eliminates the pylint warnings:
 - xrange-builtin
 - range-builtin-not-iterating
 - map-builtin-not-iterating
 - zip-builtin-not-iterating
 - filter-builtin-not-iterating
 - reduce-builtin
 - deprecated-itertools-function

Testing:
 - Ran core job

Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f
Reviewed-on: http://gerrit.cloudera.org:8080/19589
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
82bd087fb1 IMPALA-11973: Add absolute_import, division to all eligible Python files
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
 1. Python 3 requires absolute imports within packages. This
    can be emulated via "from __future__ import absolute_import"
 2. Python 3 changed division to "true" division that doesn't
    round to an integer. This can be emulated via
    "from __future__ import division"

This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.

I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.

Testing:
 - Ran core tests

Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
2b550634d2 IMPALA-11952 (part 2): Fix print function syntax
Python 3 now treats print as a function and requires
the parenthesis in invocation.

print "Hello World!"
is now:
print("Hello World!")

This fixes all locations to use the function
invocation. This is more complicated when the output
is being redirected to a file or when avoiding the
usual newline.

print >> sys.stderr , "Hello World!"
is now:
print("Hello World!", file=sys.stderr)

To support this properly and guarantee equivalent behavior
between python 2 and python 3, all files that use print
now add this import:
from __future__ import print_function

This also fixes random flake8 issues that intersect with
the changes.

Testing:
 - check-python-syntax.sh shows no errors related to print

Change-Id: Ib634958369ad777a41e72d80c8053b74384ac351
Reviewed-on: http://gerrit.cloudera.org:8080/19552
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-02-28 17:11:50 +00:00
Steve Carlin
bb9fb663ce IMPALA-10778: Allow impala-shell to connect directly to HS2
Impala-shell already uses HS2 protocol to connect to Impalad.
This commit allows impala-shell to connect to any server (for
example, Hive) using the hs2 protocol. This will be done via
the "--strict_hs2_protocol" option.

When the "--strict_hs2_protocol" option is turned on, only features
supported by hs2 will work. For instance, "runtime-profile" is an
impalad specific feature and will be disabled.

The "--strict_hs2_protocol" will only work on servers that abide
by the strict definition of what is supported by HS2. So one will
be able to connect to Hive in this mode, but connections to Impala
will not work. Any feature supported by Hive (e.g. kerberos
authentication) should work as well.

Note: While authentication should work, the test framework is not
set up to create an HS2 server that does authentication at this point
so this feature should be used with caution.
Change-Id: I674a45640a4a7b3c9a577830dbc7b16a89865a9e
Reviewed-on: http://gerrit.cloudera.org:8080/17660
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-08-27 09:45:59 +00:00
Csaba Ringhofer
94f67a3432 IMPALA-7825: Upgrade Thrift version to 0.11.0
Before this patch Impala mainly used Thrift 0.9.3, but it was
possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0
Thrift lib was already included in the toolchain.

Most of the changes are related to replacing boost:: with std::
shared_ptr-s in cpp code (this is a continuation of patch by Sahil).

The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as
Impala's test framework relies on Impyla. A thrift_sasl release is also
needed, because it currently pins Thrift version to 0.9.3 for Python 2.

The current patch uses alpha releases from Impyla and thrift_sasl that
use thrift 0.11.0.

Notable side effects:
- old logic to compile thrift for impala-shell with 0.11.0 was removed
- impala_shell's utf8 handling had to be updated as the new 0.11.0
  compilation happens with no_utf8strings. This also made things a
  bit faster, e.g the following is ~0.22s instead of ~0.25
  shell/impala_shell.py \
    -B -q "select * from functional_parquet.alltypes;" > /dev/null
- THRIFT-3921 changed the stream operators to print an enum's name
  instead of its number, leading to slightly different messages
  in some cases.
- "templates" was added to the thift generator's parameters to avoid
  a compilation issue (related to IMPALA-10600). I didn't notice any
  change in compilation time. This option generated .tcc files with
  templetized readers/writers for Thrift types. Currently we don't
  use these, but they could potentially speed up (de)serialization.

Testing:
- ran Impyla's test suite with Python 2 and 3
- ran core tests

Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Reviewed-on: http://gerrit.cloudera.org:8080/17170
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-04-27 13:36:54 +00:00
Joe McDonnell
2357958e73 IMPALA-10304: Fix log level and format for pytests
Recent testing showed that the pytests are not
respecting the log level and format set in
conftest.py's configure_logging(). It is using
the default log level of WARNING and the
default formatter.

The issue is that logging.basicConfig() is only
effective the first time it is called. The code
in lib/python/impala_py_lib/helpers.py does a
call to logging.basicConfig() at the global
level, and conftest.py imports that file. This
renders the call in configure_logging()
ineffective.

To avoid this type of confusion, logging.basicConfig()
should only be called from the main() functions for
libraries. This removes the call in lib/python/impala_py_lib
(as it is not needed for a library without a main function).
It also fixes up various other locations to move the
logging.basicConfig() call to the main() function.

Testing:
 - Ran the end to end tests and custom cluster tests
 - Confirmed the logging format
 - Added an assert in configure_logging() to test that
   the INFO log level is applied to the root logger.

Change-Id: I5d91b7f910b3606c50bcba4579179a0bc8c20588
Reviewed-on: http://gerrit.cloudera.org:8080/16679
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-30 15:32:21 +00:00
Joe McDonnell
bfdc5bf6af IMPALA-9702: Cleanup unique_database directories
If there are external tables in a database, drop database cascade
won't remove the external table locations. If those locations
are inside the database, then the database directory does not
get removed. Some tests that use unique_database fail when
running for the second time (or with a data snapshot) due to the
preexisting files.

This adds code to remove the database directory for
unique_database. It also adds some debugging statements that
list the files at the beginning of bin/run-all-tests.sh and
again at the end.

Testing:
 - Ran a core job and verified that the unique database
   directories are being removed
 - Ran TestMixedPartitions::test_incompatible_avro_partition_in_non_avro_table()
   multiple times and it passes when it previously failed.

Change-Id: I0530c028e5e7c241dfc054f04c78e2a045c2d035
Reviewed-on: http://gerrit.cloudera.org:8080/16015
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-06-02 14:01:23 +00:00
David Knupp
bc9d7e063d IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.
This is the main patch for making the the impala-shell cross-compatible with
python 2 and python 3. The goal is wind up with a version of the shell that will
pass python e2e tests irrepsective of the version of python used to launch the
shell, under the assumption that the test framework itself will continue to run
with python 2.7.x for the time being.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- The $IMPALA_HOME/bin/impala-shell.sh now sets up the impala-shell python
  environment independenty from bin/set-pythonpath.sh. The default version
  of thrift is thrift-0.11.0 (See IMPALA-9489).

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell with no authentication using Python 3.7.5
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell with LDAP-based authentication using Python 2.7.12
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result, this was inconsistency was reflected in the way we handled
  strings in the impala-shell code, but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could, and try to
  implement what is has colloquially been called a "unicode sandwich" -- namely,
  "bytes on the outside, unicode on the inside, encode/decode at the edges."

  The primary spot in the shell where we call decode() now is when sanitising
  input...

  args = self.sanitise_input(args.decode('utf-8'))

  ...and also whenever a library like re required it. Similarly, str.encode()
  is primarily used where a library like readline or csv requires is.

- PYTHONIOENCODING needs to be set to utf-8 to override the default setting for
  python 2. Without this, piping or redirecting stdout results in unicode errors.

- from __future__ import unicode_literals was added throughout

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and manually ran the tests against that.

  No effort has been made at this point to come up with a way to integrate
  testing of the shell in a python3 environment into our automated test
  processes.

Change-Id: Idb004d352fe230a890a6b6356496ba76c2fab615
Reviewed-on: http://gerrit.cloudera.org:8080/15524
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-18 05:13:50 +00:00
Laszlo Gaal
c97191b6a5 IMPALA-9626: Use Python from the toolchain for Impala
Historically Impala used the Python2 version that was available on
the hosting platform, as long as that version was at least v2.6.
This caused constant headache as all Python syntax had to be kept
compatible with Python 2.6 (for Centos 6). It also caused a recent problem
on Centos 8: here the system Python version was compiled with the
system's GCC version (v8.3), which was much more recent than the Impala
standard compiler version (GCC 4.9.2). When the Impala virtualenv was
built, the system Python version supplied C compiler switches for models
containing native code that were unknown for the Impala version of GCC,
thus breaking virtualenv installation.

This patch changes the Impala virtualenv to always use the Python2
version from the toolchain, which is built with the toolchain compiler.

This ensures that
- Impala always has a known Python 2.7 version for all its scripts,
- virtualenv modules based on native code will always be installable, as
  the Python environment and the modules are built with the same compiler
  version.

Additional changes:
- Add an auto-use fixture to conftest.py to check that the tests are
  being run with Python 2.7.x
- Make bootstrap_toolchain.py independent from the Impala virtualenv:
  remove the dependency on the "sh" library

Tests:
- Passed core-mode tests on CentOS 7.4
- Passed core-mode tests in Docker-based mode for centos:7
  and ubuntu:16.04

Most content in this patch was developed but not published earlier
by Tim Armstrong.

Change-Id: Ic7b40cef89cfb3b467b61b2d54a94e708642882b
Reviewed-on: http://gerrit.cloudera.org:8080/15624
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-16 01:08:00 +00:00
David Knupp
f9cf70d035 IMPALA-9129: Add a test fixture that cleans up intentional core dumps
Some negative tests produce core dumps intentionally. We should have a
way of removing these as part of test cleanup.

For custom cluster tests, it's likely the cores may actually be generated
during the base class setup phase, which means it's too early for the
test fixture to really be useful. Such was the case with the test case
TestAuthorizationProvider::test_invalid_provider_flag. In this instance,
we had to add the same steps directly to the tests.

Testing done:
For test_invalid_provider_flag, I made sure I had pre-existing core files
in the IMPALA_HOME directory, then ran the test to confirm new cores were
removed.

-- 2019-11-06 19:53:27,303 INFO  MainThread: Removing core.impalad.61852 created by test_invalid_provider_flag
-- 2019-11-06 19:53:27,375 INFO  MainThread: Removing core.impalad.61856 created by test_invalid_provider_flag
-- 2019-11-06 19:53:27,450 INFO  MainThread: Removing core.impalad.61849 created by test_invalid_provider_flag

...and then made sure the pre-existing cores were still present.

Change-Id: I778f27e820a6983894c1294d35627ddb04f5a51a
Reviewed-on: http://gerrit.cloudera.org:8080/14640
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-11-08 02:22:59 +00:00
Bharath Vissapragada
72c9370856 IMPALA-8717: impala-shell support for HS2 HTTP endpoint
Adds impala-shell support to connect to HiveServer2 HTTP endpoint.
Relies on toolchain change at https://gerrit.cloudera.org/#/c/13725/.

Use --protocol='hs2-http' to enable this behavior.

Example usages:
---------------
impala-shell --protocol='hs2-http'  (No auth)
impala-shell --protocol='hs2-http' --ldap -u..... (PLAIN auth)
impala-shell --protocol-'hs2-http' --ssl --ca_cert... (TLS)
impala-shell --protocol='hs2-http' --ldap --ssl --ca_cert... (LDAP +
TLS)

Limitations:
-----------
- Does not support Kerberos (-k) due to lack ot SPNEGO support.

Testing:
--------
- Parameterized existing shell tests to support this combination.
- Added shell test coverage for LDAP auth.

Change-Id: I8323950857dfe1c1dfd5377fde79f87bc2ce9534
Reviewed-on: http://gerrit.cloudera.org:8080/13746
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
2019-07-29 05:43:48 +00:00
Tim Armstrong
9ecbe7d3dc IMPALA-8553,IMPALA-8552: fix checks for remote cluster
Apparently IMPALA_REMOTE_URL is not generally used for remote cluster
tests: only --testing_remote_cluster is reliably set. Fix the
is_remote_cluster() implementation to take into account
REMOTE_DATA_LOAD and --testing_remote_cluster in addition to
IMPALA_REMOTE_URL. Consistently use is_remote_cluster() in
other tests instead of checking the pytest flag directly.

There were a few lifecycle headaches with how
ImpalaTestClusterProperties is used:
* common.environ is imported from conftest, which means that
  the top-level code in the file runs *before* pytest
  command-line arguments have been registered and parsed.
* ImpalaTestClusterProperties is used by various code,
  like build_flavor_timeout(), which runs before pytest
  command-line arguments have been parsed.
* ImpalaTestClusterProperties is called from non-pytest
  scripts like start-impala-cluster.py, so the command-line
  arguments are not available.

I dealt with the above challenges by making a few changes
to do the detection later:
* Lazily initializing a singleton ImpalaTestClusterProperties.
  This was not strictly necessary but makes the whole problem
  less sensitive to import order and module dependencies.
* Adding cluster_properties fixture to make ImpalaTestClusterProperties
  available in tests without additional boilerplate.
* Removing the caching of the local/remote build calculation.
  ImpalaTestClusterProperties is instantiated outside of python
  tests, but is_remote_cluster() is only called from python tests,
  so if we check flags in is_remote_cluster() we'll get the
  right results reliably.

As a workaround to unblock remote tests, also assume catalog_v1 if
accessing the web UI fails.

Testing:
Ran core tests against a regular minicluster.

Ran tests against a remote cluster

Change-Id: Ifa6b2a1391f53121d3d7c00c5cf0a57590899ce4
Reviewed-on: http://gerrit.cloudera.org:8080/13386
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-20 20:27:31 +00:00
Todd Lipcon
800f635855 IMPALA-8667. Remove --pull_incremental_stats flag
This flag was added as a "chicken bit" -- so we could disable the new
feature if we had some problems with it. It's been out in the wild for a
number of months and we haven't seen any such problems, so at this point
let's stop maintaining the old code path.

Change-Id: I8878fcd8a2462963c7db3183a003bb9816dda8f9
Reviewed-on: http://gerrit.cloudera.org:8080/13671
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-19 01:07:00 +00:00
Tim Armstrong
2ca7f8e7c0 IMPALA-7995: part 1: fixes for e2e dockerised impala tests
This fixes all core e2e tests running on my local dockerised
minicluster build. I do not yet have a CI job or script running
but I wanted to get feedback on these changes sooner. The second
part of the change will include the CI script and any follow-on
fixes required for the exhaustive tests.

The following fixes were required:
* Detect docker_network from TEST_START_CLUSTER_ARGS
* get_webserver_port() does not depend on the caller passing in
  the default webserver port. It failed previously because it
  relied on start-impala-cluster.py setting -webserver_port
  for *all* processes.
* Add SkipIf markers for tests that don't make sense or are
  non-trivial to fix for containerised Impala.
* Support loading Impala-lzo plugin from host for tests that depend on
  it.
* Fix some tests that had 'localhost' hardcoded - instead it should
  be $INTERNAL_LISTEN_HOST, which defaults to localhost.
* Fix bug with sorting impala daemons by backend port, which is
  the same for all dockerised impalads.

Testing:
I ran tests locally as follows after having set up a docker network and
starting other services:

  ./buildall.sh -noclean -notests -ninja
  ninja -j $IMPALA_BUILD_THREADS docker_images
  export TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster"
  export FE_TEST=false
  export BE_TEST=false
  export JDBC_TEST=false
  export CLUSTER_TEST=false
  ./bin/run-all-tests.sh

Change-Id: Iee86cbd2c4631a014af1e8cef8e1cd523a812755
Reviewed-on: http://gerrit.cloudera.org:8080/12639
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-04-13 02:42:32 +00:00
Thomas Tauber-Marshall
a4ad8f35f7 IMPALA-8390: clean up test vectors in test_cancellation.py
Due to changes to TestCancellation made in IMPALA-7205 that were not
reflected in TestCancellationSerial and TestCancellationFullSort,
test_cancel_insert has not been running at all and test_cancel_sort
has been running with unintended parameters.

This patch re-enables test_cancel_insert, while including a number of
constraints on its parameters to keep test execution time reasonable.
It also fixes an incorrect constraint on test_cancel_sort.

The patch also makes some related improvements:
- Removes an xfail on test_cancel_insert related to a bug that is
  fixed now.
- When ImpalaTestVector.get_value() is called with a value name that
  does not actually exist in the vector, the result is a StopIteration
  exception. Due to python's questionable habit of using exceptions
  for flow control, StopIteration is frequently treated not as an
  error but as the normal end of iteration, which can result in
  unexpected behavior, eg. when pytest_generate_tests raises a
  StopIteration pytest just silently ignores it and drops the test
  case. This patch modifies get_value() to instead raise a ValueError
  in this situation.
- When a test has no vectors generated for it, the name of the test is
  now included in the logged warning.

Testing:
- Ran full core and exhaustive runs and verified that the expected
  test cases are run for test_cancellation.py now

Change-Id: I9673fe82bda5314aff6a51d1961759ff286fbf6f
Reviewed-on: http://gerrit.cloudera.org:8080/12960
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-04-13 00:27:19 +00:00
Austin Nobis
515ded0035 IMPALA-7918: Remove support for authorization policy file
This patch removes support for the authorization_policy_file. When the
flag is passed, the backend will issue a warning message that the flag
is being ignored.

Tests relying on the authorization_policy_file flag have been updated to
rely on sentry server instead.

Testing:
- Ran all FE tests
- Ran all E2E tests

Change-Id: Ic2a52c2d5d35f58fbff8c088fb0bf30169625ebd
Reviewed-on: http://gerrit.cloudera.org:8080/12637
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-03-25 20:23:33 +00:00
Philip Zeyliger
6d5ca479f6 IMPALA-7666: Propagate name of test into CLIENT_IDENTIFIER.
To facilitate correlating test failures (where we sometimes know things
like fragment id) with the tests that generated those queries, we can
stuff the test name into CLIENT_IDENTIFIER.

The mechanics here are to create a global, tests.common.current_node to
store the current test, create a plugin in conftest to set this when
entering a test, and then configuring connections as they're created
deep in test code.

Change-Id: I2f685fd16982d73ad3fc0f4a7578c5ad83b9a84c
Reviewed-on: http://gerrit.cloudera.org:8080/12177
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-01-11 05:34:07 +00:00
Sahil Takiar
691f9d9ff9 IMPALA-6249: Expose several build flags via web UI
Exposes a list of build flags via the impalad web UI. The build flags
can be viewed on the root page under the "Version" section. They can
be accessed via other tests through the debug version of the root page
(e.g. adding &json to the URL). The build flags are listed in a JSON
array so that they can be parsed easily. This should help run Impala
tests against a remote Impala cluster.

The build flags are read in CMakeLists.txt and then stored in
preprocessor variables.

Three build flags are exposed as part of this commit:
- Is_NDEBUG = [true, false]
    - Whether NDEBUG was true or false at compile time
- CMake_Build_Type = [DEBUG, RELEASE, ADDRESS_SANITIZER, TIDY, UBSAN,
  UBSAN_FULL, TSAN, CODE_COVERAGE_RELEASE, CODE_COVERAGE_DEBUG]
    - The value of CMAKE_BUILD_TYPE at compile time
- Library_Link_Type = [DYNAMIC, STATIC]
    - Derived from the compile time value of BUILD_SHARED_LIBS

There are a few other minor changes that are apart of this commit:

* The patch modifies environ.py so that it supports fetching build metadata
for both local and remote clusters.

* The tests under the tests/webserver directory were not being run because
'webserver' was not whitelisted in tests/run-tests.py. This patch fixes
that and addresses several test failures in run-tests.py.

* It reverts part of IMPALA-6947 so that their is no dependency from
start-impala-cluster.py to environ.py. The timeout discussed IMPALA-6947
is now set at compile time.

Testing:

Added new tests to webserver/test_web_pages.py to ensure that the build
flags are being set. Some tests are only run when run against a local
cluster because we have no way of getting the build info from a remote
cluster, whereas local clusters contain a .cmake_build_type file.

Change-Id: I47e3ad4cbf844909bdaf22a6f9d7bd915dce3f19
Reviewed-on: http://gerrit.cloudera.org:8080/11410
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-11-05 22:47:31 +00:00
Fredy Wijaya
0cd9151801 IMPALA-7713: Add test coverage for catalogd restart when authorization is enabled
This patch adds a test coverage for catalogd restart when authorization
is enabled to ensure all privileges in the impalad's catalogs get reset
after the catalogd restart to avoid stale privileges in the impalad's
catalogs, which can pose a security issue.

Testing:
- Ran all E2E authorization tests
- Added a new test

Change-Id: Ib9a168697401cf0b83c7a193fa477888b48cb369
Reviewed-on: http://gerrit.cloudera.org:8080/11696
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-10-17 05:33:05 +00:00
Tim Armstrong
d05f73f415 IMPALA-7647: Add HS2/Impyla dimension to TestQueries
I used some ideas from Alex Leblang's abandoned patch:
https://gerrit.cloudera.org/#/c/137/ in order to run .test files through
HS2. The advantage of using Impyla is that much of the code will be
reusable for any Python client implementing the standard Python dbapi
and does not require us implementing yet another thrift client.

This gives us better coverage of non-trivial result sets from HS2,
including handling of NULLs, error logs and more interesting result
sets than the basic HS2 tests.

I added HS2 coverage to TestQueries, which has a reasonable variety of
queries and covers the data types in alltypes. I also added
TestDecimalQueries, TestStringQuery and TestCharFormats to get coverage
of DECIMAL, CHAR and VARCHAR that aren't in alltypes. Coverage of
results sets with NULLs was limited so I added a couple of queries.

Places where results differ from Beeswax:
* Impyla is a Python dbapi client so must convert timestamps into python datetime
  objects, which only have microsecond precision. Therefore result
  timestamps within nanosecond precision are truncated.
* The HS2 interface reports the NULL type as BOOLEAN as a workaround for
  IMPALA-914.
* The Beeswax interface reported VARCHAR as STRING, but HS2 reports
  VARCHAR.

I dealt with different results by adding additional result sections so
that the expected differences between the clients/protocols were
explicit.

Limitations:
* Not all of the same methods are implemented as for beeswax, so some
  tests that have more complicated interactions with the client will not
  work with HS2 yet.
* We don't have a way to get the affected row count for inserts.

I also simplified the ImpalaConnection API by removing some unnecessary
methods and moved some generic methods to the base class.

Testing:
* Confirmed that it detected IMPALA-7588 by re-applying the buggy patch.
* Ran exhaustive and CentOS6 tests.

Change-Id: I9908ccc4d3df50365be8043b883cacafca52661e
Reviewed-on: http://gerrit.cloudera.org:8080/11546
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-10-09 00:45:10 +00:00
Thomas Tauber-Marshall
7eb64a9be7 IMPALA-7576: Add a timeout for all E2E tests
We've been seeing a lot of hangs in tests lately. This can waste
test resources by keeping machines busy, cause loss of coverage when
subsequent tests don't run, and can be difficult to diagnose if its
not clear which test hung.

This patch introduces a timeout of 2 hours for normal builds and 4
hours for slow builds for all tests run under pytest by using the
pytest-timeout plugin.

The timeouts were chosen to be generous to avoid false positives.
In recent runs I examined, the longest running test is
test_decimal_fuzz, which took 63 minutes in a DEBUG build and
162 minutes in an ASAN build.

Testing:
- Ran locally with a reduced timeout and confirmed the test is
  timed out when expected.

Change-Id: I301dd27a9767bfaef2756282014ef457a31956bd
Reviewed-on: http://gerrit.cloudera.org:8080/11447
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-09-17 21:49:01 +00:00
Vuk Ercegovac
72ee4a4275 IMPALA-7425: Change incremental stats to pull from catalogd.
Currently, incremental stats can consume a substantial
amount of metadata memory (per table, partition, column).
This metadata is transmitted from catalogd to all coordinators.
As a result, memory is used for all loaded tables that use
incremental stats all the time at all coordinators. A consequence
is that coordinators and catalogd die from OOM more often
when incremental stats are used and more network bandwidth is used.

This change removes incremental stats from impalads. These stats
are only needed when computing incremental statistics and merging
new results with the existing results. They are not used by queries.
As a result, the change requires that coordinators fetch
incremental stats directly from catalogd when computing incremental stats.
In addition, catalogd no longer sends incremental stats to coordinators
via the statestore.

The option is enabled by setting a new flag, --pull_incremental_statistics,
on the catalogd and all impalad coordinators.

Testing:
  - manual testing
  - added end-to-end tests with --pull_incremental_statistics enabled
    for the compute-stats-incremental.test
  - added fe CatalogTest for new catalogd service method
  - passes exhaustive tests when --pull_incremental_statistics is enabled
    and disabled

Change-Id: I9d564808ca5157afe4e091909ca6cdac76e60d6e
Reviewed-on: http://gerrit.cloudera.org:8080/11193
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-09-05 20:49:54 +00:00
Todd Lipcon
30bb0b3d89 tests: ensure consistent logging format across tests
Many of the test modules included calls to 'logging.basicConfig' at
global scope in their implementation. This meant that by just importing
one of these files, other tests would inherit their logging format. This
is typically a bad idea in Python -- modules should not have side
effects like this on import.

The format was additionally inconsistent. In some cases we had a "--"
prepended to the format, and in others we didn't. The "--" is very
useful since it lets developers copy-paste query-test output back into
the shell to reproduce an issue.

This patch fixes the above by centralizing the logging configuration in
a pytest hook that runs prior to all pytests. A few other non-pytest
related tools now configure logging in their "main" code which is only
triggered when the module is executed directly.

I tested that, with this change, logs still show up properly in the .xml
output files from 'run-tests.py' as well as when running tests manually
from impala-py.test

Change-Id: I55ef0214b43f87da2d71804913ba4caa964f789f
Reviewed-on: http://gerrit.cloudera.org:8080/11225
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-18 04:21:00 +00:00
Todd Lipcon
4aec50484a IMPALA-7308. Support Avro tables in LocalCatalog
This adds support for loading Avro-formatted tables in LocalCatalog. In
the case that the table properties indicate a table is Avro-formatted,
the semantics are identical to the existing catalog implementation:

- if an explicit avro schema is specified, it overrides the schema
  provided by the HMS
- if no explicit avro schema is specified, one is inferred, and then the
  inferred schema takes the place of the one provided by the HMS (thus
  promoting columns like TINYINT to INT)
- on COMPUTE STATS, if any discrepancy is discovered between the HMS
  schema and the inferred schema, an error is emitted.

The semantics for LocalCatalog are slightly different in the case of
tables which have not been configured as Avro format on the table level:

The existing implementation has the behavior that, when a table is
loaded, all partitions are inspected, and, if any partition is
discovered with Avro format, the above rules are applied. This has some
very unexpected results, described in an earlier email to
dev@impala.apache.org [1]. To summarize that email thread, the existing
behavior was decided to be unintuitive and inconsistent with Hive.
Additionally, this behavior requires loading all partitions up-front,
which gets in the goal of lazy/granular metadata loading in
LocalCatalog.

Thus, the LocalCatalog implementation differs as follows:

- the "schema override" behavior ONLY occurs if the Avro file format has
  been selected at a table level.

- if an Avro partition is added to a non-Avro table, and that partition
  has a schema that isn't compatible with the table's schema, an error
  will occur on read.

The thread additionally discusses adding an error message on "alter" to
prevent users from adding an Avro partition to a table with an
incompatible schema. To keep the scope of this patch minimal, that is
not yet implemented here. I filed IMPALA-7309 to change the behavior of
the existing catalog implementation to match.

A new test verifies the behavior, set to 'xfail' when running on the
existing catalog implementation.

[1] https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@%3Cdev.impala.apache.org%3E

Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Reviewed-on: http://gerrit.cloudera.org:8080/10970
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com>
2018-08-07 17:38:04 +00:00
Michael Ho
8d7f638654 IMPALA-7212: Removes --use_krpc flag and remove old DataStream services
This change removes the flag --use_krpc which allows users
to fall back to using Thrift based implementation of DataStream
services. This flag was originally added during development of
IMPALA-2567. It has served its purpose.

As we port more ImpalaInternalServices to use KRPC, it's becoming
increasingly burdensome to maintain parallel implementation of the
RPC handlers. Therefore, going forward, KRPC is always enabled.
This change removes the Thrift based implemenation of DataStreamServices
and also simplifies some of the tests which were skipped when KRPC
is disabled.

Testing done: core debug build.

Change-Id: Icfed200751508478a3d728a917448f2dabfc67c3
Reviewed-on: http://gerrit.cloudera.org:8080/10835
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-07-24 02:36:50 +00:00
Joe McDonnell
6887fc2190 IMPALA-7238: Use custom timeout for create unique database
test_kudu.TestCreateExternalTables() saw a timeout when
creating the unique database for its tests.

__unique_conn() opens a connection, creates a unique database,
then returns another connection in that database. It takes
a custom timeout argument, but the timeout is only for the
returned connection. The first connection to create the
unique database uses the default timeout of 45 seconds.

This patch changes the first connection to use the custom
timeout. For Kudu tests, this is 5 minutes rather than 45
seconds.

Change-Id: I4f2beb5bc027a4bb44e854bf1dd8919807a92ea0
Reviewed-on: http://gerrit.cloudera.org:8080/10862
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-07-17 16:45:34 +00:00
Vuk Ercegovac
4653637b9e IMPALA-6933: Avoids db name collisions for Kudu tests
Kudu tests generate temporary db names in a way so that its
unlikely, yet possible to collide. A recent test failure
indicates such a collision came up. The fix changes the
way that the name is generated so that it includes the
classes name for which the db name is generated. This db name
will make it easier to see which test created it and the name
will not collide with other names generated by other tests.

Testing:
- ran the updated test locally

Change-Id: I7c2f8a35fec90ae0dabe80237d83954668b47f6e
Reviewed-on: http://gerrit.cloudera.org:8080/10513
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-29 20:03:54 +00:00
Philip Zeyliger
2e6a63e31e IMPALA-6070: Further improvements to test-with-docker.
This commit tackles a few additions and improvements to
test-with-docker. In general, I'm adding workloads (e.g., exhaustive,
rat-check), tuning memory setting and parallelism, and trying to speed
things up.

Bug fixes:

* Embarassingly, I was still skipping thrift-server-test in the backend
  tests. This was a mistake in handling feedback from my last review.

* I made the timeline a little bit taller to clip less.

Adding workloads:

* I added the RAT licensing check.

* I added exhaustive runs. This led me to model the suites a little
  bit more in Python, with a class representing a suite with a
  bunch of data about the suite. It's not perfect and still
  coupled with the entrypoint.sh shell script, but it feels
  workable. As part of adding exhaustive tests, I had
  to re-work the timeout handling, since now different
  suites meaningfully have different timeouts.

Speed ups:

* To speed up test runs, I added a mechanism to split py.test suites into
  multiple shards with a py.test argument. This involved a little bit of work in
  conftest.py, and exposing $RUN_CUSTOM_CLUSTER_TESTS_ARGS in run-all-tests.sh.

  Furthermore, I moved a bit more logic about managing the
  list of suites into Python.

* Doing the full build with "-notests" and only building
  the backend tests in the relevant target that needs them. This speeds
  up "docker commit" significantly by removing about 20GB from the
  container.  I had to indicates that expr-codegen-test depends on
  expr-codegen-test-ir, which was missing.

* I sped up copying the Kudu data: previously I did
  both a move and a copy; now I'm doing a move followed by a move. One
  of the moves is cross-filesystem so is slow, but this does half the
  amount of copying.

Memory usage:

* I tweaked the memlimit_gb settings to have a higher default. I've been
  fighting empirically to have the tests run well on c4.8xlarge and
  m4.10xlarge.

The more memory a minicluster and test suite run uses, the fewer parallel
suites we can run. By observing the peak processes at the tail of a run (with a
new "memory_usage" function that uses a ps/sort/awk trick) and by observing
peak container total_rss, I found that we had several JVMs that
didn't have Xmx settings set. I added Xms/Xmx settings in a few
places:

 * The non-first Impalad does very little JVM work, so having
   an Xmx keeps it small, even in the parallel tests.
 * Datanodes do work, but they essentially were never garbage
   collecting, because JVM defaults let them use up to 1/4th
   the machine memory. (I observed this based on RSS at the
   end of the run; nothing fancier.) Adding Xms/Xmx settings
   helped.
 * Similarly, I piped the settings through to HBase.

A few daemons still run without resource limitations, but they don't
seem to be a problem.

Change-Id: I43fe124f00340afa21ad1eeb6432d6d50151ca7c
Reviewed-on: http://gerrit.cloudera.org:8080/10123
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-04-26 20:47:29 +00:00
Michael Ho
3b72a6c0da IMPALA-2567: Enable KRPC by default
This change enables the switch to use KRPC by default.
This change also fixes a bug in KrpcDataStreamMgr to
check if maintenance thread was started before calling
Join() on it. This shows up in BE tests as the maintenance
thread isn't started in them.

Testing done: exhaustive build.

Change-Id: Iae736c1c1351758969b4d84e34fc5b2d048660a0
Reviewed-on: http://gerrit.cloudera.org:8080/9461
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
2018-03-05 08:57:40 +00:00
Lars Volker
a8fc9f0fc7 IMPALA-6508: add KRPC test flag
This change adds a flag "--use_krpc" to start-impala-cluster.py. The
flag is currently passed as an argument to the impalad daemon. In the
future it will also enable KRPC for the catalogd and statestored
daemons.

This change also adds a flag "--test_krpc" to pytest. When running tests
using "impala-py.test --test_krpc", the test cluster will be started
by passing "--use_krpc" to start-impala-cluster.py (see above).

This change also adds a SkipIf to skip tests based on whether the
cluster was started with KRPC support or not.

- SkipIf.not_krpc can be used to mark a test that depends on KRPC.
- SkipIf.not_thrift can be used to mark a test that depends on Thrift
  RPC.

This change adds a meta test to make sure that the new SkipIf decorators
work correctly. The test should be removed as soon as real tests have
been added with the new decorators.

Change-Id: Ie01a5de2afac4a0f43d5fceff283f6108ad6a3ab
Reviewed-on: http://gerrit.cloudera.org:8080/9291
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins
2018-02-16 09:26:01 +00:00
David Knupp
894bb77855 IMPALA-4839: Remove implicit 'localhost' for KUDU_MASTER_HOSTS
The Kudu query tests were failing on a remote cluster because the Kudu
master was always set to '127.0.0.1', with no way to override it.

This patch corrects the issue with a number of changes:

- Add a pytest command line option to specify an arbitrary Kudu master

- Consolidate the place where the default Kudu master is derived. It
  had been stored both in the env and in tests/common/__init__.py,
  with different files looking to different places. For now, just look
  to the env, and remove the value from __init__.py.

- The kudu_client test fixture in conftest.py was using the connect()
  method from impala.dbapi (part of the Impyla library), without
  specifying the host param. In the absence of that, the default value
  is 'localhost', so add the host param to the connect() call.

- Define the various defaults for pytest config as constants at the top
  of conftest.py.

Change-Id: I9df71480a165f4ce21ae3edab6ce7227fbf76f77
Reviewed-on: http://gerrit.cloudera.org:8080/5877
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
2017-02-14 21:51:39 +00:00
David Knupp
f590bc0da6 IMPALA-4750: Rename test infra classes so they don't mimic test classes.
This patch addresses warning messages from pytest re: the imported
TestMatrix, TestVector, and TestDimension classes, which were being
collected as potential test classes. The fix was to simply prepend
the class names with Impala-

git grep -l 'TestDimension' | xargs \
    sed -i 's/TestDimension/ImpalaTestDimension/g'

git grep -l 'TestMatrix' | xargs \
    sed -i 's/TestMatrix/ImpalaTestMatrix/g'

git grep -l 'TestVector' | xargs \
    sed -i 's/TestVector/ImpalaTestVector/g'

The tests all passed in an exhaustive run on the upstream jenkins
server:

http://jenkins.impala.io:8080/view/Utility/job/pre-review-test/8/

Change-Id: I06b7bc6fd99fbb637a47ba376bf9830705c1fce1
Reviewed-on: http://gerrit.cloudera.org:8080/5794
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
2017-01-26 23:40:22 +00:00
David Knupp
6c5f8e3f5e IMPALA-4639: Add pytest option and xfail markers for tests that only run locally.
As we're beginning to run Impala end-to-end tests on remote clusters, we're
finding some tests that do not pass for infrastructure-related reasons (as
opposed to product issues.) It would be useful to be able to xfail any tests
that we know to be problematic within a given module, yet still run the
others. This way, we can get passing test runs as we're ironing out those
infrastructure issues.

Change-Id: Id4d6e46dc1e64ad20c727ccb19af7a9f3daf917f
Reviewed-on: http://gerrit.cloudera.org:8080/5446
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-12-15 02:45:50 +00:00
Thomas Tauber-Marshall
d15f86cb6f IMPALA-4454: test_kudu.TestShowCreateTable flaky
The cause of the flakiness is Kudu CREATE TABLE operations
that are sometimes taking a long time, leading to timeouts
in the hiveserver2 connection. This patch adds the ability
for tests using the 'conn' pytest fixture to specify a
timeout to connect(), and sets a timeout of 5 minutes for
this test.

Change-Id: I2727c27ff66140ac4043bcad332cd4e1d72b255f
Reviewed-on: http://gerrit.cloudera.org:8080/5040
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-11-11 20:04:01 +00:00
Michael Brown
ac516670b6 IMPALA-4352: test infra: store Impala/Kudu primary keys in object model
Test infrastructure, including the random query generator and the data
migrator, needs to know the primary keys of Impala/Kudu tables. This
test infrastructure keeps Python object models of the tables and
columns. This patch adds the ability to read from source Impala/Kudu
tables via SHOW CREATE TABLE and store primary keys as proper
attributes. The patch also adds tests that ensure the test
infrastructure is always able to read and store the primary keys. This
helps find breakages sooner rather than later. For example, if a
regression to "SHOW CREATE TABLE" or the test infrastructure makes us no
longer able to parse primary keys, GVO or other CI will find the
breakage faster than running the query generator.

I also fixed some flake8 issues in files I touched. There were several
files that had a lot of white space warnings, and I wanted to keep the
patch from getting too large.

Change-Id: Ib654b6cd0e8c2a172ffb7330497be4d4a751e6e5
Reviewed-on: http://gerrit.cloudera.org:8080/4873
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2016-11-05 19:27:17 +00:00
Dimitris Tsirogiannis
041fa6d946 IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
With this commit we simplify the syntax and handling of CREATE TABLE
statements for both managed and external Kudu tables.

Syntax example:
CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b))
DISTRIBUTE BY HASH (a) INTO 3 BUCKETS,
RANGE (b) SPLIT ROWS (('abc', 'def'))
STORED AS KUDU

Changes:
1) Remove the requirement to specify table properties such as key
   columns in tblproperties.
2) Read table schema (column definitions, primary keys, and distribution
   schemes) from Kudu instead of the HMS.
3) For external tables, the Kudu table is now required to exist at the
   time of creation in Impala.
4) Disallow table properties that could conflict with an existing
   table. Ex: key_columns cannot be specified.
5) Add KUDU as a file format.
6) Add a startup flag to impalad to specify the default Kudu master
   addresses. The flag is used as the default value for the table
   property kudu_master_addresses but it can still be overriden
   using TBLPROPERTIES.
7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE
   wasn't implemented for Kudu tables and silently ignored. The Kudu
   tables wouldn't be removed in Kudu.
8) Remove DDL delegates. There was only one functional delegate (for
   Kudu) the existence of the other delegate and the use of delegates in
   general has led to confusion. The Kudu delegate only exists to provide
   functionality missing from Hive.
9) Add PRIMARY KEY at the column and table level. This syntax is fairly
   standard. When used at the column level, only one column can be
   marked as a key. When used at the table level, multiple columns can
   be used as a key. Only Kudu tables are allowed to use PRIMARY KEY.
   The old "kudu.key_columns" table property is no longer accepted
   though it is still used internally. "PRIMARY" is now a keyword.
   The ident style declaration is used for "KEY" because it is also used
   for nested map types.
10) For managed tables, infer a Kudu table name if none was given.
   The table property "kudu.table_name" is optional for managed tables
   and is required for external tables. If for a managed table a Kudu
   table name is not provided, a table name will be generated based
   on the HMS database and table name.
11) Use Kudu master as the source of truth for table metadata instead
   of HMS when a table is loaded or refreshed. Table/column metadata
   are cached in the catalog and are stored in HMS in order to be
   able to use table and column statistics.

Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Reviewed-on: http://gerrit.cloudera.org:8080/4414
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-10-21 10:52:25 +00:00
Alex Behm
ab9e54bc42 IMPALA-3491: Use unique database fixture in test_ddl.py.
Adds new parametrization to the unique database fixture:
- num_dbs: allows creating multiple unique databases at once;
  the 2nd, 3rd, etc. datbase name is generated by appending
  "2", "3", etc., to the first database name
- sync_ddl: allows creating the dabatases(s) with sync_ddl
  which is needed by most tests in test_ddl.py

Testing: I ran debug/core and debug/exhaustive on HDFS and
core/debug on S3. Also ran the test locally in a loop on
exhaustive.

Change-Id: Idf667dd5e960768879c019e2037cf48ad4e4241b
Reviewed-on: http://gerrit.cloudera.org:8080/4155
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-09-02 02:47:02 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Michael Brown
5112e65be2 Revert "Revert "Add Kudu test helpers""
This reverts commit f8dd5413b65d30646c3745dfc738ed812d50a51f and
effectively re-adds commit 9248dcb70478b8f93f022893776a0960f45fdc28. The
difference between this patch and its original is that I fixed the
changes introduced in infra/python/bootstrap_virtualenv.py to be
python2.4-compatible:

- removed the use of str.format(), preferring a str.join() pattern
- removed the call of the exit() builtin to prefer sys.exit()

The only testing I did for this patch was to ensure
CDH Impala-packaging-on-demand works.

Change-Id: I02ed97473868eacf45b25abe89b41e6fa2fce325
Reviewed-on: http://gerrit.cloudera.org:8080/3160
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
2016-05-24 16:40:59 -07:00