Fixes a potential null pointer dereference when log level >= 2.
Adds 'build' as a valid EE test helper directory as VSCode creates
this directory.
Tested locally by running test_scanners from the query_test EE test
suite using a release build of Impala and log level 2. Minidumps were
not generated during this test run but were generated during the same
test run without this fix applied.
Generated-by: Github Copilot (Claude Sonnet 3.7)
Change-Id: I91660aa84407c17ffb7cd3c721d4f3f0a844d61d
Reviewed-on: http://gerrit.cloudera.org:8080/23365
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for JS code analysis and linting to webUI
scripts using ESLint.
Support to enforce code style and quality is partcularly beneficial,
as the codebase for client-side scripts is consistently growing.
This has been implemented to work alongside other code style enforcement
rules present within 'critique-gerrit-review.py', which runs on the
existing jenkins job 'gerrit-auto-critic', to produce gerrit comments.
In the case of webUI scripts, ESLint's code analysis and linting checks
are performed to produce these comments.
As a shared NodeJS installation can be used for JS tests as well as
linting, a seperate common script "bin/nodejs/setup_nodejs.sh"
has been added for assiting with the NodeJS installation.
To ensure quicker run times for the jenkins job, NodeJS tarball is
cached within "${HOME}/.cache" directory, after the initial installation.
ESLint's packages and dependencies have been made to be cached
using NPM's own package management and are also cached locally.
NodeJS and ESLint dependencies are retrieved and executed, only if
there are any changes within ".js" files within the patchset,
and run with minimal overhead.
After analysis, comments are generated for all the violations according
to the specified rules.
A custom formatter has been added to extract, format and filter the
violations in JSON form.
These generated code style violations are formatted into the required
JSON form according to gerrit's REST API, similar to comments generated
by flake8. These are then posted to gerrit as comments
on the respective patchset from jenkins over SSH.
The following code style and quality rules have been added using ESLint.
- Disallow unused variables
- Enforce strict equality (=== and !==)
- Require curly braces for all control statements (if, while, etc.)
- Enforce semicolons at the end of statements
- Enforce double quotes for strings
- Set maximum line length to 90
- Disallow `var`, use `let` or `const`
- Prefer `const` where possible
- Disallow multiple empty lines
- Enforce spacing around infix operators (eg. +, =)
- Disallow the use of undeclared variables
- Require parentheses around arrow function arguments
- Require a space before blocks
- Enforce consistent spacing inside braces
- Disallow shadowing variables declared in the outer scope
- Disallow constant conditions in if statements, loops, etc
- Disallow unnecessary parentheses in expressions
- Disallow duplicate arguments in function definitions
- Disallow duplicate keys in object literals
- Disallow unreachable code after return, throw, continue, etc
- Disallow reassigning function parameters
- Require functions to always consistently return or not return at all
- Enforce consistent use of dot notation wherever possible
- Disallow multiple empty lines
- Enforce spacing around the colon in object literal properties
- Disallow optional chaining, where undefined values are not allowed
The required linting packages have been added as dependencies in the
"www/scripts" directory.
All the test scripts and related dependencies have been moved to -
$IMPALA_HOME/tests/webui/js_tests.
All the custom ESLint formatter scripts and related dependencies
have been moved to -
$IMPALA_HOME/tests/webui/linting.
A combination of NodeJS's 'prefix' argument and NODE_PATH environmental
variable is being used to seperate the dependencies and webUI scripts.
To support running the tests from a remote directory(i.e. tests/webui),
by modifying the required base paths.
The JS scripts need to be updated according to these linting rules,
as per IMPALA-13986.
Change-Id: Ieb3d0a9221738e2ac6fefd60087eaeee4366e33f
Reviewed-on: http://gerrit.cloudera.org:8080/21970
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
bin/run-all-tests.sh provides a convenient way to repeat running the
same test multiple times by setting NUM_TEST_ITERATIONS env var. This is
especially useful to prove that a test is not flaky. However, it will
still redundantly repeat run-workload.py and verifiers without any way
to skip them.
This patch adds env var SKIP_VERIFIERS to allow skipping verifiers. "Run
test run-workload" is rewritten into its own test_run_workload.py.
Testing:
- Run and pass test_run_workload.py.
- Manually run the script with SKIP_VERIFIERS set to true and confirm
that verifiers are skipped.
Change-Id: Ib483dcd48980655e4aa0c77f1cdc1f2a3c40a1de
Reviewed-on: http://gerrit.cloudera.org:8080/22365
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This introduces the IMPALA_USE_PYTHON3_TESTS environment variable
to select whether to run tests using the toolchain Python 3.
This is an experimental option, so it defaults to false,
continuing to run tests with Python 2.
This fixes a first batch of Python 2 vs 3 issues:
- Deciding whether to open a file in bytes mode or text mode
- Adapting to APIs that operate on bytes in Python 3 (e.g. codecs)
- Eliminating 'basestring' and 'unicode' locations in tests/ by using
the recommendations from future
( https://python-future.org/compatible_idioms.html#basestring and
https://python-future.org/compatible_idioms.html#unicode )
- Uses impala-python3 for bin/start-impala-cluster.py
All fixes leave the Python 2 path working normally.
Testing:
- Ran an exhaustive run with Python 2 to verify nothing broke
- Verified that the new environment variable works and that
it uses Python 3 from the toolchain when specified
Change-Id: I177d9b8eae9b99ba536ca5c598b07208c3887f8c
Reviewed-on: http://gerrit.cloudera.org:8080/21474
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
IMPALA-12442 removed duplicate labels for stress and execute_serially,
as they resulted in running the tests twice in different suites. Most
tests that had both labels expect to be run sequentially, which by
definition cannot be run in our stress test suite (which runs many
operations at once to stress the cluster).
Updates all tests previously marked with both 'stress' and
'execute_serially' to run serially. The only test continuing to use
'stress' mode is test_ddl_stress.py which was designed for it using a
separate 'test_index' parameter.
Change-Id: I1f7d2017ae1bab0f2f8cb0b100c2c6cc8b4f3dcd
Reviewed-on: http://gerrit.cloudera.org:8080/21905
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Avoids labeling stress tests with execute_serially so they're only run
once during run-all-tests. Previously stress tests would be run twice,
once for 'execute_serially' and again for 'stress'.
Documents the markers in pytest.ini.
Change-Id: I49bfd745881da992815292d16e1a311ab1884abf
Reviewed-on: http://gerrit.cloudera.org:8080/20395
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Python 3 made the main dictionary methods lazy (items(),
keys(), values()). This means that code that uses those
methods may need to wrap the call in list() to get a
list immediately. Python 3 also removed the old iter*
lazy variants.
This changes all locations to use Python 3 dictionary
methods and wraps calls with list() appropriately.
This also changes all itemitems(), itervalues(), iterkeys()
locations to items(), values(), keys(), etc. Python 2
will not use the lazy implementation of these, so there
is a theoretical performance impact. Our python code is
mostly for tests and the performance impact is minimal.
Python 2 will be deprecated when Python 3 is functional.
This addresses these pylint warnings:
dict-iter-method
dict-keys-not-iterating
dict-values-not-iterating
Testing:
- Ran core tests
Change-Id: Ie873ece54a633a8a95ed4600b1df4be7542348da
Reviewed-on: http://gerrit.cloudera.org:8080/19590
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
1. Python 3 requires absolute imports within packages. This
can be emulated via "from __future__ import absolute_import"
2. Python 3 changed division to "true" division that doesn't
round to an integer. This can be emulated via
"from __future__ import division"
This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.
I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.
Testing:
- Ran core tests
Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Python 3 now treats print as a function and requires
the parenthesis in invocation.
print "Hello World!"
is now:
print("Hello World!")
This fixes all locations to use the function
invocation. This is more complicated when the output
is being redirected to a file or when avoiding the
usual newline.
print >> sys.stderr , "Hello World!"
is now:
print("Hello World!", file=sys.stderr)
To support this properly and guarantee equivalent behavior
between python 2 and python 3, all files that use print
now add this import:
from __future__ import print_function
This also fixes random flake8 issues that intersect with
the changes.
Testing:
- check-python-syntax.sh shows no errors related to print
Change-Id: Ib634958369ad777a41e72d80c8053b74384ac351
Reviewed-on: http://gerrit.cloudera.org:8080/19552
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
We launch a background process checking whether tests are timeout in
run-all-tests.sh. When NUM_TEST_ITERATIONS is set to larger than 1,
run-all-tests.sh will repeat the tests. However, the timeout process is
killed at the end of each iteration, which fails the script when we want
to repeat tests. This patch moves the killing logic outside the loop.
This patch also adds a new variable, CLUSTER_TEST_FILES, to specify
a particular custom-cluster test to run.
To speedup the test iteration, this patch avoids always restarting the
Impala cluster. E.g. when we just need to run a particular EE test, we
only need to start the Impala cluster once.
Tested with NUM_TEST_ITERATIONS=10 and verified with following
scenarios.
1) custom-cluster test only
export BE_TEST, FE_TEST, JDBC_TEST, EE_TEST to false
export CLUSTER_TEST=true and CLUSTER_TEST_FILES to following values:
custom_cluster/test_local_catalog.py
custom_cluster/test_local_catalog.py::TestLocalCatalogRetries
custom_cluster/test_local_catalog.py::TestLocalCatalogRetries::test_replan_limit
"custom_cluster/test_local_catalog.py -k replan_limit"
2) e2e test only
export BE_TEST, FE_TEST, JDBC_TEST, CLUSTER_TEST to false
export EE_TEST=true and
EE_TEST_FILES=query_test/test_scanners.py::TestParquet::test_multiple_blocks_mt_dop
Change-Id: I2bdd8a9c68ffb0dd1c3ea72c3649b00abcc05a49
Reviewed-on: http://gerrit.cloudera.org:8080/18328
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This changes all existing Java code to be submodules under
a single root pom. The root pom is impala-parent/pom.xml
with minor changes to add submodules.
This avoids most of the weird CMake/maven interactions,
because there is now a single maven invocation for all
the Java code.
This moves all the Java projects other than fe into
a top level java directory. fe is left where it is
to avoid disruption (but still is compiled via the
java directory's root pom). Various pieces of code
that reference the old locations are updated.
Based on research, there are two options for dealing
with the shaded dependencies. The first is to have an
entirely separate Maven project with a separate Maven
invocation. In this case, the consumers of the shaded
jars will see the reduced set of transitive dependencies.
The second is to have the shaded dependencies as modules
with a single Maven invocation. The consumer would see
all of the original transitive dependencies and need to
exclude them all. See MSHADE-206/MNG-5899. This chooses
the second.
This only moves code around and does not focus on version
numbers or making "mvn versions:set" work.
Testing:
- Ran a core job
- Verified existing maven commands from fe/ directory still work
- Compared the *-classpath.txt files from fe and executor-deps
and verified they are the same except for paths
Change-Id: I08773f4f9d7cb269b0491080078d6e6f490d8d7a
Reviewed-on: http://gerrit.cloudera.org:8080/16500
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
ASAN maintains stacks for each allocation and free of memory. Impala
sometimes allocates/frees memory from codegen'd code, so this means
that the number of distinct stacks is unbounded. ASAN is storing
these stacks in a hash table with a fixed number of buckets (one million).
As the stacks accumulate, allocations and frees get slower and slower,
because the lookup in this hashtable gets slower. This causes test
execution time to degrade over time. Since backend tests and custom cluster
tests don't have long running daemons, only the end to end tests are
affected.
This adds support for breaking end-to-end test execution into shards,
restarting Impala between each shard. This uses the preexisting shard_tests
pytest functionality introduced for the docker-based tests in IMPALA-6070.
The number of shards is configurable via the EE_TEST_SHARDS environment
variable. By default, EE_TEST_SHARDS=1 and no sharding is used.
Without sharding, an ASAN core job takes about 16-17 hours. With 6 shards,
it takes about 9 hours. It is recommended to always use sharding with ASAN.
Testing:
- Ran core job
- Ran ASAN with EE_TEST_SHARDS=6
Change-Id: I0bdbd79940df2bc7b951efdf0f044e6b40a3fda9
Reviewed-on: http://gerrit.cloudera.org:8080/16155
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
If someone passes --skip-stress multiple times to tests/run-tests.py,
it currently only removes one of the occurrences from the arguments
and allows the other one to pass through to pytest. This causes pytest
to immediately error out. This behavior is seen on the docker-based
tests, because test-with-docker.py specifies --skip-stress and
bin/run-all-tests.sh adds another --skip-stress for core runs.
This changes tests/run-tests.py to handle multiple occurrences of
--skip-stress, --skip-parallel, and --skip-serial.
Testing:
- Tested manually with duplicate skip flags.
Change-Id: I60dc9a898f69804e2a53c05b5dfab2f948a22097
Reviewed-on: http://gerrit.cloudera.org:8080/14629
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds impala-shell support to connect to HiveServer2 HTTP endpoint.
Relies on toolchain change at https://gerrit.cloudera.org/#/c/13725/.
Use --protocol='hs2-http' to enable this behavior.
Example usages:
---------------
impala-shell --protocol='hs2-http' (No auth)
impala-shell --protocol='hs2-http' --ldap -u..... (PLAIN auth)
impala-shell --protocol-'hs2-http' --ssl --ca_cert... (TLS)
impala-shell --protocol='hs2-http' --ldap --ssl --ca_cert... (LDAP +
TLS)
Limitations:
-----------
- Does not support Kerberos (-k) due to lack ot SPNEGO support.
Testing:
--------
- Parameterized existing shell tests to support this combination.
- Added shell test coverage for LDAP auth.
Change-Id: I8323950857dfe1c1dfd5377fde79f87bc2ce9534
Reviewed-on: http://gerrit.cloudera.org:8080/13746
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
This fixes all core e2e tests running on my local dockerised
minicluster build. I do not yet have a CI job or script running
but I wanted to get feedback on these changes sooner. The second
part of the change will include the CI script and any follow-on
fixes required for the exhaustive tests.
The following fixes were required:
* Detect docker_network from TEST_START_CLUSTER_ARGS
* get_webserver_port() does not depend on the caller passing in
the default webserver port. It failed previously because it
relied on start-impala-cluster.py setting -webserver_port
for *all* processes.
* Add SkipIf markers for tests that don't make sense or are
non-trivial to fix for containerised Impala.
* Support loading Impala-lzo plugin from host for tests that depend on
it.
* Fix some tests that had 'localhost' hardcoded - instead it should
be $INTERNAL_LISTEN_HOST, which defaults to localhost.
* Fix bug with sorting impala daemons by backend port, which is
the same for all dockerised impalads.
Testing:
I ran tests locally as follows after having set up a docker network and
starting other services:
./buildall.sh -noclean -notests -ninja
ninja -j $IMPALA_BUILD_THREADS docker_images
export TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster"
export FE_TEST=false
export BE_TEST=false
export JDBC_TEST=false
export CLUSTER_TEST=false
./bin/run-all-tests.sh
Change-Id: Iee86cbd2c4631a014af1e8cef8e1cd523a812755
Reviewed-on: http://gerrit.cloudera.org:8080/12639
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Exposes a list of build flags via the impalad web UI. The build flags
can be viewed on the root page under the "Version" section. They can
be accessed via other tests through the debug version of the root page
(e.g. adding &json to the URL). The build flags are listed in a JSON
array so that they can be parsed easily. This should help run Impala
tests against a remote Impala cluster.
The build flags are read in CMakeLists.txt and then stored in
preprocessor variables.
Three build flags are exposed as part of this commit:
- Is_NDEBUG = [true, false]
- Whether NDEBUG was true or false at compile time
- CMake_Build_Type = [DEBUG, RELEASE, ADDRESS_SANITIZER, TIDY, UBSAN,
UBSAN_FULL, TSAN, CODE_COVERAGE_RELEASE, CODE_COVERAGE_DEBUG]
- The value of CMAKE_BUILD_TYPE at compile time
- Library_Link_Type = [DYNAMIC, STATIC]
- Derived from the compile time value of BUILD_SHARED_LIBS
There are a few other minor changes that are apart of this commit:
* The patch modifies environ.py so that it supports fetching build metadata
for both local and remote clusters.
* The tests under the tests/webserver directory were not being run because
'webserver' was not whitelisted in tests/run-tests.py. This patch fixes
that and addresses several test failures in run-tests.py.
* It reverts part of IMPALA-6947 so that their is no dependency from
start-impala-cluster.py to environ.py. The timeout discussed IMPALA-6947
is now set at compile time.
Testing:
Added new tests to webserver/test_web_pages.py to ensure that the build
flags are being set. Some tests are only run when run against a local
cluster because we have no way of getting the build info from a remote
cluster, whereas local clusters contain a .cmake_build_type file.
Change-Id: I47e3ad4cbf844909bdaf22a6f9d7bd915dce3f19
Reviewed-on: http://gerrit.cloudera.org:8080/11410
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch reverses the revert of IMPALA-7660.
The problem with IMPALA-7660 was that urllib.urlopen added the
'context' parameter in 2.7.9, so it isn't present on rhel7, which uses
2.7.5
The fix is to switch to using the 'requests' library, which supports
ssl connections on all the platforms Impala is supported on.
This patch also adds more info to the error message printed by
start-impala-cluster.py when the debug webserver cannot be reached yet
to help with debugging these issues in the future.
Testing:
- Ran full builds on rhel7, rhel6, and ubuntu16.
Change-Id: I679469ed7f27944f75004ec4b16d513e6ea6b544
Reviewed-on: http://gerrit.cloudera.org:8080/11625
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Following up to IMPALA-6857, it's useful for monitoring tools to see if
the pause monitor is getting triggered, and to see other GC metrics.
The Java side here, and the Thrift side, were easy enough.
However, the Impala metric implementation here caused us to call into
the frontend to read through the JMX memory beans 72 times, because each
call to GetValue() was getting all the data for the pool. This structure
made it hard to add additional, non-pool, metrics, and it felt wasteful.
To combat this, I added a cache of 10 seconds for getting the metrics
from the Frontend. The counters will typically re-use the same data.
There are five metrics here, and to avoid yet another enum class, I used
C++ lambdas to capture which field of the Thrift object I care about. If
folks like the approach, I think it can simplify way the enums for the
pool metrics as well.
I measured the cost of calling into the metrics code by
looping the metrics-gathering 100 times and looking at CPU
time for the process using this script:
START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }')
for i in $(seq 100); do
curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null
done
END_CPU=$( cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }')
echo $START_CPU $END_CPU $(($END_CPU - $START_CPU))
On a release build on my development machine, gathering metrics 100
times took 0.16 cpu seconds without this change and 0.07 cpu seconds
with this change. The measurement accuracy here is 0.01 (I spot-checked
this with using the cpuacct cgroup infrastructure which gives you nanos,
but it was more painful to script), but this convinces me that this is a
net improvement.
Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Reviewed-on: http://gerrit.cloudera.org:8080/11468
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Many of the test modules included calls to 'logging.basicConfig' at
global scope in their implementation. This meant that by just importing
one of these files, other tests would inherit their logging format. This
is typically a bad idea in Python -- modules should not have side
effects like this on import.
The format was additionally inconsistent. In some cases we had a "--"
prepended to the format, and in others we didn't. The "--" is very
useful since it lets developers copy-paste query-test output back into
the shell to reproduce an issue.
This patch fixes the above by centralizing the logging configuration in
a pytest hook that runs prior to all pytests. A few other non-pytest
related tools now configure logging in their "main" code which is only
triggered when the module is executed directly.
I tested that, with this change, logs still show up properly in the .xml
output files from 'run-tests.py' as well as when running tests manually
from impala-py.test
Change-Id: I55ef0214b43f87da2d71804913ba4caa964f789f
Reviewed-on: http://gerrit.cloudera.org:8080/11225
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit tackles a few additions and improvements to
test-with-docker. In general, I'm adding workloads (e.g., exhaustive,
rat-check), tuning memory setting and parallelism, and trying to speed
things up.
Bug fixes:
* Embarassingly, I was still skipping thrift-server-test in the backend
tests. This was a mistake in handling feedback from my last review.
* I made the timeline a little bit taller to clip less.
Adding workloads:
* I added the RAT licensing check.
* I added exhaustive runs. This led me to model the suites a little
bit more in Python, with a class representing a suite with a
bunch of data about the suite. It's not perfect and still
coupled with the entrypoint.sh shell script, but it feels
workable. As part of adding exhaustive tests, I had
to re-work the timeout handling, since now different
suites meaningfully have different timeouts.
Speed ups:
* To speed up test runs, I added a mechanism to split py.test suites into
multiple shards with a py.test argument. This involved a little bit of work in
conftest.py, and exposing $RUN_CUSTOM_CLUSTER_TESTS_ARGS in run-all-tests.sh.
Furthermore, I moved a bit more logic about managing the
list of suites into Python.
* Doing the full build with "-notests" and only building
the backend tests in the relevant target that needs them. This speeds
up "docker commit" significantly by removing about 20GB from the
container. I had to indicates that expr-codegen-test depends on
expr-codegen-test-ir, which was missing.
* I sped up copying the Kudu data: previously I did
both a move and a copy; now I'm doing a move followed by a move. One
of the moves is cross-filesystem so is slow, but this does half the
amount of copying.
Memory usage:
* I tweaked the memlimit_gb settings to have a higher default. I've been
fighting empirically to have the tests run well on c4.8xlarge and
m4.10xlarge.
The more memory a minicluster and test suite run uses, the fewer parallel
suites we can run. By observing the peak processes at the tail of a run (with a
new "memory_usage" function that uses a ps/sort/awk trick) and by observing
peak container total_rss, I found that we had several JVMs that
didn't have Xmx settings set. I added Xms/Xmx settings in a few
places:
* The non-first Impalad does very little JVM work, so having
an Xmx keeps it small, even in the parallel tests.
* Datanodes do work, but they essentially were never garbage
collecting, because JVM defaults let them use up to 1/4th
the machine memory. (I observed this based on RSS at the
end of the run; nothing fancier.) Adding Xms/Xmx settings
helped.
* Similarly, I piped the settings through to HBase.
A few daemons still run without resource limitations, but they don't
seem to be a problem.
Change-Id: I43fe124f00340afa21ad1eeb6432d6d50151ca7c
Reviewed-on: http://gerrit.cloudera.org:8080/10123
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Allows running the tests that make up the "core" suite in about 2 hours.
By comparison, https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/buildTimeTrend
tends to run in about 3.5 hours.
This commit:
* Adds "echo" statements in a few places, to facilitate timing.
* Adds --skip-parallel/--skip-serial flags to run-tests.py,
and exposes them in run-all-tests.sh.
* Marks TestRuntimeFilters as a serial test. This test runs
queries that need > 1GB of memory, and, combined with
other tests running in parallel, can kill the parallel test
suite.
* Adds "test-with-docker.py", which runs a full build, data load,
and executes tests inside of Docker containers, generating
a timeline at the end. In short, one container is used
to do the build and data load, and then this container is
re-used to run various tests in parallel. All logs are
left on the host system.
Besides the obvious win of getting test results more quickly, this
commit serves as an example of how to get various bits of Impala
development working inside of Docker containers. For example, Kudu
relies on atomic rename of directories, which isn't available in most
Docker filesystems, and entrypoint.sh works around it.
In addition, the timeline generated by the build suggests where further
optimizations can be made. Most obviously, dataload eats up a precious
~30-50 minutes, on a largely idle machine.
This work is significantly CPU and memory hungry. It was developed on a
32-core, 120GB RAM Google Compute Engine machine. I've worked out
parallelism configurations such that it runs nicely on 60GB of RAM
(c4.8xlarge) and over 100GB (eg., m4.10xlarge, which has 160GB). There is
some simple logic to guess at some knobs, and there are knobs. By and
large, EC2 and GCE price machines linearly, so, if CPU usage can be kept
up, it's not wasteful to run on bigger machines.
Change-Id: I82052ef31979564968effef13a3c6af0d5c62767
Reviewed-on: http://gerrit.cloudera.org:8080/9085
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
A set-literal snuck into run-tests.py in a recent
change. We wish to avoid these to be able to run on
py2.6.
Change-Id: I81928d1880a493b91abb13b3a8149568c9789f66
Reviewed-on: http://gerrit.cloudera.org:8080/9843
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Philip Zeyliger <philip@cloudera.com>
exit_code for EE tests when no tests are collected.After this
change return_code will be either 0 if no tests are expected
to be collected (dry-run) and 1 if tests are expected to be
collected but are not collected due to some error
Testing:
- Ran end-to-end shell, hs2 tests for IMPALA-5886 with debug
statements to verify the exit_codes
- Ran end-to-end shell tests with collect-only for IMPALA-4812
Change-Id: If82f974cc2d1e917464d4053563eaf4afc559150
Reviewed-on: http://gerrit.cloudera.org:8080/9494
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins
IMPALA-6715:
This commit
IMPALA-6551: Change Kudu TPCDS and TPCH columns to DECIMAL
added additional decimal_v2 queries to the stress test that amount to
running the same query twice. This makes the binary search run
incredibly slow.
- Fix the query selection. Add additional queries that weren't matching
before, like the tpcds-q[0-9]+a.test series.
- Add a test that will at least ensure if
testdata/workloads/tpc*/queries is modified, the stress test will
still find the same number of queries for the given workload. There's
no obvious place to put this test: it's not testing the product at
all, so:
- Add a new directory tests/infra for such tests and add it to
tests/run-tests.py.
- Move the test from IMPALA-6441 into tests/infra.
Testing:
- Core private build passed. I manually looked to make sure the moved
and new tests ran.
- Short stress test run. I checked the runtime info and saw the new
TPCDS queries in the JSON.
- While testing on hardware clusters down stream, I noticed...
IMPALA-6736:
TPC-DS Q67A is 10x more expensive to run without spilling than any
other query. I fixed the --filter-query-mem-ratio option to work. This
will still run Q67A during the binary search phase, but if a cluster
is too small, the query will be skipped.
Change-Id: I3e26b64d38aa8d63a176daf95c4ac5dee89508da
Reviewed-on: http://gerrit.cloudera.org:8080/9758
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins
The current version of pytest in the Impala python environment is
quite old (2.7.2) and there have been bug fixes in later versions
that we could benefit from.
Also, since the passing of params to pytest.main() as a string will
be deprecated in upcoming versions of pytest, edit run-tests.py to
instead pass params as a list. (This also means we don't need to
worry about esoteric bash limitations re: single quotes in strings.)
While working on this file, the filtering of commandline args when
running the verfier tests was made a little more robust.
Tested by doing a standard (non-exhaustive) test run on centos 6.4
and ubuntu 14.04, plus an exhaustive test run on RHEL7.
Change-Id: I40d129e0e63ca5bee126bac6ac923abb3c7e0a67
Reviewed-on: http://gerrit.cloudera.org:8080/5640
Tested-by: Impala Public Jenkins
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
From the Python docs:
"Changed in version 2.7: The positional argument specifiers can be
omitted, so '{} {}' is equivalent to '{0} {1}'."
http://gerrit.cloudera.org:8080/5401 used the newer form,
"{}".format(). This change uses the older backwards-compatible
compatible form.
Change-Id: If78b9b4061ca191932ac5b0b14e0ee8951a9d4e8
Reviewed-on: http://gerrit.cloudera.org:8080/5641
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
These stress tests were sometimes causing the end-to-end tests to hang
indefinitey, including in the pre-merge testing (sometimes called
"GVO" or "GVM").
This patch also prints to stdout some connections metrics that may
prove useful for debugging stress test hangs in the future. The
metrics are printed before and after stress tests are run when
run-tests.py is used.
Change-Id: Ibd30abf8215415e0f2830b725e43b005daa2bb2d
Reviewed-on: http://gerrit.cloudera.org:8080/5401
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
In the shell, double-quoted strings are not very close to "raw"
strings; double quotes end the string, but parameter expansion is also
performed forstrings like "${FOO}". To pass strings from Python to the
shell, I have replaced double quotes with single quotes and escaped
the single quote characters in the strings.
While I am here, add better logging in TestExecutor.run_tests to make
errors like this easier to diagnose.
Change-Id: I006eb559ec5f5b5b0379997fab945116dfc7e8d7
Reviewed-on: http://gerrit.cloudera.org:8080/5242
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
run-tests.py is a wrapper around impala-py.test. It abstracts away
the need to invoke separate runs for serial tests, parallel tests,
and metric verification tests.
Because it's possible for a user to specify certain test suites,
or even specific tests, on the command line when calling
run-tests.py, it had been necessary to override the command line
args when it came time to run the metric verification tests --
otherwise those other tests/suites would be rerun. Before this
patch, we had simply been stripping away all command line args.
However, that blanket approach causes problems when running tests
against a remote cluster, because we need to retain those command
line args that pertain to the remote cluster.
This patch selectively prunes unwanted command line args for the
last metric verification test stage, keeping the ones that we
need, and also adds extensive documentation for explaining why we
have to go through this fairly odd and elaborate step.
This patch was tested by running a sample test suite locally,
and against a remote cluster. Previously, the metric verification
stage had been failing for remote cluster tests (since they were
defaulting to localhost for services that were only available
remotely.) With the patch, the remote verfification tests were
passing.
Also, while I'm here, add a small change that exits immediately
if the user calls for --help. Before this, we actually still ran
the tests.
Change-Id: I069172f44c1307d55f85779cdb01fecc0ba1799e
Reviewed-on: http://gerrit.cloudera.org:8080/5135
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
I had missed this in my original logs consolidation patch.
This change is needed for Jenkins to pick up the EE test results
for reporting purposes.
Change-Id: I58e6a4a6392223de87ea2ce50a36dd35cafa5b86
Reviewed-on: http://gerrit.cloudera.org:8080/2667
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The problem: By default, all file descriptors opened by a process,
including sockets, are inherited by any forked child processes. This
includes the connection socket created at the beginning of each test
in ImpalaTestSuite.setup_class(). In
TestHiveMetaStoreFailure.test_hms_service_dies(), the Hive Metastore
is stopped and restarted, meaning the metastore in now a child process
of the test process. This causes the client connection not to be
closed when the parent process (the test) exits, meaning that one of a
finite number of connections (64) to Impala is left permanently in
use.
This would be barely noticeable except run-tests.py runs the mini
stress test with 4 * <num CPUs> concurrent clients by default. On our
build machines, this is 64 clients, which is also the default max
number of connections for an impalad. When a test process tries to
make the 65th connection (since the leaked connection is still there),
it blocks until a connection is freed up. Due to a quirk of the xdist
py.test plugin that I don't fully understand, the test framework will
not clean up test classes (and close the connections) until a number
of tests complete, causing the test process to deadlock.
The solution: use the close_fds argument to make sure the TCP socket
is closed in the spawned child process. This is also done in
CustomClusterTestSuite._start_impala_cluster() when it starts the new
cluster.
This patch also switches test_hms_failure.py to use check_call()
instead of call(), and explicitly caps the number of stress clients at
64.
Change-Id: I03feae922883a0624df1422ffb6ba5f1d83fb869
Reviewed-on: http://gerrit.cloudera.org:8080/1853
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
Python tests and infra scripts will now use "python" from the virtualenv
via $IMPALA_HOME/bin/impala-python. Some scripts could be simplified now
that python 2.6 and a dependable set of third-party libraries are
available but that is not done as part of this commit.
Change-Id: If1cf96898d6350e78ea107b9026b12ba63a4162f
Reviewed-on: http://gerrit.cloudera.org:8080/603
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
This patch adds protocol testing for the statestore by adding a simple
Python client which can easily be manipulated to exercise the case under
test. All the tests may be run in parallel, and the statestore does not
need to be restarted between runs. The longest test takes around 40s to
complete.
Change-Id: I72ff9488f9b2ce040c65c328244964c207590d47
Reviewed-on: http://gerrit.cloudera.org:8080/263
Tested-by: Internal Jenkins
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Re-enables data error tests which were not being included in
run-tests.py. Broken tests were updated, with one exception which
is tracked by IMPALA-1862. Depends on a related change to
Impala-lzo.
Change-Id: I4c42498bdebf9155a8722695a3305b63ecc6e5f3
Reviewed-on: http://gerrit.cloudera.org:8080/194
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This patch does a few things:
1) Move the metadata tests into their own folder under tests/. I think it's useful to
loosely categorize them so it's easier to run a subset of the tests that are most
useful for the changes you are making.
2) Reduce the test vectors for query_tests. We should have identical coverage in
the daily exhaustive runs but the normal runs should be much better. In particular,
deemphasizing scanner tests since that code is more stable now.
3) Misc test cleanup/consolidate python test files/etc.
Change-Id: I03c2f34877aed192c2a50665bd5e15fa85e12f1e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3831
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
Adds a new client API for retrieving all user defined functions (aggregate and scalar)
in a database. This is a requirement from CM Backup Disaster and Recovery.
Change-Id: I4e33d714795fe808370262f36218ea112f67ec30
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1271
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
The problem is that we were running the "verifiers" with the command
line parameters, which means the custom .py file was passed to them as well
causing the duplicate test runs.
Change-Id: I36f87e9b71ad49a05246af8006d4096c04541c27
Reviewed-on: http://gerrit.ent.cloudera.com:8080/981
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
This patch fixes a slightly pathological state that occurs when the
statestore is under heavy load. The result of the bug is that
subscribers cannot successfully re-register because the statestore never
marks them as failed.
The exact sequence of events is as follows:
1. Subscriber registers with state-store.
2. Statestore does not send heartbeats in timely fashion to
subscriber. Subscriber times-out.
3. Subscriber is restarted quickly. Statestore does not detect
restart.
4. Subscriber's RegisterSubscriber() call fails, because statestore
detects duplicate registration.
5. Subscriber restarts again. Since state-store is slow to send
heartbeats, the state-store has not detected the restart and the
subscriber receives a heartbeat message from the statestore and
does not reject it.
6. Statestore continues to believe subscriber is alive, since the
heartbeats are not being rejected.
To fix this, we add a registration ID to each successfully registered
subscriber that is known to both subscriber and statestore. If the
subscriber should restart and re-register, it receives a new
registration ID. Whenever a heartbeat arrives, it compares its
registration ID to that sent by the statestore with the heartbeat, and
rejects the heartbeat if they do not match.
We also allow re-registration of existing subscribers (getting rid of
the dreaded "Duplicate subscription" message). A new registration
overwrites an old one.
Change-Id: Ie32df3a586ccb375375ebfbcbec1aaeb930b6bfe
Reviewed-on: http://gerrit.ent.cloudera.com:8080/778
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
These HiveServer2 client tests (in tests/hs2) are intended to check the
HS2 API implementation in the following ways:
* API tests: Can all the supported API endpoints be called successfully?
* Query lifecycle tests: does calling the API in an unusual sequence
result in reasonable behaviour?
* Malformed query tests: does sending an incorrectly constructed request
correctly result in an error?
This patch adds a few simple tests as a starting point for a larger test suite.
Change-Id: I4b926d1639c640317ea3478bdeb0aa4b5a9286ee
Reviewed-on: http://gerrit.ent.cloudera.com:8080/320
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
This change adds support for auxiliary worksloads, tests, and datasets. This is useful
to augment the regular test runs with some additional tests that do not belong in the
main Impala repo.