This is required for the ASF migration, since we don't want to include
all of the tarballs in the repo and we want to allow developers to build
using dependencies obtained from the standard upstream sources.
Also remove a workaround for an old issue with building an impyla
development version package.
Change-Id: Ie9216596db0f37d706ea7f77c129cecd5b070429
Reviewed-on: http://gerrit.cloudera.org:8080/3217
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Impala compiled with the address sanitizer, or compiled with code
coverage, runs through code paths much slower. This can cause end-to-end
tests that pass on a non-ASAN or non-code coverage build to fail. Some
examples include IMPALA-2721, IMPALA-2973, and IMPALA-3501. These
classes of failures tend always to involve some time-sensitive condition
that fails to succeed under such "slow builds".
The works-around in the past have been to simply increase the timeout.
The problem with this approach is that it relaxes conditions for tests
on builds that see the field--i.e., release builds--for builds that
never will--i.e., ASAN and code coverage.
This patch fixes that problem by allowing test authors to set timeout
values based on a *specific* build type. The author may choose timeouts
with a default value, and different timeouts for either or both
so-called "slow builds": ASAN and code coverage.
We detect the so-called "specific build type" by inspecting the binary
expected to be at the path under test. This removes the need to make
alterations to Impala itself. The inspection done is to read the DWARF
information in the binary, specifically the first compile unit's
DW_AT_producer and DW_AT_name DIE attributes. We employ a heuristic
based on these attributes' values to guess the build type. If we can't
determine the build type, we will assume it's a debug build. More
information on this is in IMPALA-3501.
A quick summary of the changes follows:
1. Move some of the logic in tests.common.skip to tests.common.environ
and rework some skip marks to be more precise.
2. Add Pyelftools for convenient deserialization of DWARF
3. Our Pyelftools usage requires collections.OrderedDict, which isn't in
python2.6; also add Monkeypatch to handle this.
4. Add ImpalaBuild and specific_build_type_timeout, the core of the new
functionality
5. Fix the statestore tests that only fail under code coverage (the
basis for IMPALA-3501)
Testing:
The tests that were previously, reliably failing under code coverage now
pass. I also ran perfunctory tests of debug, release, and ASAN builds to
ensure our detection of build type is working. This patch will *not*
turn the code coverage builds green; there are other tests that fail,
and fixing all of them here is out of the scope of this patch.
Change-Id: I2b675c04c54e36d404fd9e5a6cf085fb8d6d0e47
Reviewed-on: http://gerrit.cloudera.org:8080/3156
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
This reverts commit f8dd5413b65d30646c3745dfc738ed812d50a51f and
effectively re-adds commit 9248dcb70478b8f93f022893776a0960f45fdc28. The
difference between this patch and its original is that I fixed the
changes introduced in infra/python/bootstrap_virtualenv.py to be
python2.4-compatible:
- removed the use of str.format(), preferring a str.join() pattern
- removed the call of the exit() builtin to prefer sys.exit()
The only testing I did for this patch was to ensure
CDH Impala-packaging-on-demand works.
Change-Id: I02ed97473868eacf45b25abe89b41e6fa2fce325
Reviewed-on: http://gerrit.cloudera.org:8080/3160
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
Changes:
1) Add the python Kudu module to the virtualenv. Building the virtualenv
is much slower now because Cython and numpy are required. To help with
the rebuild time --no-cache was removed. That option was added to help
when using the dev version of impyla, the version number would be the
same but the module contents were different and the cache used the old
module contents.
2) Add some py.test fixtures to help create Kudu and Impala connections.
Change-Id: I8e5e22b38d5bd09a36238e66a69aa42d1a941de7
Reviewed-on: http://gerrit.cloudera.org:8080/2855
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Previously Impala disallowed LOAD DATA and INSERT on S3. This patch
functionally enables LOAD DATA and INSERT on S3 without making major
changes for the sake of improving performance over S3. This patch also
enables both INSERT and LOAD DATA between file systems.
S3 does not support the rename operation, so the staged files in S3
are copied instead of renamed, which contributes to the slow
performance on S3.
The FinalizeSuccessfulInsert() function now does not make any
underlying assumptions of the filesystem it is on and works across
all supported filesystems. This is done by adding a full URI field to
the base directory for a partition in the TInsertPartitionStatus.
Also, the HdfsOp class now does not assume a single filesystem and
gets connections to the filesystems based on the URI of the file it
is operating on.
Added a python S3 client called 'boto3' to access S3 from the python
tests. A new class called S3Client is introduced which creates
wrappers around the boto3 functions and have the same function
signatures as PyWebHdfsClient by deriving from a base abstract class
BaseFileSystem so that they can be interchangeably through a
'generic_client'. test_load.py is refactored to use this generic
client. The ImpalaTestSuite setup creates a client according to the
TARGET_FILESYSTEM environment variable and assigns it to the
'generic_client'.
P.S: Currently, the test_load.py runs 4x slower on S3 than on
HDFS. Performance needs to be improved in future patches. INSERT
performance is slower than on HDFS too. This is mainly because of an
extra copy that happens between staging and the final location of a
file. However, larger INSERTs come closer to HDFS permformance than
smaller inserts.
ACLs are not taken care of for S3 in this patch. It is something
that still needs to be discussed before implementing.
Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d
Reviewed-on: http://gerrit.cloudera.org:8080/2574
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
Some recent commits broke the query generator leopard framework, for
example QueryResultComparator requires a different number of arguments.
Additional changes:
- Added better support for running the query generator in nested types
mode
- Keeping track of the number of queries that returned data
- Made it easier to control behavior from a central place by adding
flags to controller.py
Change-Id: I8f47c52097ccd53df4233b88eea887ce5fab1955
Reviewed-on: http://gerrit.cloudera.org:8080/1968
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
I do not plan on enabling this any time soon, but I do
want to privately run the tests in a random order to
flush out flakiness.
Change-Id: Ib14bde2f35c33566e5cdba8a28a789e089f43be6
Reviewed-on: http://gerrit.cloudera.org:8080/1931
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The major changes are:
1) Collect backtrace and fatal log on crash.
2) Poll memory usage. The data is only displayed at this time.
3) Support kerberos.
4) Add random queries.
5) Generate random and TPC-H nested data on a remote cluster. The
random data generator was converted to use MR for scaling.
6) Add a cluster abstraction to run data loading for #5 on a
remote or local cluster. This also moves and consolidates some
Cloudera Manager utilities that were in the stress test.
7) Cleanup the wrappers around impyla. That stuff was getting
messy.
Change-Id: I4e4b72dbee1c867626a0b22291dd6462819e35d7
Reviewed-on: http://gerrit.cloudera.org:8080/1298
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Building readline on linux requires that a dev package of ncurses is
installed but it's typically not installed by default. It turns out that
pip's requirements file format has a way to mark modules as OS
dependent, we just need to use it.
Change-Id: Iacb5289a8406cfb975dd98867450228f4df275eb
Reviewed-on: http://gerrit.cloudera.org:8080/1640
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
The cm_api plugin has a readline dependency that is not correctly
resolved on OS X. For that reason we want to have this dependency part
of the Impala code, even though it might not be used on Linux.
Change-Id: I8c386e7b07f8f7e39d03260e96d23aaf4b00180a
Reviewed-on: http://gerrit.cloudera.org:8080/1613
Tested-by: Internal Jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Readability: Martin Grund <mgrund@cloudera.com>
- Parsing of describe statements with nested types
- Random query generation that involves nested types
- Query flattening (converts a query for a dataset with nested types
to an equivalent query for a flattened dataset)
Change-Id: If013d104fb90864dcf0934ef92157b95e917e7e8
Reviewed-on: http://gerrit.cloudera.org:8080/1375
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
Impyla now has better support for Hive with commit c83a11. This is
useful for testing.
Change-Id: I2814cbff6f1fcfce51a0271b68daababca33dc65
Reviewed-on: http://gerrit.cloudera.org:8080/744
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Previously thrift_sasl was brought into the virtualenv by building
the shell. That meant the shell had to be built before the
virtualenv could be used. By includeing thrift_sasl directly, the
virtualenv can be used even if impala/shell is not built.
Change-Id: Id1a099036b1ac8add5a314af981789ebf69ce465
Reviewed-on: http://gerrit.cloudera.org:8080/685
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This adds a bootstrap script and a "impala-python" command to
$IMPALA_HOME/bin that automatically runs the bootstrap and redirects to
the virtualenv python. Existing python scripts will later be updated to
use the this new "impala-python" command.
The bootstrap script will build a virtualenv to ensure a minimum python
version (2.6) and a well known set of dependencies. The bootstrap script
can be run with python 2.4 but 2.6 must already be installed on the
system. The resulting virtualenv will use 2.6 at a minimum.
Only dependencies explicitly listed in requirements.txt will be
installed and available (no system packages will ever be used). No
packages will ever be downloaded when setting up the virtualenv. In the
future new dependencies can be added by editing the requirements.txt
file. Installation through requirements.txt is a standard pip feature.
When requirements.txt is updated, the next run of "impala-python" will
rebuild the virtualenv.
Change-Id: I150595d7e09a45d5f2e3c30a845bc8d6a761eeed
Reviewed-on: http://gerrit.cloudera.org:8080/424
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins