Commit Graph

115 Commits

Author SHA1 Message Date
Thomas Tauber-Marshall
7eb64a9be7 IMPALA-7576: Add a timeout for all E2E tests
We've been seeing a lot of hangs in tests lately. This can waste
test resources by keeping machines busy, cause loss of coverage when
subsequent tests don't run, and can be difficult to diagnose if its
not clear which test hung.

This patch introduces a timeout of 2 hours for normal builds and 4
hours for slow builds for all tests run under pytest by using the
pytest-timeout plugin.

The timeouts were chosen to be generous to avoid false positives.
In recent runs I examined, the longest running test is
test_decimal_fuzz, which took 63 minutes in a DEBUG build and
162 minutes in an ASAN build.

Testing:
- Ran locally with a reduced timeout and confirmed the test is
  timed out when expected.

Change-Id: I301dd27a9767bfaef2756282014ef457a31956bd
Reviewed-on: http://gerrit.cloudera.org:8080/11447
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-09-17 21:49:01 +00:00
Thomas Tauber-Marshall
85f3bb0178 IMPALA-7499: build against CDH Kudu
This patch transitions from pulling in Kudu (libkudu_client.so and the
minicluster tarballs) from the toolchain to instead pull Kudu in with
the other CDH components.

For OSes where the CDH binaries are not provided but the toolchain
binaries are (only Ubuntu 14), we set USE_CDH_KUDU to false to
continue to download the toolchain binaries. We also continue
to use the toolchain binaries to build the client stub for OSes
where KUDU_IS_SUPPORTED is false.

This patch also fixes an issue in bootstrap_toolchain.py where we were
using the wrong g++ to compile the Kudu stub.

Testing:
- Verified building and running Impala works as expected for supported
  combinations of KUDU_IS_SUPPORTED/USE_CDH_KUDU

Change-Id: If6e1048438b6d09a1b38c58371d6212bb6dcc06c
Reviewed-on: http://gerrit.cloudera.org:8080/11363
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-09-11 01:01:06 +00:00
Michael Brown
c6ce735d1b update Flask to latest (1.0.2)
Similar to Fabric and Paramiko, make Flask part of extended test
requirements and upgrade it to its latest.

Testing:
- Impala builds, can do a full data load.
- Change works in my local environment.

Change-Id: Ibfc01d562e4d7fe48443d15074bd4c7d0176d2a0
Reviewed-on: http://gerrit.cloudera.org:8080/11335
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-27 21:01:51 +00:00
Michael Brown
063d2c9d55 IMPALA-7460 part 2: upgrade Paramiko and Fabric in extended test env
Upgrade Paramiko to the latest, 2.4.1. Paramiko drastically changed its
dependencies in Paramiko 2, dropping defunct Pycrypto and using Cryptography
instead.

https://github.com/paramiko/paramiko/issues/637
https://github.com/paramiko/paramiko/pull/394

This change implicitly removes the dependency on Pycrypto.

Also upgrade Fabric to the latest 1.x version, 1.14.0.

Testing:
- This works in my development environment.
- This works in my downstream stress and query gen environments.
- This works when doing a full data load.
- Impala still builds on a variety of OSs.

Change-Id: I0636d8113be449953420e1d5773f63d7c91943e3
Reviewed-on: http://gerrit.cloudera.org:8080/11308
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-24 01:12:16 +00:00
David Knupp
6e5ec22b12 IMPALA-7399: Emit a junit xml report when trapping errors
This patch will cause a junitxml file to be emitted in the case of
errors in build scripts. Instead of simply echoing a message to the
console, we set up a trap function that also writes out to a
junit xml report that can be consumed by jenkins.impala.io.

Main things to pay attention to:

- New file that gets sourced by all bash scripts when trapping
  within bash scripts:

  https://gerrit.cloudera.org/c/11257/1/bin/report_build_error.sh

- Installation of the python lib into impala-python venv for use
  from within python files:

  https://gerrit.cloudera.org/c/11257/1/bin/impala-python-common.sh

- Change to the generate_junitxml.py file itself, for ease of
  https://gerrit.cloudera.org/c/11257/1/lib/python/impala_py_lib/jenkins/generate_junitxml.py

Most of the other changes are to source the new report_build_error.sh
script to set up the trap function.

Change-Id: Idd62045bb43357abc2b89a78afff499149d3c3fc
Reviewed-on: http://gerrit.cloudera.org:8080/11257
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-23 18:33:58 +00:00
Michael Brown
971cf179f6 IMPALA-7460 part 1: require user to install Paramiko and Fabric
- Remove Fabric and Paramiko as requirements. They aren't needed by
  anything in buildall.sh.
- Add a means to install into the impala-python virtual environment by hand.
  impala-pip is fine for this.
- Add another requirements file for extended testing. The dependency
  situation is messy and untangling that out of impala-python and into
  lib/python should be out of the scope of IMPALA-7460.
- Update core tests, which cover real regressions that have happened in
  the past, to run against locations that don't require a Paramiko
  import. This moves some logic out of concurrent_select.py into a
  thinner module.
- Insulate ssh_util from globally-scoped import so that it only imports
  when needed.

Testing:
- This works in my development environment.
- This works in my downstream stress and query gen environments.
- This works when doing a full data load.
- Impala still builds on a variety of OSs.

Todo:
- A subsequent review will update the versions.

Change-Id: Ibf9010a0387b52c95b7bda5d1d4606eba1008b65
Reviewed-on: http://gerrit.cloudera.org:8080/11264
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-23 00:20:15 +00:00
Joe McDonnell
7f3f63424f IMPALA-7199: Add scripts to create code coverage reports
gcovr is a python library that uses gcov to generate
code coverage reports. This adds gcovr to the python
dependencies and adds bin/impala-gcovr to provide
easy access to gcovr's command line. gcovr 3.4
supports python 2.6+.

This also adds bin/coverage_helper.sh to provide a
simplified interface to generate reports and zero
coverage counters.

Code coverage data is written out when a program
exits, so it is important to avoid hard kills
to shut down the impalads when generating coverage.
This modifies testdata/bin/kill-all.sh to call
start-impala-cluster.py --kill when shutting down
the minicluster to try to avoid doing a hard kill.
It will still do a hard kill if impala is still
running after the softer kill.

Change-Id: I5b2e0b794c64f9343ec976de7a3f235e54d2badd
Reviewed-on: http://gerrit.cloudera.org:8080/10791
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-07-17 16:45:44 +00:00
Tianyi Wang
0b6be850ca IMPALA-5690: Part 2: Upgrade thrift to 0.9.3-p4
Dependency changes:
- BE and python use thrift 0.9.3-p4 from native-toolchain.
- FE uses thrift 0.9.3 from apache maven repo.
- Fb303 and http components dependencies are no longer needed in FE and
  are removed.
- The minimum openssl version requirement is increased to 1.0.1.

Configuration change:
- Thrift codegen option movable_type is enabled. New code no longer
  needs to use std::swap to avoid copying.

Cherry-picks: not for 2.x

Change-Id: I639227721502eaa10398d9490ff6ac63aa71b3a6
Reviewed-on: http://gerrit.cloudera.org:8080/9300
Reviewed-by: Tianyi Wang <twang@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-01 22:18:54 +00:00
Philip Zeyliger
202807e2ff Speed up Python dependencies.
This parallelizes downloading some Python libraries, giving a speedup of
$IMPALA_HOME/infra/python/deps/download_requirements.  I've seen this
take from 7-15 seconds before and from 2-5 seconds after.

I also checked that we always have at least Python 2.6 when
building Impala, so I was able to remove the try/except
handling in bootstrap_toolchain.

Change-Id: I7cbf622adb7d037f1a53c519402dcd8ae3c0fe30
Reviewed-on: http://gerrit.cloudera.org:8080/10234
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-01 22:12:39 +00:00
Michael Brown
2fb73f94b4 add impala-flake8
Add flake8 and dependencies to impala-python. The versions are
compatible with Python 2.6.6. Add the impala-flake8 entry point, similar
to impala-python.

Add setup.cfg which defines flake8 special rules and exemptions. They
are added to support 2-space indents and a max line length of 90.

Contributors writing Python can use impala-flake8 to look for formatting
mistakes. The two most common uses would be:

impala-flake8 myfile.py
or
git diff HEAD^ myfile.py | impala-flake8 --diff

In the second usage, flake8 will only examine lines changed. This allows
a contributor to fix their own code and not be penalized by flake8
violations that may already be present (though they are encouraged to
fix them if they can!)

Change-Id: Ib4ce9eca6f8b55eaec1c96e7db1ff630ac016be0
Reviewed-on: http://gerrit.cloudera.org:8080/10182
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-04-25 20:52:42 +00:00
Joe McDonnell
834b3b93a1 IMPALA-6790: Upgrade sqlparse to 0.1.19
Some remote cluster tests have failed to load data
due to sqlparse failing to split SQL statements
appropriately. The SQL file itself is identical
to our usual dataload, so it must be a unique
environment. The current version of sqlparse is
0.1.15.

This upgrades sqlparse to 0.1.19. When running on
the same environment with the newer version,
the problem does not occur. Note that this
is not the version used for the Impala shell.
Impala shell has sqlparse checked-in under
shell/ext-py.

Change-Id: Ic5289f86b78f1d77d91a8fa47d63b7a7eaa3af38
Reviewed-on: http://gerrit.cloudera.org:8080/10044
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-04-18 00:38:27 +00:00
Philip Zeyliger
eaf66172df IMPALA-6863: Make pip_download.py honor redirects.
As part of our continuing woes with PyPi infrastructure, we've now seen
redirects. Following redirects seems like the right thing to do, so I've
changed the downloader code to follow them.

I checked that this is available in Python 2.6.

The build failure signature looks like:

   Downloading AllPairs-2.0.1.tar.gz from cb85d029b3/AllPairs-2.0.1.tar.gz
   ('http error', 302, 'Found', <httplib.HTTPMessage instance at 0x7fbf7819b050>)
   Download failed after several attempts.
   Warning: Unable to download Python requirements.
   Warning: bootstrap_virtualenv or other Python-based tooling may fail.

Change-Id: Ic7551cec43a2d378df7e3cc7d521ace338b56ba2
Reviewed-on: http://gerrit.cloudera.org:8080/10083
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Philip Zeyliger <philip@cloudera.com>
Reviewed-by: Lars Volker <lv@cloudera.com>
2018-04-17 19:49:55 +00:00
Lars Volker
2194dfd0ff IMPALA-6731: Move execnet Python dependency to stage 2
It seems that execnet also cannot be installed together with
setuptools-scm if only a local mirror and index are available
(similar to https://github.com/pywebhdfs/pywebhdfs/issues/52).

Testing: Observed that execnet failed to install during
bootstrap_toolchain.py on a CentOS 6.4 EC2 instanc at 5:02pm (within the
brownout period). With this change, bootstrap_toolchain.py succeeded.

Change-Id: Ic949edcc03f0e068bdd84b6ede487e64dcf2439b
Reviewed-on: http://gerrit.cloudera.org:8080/9850
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins
2018-03-29 04:38:31 +00:00
Tim Armstrong
bc3e4fb376 IMPALA-6752: fix pip --no-binary usage
The key facts here are:
* --no-cache-dir is crucial because it prevents us pulling in a
  cached package compiled with the wrong compiler.
* --no-binary takes a argument specifying the set of packages it
  should apply to.

The latent bug was that we didn't provide an argument to --no-binary
and it instead it took --no-index as the argument, which was a no-op
because there are no packages of that name. IMPALA-6731 moved the
arguments, and instead --no-cache-dir became the argument to
--no-binary

Testing:
I could reliably reproduce the failure in my environment by deleting
infra/python/env then running a test with impala-py.test. This patch
is sufficient to solve it.

Change-Id: I118738347ca537b2dddfa6142c3eb5608c49c2e0
Reviewed-on: http://gerrit.cloudera.org:8080/9829
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-03-28 04:14:40 +00:00
Lars Volker
37565e3812 IMPALA-6731: Use private index in bootstrap_virtualenv
This change switches to using a private pypi index url when using a
private pypi mirror. This allows to run the tests without relying on the
public Python pypi mirrors.

Some packages can not detect their dependencies correctly when they get
installed together with the dependencies in the same call to pip. This
change adds a second stage of package installation to separate these
packages from their dependencies.

It also adds a few missing packages and updates some packages to newer
versions.

Testing: Ran this on a box where I blocked DNS resolution to Python's
upstream pypi.

Change-Id: I85f75f1f1a305f3043e0910ab88a880eeb30f00b
Reviewed-on: http://gerrit.cloudera.org:8080/9798
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
2018-03-26 22:45:03 +00:00
Lars Volker
4fbe4cb208 IMPALA-6697: Downgrade setuptools to be compatible with Python 2.6
Change-Id: I0d4727b7a5911269b82287ed9ce759f1e211f386
Reviewed-on: http://gerrit.cloudera.org:8080/9713
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
2018-03-18 23:31:17 +00:00
Lars Volker
b1ef7de0e7 IMPALA-6695: Fix PyPi regex, update setuptools version
pytest-runner, which is required by kudu-python requires are more recent
version of setuptools. Adding an explicit dependency required an update
to the regular expression to parse PyPi URLs.

Change-Id: Ia67189f81a31a9a5a0ed80cd4d6661762ef427b2
Reviewed-on: http://gerrit.cloudera.org:8080/9711
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
2018-03-18 16:39:32 +00:00
Tianyi Wang
af6769d95a IMPALA-6690: Fix pip_download.py on python 2.6
IMPALA-6682 used set literal syntax in pip_download.py, which is
introduced in python 2.7. This patch changes it to set constructor.

It's tested on python 2.6.9.

Change-Id: I82b4116ee056f605c8aadf39a8b92b78313cb8bf
Reviewed-on: http://gerrit.cloudera.org:8080/9694
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
2018-03-17 01:32:22 +00:00
Tianyi Wang
8dde41e802 IMPALA-6682: Remove MD5 assumption from pypi download script
pip_download.py assumes the python repository to use md5 as the hash
algorithm, which is not required by PEP-503 and not always true in
reality. This patch removes this assumption and enables support of all
hash algorithms in python hashlib.

Testing: buildall.sh works with 2 repos. One uses md5 and another uses
sha-256.

Change-Id: Ie78f851490cbab10daa654aece36dab6e6c4329b
Reviewed-on: http://gerrit.cloudera.org:8080/9683
Reviewed-by: Tianyi Wang <twang@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Impala Public Jenkins
2018-03-16 03:39:33 +00:00
Tim Armstrong
dc1282fbc9 IMPALA-6241: timeout in admission control test under ASAN
The fix for IMPALA-6241 is to increase the timeout for all slow builds.

While testing that fix, I discovered that the ASAN build detection logic
was failing silently, resulting in it assuming that it was testing a
DEBUG build. The error was:

  Unexpected DW_AT_name in first CU:
  /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-ubuntu-16-04/toolchain/source/llvm/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_preinit.cc;
  choosing DEBUG

The fix for that issue is to remove the build type detection heuristic
and instead just write a file with the build type as part of the build process.

Testing:
Before this change I was able to reproduce locally every 5-10 test
iterations. After this change I haven't seen it reproduce.

Change-Id: Ia4ed949cac99b9925f72e19e4adaa2ead370b536
Reviewed-on: http://gerrit.cloudera.org:8080/8652
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-11-29 03:28:22 +00:00
Sailesh Mukil
4e5497995b IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies
Builds on CentOS 6.4 fail due to dependencies not met for the new
'cryptography' python package.

The ADLS commit states that the new packages are only required for ADLS
and that ADLS on a dev environment is only supported from CentOS 6.7.

This patch moves the compiled requirements for ADLS from
compiled-requirements.txt to adls-requirements.txt and passing a
compiler to the Pip environment while installing the ADLS
requirements.

Testing: Tested it on a machine that with TARGET_FILESYSTEM='adls'
and also tested it on a CentOS 6.4 machine with the default
configuration.

Change-Id: I7d456a861a85edfcad55236aa8b0dbac2ff6fc78
Reviewed-on: http://gerrit.cloudera.org:8080/6998
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-26 07:52:40 +00:00
Sailesh Mukil
50bd015f2d IMPALA-5333: Add support for Impala to work with ADLS
This patch leverages the AdlFileSystem in Hadoop to allow
Impala to talk to the Azure Data Lake Store. This patch has
functional changes as well as adds test infrastructure for
testing Impala over ADLS.

We do not support ACLs on ADLS since the Hadoop ADLS
connector does not integrate ADLS ACLs with Hadoop users/groups.

For testing, we use the azure-data-lake-store-python client
from Microsoft. This client seems to have some consistency
issues. For example, a drop table through Impala will delete
the files in ADLS, however, listing that directory through
the python client immediately after the drop, will still show
the files. This behavior is unexpected since ADLS claims to be
strongly consistent. Some tests have been skipped due to this
limitation with the tag SkipIfADLS.slow_client. Tracked by
IMPALA-5335.

The azure-data-lake-store-python client also only works on CentOS 6.6
and over, so the python dependencies for Azure will not be downloaded
when the TARGET_FILESYSTEM is not "adls". While running ADLS tests,
the expectation will be that it runs on a machine that is at least
running CentOS 6.6.
Note: This is only a test limitation, not a functional one. Clusters
with older OSes like CentOS 6.4 will still work with ADLS.

Added another dependency to bootstrap_build.sh for the ADLS Python
client.

Testing: Ran core tests with and without TARGET_FILESYSTEM as
'adls' to make sure that all tests pass and that nothing breaks.

Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Reviewed-on: http://gerrit.cloudera.org:8080/6910
Tested-by: Impala Public Jenkins
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
2017-05-25 19:35:24 +00:00
Lars Volker
841fe7f621 IMPALA-5189: Pin version of setuptools-scm
A new upstream release of setuptools-scm (1.15.3) broke setting up the
python environment. A subsequently released version fixed the breakage.
Nonetheless pinning external dependencies seems like a good idea, so
this change pins the version of setuptools-scm to the new version
(1.15.4) to protect us from similar issues in the future.

I tested this by running the following command in a new virtualenv and
checking in the output that it installed the correct version of
setuptools-scm (1.15.4).

pip install --no-binary --no-index --no-cache-dir --find-links
infra/python/deps/ -r infra/python/deps/requirements.txt

Change-Id: I398972d2cdf3acc9d5d8c598fc5b964b7241f1d2
Reviewed-on: http://gerrit.cloudera.org:8080/6599
Reviewed-by: Lars Volker <lv@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-19 22:03:51 +00:00
Taras Bobrovytsky
4a79c9e7e3 IMPALA-5181: Extract PYPI metadata from a webpage
There were some build failures due to a failure to download a JSON file
containing package metadata from PYPI. We need to switch to downloading
this from a PYPI mirror. In order to be able to download the metadata
from a PYPI mirror, we need be able to extract the data from a web page,
because PYPI mirrors do not always have a JSON interface.

We implement a regex based html parser in this patch. Also, we increase
the number of download attempts and randomly vary the amount of time
between each attempt.

Testing:
- Tested locally against PYPI and a PYPI mirror.
- Ran a private build that passed (which used a PYPI mirror).

Change-Id: If3845a0d5f568d4352e3cc4883596736974fd7de
Reviewed-on: http://gerrit.cloudera.org:8080/6579
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-08 00:19:08 +00:00
Michael Brown
cc8a119839 IMPALA-5044: test infra: remove backports.tempfile
backports.tempfile is not compatible with Python 2.6, so if Python 2.6
is the Python used for end-to-end tests, this test unconditionally
fails.  Moreover, Py.test provides a builtin tmpdir fixture with
equivalent functionality. Remove the requirement and port tests using
backports.tempfile.TemporaryDirectory to use tmpdir.

Change-Id: I887b62eb1b3425fc8fd62562e28f0c17cb261f6d
Reviewed-on: http://gerrit.cloudera.org:8080/6316
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-09 01:57:37 +00:00
Lars Volker
768fc0ea27 IMPALA-4734: Set parquet::RowGroup::sorting_columns
This changes the HdfsParquetTableWriter to populate the
parquet::RowGroup::sorting_columns list with all columns mentioned in a
'sortby()' hint within INSERT statements. The columns are added to the
list in the order in which they appear inside the hint.

The change also adds backports.tempfile to the python requirements to
provide 'tempfile.TemporaryDirectory' on python 2.7.

The change also changes the default ordering for columns mentioned in
'sortby()' hints from descending to ascending.

To test this change, we write a table with a 'sortby()' hint and verify,
that the sorting_columns get populated correctly.

Change-Id: Ib42aab585e9e627796e9510e783652d49d74b56c
Reviewed-on: http://gerrit.cloudera.org:8080/6219
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-07 09:07:05 +00:00
Tim Armstrong
c8e15e484c IMPALA-4593,IMPALA-4635: fix some python build issues
Build C/C++ packages with toolchain GCC to avoid ABI compatibility
issues. This requires a multi-step bootstrapping process:
1. install basic non-C/C++ packages into the virtualenv
2. use Python 2.7 from the virtualenv to bootstrap the toolchain
3. use toolchain gcc to build C/C++ packages
4. build the kudu-python package with toolchain gcc and Cython

To avoid potentially pulling in cached versions of packages
built with a different compiler, this patch also disables pip's
caching. This should not have a significant effect on performance
since we've enabled ccache and cache downloaded packages in
infra/python/deps.

Improve bootstrapping time significantly by using ccache and by
parallelising the numpy build - the most expensive part of the
install process. On a system with a warmed-up ccache,
bootstrapping after deleting infra/python/env takes 1m16s. Previously
it could take over 5m.

Testing:
Tested manually on Ubuntu 16.04 to confirm that it fixes the ABI
problem mentioned in IMPALA-4593. Initially "import kudu" failed
in my dev environment. After deleting infra/python/env and
re-bootstrapping, "import kudu" succeeded.

Also ran the standard test suite on CentOS 6 and built Impala on
a range of platforms (CentOS 5,6,7; SLES 11,12; Debian 6,7;
Ubuntu12.04,14.04,16.04) to make sure nothing broke.

Change-Id: I9e807510eddeb354069e0478363f649a1c1b75cf
Reviewed-on: http://gerrit.cloudera.org:8080/6218
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-07 02:56:18 +00:00
Matthew Jacobs
ed711330fc IMPALA-4934: Disable Kudu OpenSSL initialization
Bumps the Kudu version to include the change to the client
that allows Impala to disable SSL initialization.

In authentication.cc, after Impala initializes OpenSSL,
Impala then disables Kudu's OpenSSL init.

Fixed a python test case that started failing after bumping
the Kudu client version.

Change-Id: I3f13f3af512c6d771979638da593685524c73086
Reviewed-on: http://gerrit.cloudera.org:8080/6056
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
2017-02-22 05:06:20 +00:00
David Knupp
e5c098b076 IMPALA-4735: Upgrade pytest in python env to version 2.9.2.
The current version of pytest in the Impala python environment is
quite old (2.7.2) and there have been bug fixes in later versions
that we could benefit from.

Also, since the passing of params to pytest.main() as a string will
be deprecated in upcoming versions of pytest, edit run-tests.py to
instead pass params as a list. (This also means we don't need to
worry about esoteric bash limitations re: single quotes in strings.)

While working on this file, the filtering of commandline args when
running the verfier tests was made a little more robust.

Tested by doing a standard (non-exhaustive) test run on centos 6.4
and ubuntu 14.04, plus an exhaustive test run on RHEL7.

Change-Id: I40d129e0e63ca5bee126bac6ac923abb3c7e0a67
Reviewed-on: http://gerrit.cloudera.org:8080/5640
Tested-by: Impala Public Jenkins
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
2017-02-02 21:27:39 +00:00
Taras Bobrovytsky
2159beee89 IMPALA-4467: Add support for DML statements in stress test
- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Here's an example of a full call with the new options that runs the
stress test on the local mini cluster:
./concurrent_select.py \
    --tpch-kudu-db=tpch_kudu \
    --generate-dml-queries \
    --dml-mod-values 11 13 17 \
    --generate-compute-stats-queries \
    --select-probability=0.5 \
    --mem-limit-padding-pct=25 \
    --mem-limit-padding-abs=50 \
    --reset-databases-before-binary-search \
    --reset-databases-after-binary-search

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Reviewed-on: http://gerrit.cloudera.org:8080/5093
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Impala Public Jenkins
2016-12-20 01:33:01 +00:00
Matthew Jacobs
f3fe2cfe10 Bump Kudu python version to 1.1
Change-Id: I5834b3aa4eeae363eae938f61e473c52a0fe5596
Reviewed-on: http://gerrit.cloudera.org:8080/5307
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-12-01 23:11:49 +00:00
Tim Armstrong
51b1310681 IMPALA-3872: allow providing PyPi mirror for python packages
We still rely on the python.org json API, which doesn't seem to be
mirrored (instead there's a html-based index format implemented by
the mirrors).

The mirror can be provided by setting the PYPI_MIRROR environment
variable. The default is "https://pypi.python.org".

Change-Id: Ibc11f010332c0225121c86c9930e35c7ac01409c
Reviewed-on: http://gerrit.cloudera.org:8080/4770
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-11-08 05:34:50 +00:00
Matthew Jacobs
9b507b6ed6 IMPALA-4379: Fix and test Kudu table type checking
Creating Kudu tables shouldn't allow types not supported by
Kudu (e.g. VARCHAR/CHAR, DECIMAL, TIMESTAMP, collection types).
The behavior is inconsistent: for some types it throws in
the catalog, for VARCHAR/CHAR these become strings. This changes
behavior so that all fail during analysis. Analysis tests
were added.

Similarly, external tables cannot contain Kudu types that
Impala doesn't support (e.g. UNIXTIME_MICROS, BINARY). Tests
were added to validate this behavior. Note that this
required upgrading the python Kudu client.

This also fixes a small corner case with ALTER TABLE:
ALTER TABLE shouldn't allow Kudu tables to change the
storage descriptor tblproperty, otherwise the table metadata
gets in an inconsistent state.

Tests were added for all of the above.

Change-Id: I475273cbbf4110db8d0f78ddf9a56abfc6221e3e
Reviewed-on: http://gerrit.cloudera.org:8080/4857
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2016-10-31 16:03:54 +00:00
Dimitris Tsirogiannis
041fa6d946 IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
With this commit we simplify the syntax and handling of CREATE TABLE
statements for both managed and external Kudu tables.

Syntax example:
CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b))
DISTRIBUTE BY HASH (a) INTO 3 BUCKETS,
RANGE (b) SPLIT ROWS (('abc', 'def'))
STORED AS KUDU

Changes:
1) Remove the requirement to specify table properties such as key
   columns in tblproperties.
2) Read table schema (column definitions, primary keys, and distribution
   schemes) from Kudu instead of the HMS.
3) For external tables, the Kudu table is now required to exist at the
   time of creation in Impala.
4) Disallow table properties that could conflict with an existing
   table. Ex: key_columns cannot be specified.
5) Add KUDU as a file format.
6) Add a startup flag to impalad to specify the default Kudu master
   addresses. The flag is used as the default value for the table
   property kudu_master_addresses but it can still be overriden
   using TBLPROPERTIES.
7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE
   wasn't implemented for Kudu tables and silently ignored. The Kudu
   tables wouldn't be removed in Kudu.
8) Remove DDL delegates. There was only one functional delegate (for
   Kudu) the existence of the other delegate and the use of delegates in
   general has led to confusion. The Kudu delegate only exists to provide
   functionality missing from Hive.
9) Add PRIMARY KEY at the column and table level. This syntax is fairly
   standard. When used at the column level, only one column can be
   marked as a key. When used at the table level, multiple columns can
   be used as a key. Only Kudu tables are allowed to use PRIMARY KEY.
   The old "kudu.key_columns" table property is no longer accepted
   though it is still used internally. "PRIMARY" is now a keyword.
   The ident style declaration is used for "KEY" because it is also used
   for nested map types.
10) For managed tables, infer a Kudu table name if none was given.
   The table property "kudu.table_name" is optional for managed tables
   and is required for external tables. If for a managed table a Kudu
   table name is not provided, a table name will be generated based
   on the HMS database and table name.
11) Use Kudu master as the source of truth for table metadata instead
   of HMS when a table is loaded or refreshed. Table/column metadata
   are cached in the catalog and are stored in HMS in order to be
   able to use table and column statistics.

Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Reviewed-on: http://gerrit.cloudera.org:8080/4414
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-10-21 10:52:25 +00:00
David Knupp
a42d18dcc3 IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
We used to include a step in run-hbase.sh for calling a python
script that queried Zookeeper to see if the HBase master was up.
The original script was problematic, so we stopped using it during
our mini-cluster HBase start up procedure.

HBase start up issues continue to plague us, however. This patch
reintroduces a Zookeeper check, with the following updates:

- replace the original script with check-hbase-nodes.py
- query the correct node /hbase/master, not just /hbase/rs
- use the python Zookeeper library kazoo, rather than calling
  out to the shell and parsing the return string
- since we are moving toward testing on a remote cluster, also
  add the capability to pass in the address for the host that
  provides the Zookeeper and HBase services
- add an additional check that the HDFS service is running,
  because of an edge case where the HBase master can briefly
  start without a cluster running.

In addition to the expected tests, this script was also tested
under the conditions of IMPALA-4088, whereby the HBase RegionServer
is running, but the master fails because another listening process
has already taken its TCP port (60010) during startup.

Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Reviewed-on: http://gerrit.cloudera.org:8080/4348
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-09-15 00:02:22 +00:00
Jim Apple
bd2947329e IMPALA-4110: Clean up issues found by Apache RAT.
Change-Id: I5bfe77f9a871018e7a67553ed270e2df53006962
Reviewed-on: http://gerrit.cloudera.org:8080/4361
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-09-14 22:09:24 +00:00
Zoltan Ivanfi
a60ba6d274 IMPALA-4006: dangerous rm -rf statements in scripts
Quoted variable substitutions in rm -rf commands and in many other
places. This prevents disasters if those variables contain whitespace.

Redirected output of the cd commands to /dev/null. This prevents
polluting the target variable with the directory name when the CDPATH
environment variable is set.

Change-Id: I7503794180dee99eeb979e67f34e3b2edade70fe
Reviewed-on: http://gerrit.cloudera.org:8080/4078
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2016-09-01 21:26:52 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Tim Armstrong
904265ccb5 Update .gitignore files for ninja, coredumps and pypi packages
Change-Id: Ie7d34fbd27150ba6c437207611f71bb95a0e4cba
Reviewed-on: http://gerrit.cloudera.org:8080/3814
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-07-29 21:42:07 +00:00
Lars Volker
b94f88a697 IMPALA-3886: Improve log of pip_download.py
pip_download.py prints the following line for each dependency that is
already up-to-date:

File with matching md5sum already exists, skipping download.

This change adds the filename to the message so it is more useful.

Change-Id: Ie3d81743814be37ee8ddbe04c264ed2bf37410f9
Reviewed-on: http://gerrit.cloudera.org:8080/3687
Reviewed-by: Lars Volker <lv@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2016-07-21 08:30:51 -07:00
Taras Bobrovytsky
baf8fe202c IMPALA-3778: Fix ASF packaging build
The tarballs in IMPALA_HOME/infra/python/deps and the thirdparty
directory have been removed in the ASF repository. All Python
dependencies and CDH components must now be downloaded as part of every
build. This caused the ASF packaging build to fail. Before this patch,
we used the system pip to download the Python dependencies, which caused
flakiness and inconsistency on different operating systems. This patch
fixes the problem by using our own script (which requires Python 2.6+ to
be installed on the system), to download all the files in
requirements.txt.

Also replaced all whl and zip Python packages with tar.gz to make it
consistent with the ASF build.

Change-Id: Ibe5a743096cda2059bd330805d324983f6730e19
Reviewed-on: http://gerrit.cloudera.org:8080/3647
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
2016-07-14 19:04:45 +00:00
Tim Armstrong
a070217750 IMPALA-3774: fix download_requirements for older Python versions
Pip always runs the setup.py file in downloaded tarballs to get
metadata. Impyla's setup.py does not work in some older python
installations since find_packages() in setuptools does not support
the 'include' argument.

As a workaround, use our pip_download.py script to download Impyla
instead of pip.

Testing:
Confirmed that Jenkins build successfully downloaded the pip packages
and was able to bootstrap the virtualenv.

Change-Id: Id8801493c0f4caab2273383333ffbe2729b8339b
Reviewed-on: http://gerrit.cloudera.org:8080/3574
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2016-07-06 14:40:54 -07:00
Michael Ho
a07fc367ee Revert "IMPALA-1619: Support 64-bit allocations."
This reverts commit 1ffb2bd5a2a2faaa759ebdbaf49bf00aa8f86b5e.

Unbreak the packaging builds for now.

Change-Id: Id079acb83d35b51ba4dfe1c8042e1c5ec891d807
Reviewed-on: http://gerrit.cloudera.org:8080/3543
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Michael Ho <kwho@cloudera.com>
2016-07-05 13:37:26 -07:00
Michael Ho
5f3dfdf6c7 IMPALA-1619: Support 64-bit allocations.
This change extends MemPool, FreePool and StringBuffer to support
64-bit allocations, fixes a bug in decompressor and extends various
places in the code to support 64-bit allocation sizes. With this
change, the text scanner can now decompress compressed files larger
than 1GB.

Note that the UDF interfaces FunctionContext::Allocate() and
FunctionContext::Reallocate() still use 32-bit for the input
argument to avoid breaking compatibility.

In addition, the byte size of a tuple is still assumed to be
within 32-bit. If it needs to be upgraded to 64-bit, it will be
done in a separate change.

Change-Id: I7ed28083d809a86d801a9c063a0aa32c50d32b20
Reviewed-on: http://gerrit.cloudera.org:8080/2781
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-07-05 13:37:25 -07:00
Jim Apple
a5ae2bfd88 IMPALA-3762: Download Python requirements before they are needed.
This is needed for ASF builds. It sounds expensive, but takes less
than 10 seconds if the packages are already present.

Change-Id: I84103c2fb8f9a93336bf28b644ca045f15651dd6
Reviewed-on: http://gerrit.cloudera.org:8080/3452
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Jim Apple <jbapple@cloudera.com>
2016-06-22 14:38:57 -07:00
Jim Apple
140220323d IMPALA-3767: bootstrap_virtualenv fails to find cython distribution
This patch avoids trying to download a Cython binary 0.23.4, which pip
has trouble finding.

Change-Id: Ic6733ccb71bcf99196075faa2fb6cf2a1d6276ce
Reviewed-on: http://gerrit.cloudera.org:8080/3427
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Internal Jenkins
2016-06-21 19:38:19 -07:00
Tim Armstrong
fc3ff1c52f IMPALA-3763: download_requirements fixes
* Download to infra/python/deps instead of the current directory.
* Download the correct virtualenv version, to match the version on
  cdh5-trunk
* Don't re-download packages repeatedly, instead check the md5sum.

Testing:
Tested manually on the ASF tree, then made sure that
bootstrap_virtualenv completed successfully to make sure we had all
of the requirements downloaded successfully.

Change-Id: I5a3c42236dddfd8a456c82605dc1fdc199a2bc48
Reviewed-on: http://gerrit.cloudera.org:8080/3416
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Internal Jenkins
2016-06-21 00:37:54 -07:00
Tim Armstrong
ec3a1c7866 download_requirements should download kudu-python and virtualenv
This is required for the ASF migration, since we don't want to include
all of the tarballs in the repo and we want to allow developers to build
using dependencies obtained from the standard upstream sources.

Also remove a workaround for an old issue with building an impyla
development version package.

Change-Id: Ie9216596db0f37d706ea7f77c129cecd5b070429
Reviewed-on: http://gerrit.cloudera.org:8080/3217
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-06-13 17:32:27 -07:00
Michael Brown
22669e23be IMPALA-3501: ee tests: detect build type and support different timeouts based on the same
Impala compiled with the address sanitizer, or compiled with code
coverage, runs through code paths much slower. This can cause end-to-end
tests that pass on a non-ASAN or non-code coverage build to fail. Some
examples include IMPALA-2721, IMPALA-2973, and IMPALA-3501. These
classes of failures tend always to involve some time-sensitive condition
that fails to succeed under such "slow builds".

The works-around in the past have been to simply increase the timeout.
The problem with this approach is that it relaxes conditions for tests
on builds that see the field--i.e., release builds--for builds that
never will--i.e., ASAN and code coverage.

This patch fixes that problem by allowing test authors to set timeout
values based on a *specific* build type. The author may choose timeouts
with a default value, and different timeouts for either or both
so-called "slow builds": ASAN and code coverage.

We detect the so-called "specific build type" by inspecting the binary
expected to be at the path under test. This removes the need to make
alterations to Impala itself. The inspection done is to read the DWARF
information in the binary, specifically the first compile unit's
DW_AT_producer and DW_AT_name DIE attributes. We employ a heuristic
based on these attributes' values to guess the build type. If we can't
determine the build type, we will assume it's a debug build. More
information on this is in IMPALA-3501.

A quick summary of the changes follows:

1. Move some of the logic in tests.common.skip to tests.common.environ
   and rework some skip marks to be more precise.

2. Add Pyelftools for convenient deserialization of DWARF

3. Our Pyelftools usage requires collections.OrderedDict, which isn't in
   python2.6; also add Monkeypatch to handle this.

4. Add ImpalaBuild and specific_build_type_timeout, the core of the new
   functionality

5. Fix the statestore tests that only fail under code coverage (the
   basis for IMPALA-3501)

Testing:

The tests that were previously, reliably failing under code coverage now
pass. I also ran perfunctory tests of debug, release, and ASAN builds to
ensure our detection of build type is working. This patch will *not*
turn the code coverage builds green; there are other tests that fail,
and fixing all of them here is out of the scope of this patch.

Change-Id: I2b675c04c54e36d404fd9e5a6cf085fb8d6d0e47
Reviewed-on: http://gerrit.cloudera.org:8080/3156
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
2016-05-25 19:41:45 -07:00
Michael Brown
5112e65be2 Revert "Revert "Add Kudu test helpers""
This reverts commit f8dd5413b65d30646c3745dfc738ed812d50a51f and
effectively re-adds commit 9248dcb70478b8f93f022893776a0960f45fdc28. The
difference between this patch and its original is that I fixed the
changes introduced in infra/python/bootstrap_virtualenv.py to be
python2.4-compatible:

- removed the use of str.format(), preferring a str.join() pattern
- removed the call of the exit() builtin to prefer sys.exit()

The only testing I did for this patch was to ensure
CDH Impala-packaging-on-demand works.

Change-Id: I02ed97473868eacf45b25abe89b41e6fa2fce325
Reviewed-on: http://gerrit.cloudera.org:8080/3160
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
2016-05-24 16:40:59 -07:00