127 Commits

Author SHA1 Message Date
Joe McDonnell
3ce0004c12 IMPALA-14512: Remove dependency on sh python package
This modifies bin/single_node_perf_run.py to stop using the sh
python package. It replaces sh with calls to subprocess. It
stops installing sh for both the Python 2 and 3 virtualenvs.

Testing:
 - Ran perf-AB-test job with it and examined the logs

Change-Id: Ic5f9316a5d83c5c0dc37d4a94c55b6a655765fe3
Reviewed-on: http://gerrit.cloudera.org:8080/23600
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-11-20 03:29:48 +00:00
Joe McDonnell
1913ab46ed IMPALA-14501: Migrate most scripts from impala-python to impala-python3
To remove the dependency on Python 2, existing scripts need to use
python3 rather than python. These commands find those
locations (for impala-python and regular python):
git grep impala-python | grep -v impala-python3 | grep -v impala-python-common | grep -v init-impala-python
git grep bin/python | grep -v python3

This removes or switches most of these locations by various means:
1. If a python file has a #!/bin/env impala-python (or python) but
   doesn't have a main function, it removes the hash-bang and makes
   sure that the file is not executable.
2. Most scripts can simply switch from impala-python to impala-python3
   (or python to python3) with minimal changes.
3. The cm-api pypi package (which doesn't support Python 3) has been
   replaced by the cm-client pypi package and interfaces have changed.
   Rather than migrating the code (which hasn't been used in years), this
   deletes the old code and stops installing cm-api into the virtualenv.
   The code can be restored and revamped if there is any interest in
   interacting with CM clusters.
4. This switches tests/comparison over to impala-python3, but this code has
   bit-rotted. Some pieces can be run manually, but it can't be fully
   verified with Python 3. It shouldn't hold back the migration on its own.
5. This also replaces locations of impala-python in comments / documentation /
   READMEs.
6. kazoo (used for interacting with HBase) needed to be upgraded to a
   version that supports Python 3. The newest version of kazoo requires
   upgrades of other component versions, so this uses kazoo 2.8.0 to avoid
   needing other upgrades.

The two remaining uses of impala-python are:
 - bin/cmake_aux/create_virtualenv.sh
 - bin/impala-env-versioned-python
These will be removed separately when we drop Python 2 support
completely. In particular, these are useful for testing impala-shell
with Python 2 until we stop supporting Python 2 for impala-shell.

The docker-based tests still use /usr/bin/python, but this can
be switched over independently (and doesn't impact impala-python)

Testing:
 - Ran core job
 - Ran build + dataload on Centos 7, Redhat 8
 - Manual testing of individual scripts (except some bitrotted areas like the
   random query generator)

Change-Id: If209b761290bc7e7c716c312ea757da3e3bca6dc
Reviewed-on: http://gerrit.cloudera.org:8080/23468
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2025-10-22 16:30:17 +00:00
Riza Suminto
2975f10701 IMPALA-14308: Workaround failure in impala_python3 build
Construction of the impala-virtualenv fails since PyPI released version
7.0.0 of pbr. This blocks all precommit runs, since the Impala
virtualenv is required for all end-to-end tests.

The failure happen during pywebhdfs==0.3.2 installation. It is expected
to pullthe pinned version pbr==3.1.1, but the latest pbr==7.0.0 was
pulled instead. pbr==7.0.0 then broke with this error message:

  ModuleNotFoundError: No module named 'packaging.requirements'

This patch adds workaround in bootstrap_virtualenv.py to install
packaging==24.1 early for python3. Installing it early managed to
unblock `make -j impala_python3`. packaging==24.1 package is already
listed in infra/python/deps/gcovr-requirements.txt, which installed in
later step and in python3 virtualenv only.

Testing:
Pass shell/ tests in Ubuntu 22.04 and Rocky 9.2.

Change-Id: I0167fb5e1e0637cdde64d0d3beaf6b154afc06b1
Reviewed-on: http://gerrit.cloudera.org:8080/23292
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Jason Fehr <jfehr@cloudera.com>
2025-08-13 20:57:08 +00:00
Laszlo Gaal
fae42323da IMPALA-14144: Make pip_download.py more tolerant with PEP 503 simple pages
Recent package updates on PyPI have introduced package description
pages that have extra newlines in addition to the newline character
separating the complete URLs for the difference package versions.
These extra newlines usually show up before the closing angle bracket
character ('>') of the opening half of the anchor tag.

This broke pip_download.py, because it uses a regex to crack out
various data items (file name, download path, hash algorithm and hash
value) from the download page. The regex attempts the whole anchor
element up to and including the closing '</a>' tag, which fails because
the '.' in a regex matches any character, except a newline. This failure
causes all lines in the package descriptor page to be rejected as not
matching the search pattern, so the package with a page in this format
can never be recognized.

This patch works around this formatting issue by adding the flag
re.DOTALL to the regex search call, making the regex '.' character match
the newline as well, so that the regex can match the complete anchor
element across a line break as well.

Change-Id: Ia56f87c54e0d9cad97b7e0ffbcce8f4c0f715c44
Reviewed-on: http://gerrit.cloudera.org:8080/23026
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2025-06-13 21:50:41 +00:00
Joe McDonnell
c5a0ec8bdf IMPALA-11980 (part 1): Put all thrift-generated python code into the impala_thrift_gen package
This puts all of the thrift-generated python code into the
impala_thrift_gen package. This is similar to what Impyla
does for its thrift-generated python code, except that it
uses the impala_thrift_gen package rather than impala._thrift_gen.
This is a preparatory patch for fixing the absolute import
issues.

This patches all of the thrift files to add the python namespace.
This has code to apply the patching to the thirdparty thrift
files (hive_metastore.thrift, fb303.thrift) to do the same.

Putting all the generated python into a package makes it easier
to understand where the imports are getting code. When the
subsequent change rearranges the shell code, the thrift generated
code can stay in a separate directory.

This uses isort to sort the imports for the affected Python files
with the provided .isort.cfg file. This also adds an impala-isort
shell script to make it easy to run.

Testing:
 - Ran a core job

Change-Id: Ie2927f22c7257aa38a78084efe5bd76d566493c0
Reviewed-on: http://gerrit.cloudera.org:8080/20169
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-04-15 17:03:02 +00:00
Csaba Ringhofer
9437f9fd16 IMPALA-12656: Bump sasl to 0.4a1 to allow Python3.11+ in impala-shell
Before this change impala-shell could not be installed on Python 3.11
duo to compilation failure in python-sasl. Checked installation
on Python 3.11/3.12/3.13.

Also bumps impyla version to 0.21a2.

Change-Id: I4efdd105e489e1d0a996d156fb7efbb6fad8da7d
Reviewed-on: http://gerrit.cloudera.org:8080/22593
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-03-12 19:33:09 +00:00
Csaba Ringhofer
6fc36b3e8e Bump Impyla version to 0.21a1
Change-Id: I297a2a34f3e688555ce8572b6c7fffbd34423f2d
Reviewed-on: http://gerrit.cloudera.org:8080/21286
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>
2025-02-03 17:42:50 +00:00
Riza Suminto
134de01a59 IMPALA-13642: Fix unused test vector in test_scanners.py
Several test vectors were ignored in test_scanners.py. This cause
repetition of the same test without actually varying the test
exec_option nor debug_action.

This patch fix it by:
- Use execute_query() instead of client.execute()
- Passing vector.get_value('exec_option') when executing test query.

Repurpose ImpalaTestMatrix.embed_independent_exec_options to deepcopy
'exec_option' dimension during vector generation. Therefore, each test
execution will have unique copy of 'exec_option' for them self.

This patch also adds flake8-unused-arguments plugin into
critique-gerrit-review.py and py3-requirements.txt so we can catch this
issue during code review. impala-flake8 is also updated to use
impala-python3-common.sh. Adds flake8==3.9.2 in py3-requirements.txt,
which is the highest version that has compatible dependencies with
pylint==2.10.2.

Drop unused 'dryrun' parameter in get_catalog_compatibility_comments
method of critique-gerrit-review.py.

Testing:
- Run impala-flake8 against test_scanners.py and confirm there is no
  more unused variable.
- Run and pass test_scanners.py in core exploration.

Change-Id: I3b78736327c71323d10bcd432e162400b7ed1d9d
Reviewed-on: http://gerrit.cloudera.org:8080/22301
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-01-09 06:17:51 +00:00
stiga-huang
777ae104bb IMPALA-13305: Better thrift compatibility checks based on pyparsing
There are some false positive warnings reported by
critique-gerrit-review.py when adding a new thrift struct that has
required fields. This patch leverages pyparsing to analyze the
thrift file changes. So we can identify whether the new required field
is added in an existing struct.

thrift_parser.py adds a simple thrift grammar parser to parse a thrift
file into an AST. It basically consists of pyparsing.ParseResults and
some customized classes to inject the line number, i.e.
thrift_parser.ThriftField and thrift_parser.ThriftEnumItem.

Import thrift_parser to parse the current version of a thrift file and
the old version of it before the commit. critique-gerrit-review.py
then compares the structs and enums to report these warnings:
 - A required field is deleted in an existing struct.
 - A new required field is added in an existing struct.
 - An existing field is renamed.
 - The qualifier (required/optional) of a field is changed.
 - The type of a field is changed.
 - An enum item is removed.
 - Enum items are reordered.

Only thrift files used in both catalogd and impalad are checked. This is
the same as the current version. We can further improve this by
analyzing all RPCs used between impalad and catalogd to get all thrift
struct/enums used in them.

Warning examples for commit e48af8c04:
  "common/thrift/StatestoreService.thrift": [
   {
    "message": "Renaming field 'sequence' to 'catalogd_version' in TUpdateCatalogdRequest might break the compatibility between impalad and catalogd/statestore during upgrade",
    "line": 345,
    "side": "REVISION"
   }
  ]

Warning examples for commit 595212b4e:
  "common/thrift/CatalogObjects.thrift": [
   {
    "message": "Adding a required field 'type' in TIcebergPartitionField might break the compatibility between impalad and catalogd/statestore during upgrade",
    "line": 612,
    "side": "REVISION"
   }
  ]

Warning examples for commit c57921225:
  "common/thrift/CatalogObjects.thrift": [
   {
    "message": "Renaming field 'partition_id' to 'spec_id' in TIcebergPartitionSpec might break the compatibility between impalad and catalogd/statestore during upgrade",
    "line": 606,
    "side": "REVISION"
   }
  ],
  "common/thrift/CatalogService.thrift": [
   {
    "message": "Changing field 'iceberg_data_files_fb' from required to optional in TIcebergOperationParam might break the compatibility between impalad and catalogd/statestore during upgrade",
    "line": 215,
    "side": "REVISION"
   },
   {
    "message": "Adding a required field 'operation' in TIcebergOperationParam might break the compatibility between impalad and catalogd/statestore during upgrade",
    "line": 209,
    "side": "REVISION"
   }
  ],
  "common/thrift/Query.thrift": [
   {
    "message": "Renaming field 'spec_id' to 'iceberg_params' in TFinalizeParams might break the compatibility between impalad and catalogd/statestore during upgrade",
    "line": 876,
    "side": "REVISION"
   }
  ]

Warning example for commit 2b2cf8d96:
  "common/thrift/CatalogService.thrift": [
   {
    "message": "Enum item FUNCTION_NOT_FOUND=3 changed to TABLE_NOT_LOADED=3 in CatalogLookupStatus. This might break the compatibility between impalad and catalogd/statestore during upgrade",
    "line": 381,
    "side": "REVISION"
   }
  ]

Warning example for commit c01efd096:
  "common/thrift/JniCatalog.thrift": [
   {
    "message": "Removing the enum item TAlterTableType.SET_OWNER=15 might break the compatibility between impalad and catalogd/statestore during upgrade",
    "line": 107,
    "side": "PARENT"
   }
  ]

Warning example for commit 374783c55:
"common/thrift/Query.thrift": [
   {
    "message": "Changing type of field 'enabled_runtime_filter_types' from PlanNodes.TEnabledRuntimeFilterTypes to set<PlanNodes.TRuntimeFilterType> in TQueryOptions might break the compatibility between impalad and catalogd/statestore during upgrade",
    "line": 449,
    "side": "REVISION"
   }

Tests
 - Add tests in tests/infra/test_thrift_parser.py
 - Verified the script with all(1260) commits of common/thrift.

Change-Id: Ia1dc4112404d0e7c5df94ee9f59a4fe2084b360d
Reviewed-on: http://gerrit.cloudera.org:8080/22264
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-01-07 02:00:17 +00:00
Andrew Sherman
21ef3e6ffe IMPALA-13638: Translate apostrophe to underscore in Prometheus metric names.
Impala has some metrics that reflect the state of the JVM. Some of these
metrics have names that are partly composed of the names of the
MemoryPoolMXBean objects in the Java virtual machine. In Jdk 8 these
are names like "Code Cache" and "PS Eden Space". In Jdk 11 these names
include apostrophe characters, for example "CodeHeap 'profiled
nmethods'". The derived metric names work OK for Impala in both the
webui and in json output. However the apostrophe character is illegal
in Prometheus metric names per
https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels
and these metrics cannot be consumed by Prometheus. Fix this by adding
the apostrophe to the list of characters that are mapped to underscores
when we translate the metric names for Prometheus metrics.

TESTING:

Extended the test_prometheus_metrics test to parse all generated
Prometheus metrics. Ran the test with Jdk 11 where it failed without
the server fix

Change-Id: I557b123c075dff0b14ac527de08bc6177bd2a3f6
IMPALA-13596: first cut at tidied code
Reviewed-on: http://gerrit.cloudera.org:8080/22295
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-01-06 23:38:49 +00:00
Joe McDonnell
5b4afb4f8f IMPALA-13368: Fixup Redhat detection for Python >= 3.8
Python 3.8 removed the platform.linux_distribution() function which is
currently used to detect Redhat. This switches to using the 'distro'
package, which implements the same functionality across different
Python versions. Since Redhat 6 is no longer supported, this removes
the detection of Redhat 6 and associated skip logic.

Testing:
 - Ran a core job

Change-Id: I0dfaf798c0239f6068f29adbd2eafafdbbfd66c3
Reviewed-on: http://gerrit.cloudera.org:8080/22073
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-12-17 07:28:51 +00:00
Riza Suminto
d3ae4a416e IMPALA-13585: Make pip_download.py interruptible
infra/python/deps/pip_download.py use multiprocessing.pool.ThreadPool
where each thread calls wget. It is also wrap the download_package
function with retry wrapper. When there is a network issue happen,
pressing Ctrl+C does not immediately terminate pip_download.py and all
its children. Thus, the script appears to hang.

This patch make pip_download.py to capture SIGINT and pass it as
cancellation event to all threads. It is changed to run with python3.
All flake8 issues are also fixed.

Testing:

- Manually run `buildall.sh -cmake_only` and interrupt it in the middle
  of pip_download.py execution. Verify that script terminate
  immediately.

Change-Id: I6f293dd8f3fcf3cffa17a4a44627a41d67b7dc91
Reviewed-on: http://gerrit.cloudera.org:8080/22128
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-11-28 10:13:28 +00:00
Joe McDonnell
11396d3146 IMPALA-13384: Only install gcovr deps for coverage builds
IMPALA-13279 upgraded gcovr to 7.2 and moved it from python 2 to
python 3.8. gcovr has several dependencies that require native
compilation, and this increased the cost of initializing the
Python 3 virtualenv substantially:

Without gcovr: 1m43.279s
With gcovr and deps: 6m35.107s

This moves gcovr to its own requirements file and only installs
gcovr if this is a coverage build (detected from the
.cmake_buid_type file).

Testing:
 - Verified that a coverage build does install gcovr and
   produce a report

Change-Id: I1d0fd6d21273053aaf2acee39fcb83d9093d49a2
Reviewed-on: http://gerrit.cloudera.org:8080/21849
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-09-27 00:25:41 +00:00
Joe McDonnell
ed94f31a25 IMPALA-13279: Upgrade gcovr to 7.2
In some environments, the code coverage report is empty even
though the tests ran successfully and gcno/gcda files are
written properly.

This upgrades to gcovr 7.2, which does not show the same
problem. gcovr 7.2 requires Python 3.8, so this switches to use
Python 3.8 from the toolchain and installs gcovr in the Python 3
virtualenv.

gcovr 7.2 outputs logging to stderr, so this also modifies
bin/coverage_helper.sh to redirect stderr to stdout.

Testing:
 - Verified that this can generate a report locally and on
   the affected environment

Change-Id: I5b1aaa92c65f54149a3e7230cbe56d5286f1051a
Reviewed-on: http://gerrit.cloudera.org:8080/21647
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-08-06 22:28:16 +00:00
Joe McDonnell
5071f54a4c IMPALA-12825: Install thrift into the impala-python virtualenv
impala-python currently gets its Thrift from the toolchain
by adding the appropriate Thrift toolchain directories to
the PYTHONPATH. This is a problem when switching to Python 3,
because the toolchain Thrift was built with Python 2 and
this can produce complicated bugs. In general, it is also
not a good idea to get Python dependencies from the toolchain.

This switches to installing Thrift into the impala-python
virtualenv, which lets the different Python versions have
their own copy of compiled files.

Testing:
 - Ran a core job

Change-Id: Ib36e8a1ce8d446b69b08e81ea458f95c158e28f5
Reviewed-on: http://gerrit.cloudera.org:8080/21046
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-03-01 08:06:56 +00:00
zhangyifan27
45682c132f IMPALA-12229: Support soft-delete Kudu table
Adds 'kudu_table_reserve_seconds' query option to set reserved time
for deleted Impala managed Kudu tables. The default value is 0.
This option can prevent users from deleting important Kudu tables
by mistake.

Testing:
- Added e2e tests.

Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
Reviewed-on: http://gerrit.cloudera.org:8080/20773
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-12-14 00:12:55 +00:00
Gergely Farkas
04bdb4d32c IMPALA-12552: Fix Kerberos authentication issue that occurs
in python 3 environment when kerberos_host_fqdn option is used

In Pyhton 2, the sasl layer does not accept unicode strings,
so we have to explicitly encode the kerberos_host_fqdn string
to ascii. However, this is not the case in python 3, where
we have to omit the encode, because if we don't do this,
impala-shell wants to use the following service principal
during Kerberos auth:
my_service_name/b'my.kerberos.host.fqdn'@MY.REALM
instead of the correct one, which is:
my_service_name/my.kerberos.host.fqdn@MY.REALM
(This is because the output of the encode function
is a byte array in python 3.)

Tested with new unit tests and with a snapshot build
manually in CDP PVC DS.

Change-Id: I8b157d76824ad67faf531a529256a8afe2ab9d49
Reviewed-on: http://gerrit.cloudera.org:8080/20691
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
2023-11-17 20:08:42 +00:00
Joe McDonnell
7b502f7c96 IMPALA-12240: Put gcc on the PATH when building the impala-python venv
On some systems, we have seen the build for the impala-python
virtualenv refer to system gcc directly, even though we have
specified Impala toolchain's gcc via CC. When the system gcc
is newer than Impala's gcc, it fails to execute because it needs
symbols that are not present in Impala's libstdc++:

gcc: /home/joe/impala/toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/lib64/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by gcc)

This adds the toolchain gcc to the PATH when building the impala-python
virtualenv. This means that any direct reference to gcc will use our
compiler rather than system gcc. We continue to have CC pointed to
our compiler.

Testing:
 - Ran a build on Redhat 9 where the issue presented

Change-Id: Ia5ddd6a88b41a3f8ba04d13538b3de2d9499cbf5
Reviewed-on: http://gerrit.cloudera.org:8080/20114
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-24 02:28:55 +00:00
Michael Smith
0a42185d17 IMPALA-9627: Update utility scripts for Python 3 (part 2)
We're starting to see environments where the system Python ('python') is
Python 3. Updates utility and build scripts to work with Python 3, and
updates check-pylint-py3k.sh to check scripts that use system python.

Fixes other issues found during a full build and test run with Python
3.8 as the default for 'python'.

Fixes a impala-shell tip that was supposed to have been two tips (and
had no space after period when they were printed).

Removes out-of-date deploy.py and various Python 2.6 workarounds.

Testing:
- Full build with /usr/bin/python pointed to python3
- run-all-tests passed with python pointed to python3
- ran push_to_asf.py

Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb
Reviewed-on: http://gerrit.cloudera.org:8080/19697
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-04-26 18:52:23 +00:00
Joe McDonnell
82bd087fb1 IMPALA-11973: Add absolute_import, division to all eligible Python files
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
 1. Python 3 requires absolute imports within packages. This
    can be emulated via "from __future__ import absolute_import"
 2. Python 3 changed division to "true" division that doesn't
    round to an integer. This can be emulated via
    "from __future__ import division"

This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.

I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.

Testing:
 - Ran core tests

Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
566df80891 IMPALA-11959: Add Python 3 virtualenv
This adds a Python 3 equivalent to the impala-python
virtualenv base on the toolchain Python 3.7.16.
This modifies bootstrap_virtualenv.py to support
the two different modes. This adds py2-requirements.txt
and py3-requirements.txt to allow some differences
between the Python 2 and Python 3 virtualenvs.

Here are some specific package changes:
 - allpairs is replaced with allpairspy, as allpairs did
   not support Python 3.
 - requests is upgraded slightly, because otherwise is has issues
   with idna==2.8.
 - pylint is limited to Python 3, because we are adding it
   and don't need it on both
 - flake8 is limited to Python 2, because it will take
   some work to switch to a version that works on Python 3
 - cm_api is limited to Python 2, because it doesn't support
   Python 3
 - pytest-random does not support Python 3 and it is unused,
   so it is removed
 - Bump the version of setuptool-scm to support Python 3

This adds impala-pylint, which can be used to do further
Python 3 checks via --py3k. This also adds a bin/check-pylint-py3k.sh
script to enforce specific py3k checks. The banned py3k warnings
are specified in the bin/banned_py3k_warnings.txt. This is currently
empty, but this can ratchet up the py3k strictness over time
to avoid regressions.

This pulls in a new toolchain with the fix for IMPALA-11956
to get Python 3.7.16.

Testing:
 - Hand tested that the allpairs libraries produce the
   same results
 - The python3 virtualenv has no influence on regular
   tests yet

Change-Id: Ica4853f440c9a46a79bd5fb8e0a66730b0b4efc0
Reviewed-on: http://gerrit.cloudera.org:8080/19567
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Michael Smith
0c72c98f91 IMPALA-9627: Update utility scripts for Python 3
Updates utility scripts that don't use impala-python to work with Python
3 so we can build on systems that don't include Python 2 (such as SLES
15 SP4).

Primarily adds 'universal_newlines=True' to subprocess calls so they
return text rather than binary data in Python 3 with a change that's
compatible with Python 2.

Testing:
- built in SLES 15 SP4 container with Python 3

Change-Id: I7f4ce71fa1183aaeeca55d0666aeb113640c5cf2
Reviewed-on: http://gerrit.cloudera.org:8080/19559
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-03-01 04:53:49 +00:00
Joe McDonnell
ff62a4df39 IMPALA-11951: Add tools for checking/fixing python 3 syntax
This adds the bin/check-python-syntax.sh script, which
runs "python -m compileall" for all python files in
Impala with both python2 and python3. This detects
syntax errors in the python files. This will be
incorporated into precommit once it is clean.

This also adds future to the impala-python virtualenv.
This provides the futurize script (exposed via
impala-futurize), which can be used to automatically
fix some py2/py3 issues. Future also provides the
builtins library, which can provide python 3
functionality on python 2.

Testing:
 - Ran impala-futurize locally
 - Ran the script repeatedly while fixing syntax errors

Change-Id: Iae2c51bc6ddc9b6a04469ee1b8284227fed3bd45
Reviewed-on: http://gerrit.cloudera.org:8080/19550
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-02-28 17:11:50 +00:00
Joe McDonnell
a9cfc7b33f IMPALA-11624: Bump Impyla dependency to 0.18.0
IMPALA_THRIFT_PY_VERSION is also bumped to 0.16.0p3.
As 0.16.0p3 Thrift does not contain Python related
patches and Impyla 0.18.0 depends on Thrift 0.16.0,
now we are consistently using Thrift 0.16.0 in all
Python code. This also bumps the Thrift in the
shell's ext-py directory to 0.16.0 (based on the
Thrift 0.16.0 pypi tarball with the egg directory
removed).

Testing:
 - Ran a GVO job

Change-Id: I7265558b0e07959c606cba73cd251c3edfcb3ed5
Reviewed-on: http://gerrit.cloudera.org:8080/18456
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-02-27 20:39:26 +00:00
stiga-huang
45ea094fa2 IMPALA-11716: Bump up gcovr version to 4.2
IMPALA-9999 upgrades to GCC version to 10.4 which generates new gcov
format that the current gcovr version (3.4) can't parse. This patch
upgrades gcovr to the latest Python2-compatible version (4.2). Also adds
Jinja2, MarkupSafe and lxml as the required dependent packages. The
development packages of libxml2 and libxslt are also added in
bootstrap_system.sh and bootstrap_build.sh.

This patch also fixes a failure due to the gcov executable not found in
PATH.

Tests:
 - Verified builds on Ubuntu 16.04 and CentOS 7.9
 - Verified coverage_helper.sh work after this patch

Change-Id: I9458fa0dc97d69f88a4e8a3313dc9440215dfd52
Reviewed-on: http://gerrit.cloudera.org:8080/19226
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-11-11 00:09:05 +00:00
Joe McDonnell
cff286e751 IMPALA-9999: Switch to GCC 10.4
This upgrades GCC and libstdc++ to version 10.4. This
required patching or upgrading several dependencies
so they could compile with GCC 10. The toolchain
companion change has details on what items needed
to be upgraded and why.

The toolchain companion change switches GCC to build
with toolchain binutils rather than host binutils. This
means that the python virtualenv initialization needs
to include binutils on the path.

This disables two warnings introduced in the new GCC
versions (Wclass-memaccess and Winit-list-lifetime).
These two warnings occur in our code and also in
dependencies like LLVM and rapidjson. These are not
critical warnings, so they can be addressed
independently and reenabled later.

Binary sizes increase, particulary when including
debug symbols:
                         | GCC 7.5     | GCC 10.4
impalad RELEASE stripped |  83204768   |  88702824
impalad RELEASE          | 707278904   | 971711456
impalad DEBUG stripped   | 106677672   |  97391944
impalad DEBUG            | 725864760   | 867647512

Testing:
 - Multiple test jobs (core, release exhaustive, ASAN)
 - Performance testing for TPC-H and TPC-DS shows
   a modest improvement (2-4%).
 - Code compiles without warnings on debug and release

Change-Id: Ibe6857b822925226d39fd4d6413457ef6bbaabec
Reviewed-on: http://gerrit.cloudera.org:8080/18134
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2022-09-20 15:50:18 +00:00
Michael Smith
8b1002aa6a IMPALA-11398: Update flake8 for indent-size=2
Updates flake8 to the latest Python 2-compatible version so we can use
indent-size=2. Our code uses 2-space indents and we have previously
worked around or disabled flake8 checks that rely on 4-space indenting.

Change-Id: Ia701f6e3d86be451ae86d041b799c8a10aee2d93
Reviewed-on: http://gerrit.cloudera.org:8080/18669
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-06-30 09:29:41 +00:00
wzhou-code
397d1d15a2 IMPALA-10745: Support Kerberos over HTTP for impala-shell
This patch ports the implementation of GSSAPI authentication over http
transport from Impyla (https://github.com/cloudera/impyla/pull/415) to
impala-shell.

The implementation adds a new dependency on 'kerberos' python module,
which is a pip-installed module distributed under Apache License Version
2.
When using impala-shell with Kerberos over http, it is assumed that the
host has a preexisting kinit-cached Kerberos ticket that impala-shell
can pass to the server automatically without the user to reenter the
password.

Testing:
 - Passed exhaustive tests.
 - Tested manually on a real cluster with a full Kerberos setup.

Change-Id: Ia59ba4004490735162adbd468a00a962165c5abd
Reviewed-on: http://gerrit.cloudera.org:8080/18493
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-05-10 03:22:41 +00:00
Michael Smith
e6ed98c22b IMPALA-11201: update gitignore files
Updates gitignore for files generated during bootstrap_development.
Fixes deleting tracked files in be/src/thirdparty. Includes ignore rules
for past versions of shell dependencies and updates ignores for current
versions.

Change-Id: I03deba5e7fb151ef8e34039becdcc3fb47684084
Reviewed-on: http://gerrit.cloudera.org:8080/18499
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-05-10 03:06:59 +00:00
yx91490
f566e7dee7 IMPALA-10994: Normalize the pip package name part of download URL.
According to PEP-0503, pip repo server doesn't support unnormalized URL
access, and some package name within
'infra/python/deps/*requirements.txt' are unnormalized, e.g. 'Cython',
and pip_download.py will concat $PYPI_MIRROR and package name to get
download URL directly, which maybe unnormalized.

Fix this by normalize package name in download URL using the
recommanded method in PEP-0503.

Change-Id: I479df0ad7acf3c650b8f5317372261d5e2840864
Reviewed-on: http://gerrit.cloudera.org:8080/17987
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-11-26 08:17:10 +00:00
Attila Jeges
c8aa5796d9 IMPALA-10879: Add parquet stats to iceberg manifest
This patch adds parquet stats to iceberg manifest as per-datafile
metrics.

The following metrics are supported:
- column_sizes :
  Map from column id to the total size on disk of all regions that
  store the column. Does not include bytes necessary to read other
  columns, like footers.

- null_value_counts :
  Map from column id to number of null values in the column.

- lower_bounds :
  Map from column id to lower bound in the column serialized as
  binary. Each value must be less than or equal to all non-null,
  non-NaN values in the column for the file.

- upper_bounds :
  Map from column id to upper bound in the column serialized as
  binary. Each value must be greater than or equal to all non-null,
  non-Nan values in the column for the file.

The corresponding parquet stats are collected by 'ColumnStats'
(in 'min_value_', 'max_value_', 'null_count_' members) and
'HdfsParquetTableWriter::BaseColumnWriter' (in
'total_compressed_byte_size_' member).

Testing:
- New e2e test was added to verify that the metrics are written to the
  Iceberg manifest upon inserting data.
- New e2e test was added to verify that lower_bounds/upper_bounds
  metrics are used to prune data files on querying iceberg tables.
- Existing e2e tests were updated to work with the new behavior.
- BE test for single-value serialization.

Relevant Iceberg documentation:
- Manifest:
  https://iceberg.apache.org/spec/#manifests
- Values in lower_bounds and upper_bounds maps should be Single-value
  serialized to binary:
  https://iceberg.apache.org/spec/#appendix-d-single-value-serialization

Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
Reviewed-on: http://gerrit.cloudera.org:8080/17806
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Attila Jeges <attilaj@cloudera.com>
2021-09-02 21:34:41 +00:00
wzhou-code
237ed5e873 IMPALA-10874: Upgrade impyla to the latest version
This patch upgrades impyla to the latest version 0.18a1, which supports
cookie retention for LDAP authentications. Also adds unit-test cases
for implyla's HTTP test with LDAP authentication.

Testing:
 - Passed core tests.

Change-Id: I990e5cdde4e98d6ab3581fe48f53a5d0590ce492
Reviewed-on: http://gerrit.cloudera.org:8080/17795
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-08-25 05:52:35 +00:00
Csaba Ringhofer
94f67a3432 IMPALA-7825: Upgrade Thrift version to 0.11.0
Before this patch Impala mainly used Thrift 0.9.3, but it was
possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0
Thrift lib was already included in the toolchain.

Most of the changes are related to replacing boost:: with std::
shared_ptr-s in cpp code (this is a continuation of patch by Sahil).

The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as
Impala's test framework relies on Impyla. A thrift_sasl release is also
needed, because it currently pins Thrift version to 0.9.3 for Python 2.

The current patch uses alpha releases from Impyla and thrift_sasl that
use thrift 0.11.0.

Notable side effects:
- old logic to compile thrift for impala-shell with 0.11.0 was removed
- impala_shell's utf8 handling had to be updated as the new 0.11.0
  compilation happens with no_utf8strings. This also made things a
  bit faster, e.g the following is ~0.22s instead of ~0.25
  shell/impala_shell.py \
    -B -q "select * from functional_parquet.alltypes;" > /dev/null
- THRIFT-3921 changed the stream operators to print an enum's name
  instead of its number, leading to slightly different messages
  in some cases.
- "templates" was added to the thift generator's parameters to avoid
  a compilation issue (related to IMPALA-10600). I didn't notice any
  change in compilation time. This option generated .tcc files with
  templetized readers/writers for Thrift types. Currently we don't
  use these, but they could potentially speed up (de)serialization.

Testing:
- ran Impyla's test suite with Python 2 and 3
- ran core tests

Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Reviewed-on: http://gerrit.cloudera.org:8080/17170
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-04-27 13:36:54 +00:00
Jim Apple
f18e0d72a7 Upgrade urllib3 to 1.24.2
Change-Id: Ib18c76e66db2920e7e05a63b5bcd79854b819cd9
Reviewed-on: http://gerrit.cloudera.org:8080/17270
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
2021-04-06 09:26:45 +00:00
Joe McDonnell
ede22a63a5 IMPALA-10608 followup: Detect the virtualenv tarball version
When rebasing from an older commit, the version change
in virtualenv can cause there to be multiple virtualenv
tarballs of different versions in the infra/python/deps
directory. bootstrap_virtualenv.py currently doesn't
handle this gracefully, because it is looking for all
virtualenv*.tar.gz files and fails when it finds more
than one.

This changes bootstrap_virtualenv.py to get the virtualenv
version from the requirements.txt file and only look
for the tarball with that version. If it fails to get
the version, it falls back to the old method.

Testing:
 - Copied virtualenv-16.7.10.tar.gz to virtualenv-16.7.9.tar.gz
   and verified that bootstrap_virtualenv.py works

Change-Id: Iebfa9ba5e223d5187414e02e24f34562418fae40
Reviewed-on: http://gerrit.cloudera.org:8080/17249
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2021-03-31 22:33:43 +00:00
Joe McDonnell
e7fc18c4ea IMPALA-10608: Update kudu-python version and remove some unused packages
This updates kudu-python to version 1.14.0 (from 1.2.0).
As part of this, it disables ccache for bootstrap_virtualenv.py.
ccache wasn't working anyway, because pip install uses random
temporary directories. It also needs to copy a few files to
the build directory for the Kudu install. The advantage to
upgrading is that the new version no longer has a numpy dependency.

Additionally, this modifies a few minor packages:
 - virtualenv moves to the latest version prior to the rewrite
   that accompanied version 20 (i.e. 16.10.7).
 - setuptools moves to the last version that supports python 2.7 (44.1.1)
 - remove botos3, ipython, and ordereddict

These changes speed up installing the virtualenv
Before:
real	3m11.956s
user	2m49.620s
sys	0m14.266s
After:
real    1m38.798s
user    1m33.591s
sys     0m8.112s

Testing:
 - Hand tests, GVO run

Change-Id: Ib47770df9e46de448fe2bffef7abe2c3aa942fb9
Reviewed-on: http://gerrit.cloudera.org:8080/17231
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-31 03:17:24 +00:00
Joe McDonnell
9670c1455d IMPALA-10606 (part 2): Clean up ordering of requirements.txt
This is a followup to the original IMPALA-10606 that reorders
the requirements.txt alphabetically. No versions changed
as part of this.

Change-Id: I2f13ec8f8af80c4bac5da30d08a2ea4c56806d27
Reviewed-on: http://gerrit.cloudera.org:8080/17229
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2021-03-31 03:17:24 +00:00
Joe McDonnell
1142c7b58e IMPALA-10606: Simplify impala-python virtualenv bootstrapping
Bootstrapping the impala-python virtualenv requires multiple
rounds of pip installs with different sets of requirements.
This consolidates the requirements.txt, stage2-requirements.txt,
and compiled-requirements.txt into a single requirements.txt.
This will make it easier to upgrade python packages.

This also splits out setuptools into its own
setuptools-requirements.txt. Setuptools is used during the
pip install for several of the dependencies. Recent versions
of setuptools do not support Python 2, but some of the install
tools (like easy_install) don't know how to pick a version
of setuptools that works with Python 2. Splitting it out to its
own requirements file lets us pin the version.

To make review easier, this does not change any of the versions
of the dependencies. It also leaves the stage2-requirements.txt
and compiled-requirements.txt split out in separate sections
of requirements.txt. These will later be turned into a single
alphabetical list.

Testing:
 - Tested impala-python locally
 - Ran GVO

Change-Id: I8e920e5a257f1e0613065685078624a50d59bf2e
Reviewed-on: http://gerrit.cloudera.org:8080/17226
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2021-03-31 03:17:24 +00:00
Jim Apple
103774a8e5 Update Python requests package to 2.20.0
See https://2.python-requests.org/en/master/community/updates/#id8.
This is currently only used in the tests, but it's best to fix
this now.

While here, remove now-false not about required support for Python
2.6.

Change-Id: I092a641a12f38cdb45b0062c31ffb51c0c664800
Reviewed-on: http://gerrit.cloudera.org:8080/17215
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2021-03-30 01:27:04 +00:00
Jim Apple
e5d5dbc30a Update Paramiko to 2.4.2.
See https://www.paramiko.org/changelog.html#2.4.2. This shouldn't
directly apply to Impala deployments, but it is best to fix this in
test now.

Change-Id: If9cc9ea4a0763c8b5303ca4e8482761ee2f53efa
Reviewed-on: http://gerrit.cloudera.org:8080/17214
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-22 19:34:00 +00:00
Joe McDonnell
60f8f87b09 IMPALA-10274: Initialize impala-python as part of the CMake build
Initializing the impala-python virtualenv takes a couple minutes,
so it is useful to do that in parallel to the rest of the build.
This moves the impala-python initialization to its own step
in the CMake build. It stops using impala-python for commands
invoked from buildall.sh or the CMake build to avoid premature
or concurrent initializations of impala-python. Then, it adds
a dedicated step to initialize impala-python.

Testing:
 - Ran a core job and a couple builds
 - Rebuilt and verified that impala-python is not reinitialized
   if it is already initialized

Change-Id: Ieff51263c55bd234028fed7101c94b4a928590f0
Reviewed-on: http://gerrit.cloudera.org:8080/16607
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-02-04 17:03:57 +00:00
Tim Armstrong
b8a2b75466 IMPALA-10225: bump impyla version to 0.17a1
Update a couple of tests with the new improved error messages.

Change-Id: I70a0e883275f3c29e2b01fd5bab7725857c8a1ed
Reviewed-on: http://gerrit.cloudera.org:8080/16562
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-10 02:08:22 +00:00
guojingfeng
7baa31ea04 IMPALA-10093: Replace urllib with wget to download python deps
When build impala in Company internal network, pip_download.py
failed to download dependency eggs from https engpoint Although
correcly set system proxy like http_proxy, https_proxy. Is is
a issue of python2's urllib. I just replace urllib with wget
which can works well with system proxy like https_proxy.

Change-Id: I146d93312701fd682420cb65cf4738bc030f3cfb
Reviewed-on: http://gerrit.cloudera.org:8080/16344
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-11 16:43:35 +00:00
Tim Armstrong
6ec6aaae8e IMPALA-3695: Remove KUDU_IS_SUPPORTED
Testing:
Ran exhaustive tests.

Change-Id: I059d7a42798c38b570f25283663c284f2fcee517
Reviewed-on: http://gerrit.cloudera.org:8080/16085
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-06-18 01:11:18 +00:00
Joe McDonnell
13fbe510c0 IMPALA-9838: Switch to GCC 7.5.0
This upgrades GCC and libstdc++ to version 7.5.0. There
have been ABI changes since 4.9.2, so this means that
the native-toolchain produced with the new compiler is
not interoperable with one produced by the old compiler.
To allow that transition, IMPALA_TOOLCHAIN_PACKAGES_HOME
is now a subdirectory of IMPALA_TOOLCHAIN
(toolchain-packages-gcc${IMPALA_GCC_VERSION}) to distinguish
it from the old packages.

Some Python packages in the impala-python virtualenv are
compiled using the toolchain GCC and now use the new ABI.
This leads to two changes:
1. When constructing the LD_LIBRARY_PATH for impala-python,
we include the GCC libstdc++ libraries. Otherwise, certain
Python packages that use C++ fail on older OSes like Centos 7.
This fixes IMPALA-9804.
2. Since developers work on various branches, this changes
the virtualenv's directory location to a directory with
the GCC version in the name. This allows the virtualenv
built with GCC 7 to coexist with the current virtualenv
built with GCC 4.9.2. The location for the old virtualenv is
${IMPALA_HOME}/infra/python/env. The new location is
${IMPALA_HOME}/infra/python/env-gcc${IMPALA_GCC_VERSION}. This
required updating several impala-python scripts.

There are various odds-and-ends related to the transition:
1. Due to the small string optimization, the size of std::string
changed, which means that various data structures also changed
in size. This required updating some static asserts.
2. There is a bug in clang-tidy that reports a use-after-free
for some code using std::shared_ptr. Clang is not modeling
the shared_ptr correctly, so it is a false-positive. As a workaround,
this disables the clang-analyzer-cplusplus.NewDelete diagnostic.
3. Various small compilation fixes (includes, etc).

Performance testing:
 - Ran single-node performance tests on TPC-H for the following
   configurations:
    - TPC-H Parquet scale 30 with normal configurations
    - TPC-H Parquet scale 30 with codegen disabled
    - TPC-H Kudu scale 10
   None found any significant regressions. Full results are
   posted on the JIRA.
 - Ran single-node performance tests on targeted-perf scale 10.
   No significant regressions.
 - The size of binaries (impalad, etc) is slightly smaller with the new GCC:
   GCC 4.9.2 release impalad binary: 545664
   GCC 7.5.0 release impalad binary: 539900
 - Compilation in DEBUG mode is roughly 15-25% faster

Functional testing:
 - Ran core jobs, exhaustive release jobs, UBSAN

Change-Id: Ia0beb2b618ba669c9699f8dbc0c52d1203d004e4
Reviewed-on: http://gerrit.cloudera.org:8080/16045
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-06-15 23:42:12 +00:00
Joe McDonnell
56ee90c598 IMPALA-9760: Add IMPALA_TOOLCHAIN_PACKAGES_HOME to prepare for GCC7
The locations for native-toolchain packages in IMPALA_TOOLCHAIN
currently do not include the compiler version. This means that
the toolchain can't distinguish between native-toolchain packages
built with gcc 4.9.2 versus gcc 7.5.0. The collisions can cause
issues when switching back and forth between branches.

This introduces the IMPALA_TOOLCHAIN_PACKAGES_HOME environment
variable, which is a location inside IMPALA_TOOLCHAIN that would
hold native-toolchain packages. Currently, it is set to the same
as IMPALA_TOOLCHAIN, so there is no difference in behavior.
This lays the groundwork to add the compiler version to this
path when switching to GCC7.

Testing:
 - The only impediment to building with
   IMPALA_TOOLCHAIN_PACKAGES_HOME=$IMPALA_TOOLCHAIN/test is
   Impala-lzo. With a custom Impala-lzo, compilation succeeds.
   Either Impala-lzo will be fixed or it will be removed.
 - Core tests

Change-Id: I1ff641e503b2161baf415355452f86b6c8bfb15b
Reviewed-on: http://gerrit.cloudera.org:8080/15991
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-05-30 16:25:37 +00:00
Laszlo Gaal
b921d982b5 IMPALA-9668: Obey SKIP_TOOLCHAIN_BOOTSTRAP during virtualenv bootstrap
IMPALA-9626 broke the use case where the toolchain binaries are not
downloaded from the native-toolchain S3 bucket, because
SKIP_TOOLCHAIN_BOOTSTRAP is set to true.

Fix this use case by checking SKIP_TOOLCHAIN_BOOTSTRAP in
bin/bootstrap_environment.py:
- if true: just check if the specified version of the Python binary is
  present at the expected toolchain location. If it is there, use it,
  otherwise throw an exception and abort the bootstrap process.
- in any other case: proceed to download the Python binary as in
  bootstrap_toolchain.py.

Test:
- simulate the custom toolchain setup by downloading the toolchain
  binaries from the S3 bucket, copying them to a separate directory,
  symlinking them into Impala/toolchain, then executing buildall.sh
  with SKIP_BOOTSTRAP_TOOLCHAIN set to "true".

Change-Id: Ic51b3c327b3cebc08edff90de931d07e35e0c319
Reviewed-on: http://gerrit.cloudera.org:8080/15759
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-22 21:56:01 +00:00
David Knupp
c26e3db4bd IMPALA-9362: Upgrade sqlparse 0.1.19 -> 0.3.1
Upgrades the impala-shell's bundled version of sqlparse to 0.3.1.
There were some API changes in 0.2.0+ that required a re-write of
the StripLeadingCommentFilter in impala_shell.py. A slight perf
optimization was also added to avoid using the filter altogether
if no leading comment is readily discernible.

As 0.1.19 was the last version of sqlparse to support python 2.6,
this patch also breaks Impala's compatibility with python 2.6.

No new tests were added, but all existing tests passed without
modification.

Change-Id: I77a1fd5ae311634a18ee04b8c389d8a3f3a6e001
Reviewed-on: http://gerrit.cloudera.org:8080/15642
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-17 05:04:23 +00:00
Laszlo Gaal
c97191b6a5 IMPALA-9626: Use Python from the toolchain for Impala
Historically Impala used the Python2 version that was available on
the hosting platform, as long as that version was at least v2.6.
This caused constant headache as all Python syntax had to be kept
compatible with Python 2.6 (for Centos 6). It also caused a recent problem
on Centos 8: here the system Python version was compiled with the
system's GCC version (v8.3), which was much more recent than the Impala
standard compiler version (GCC 4.9.2). When the Impala virtualenv was
built, the system Python version supplied C compiler switches for models
containing native code that were unknown for the Impala version of GCC,
thus breaking virtualenv installation.

This patch changes the Impala virtualenv to always use the Python2
version from the toolchain, which is built with the toolchain compiler.

This ensures that
- Impala always has a known Python 2.7 version for all its scripts,
- virtualenv modules based on native code will always be installable, as
  the Python environment and the modules are built with the same compiler
  version.

Additional changes:
- Add an auto-use fixture to conftest.py to check that the tests are
  being run with Python 2.7.x
- Make bootstrap_toolchain.py independent from the Impala virtualenv:
  remove the dependency on the "sh" library

Tests:
- Passed core-mode tests on CentOS 7.4
- Passed core-mode tests in Docker-based mode for centos:7
  and ubuntu:16.04

Most content in this patch was developed but not published earlier
by Tim Armstrong.

Change-Id: Ic7b40cef89cfb3b467b61b2d54a94e708642882b
Reviewed-on: http://gerrit.cloudera.org:8080/15624
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-16 01:08:00 +00:00
David Knupp
5c541512f0 IMPALA-9582: Upgrade thrift_sasl to 0.4.2 for impala-shell
Change-Id: Iff739ebeaf5b022a7418883b638b5c5d17885f3b
Reviewed-on: http://gerrit.cloudera.org:8080/15610
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-01 04:22:38 +00:00