To remove the dependency on Python 2, existing scripts need to use
python3 rather than python. These commands find those
locations (for impala-python and regular python):
git grep impala-python | grep -v impala-python3 | grep -v impala-python-common | grep -v init-impala-python
git grep bin/python | grep -v python3
This removes or switches most of these locations by various means:
1. If a python file has a #!/bin/env impala-python (or python) but
doesn't have a main function, it removes the hash-bang and makes
sure that the file is not executable.
2. Most scripts can simply switch from impala-python to impala-python3
(or python to python3) with minimal changes.
3. The cm-api pypi package (which doesn't support Python 3) has been
replaced by the cm-client pypi package and interfaces have changed.
Rather than migrating the code (which hasn't been used in years), this
deletes the old code and stops installing cm-api into the virtualenv.
The code can be restored and revamped if there is any interest in
interacting with CM clusters.
4. This switches tests/comparison over to impala-python3, but this code has
bit-rotted. Some pieces can be run manually, but it can't be fully
verified with Python 3. It shouldn't hold back the migration on its own.
5. This also replaces locations of impala-python in comments / documentation /
READMEs.
6. kazoo (used for interacting with HBase) needed to be upgraded to a
version that supports Python 3. The newest version of kazoo requires
upgrades of other component versions, so this uses kazoo 2.8.0 to avoid
needing other upgrades.
The two remaining uses of impala-python are:
- bin/cmake_aux/create_virtualenv.sh
- bin/impala-env-versioned-python
These will be removed separately when we drop Python 2 support
completely. In particular, these are useful for testing impala-shell
with Python 2 until we stop supporting Python 2 for impala-shell.
The docker-based tests still use /usr/bin/python, but this can
be switched over independently (and doesn't impact impala-python)
Testing:
- Ran core job
- Ran build + dataload on Centos 7, Redhat 8
- Manual testing of individual scripts (except some bitrotted areas like the
random query generator)
Change-Id: If209b761290bc7e7c716c312ea757da3e3bca6dc
Reviewed-on: http://gerrit.cloudera.org:8080/23468
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
If an external table contains data files in subdirectories, and
recursive listing is enabled, Impala considers the files in the
subdirectories as part of the table. However, currently INSERT OVERWRITE
and TRUNCATE do not always delete these files, leading to data
corruption.
This change takes care of TRUNCATE.
Currently TRUNCATE can be run in two different ways:
- if the table is being replicated, the HMS api is used
- otherwise catalogd deletes the files itself.
Two differences between these methods are:
- calling HMS leads to an ALTER_TABLE event
- calling HMS leads to recursive delete while catalogd only
deletes files directly in the partition/table directory.
This commit introduces the '--truncate_external_tables_with_hms' startup
flag, with default value 'true'. If this flag is set to true, Impala
always uses the HMS api for TRUNCATE operations.
Note that HMS always deletes stats on TRUNCATE, so setting the
DELETE_STATS_IN_TRUNCATE query option to false is not supported if
'--truncate_external_tables_with_hms' is set to true: an exception is
thrown.
Testing:
- extended the tests in test_recursive_listing.py::TestRecursiveListing
to include TRUNCATE
- Moved tests with DELETE_STATS_IN_TRUNCATE=0 from truncate-table.test
to truncate-table-no-delete-stats.test, which is run in a new custom
cluster test (custom_cluster/test_no_delete_stats_in_truncate.py).
Change-Id: Ic0fcc6cf1eca8a0bcf2f93dbb61240da05e35519
Reviewed-on: http://gerrit.cloudera.org:8080/23166
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>