Files
impala/shell/packaging/setup.py
David Knupp bc9d7e063d IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.
This is the main patch for making the the impala-shell cross-compatible with
python 2 and python 3. The goal is wind up with a version of the shell that will
pass python e2e tests irrepsective of the version of python used to launch the
shell, under the assumption that the test framework itself will continue to run
with python 2.7.x for the time being.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- The $IMPALA_HOME/bin/impala-shell.sh now sets up the impala-shell python
  environment independenty from bin/set-pythonpath.sh. The default version
  of thrift is thrift-0.11.0 (See IMPALA-9489).

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell with no authentication using Python 3.7.5
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell with LDAP-based authentication using Python 2.7.12
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result, this was inconsistency was reflected in the way we handled
  strings in the impala-shell code, but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could, and try to
  implement what is has colloquially been called a "unicode sandwich" -- namely,
  "bytes on the outside, unicode on the inside, encode/decode at the edges."

  The primary spot in the shell where we call decode() now is when sanitising
  input...

  args = self.sanitise_input(args.decode('utf-8'))

  ...and also whenever a library like re required it. Similarly, str.encode()
  is primarily used where a library like readline or csv requires is.

- PYTHONIOENCODING needs to be set to utf-8 to override the default setting for
  python 2. Without this, piping or redirecting stdout results in unicode errors.

- from __future__ import unicode_literals was added throughout

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and manually ran the tests against that.

  No effort has been made at this point to come up with a way to integrate
  testing of the shell in a python3 environment into our automated test
  processes.

Change-Id: Idb004d352fe230a890a6b6356496ba76c2fab615
Reviewed-on: http://gerrit.cloudera.org:8080/15524
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-18 05:13:50 +00:00

170 lines
6.0 KiB
Python

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Set up the Impala shell python package."""
import datetime
import os
import re
import sys
import time
from impala_shell import impala_build_version
from setuptools import find_packages, setup
from textwrap import dedent
CURRENT_DIR = os.path.dirname(os.path.abspath(__file__))
def parse_requirements(requirements_file='requirements.txt'):
"""
Parse requirements from the requirements file, stripping comments.
Args:
requirements_file: path to a requirements file
Returns:
a list of python packages
"""
lines = []
with open(requirements_file) as reqs:
for _ in reqs:
line = _.split('#')[0]
if line.strip():
lines.append(line)
return lines
def get_version():
"""Generate package version string when calling 'setup.py'.
When setup.py is being used to CREATE a distribution, e.g., via setup.py sdist
or setup.py bdist, then use the output from impala_build_version.get_version(),
and append modifiers as specified by the RELEASE_TYPE and OFFICIAL environment
variables. By default, the package created will be a dev release, designated
by timestamp. For example, if get_version() returns the string 3.0.0-SNAPSHOT,
the package version may be something like 3.0.0.dev20180322154653.
It's also possible set an evironment variable for BUILD_VERSION to override the
default build value returned from impala_build_version.get_version().
E.g., to specify an offical 3.4 beta 2 release (3.4b2), one would call:
BUILD_VERSION=3.4 RELEASE_TYPE=b2 OFFICIAL=true python setup.py sdist
The generated version string will be written to a version.txt file to be
referenced when the distribution is installed.
When setup.py is invoked during installation, e.g., via pip install or
setup.py install, read the package version from the version.txt file, which
is presumed to contain a single line containing a valid PEP-440 version string.
The file should have been generated when the distribution being installed was
created. (Although a version.txt file can also be created manually.)
See https://www.python.org/dev/peps/pep-0440/ for more info on python
version strings.
Returns:
A package version string compliant with PEP-440
"""
version_file = os.path.join(CURRENT_DIR, 'version.txt')
if not os.path.isfile(version_file):
# If setup.py is being executed to create a distribution, e.g., via setup.py
# sdist or setup.py bdist, then derive the version and WRITE the version.txt
# file that will later be used for installations.
if os.getenv('BUILD_VERSION') is not None:
package_version = os.getenv('BUILD_VERSION')
else:
version_match = re.search('\d+\.\d+\.\d+', impala_build_version.get_version())
if version_match is None:
sys.exit('Unable to acquire Impala version.')
package_version = version_match.group(0)
# packages can be marked as alpha, beta, or rc RELEASE_TYPE
release_type = os.getenv('RELEASE_TYPE')
if release_type:
if not re.match('(a|b|rc)\d+?', release_type):
msg = """\
RELEASE_TYPE \'{0}\' does not conform to any PEP-440 release format:
aN (for alpha releases)
bN (for beta releases)
rcN (for release candidates)
where N is the number of the release"""
sys.exit(dedent(msg).format(release_type))
package_version += release_type
# packages that are not marked OFFICIAL have ".dev" + a timestamp appended
if os.getenv('OFFICIAL') != 'true':
epoch_t = time.time()
ts_fmt = '%Y%m%d%H%M%S'
timestamp = datetime.datetime.fromtimestamp(epoch_t).strftime(ts_fmt)
package_version = '{0}.dev{1}'.format(package_version, timestamp)
with open('version.txt', 'w') as version_file:
version_file.write(package_version)
else:
# If setup.py is being invoked during installation, e.g., via pip install
# or setup.py install, we expect a version.txt file from which to READ the
# version string.
with open(version_file) as version_file:
package_version = version_file.readline()
return package_version
setup(
name='impala_shell',
python_requires='>2.6',
version=get_version(),
description='Impala Shell',
long_description_content_type='text/markdown',
long_description=open('README.md').read(),
author="Impala Dev",
author_email='dev@impala.apache.org',
url='https://impala.apache.org/',
license='Apache Software License',
packages=find_packages(),
include_package_data=True,
install_requires=parse_requirements(),
entry_points={
'console_scripts': [
'impala-shell = impala_shell.impala_shell:impala_shell_main'
]
},
classifiers=[
'Development Status :: 5 - Production/Stable',
'Environment :: Console',
'Intended Audience :: Developers',
'Intended Audience :: End Users/Desktop',
'Intended Audience :: Science/Research',
'License :: OSI Approved :: Apache Software License',
'Operating System :: MacOS :: MacOS X',
'Operating System :: POSIX :: Linux',
'Programming Language :: Python :: 2 :: Only',
'Programming Language :: Python :: 2.6',
'Programming Language :: Python :: 2.7',
'Topic :: Database :: Front-Ends'
]
)