IMPALA-10994: Normalize the pip package name part of download URL.

According to PEP-0503, pip repo server doesn't support unnormalized URL
access, and some package name within
'infra/python/deps/*requirements.txt' are unnormalized, e.g. 'Cython',
and pip_download.py will concat $PYPI_MIRROR and package name to get
download URL directly, which maybe unnormalized.

Fix this by normalize package name in download URL using the
recommanded method in PEP-0503.

Change-Id: I479df0ad7acf3c650b8f5317372261d5e2840864
Reviewed-on: http://gerrit.cloudera.org:8080/17987
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
yx91490
2021-11-01 12:33:50 +08:00
committed by Impala Public Jenkins
parent da53428abc
commit f566e7dee7

View File

@@ -82,7 +82,8 @@ def get_package_info(pkg_name, pkg_version):
# to sort them and return the first value in alphabetical order. This ensures that the
# same result is always returned even if the ordering changed on the server.
candidates = []
url = '{0}/simple/{1}/'.format(PYPI_MIRROR, pkg_name)
normalized_name = re.sub(r"[-_.]+", "-", pkg_name).lower()
url = '{0}/simple/{1}/'.format(PYPI_MIRROR, normalized_name)
print('Getting package info from {0}'.format(url))
# The web page should be in PEP 503 format (https://www.python.org/dev/peps/pep-0503/).
# We parse the page with regex instead of an html parser because that requires