IMPALA-10994: Normalize the pip package name part of download URL.

According to PEP-0503, pip repo server doesn't support unnormalized URL access, and some package name within 'infra/python/deps/*requirements.txt' are unnormalized, e.g. 'Cython', and pip_download.py will concat $PYPI_MIRROR and package name to get download URL directly, which maybe unnormalized. Fix this by normalize package name in download URL using the recommanded method in PEP-0503. Change-Id: I479df0ad7acf3c650b8f5317372261d5e2840864 Reviewed-on: http://gerrit.cloudera.org:8080/17987 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-12-19 18:12:08 -05:00 · 2021-11-01 12:33:50 +08:00
parent da53428abc
commit f566e7dee7
1 changed files with 2 additions and 1 deletions
--- a/infra/python/deps/pip_download.py
+++ b/infra/python/deps/pip_download.py
@@ -82,7 +82,8 @@ def get_package_info(pkg_name, pkg_version):
  # to sort them and return the first value in alphabetical order. This ensures that the
  # same result is always returned even if the ordering changed on the server.
  candidates = []
-  url = '{0}/simple/{1}/'.format(PYPI_MIRROR, pkg_name)
+  normalized_name = re.sub(r"[-_.]+", "-", pkg_name).lower()
+  url = '{0}/simple/{1}/'.format(PYPI_MIRROR, normalized_name)
  print('Getting package info from {0}'.format(url))
  # The web page should be in PEP 503 format (https://www.python.org/dev/peps/pep-0503/).
  # We parse the page with regex instead of an html parser because that requires