impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Joe McDonnell	0c7c6a335e	IMPALA-11977: Fix Python 3 broken imports and object model differences Python 3 changed some object model methods: - __nonzero__ was removed in favor of __bool__ - func_dict / func_name were removed in favor of __dict__ / __name__ - The next() function was deprecated in favor of __next__ (Code locations should use next(iter) rather than iter.next()) - metaclasses are specified a different way - Locations that specify __eq__ should also specify __hash__ Python 3 also moved some packages around (urllib2, Queue, httplib, etc), and this adapts the code to use the new locations (usually handled on Python 2 via future). This also fixes the code to avoid referencing exception variables outside the exception block and variables outside of a comprehension. Several of these seem like false positives, but it is better to avoid the warning. This fixes these pylint warnings: bad-python3-import eq-without-hash metaclass-assignment next-method-called nonzero-method exception-escape comprehension-escape Testing: - Ran core tests - Ran release exhaustive tests Change-Id: I988ae6c139142678b0d40f1f4170b892eabf25ee Reviewed-on: http://gerrit.cloudera.org:8080/19592 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	eb66d00f9f	IMPALA-11974: Fix lazy list operators for Python 3 compatibility Python 3 changes list operators such as range, map, and filter to be lazy. Some code that expects the list operators to happen immediately will fail. e.g. Python 2: range(0,5) == [0,1,2,3,4] True Python 3: range(0,5) == [0,1,2,3,4] False The fix is to wrap locations with list(). i.e. Python 3: list(range(0,5)) == [0,1,2,3,4] True Since the base operators are now lazy, Python 3 also removes the old lazy versions (e.g. xrange, ifilter, izip, etc). This uses future's builtins package to convert the code to the Python 3 behavior (i.e. xrange -> future's builtins.range). Most of the changes were done via these futurize fixes: - libfuturize.fixes.fix_xrange_with_import - lib2to3.fixes.fix_map - lib2to3.fixes.fix_filter This eliminates the pylint warnings: - xrange-builtin - range-builtin-not-iterating - map-builtin-not-iterating - zip-builtin-not-iterating - filter-builtin-not-iterating - reduce-builtin - deprecated-itertools-function Testing: - Ran core job Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f Reviewed-on: http://gerrit.cloudera.org:8080/19589 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	2b550634d2	IMPALA-11952 (part 2): Fix print function syntax Python 3 now treats print as a function and requires the parenthesis in invocation. print "Hello World!" is now: print("Hello World!") This fixes all locations to use the function invocation. This is more complicated when the output is being redirected to a file or when avoiding the usual newline. print >> sys.stderr , "Hello World!" is now: print("Hello World!", file=sys.stderr) To support this properly and guarantee equivalent behavior between python 2 and python 3, all files that use print now add this import: from __future__ import print_function This also fixes random flake8 issues that intersect with the changes. Testing: - check-python-syntax.sh shows no errors related to print Change-Id: Ib634958369ad777a41e72d80c8053b74384ac351 Reviewed-on: http://gerrit.cloudera.org:8080/19552 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2023-02-28 17:11:50 +00:00
Sahil Takiar	ac87278b16	IMPALA-8950: Add -d, -f options to hdfs copyFromLocal, put, cp Add the -d option and -f option to the following commands: `hdfs dfs -copyFromLocal <localsrc> URI` `hdfs dfs -put [ - \| <localsrc1> .. ]. <dst>` `hdfs dfs -cp URI [URI ...] <dest>` The -d option "Skip[s] creation of temporary file with the suffix ._COPYING_." which improves performance of these commands on S3 since S3 does not support metadata only renames. The -f option "Overwrites the destination if it already exists" combined with HADOOP-13884 this improves issues seen with S3 consistency issues by avoiding a HEAD request to check if the destination file exists or not. Added the method 'copy_from_local' to the BaseFilesystem class. Re-factored most usages of the aforementioned HDFS commands to use the filesystem_client. Some usages were not appropriate / worth refactoring, so occasionally this patch just adds the '-d' and '-f' options explicitly. All calls to '-put' were replaced with 'copyFromLocal' because they both copy files from the local fs to a HDFS compatible target fs. Since WebHDFS does not have good support for copying files, this patch removes the copy functionality from the PyWebHdfsClientWithChmod. Re-factored the hdfs_client so that it uses a DelegatingHdfsClient that delegates to either the HadoopFsCommandLineClient or PyWebHdfsClientWithChmod. Testing: * Ran core tests on HDFS and S3 Change-Id: I0d45db1c00554e6fb6bcc0b552596d86d4e30144 Reviewed-on: http://gerrit.cloudera.org:8080/14311 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-10-05 00:04:08 +00:00
Tim Armstrong	4fb8e8e324	IMPALA-8816: reduce custom cluster test runtime in core This includes some optimisations and a bulk move of tests to exhaustive. Move a bunch of custom cluster tests to exhaustive. I selected these partially based on runtime (i.e. I looked most carefully at the tests that ran for over a minute) and the likelihood of them catching a precommit bug. Regression tests for specific edge cases and tests for parts of the code that are very stable were prime candidates. Remove an unnecessary cluster restart in test_breakpad. Merge test_scheduler_error into test_failpoints to avoid an unnecessary cluster restart. Speed up cluster starts by ensuring that the default statestore args are applied even when _start_impala_cluster() is called directly. This shaves a couple of seconds off each restart. We made the default args use a faster update frequency - see IMPALA-7185 - but they did not take effect in all tests. Change-Id: Ib2e3e7ebc9695baec4d69183387259958df10f62 Reviewed-on: http://gerrit.cloudera.org:8080/13967 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-08-06 21:34:26 +00:00
Tim Armstrong	2ca7f8e7c0	IMPALA-7995: part 1: fixes for e2e dockerised impala tests This fixes all core e2e tests running on my local dockerised minicluster build. I do not yet have a CI job or script running but I wanted to get feedback on these changes sooner. The second part of the change will include the CI script and any follow-on fixes required for the exhaustive tests. The following fixes were required: * Detect docker_network from TEST_START_CLUSTER_ARGS * get_webserver_port() does not depend on the caller passing in the default webserver port. It failed previously because it relied on start-impala-cluster.py setting -webserver_port for all processes. * Add SkipIf markers for tests that don't make sense or are non-trivial to fix for containerised Impala. * Support loading Impala-lzo plugin from host for tests that depend on it. * Fix some tests that had 'localhost' hardcoded - instead it should be $INTERNAL_LISTEN_HOST, which defaults to localhost. * Fix bug with sorting impala daemons by backend port, which is the same for all dockerised impalads. Testing: I ran tests locally as follows after having set up a docker network and starting other services: ./buildall.sh -noclean -notests -ninja ninja -j $IMPALA_BUILD_THREADS docker_images export TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster" export FE_TEST=false export BE_TEST=false export JDBC_TEST=false export CLUSTER_TEST=false ./bin/run-all-tests.sh Change-Id: Iee86cbd2c4631a014af1e8cef8e1cd523a812755 Reviewed-on: http://gerrit.cloudera.org:8080/12639 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-13 02:42:32 +00:00
Joe McDonnell	6399a65a00	IMPALA-7639: Move concurrent UDF tests to a custom cluster test Two test_udfs.py tests (test_native_functions_race and test_concurrent_jar_drop_use) spawn dozens of connections to test Impala behavior under concurrency. These connections use up frontend service threads and can cause shell tests to timeout when trying to connect. This moves both tests to a new TestUdfConcurrency custom cluster test. The new custom cluster test uses a larger fe_service_threads value to allow full concurrency. The tests run serially and cannot impact other tests. This also reduces the test dimensions for test_native_functions_race so that it runs one configuration rather than eight. Change-Id: I3f255823167a4dd807a07276f630ef02435900a3 Reviewed-on: http://gerrit.cloudera.org:8080/11701 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-17 21:31:24 +00:00

8 Commits