impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Joe McDonnell	c5a0ec8bdf	IMPALA-11980 (part 1): Put all thrift-generated python code into the impala_thrift_gen package This puts all of the thrift-generated python code into the impala_thrift_gen package. This is similar to what Impyla does for its thrift-generated python code, except that it uses the impala_thrift_gen package rather than impala._thrift_gen. This is a preparatory patch for fixing the absolute import issues. This patches all of the thrift files to add the python namespace. This has code to apply the patching to the thirdparty thrift files (hive_metastore.thrift, fb303.thrift) to do the same. Putting all the generated python into a package makes it easier to understand where the imports are getting code. When the subsequent change rearranges the shell code, the thrift generated code can stay in a separate directory. This uses isort to sort the imports for the affected Python files with the provided .isort.cfg file. This also adds an impala-isort shell script to make it easy to run. Testing: - Ran a core job Change-Id: Ie2927f22c7257aa38a78084efe5bd76d566493c0 Reviewed-on: http://gerrit.cloudera.org:8080/20169 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-04-15 17:03:02 +00:00
Csaba Ringhofer	f98b697c7b	IMPALA-13929: Make 'functional-query' the default workload in tests This change adds get_workload() to ImpalaTestSuite and removes it from all test suites that already returned 'functional-query'. get_workload() is also removed from CustomClusterTestSuite which used to return 'tpch'. All other changes besides impala_test_suite.py and custom_cluster_test_suite.py are just mass removals of get_workload() functions. The behavior is only changed in custom cluster tests that didn't override get_workload(). By returning 'functional-query' instead of 'tpch', exploration_strategy() will no longer return 'core' in 'exhaustive' test runs. See IMPALA-3947 on why workload affected exploration_strategy. An example for affected test is TestCatalogHMSFailures which was skipped both in core and exhaustive runs before this change. get_workload() functions that return a different workload than 'functional-query' are not changed - it is possible that some of these also don't handle exploration_strategy() as expected, but individually checking these tests is out of scope in this patch. Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115 Reviewed-on: http://gerrit.cloudera.org:8080/22726 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-08 07:12:55 +00:00
wzhou-code	a82830896b	IMPALA-12150: Use protocol version to isolate cluster components Some Thrift request/response structs in CatalogService were changed to add new variables in the middle, which caused cross version incompatibility issue for CatalogService. Impala cluster membership is managed by the statestore. During upgrade scenarios where different versions of Impala daemons are upgraded one at a time, the upgraded daemons have incompatible message formats. Even through protocol versions numbers were already defined for Statestore and Catalog Services, they were not used. The Statestore and Catalog server did not check the protocol version in the requests, which allowed incompatible Impala daemons to join one cluster. This causes unexpected query failures during rolling upgrade. We need a way to detect this and enforce that some rules are followed: - Statestore refuses the registration requests from incompatible subscribers. - Catalog server refuses the requests from incompatible clients. - Scheduler assigns tasks to a group of compatible executors. This patch isolate Impala daemons into separate clusters based on protocol versions of Statestore service to prevent incompatible Impala daemons from communicating with each other. It covers the Thrift RPC communications between catalogd and coordinators, and communication between statestore and its subscribers (executor, coordinators, catalogd and admissiond). This change should work for future upgrade. Following changes were made: - Bump StatestoreServiceVersion and CatalogServiceVersion to V2 for all requests of Statestore and Catalog services. - Update the request and response structs in CatalogService to ensure each Thrift request struct has protocol version and each Thrift response struct has returned status. - Update the request and response struct in StatestoreService to ensure each Thrift request struct has protocol version and each Thrift response struct has returned status. - Add subscriber type so that statestore could distinguish different types of subscribers. - Statestore checks protocol version for registration requests from subscribers. It refuses the requests with incompatible version. - Catalog server checks protocol version for Catalog service APIs, and returns error for requests with incompatible version. - Catalog daemon sends its address and the protocol version of Catalog service when it registers to statestore, statestore forwards the address and the protocol version of Catalog service to all subscribers during registration. - Add UpdateCatalogd API for StatestoreSubscriber service so that the coordinators could receive the address and the protocol version of Catalog service from statestore if the coordinators register to statestore before catalog daemon. - Add GetProtocolVersion API for Statestore service so that the subscribers can check the protocol version of statestore before calling RegisterSubscriber API. - Add starting flag tolerate_statestore_startup_delay. It is off by default. When it's enabled, the subscriber is able to tolerate the delay of the statestore's availability. The subscriber's process will not exit if it cannot register with the specified statestore on startup. But instead it enter into Recovery mode, it will loop, sleep and retry till it successfully register with the statestore. This flag should be enabled during rolling upgrade. CatalogServiceVersion is defined in CatalogService.thrift. In future, if we make non backward version compatible changes in the request or response structures for CatalogService APIs, we need to bump the protocol version of Catalog service. StatestoreServiceVersion is defined in StatestoreService.thrift. Similarly if we make non backward version compatible changes in the request or response structures for StatestoreService APIs, we need to bump the protocol version of Statestore service. Message formats for KRPC communications between coordinators and executors, and between admissiond and coordinators are defined in proto files under common/protobuf. If we make non backward version compatible changes in these structures, we need to bump the protocol version of Statestore service. Testing: - Added end-to-end unit tests. - Passed the core tests. - Ran manual test to verify old version of executors cannot register with new version of statestore, and new version of executors cannot register with old version of statestore. Change-Id: If61506dab38c4d1c50419c1b3f7bc4f9ee3676bc Reviewed-on: http://gerrit.cloudera.org:8080/19959 Reviewed-by: Andrew Sherman <asherman@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-06-21 04:24:16 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Tim Armstrong	2ca7f8e7c0	IMPALA-7995: part 1: fixes for e2e dockerised impala tests This fixes all core e2e tests running on my local dockerised minicluster build. I do not yet have a CI job or script running but I wanted to get feedback on these changes sooner. The second part of the change will include the CI script and any follow-on fixes required for the exhaustive tests. The following fixes were required: * Detect docker_network from TEST_START_CLUSTER_ARGS * get_webserver_port() does not depend on the caller passing in the default webserver port. It failed previously because it relied on start-impala-cluster.py setting -webserver_port for all processes. * Add SkipIf markers for tests that don't make sense or are non-trivial to fix for containerised Impala. * Support loading Impala-lzo plugin from host for tests that depend on it. * Fix some tests that had 'localhost' hardcoded - instead it should be $INTERNAL_LISTEN_HOST, which defaults to localhost. * Fix bug with sorting impala daemons by backend port, which is the same for all dockerised impalads. Testing: I ran tests locally as follows after having set up a docker network and starting other services: ./buildall.sh -noclean -notests -ninja ninja -j $IMPALA_BUILD_THREADS docker_images export TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster" export FE_TEST=false export BE_TEST=false export JDBC_TEST=false export CLUSTER_TEST=false ./bin/run-all-tests.sh Change-Id: Iee86cbd2c4631a014af1e8cef8e1cd523a812755 Reviewed-on: http://gerrit.cloudera.org:8080/12639 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-13 02:42:32 +00:00
David Knupp	f590bc0da6	IMPALA-4750: Rename test infra classes so they don't mimic test classes. This patch addresses warning messages from pytest re: the imported TestMatrix, TestVector, and TestDimension classes, which were being collected as potential test classes. The fix was to simply prepend the class names with Impala- git grep -l 'TestDimension' \| xargs \ sed -i 's/TestDimension/ImpalaTestDimension/g' git grep -l 'TestMatrix' \| xargs \ sed -i 's/TestMatrix/ImpalaTestMatrix/g' git grep -l 'TestVector' \| xargs \ sed -i 's/TestVector/ImpalaTestVector/g' The tests all passed in an exhaustive run on the upstream jenkins server: http://jenkins.impala.io:8080/view/Utility/job/pre-review-test/8/ Change-Id: I06b7bc6fd99fbb637a47ba376bf9830705c1fce1 Reviewed-on: http://gerrit.cloudera.org:8080/5794 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-01-26 23:40:22 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Taras Bobrovytsky	609b80410e	Clean up Python test import statements Many of our test scripts have import statements that look like "from xxx import *". It is a good practice to explicitly name what needs to be imported. This commit implements this practice. Also, unused import statements are removed. Change-Id: I6a33bb66552ae657d1725f765842f648faeb26a8 Reviewed-on: http://gerrit.cloudera.org:8080/3444 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Internal Jenkins	2016-07-15 23:26:18 +00:00
David Knupp	fc444c102e	IMPALA-3491: Use unique_database fixture in test_catalog_service_client.py. Even though this is just a single test, this change introduces the unique_database test fixture that was initially created to help with concurrent tests. It's still worth to do this here because we want to update all tests to use best practices. That said, there was still a performance gain to be had here. It turns out the initial code called the cleanup_db() method from the base ImpalaTestSuite class, which in turn sets the 'sync_ddl' query option to true. Not doing this at the beginning of this test results in a roughly 40x speedup. Change-Id: I5d6994f31d52e18e2e04aab0e34202e2c623e367 Reviewed-on: http://gerrit.cloudera.org:8080/3366 Reviewed-by: David Knupp <dknupp@cloudera.com> Tested-by: Internal Jenkins	2016-06-13 16:32:22 -07:00
Alex Behm	049ede9f62	OPSAPS-32457: Fix CatalogService Thrift changes to be backwards compatible. While adding support for permanent UDFs we made incompatible changes to the CatalogService Thrift definitions. Some services like BDR rely on a stable catalog API. This patch fixes the incompatibility. Change-Id: Iec04d07c48d7159d2837667d7039046de126a3ad Reviewed-on: http://gerrit.cloudera.org:8080/2455 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-03-06 15:23:39 -08:00
Casey Ching	074e5b4349	Remove hashbang from non-script python files Many python files had a hashbang and the executable bit set though they were not intended to be run a standalone script. That makes determining which python files are actually scripts very difficult. A future patch will update the hashbang in real python scripts so they use $IMPALA_HOME/bin/impala-python. Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba Reviewed-on: http://gerrit.cloudera.org:8080/599 Reviewed-by: Casey Ching <casey@cloudera.com> Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2015-08-04 05:26:07 +00:00
Dan Hecht	b46e8001ef	S3: more test triage and location fixups Fix up more locations to allow the tests to run on secondary filsystem. In particular, database locations need to be located on the target filesystem or else any tables created without locations will be in HDFS and not actually give coverage on S3. Change-Id: Ifcc4a47ecaa235b23d305784b844788732d5fa05 Reviewed-on: http://gerrit.cloudera.org:8080/143 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-03-05 09:12:46 +00:00
Martin Grund	b582cdc22b	IMPALA-1598: Adding Error Codes to Log Messages This patch introduces the concept of error codes for errors that are recorded in Impala and are going to be presented to the client. These error codes are used to aggregate and group incoming error / warning messages to reduce the spill on the shell and increase the usefulness of the messages. By splitting the message string from the implementation, it becomes possible to edit the string independently of the code and pave the way for internationalization. Error messages are defined as a combination of an enum value and a string. Both are defined in the Error.thrift file that is automatically generated using the script in common/thrift/generate_error_codes.py. The goal of the script is to have a central understandable repository of error messages. Adding new messages to this file will require rebuilding the thrift part. The proxy class ErrorMessage is responsible to represent an error and capture the parameters that are used to format the error message string. When error messages are recorded they are recorded based on the following algorithm: - If an error message is of type GENERAL, do not aggregate this message and simply add it to the total number of messages - If an error messages is of specific type, record the first error message as a sample and for all other occurrences increment the count. - The coordinator will merge all error messages except the ones of type GENERAL and display a count. For example, in the case of the parquet file spanning multiple blocks the output will look like: Parquet files should not be split into multiple hdfs-blocks. file=hdfs://localhost:20500/fid.parq (1 of 321 similar) All messages are always logged to VLOG. In the coordinator error messages are merged across all backends to retain readability in the case of large clusters. The current version of this patch adds these new error codes to some of the most important error messages as a reference implementation. Change-Id: I1f1811631836d2dd6048035ad33f7194fb71d6b8 Reviewed-on: http://gerrit.cloudera.org:8080/39 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: Internal Jenkins	2015-03-01 03:37:32 +00:00
ishaan	11cd7d1d46	Blacklist tests that don't work on s3 This patch introduces a new pytest marker that skip tests that currently don't work when s3 is used as the underlying file system. The set of blacklisted tests is a superset of tests that cannot be run with s3. Follow up patches will remove some of the test files from the blacklist. Change-Id: I39a58223d3435f0bd6496ffd00a2d483b751693d Reviewed-on: http://gerrit.cloudera.org:8080/82 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Internal Jenkins	2015-02-24 01:43:28 +00:00
Alex Behm	9cabee4a71	Wait for the Metastore to come up before starting HiveServer2. Change-Id: Ic8e29efe63f6745e1ff44248657cbd7882bb16d9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1626 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1670 Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-02-25 21:05:33 -08:00
Lenni Kuff	51f003a785	IMP-1156: Add CatalogServer API for listing all UDFs and UDAs in a database Adds a new client API for retrieving all user defined functions (aggregate and scalar) in a database. This is a requirement from CM Backup Disaster and Recovery. Change-Id: I4e33d714795fe808370262f36218ea112f67ec30 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1271 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-01-14 00:01:25 -08:00

16 Commits