impala

mirror of https://github.com/apache/impala.git synced 2025-12-22 03:18:15 -05:00

Author	SHA1	Message	Date
Csaba Ringhofer	843de44788	IMPALA-13125: Fix pairwise test vector generation Replaced allpairspy with a homemade pair finder that seems to find a somewhat less optimal (larger) covering vector set but works reliably with filters. For details see tests/common/test_vector.py Also fixes a few test issues uncovered. Some fixes are copied from https://gerrit.cloudera.org/#/c/23319/ Added the possibility of shuffling vectors to get a different test set (env var IMPALA_TEST_VECTOR_SEED). By default the algorithm is deterministic so the test set won't change between runs (similarly to allpairspy). Added a new constraint to test only a single compression per file format in some tests to reduce the number of new vectors. EE + custom_cluster test count in exhaustive runs: before patch: ~11000 after patch: ~16000 without compression constraint: ~17000 Change-Id: I419c24659a08d8d6592fadbbd5b764ff73cbba3e Reviewed-on: http://gerrit.cloudera.org:8080/23342 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-28 15:27:02 +00:00
Riza Suminto	9cb9bae84e	IMPALA-13758: Use context manager in ImpalaTestSuite.change_database ImpalaTestSuite.change_database is responsible to point impala client to database under test. However, it left client pointing to that database after the test without reverting them back to default database. This patch does the reversal by changing ImpalaTestSuite.change_database to use context manager. This patch change the behavior of execute_query_using_client() and execute_query_async_using_client(). They used to change database according to the given vector parameter, but not anymore after this patch. In practice, this behavior change does not affect many tests because most queries going through these functions already use fully qualified table name. Going forward, querying through function other than run_test_case() should try to use fully qualified table name as much as possible. Retain behavior of ImpalaTestSuite._get_table_location() since there are considerable number of tests relies on it (changing database when called). Removed unused test fixtures and fixed several flake8 issues in modified test files. Testing: - Moved nested-types-subplan-single-node.test. This allows the test framework to point to the right tpch_nested* database. - Pass exhaustive test except IMPALA-13752 and IMPALA-13761. They will be fixed in separate patch. Change-Id: I75bec7403cc302728a630efe3f95e852a84594e2 Reviewed-on: http://gerrit.cloudera.org:8080/22487 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-19 23:50:34 +00:00
Riza Suminto	c08aff420d	IMPALA-13672: Migrate query_test/test_kudu.py to use hs2 protocol This patch migrate query_test/test_kudu.py to use hs2 client protocol. Here are the steps taken: - Override default_test_protocol() to return 'hs2'. See documentation in ImpalaTestSuite about what this method does. - Remove usage of deprecated cursor and unique_cursor fixture. - Replace all direct ImpalaTestSuite.client usage with helper function call such as execute_query() or execute_query_using_vector(). - Remove all "SET" query invocation and replace it with passing exec_option dictionary to helper method. - Replace veryfing kudu modified / inserted rows from reading query output to reading runtime profile counters. - Add HS2_TYPES section at test cases where only TYPES exist. - Remove all drop_impala_table_after_context() calls and replace it with proper use of unique_database fixture. KuduTestSuite is fixed with hs2 protocol dimension. Meanwhile, CustomKuduTest is fixed to use beeswax protocol dimension until proper migration can be done. Added following convenience methods: - ImpalaTestSuite.default_test_protocol() to allow individual test class to override its default test procol. - ImpylaHS2ResultSet.tuples() to access the raw HS2 result set that is a list of tuples. This patch also added several literal constants around test vector dimension to help with traceability. Fixed a bug where "SHOW PARTITIONS" via hs2 over kudu table will shows NULL number of #Replicas because TResultRowBuilder does not have overload for int type value. Adjust numFiles variable inside HdfsTable.getTableStats() from int to long to match Type.BIGINT of column '#Files'. Fixed py.test classes that does not inherit BaseTestSuite. Fixed flake8 issues in test_statestore.py. Testing: - Run and pass all tests extended from KuduTestSuite in exhaustive mode. Change-Id: I5f38baf5a0bbde1a1ad0bb4666c300f4f3cabd33 Reviewed-on: http://gerrit.cloudera.org:8080/22358 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-07 11:57:59 +00:00
Riza Suminto	6fbde72969	IMPALA-13694: Add ImpalaTestSuite.__reset_impala_clients method This patch adds __reset_impala_clients() method in ImpalaConnection. __reset_impala_clients() then simply clear configuration. It is called on each setup_method() to ensure that each EE test uses clean test client. All subclasses of ImpalaTestSuite that declare setup() method are refactored to declare setup_method() instead, to match newer py.test convention. Also implement teardown_method() to complement setup_method(). See "Method and function level setup/teardown" at https://docs.pytest.org/en/stable/how-to/xunit_setup.html. CustomClusterTestSuite fully overrides setup_method() and teardown_method() because it subclasses can be destructive. The custom cluster test method often restart the whole Impala cluster, rendering default impala clients initialized at setup_class() unusable. Each subclass of CustomClusterTestSuite is responsible to ensure that impala client they are using is in a good state. This patch improve BeeswaxConnection and ImpylaHS2Connection to only consider non-REMOVED options as its default options. They lookup for valid (not REMOVED) query options with their own appropriate way, memorized the option names as lowercase string and the values as string. List values are wrapped with double quote. Log in ImpalaConnection.set_configuration_option() is differentiated from how SET query looks. Note that ImpalaTestSuite.run_test_case() modify and restore query option written at .test file by issuing SET query, not by calling ImpalaConnection.set_configuration_option(). It is remain unchanged. Consistently lower case query option everywhere in Impala test code infrastructure. Fixed several tests that has been unknowingly override 'exec_option' vector dimension due to case sensitive mismatch. Also fixed some flake8 issues. Added convenience method execute_query_using_vector() and create_impala_client_from_vector() in ImpalaTestSuite. Testing: - Pass core tests. Change-Id: Ieb47fec9f384cb58b19fdbd10ff7aa0850ad6277 Reviewed-on: http://gerrit.cloudera.org:8080/22404 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-06 04:03:33 +00:00
Riza Suminto	134de01a59	IMPALA-13642: Fix unused test vector in test_scanners.py Several test vectors were ignored in test_scanners.py. This cause repetition of the same test without actually varying the test exec_option nor debug_action. This patch fix it by: - Use execute_query() instead of client.execute() - Passing vector.get_value('exec_option') when executing test query. Repurpose ImpalaTestMatrix.embed_independent_exec_options to deepcopy 'exec_option' dimension during vector generation. Therefore, each test execution will have unique copy of 'exec_option' for them self. This patch also adds flake8-unused-arguments plugin into critique-gerrit-review.py and py3-requirements.txt so we can catch this issue during code review. impala-flake8 is also updated to use impala-python3-common.sh. Adds flake8==3.9.2 in py3-requirements.txt, which is the highest version that has compatible dependencies with pylint==2.10.2. Drop unused 'dryrun' parameter in get_catalog_compatibility_comments method of critique-gerrit-review.py. Testing: - Run impala-flake8 against test_scanners.py and confirm there is no more unused variable. - Run and pass test_scanners.py in core exploration. Change-Id: I3b78736327c71323d10bcd432e162400b7ed1d9d Reviewed-on: http://gerrit.cloudera.org:8080/22301 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-01-09 06:17:51 +00:00
Riza Suminto	b9b4a6d122	IMPALA-13330: Fix orc_schema_resolution in test_nested_types.py test_nested_types.py declare 'orc_schema_resolution' dimension, but does not actually exercise it. None of the test actively inserting 'orc_schema_resolution' dimension value into vector.get_value('exec_dimension'). This patch fix that issue by declaring 'orc_schema_resolution' option using helper function add_exec_option_dimension() to automatically insert it into 'exec_option' dimension. Test classes also reorganized to reduce test skipping and deepcopy-ing. Following are notable changes: - Use 'unique_database' in test_struct_in_select_list to avoid collision during view creation. - Drop unused 'unique_database' fixture in TestNestedCollectionsInSelectList. - test_map_null_keys does not have 'mt_dop' dimension anymore since it only test how NULL map key are displayed. - Created common base class TestParquetArrayEncodingsBase for TestParquetArrayEncodings and TestParquetArrayEncodingsAmbiguous. The latter does not run with 'parquet_array_resolution' anymore since that query option is set directly within parquet-ambiguous-list-modern.test and parquet-ambiguous-list-legacy.test files. - Make ImpalaTestMatrix.add_dimensions() call ImpalaTestMatrix.clear_dimension() if given dimension.name is 'exec_option' and independent_exec_option_names is not empty. The reduction of test count are follows: Before patch: 168 core tests, 571 exhaustive tests After patch: 161 core tests, 529 exhaustive tests Testing: - Ran and pass test_nested_types.py in exhaustive exploration. - Verified that no WARNING log printed by ImpalaTestSuite.validate_exec_option_dimension() Change-Id: Ib958cd34a56c949190b4f22e5da5dad2c0de25ff Reviewed-on: http://gerrit.cloudera.org:8080/21726 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-08-31 06:11:10 +00:00
Riza Suminto	be6f896d10	IMPALA-13319: Avoid duplicate exec option declaration in py.test Before this patch, add_mandatory_exec_option() replace existing query option values in 'exec_option' dimension and may cause unintended test vector duplication. For example, the following declaration will create two duplicate test vector, both with "disable_codegen=False": cls.ImpalaTestMatrix.add_dimension(create_exec_option_dimension( disable_codegen_options=[False, True])) add_mandatory_exec_option(cls, "disable_codegen", False) add_exec_option_dimension() will create new test dimension for a 'key', but does not insert it into 'exec_option' dimension until vector generation later. It also does not validate if 'key' already exist in 'exec_option' dimension. This can confuse test writer when they need to write constraint, because they might look for the value at vector.get_value('exec_option')['key'] instead of vector.get_value('key'), and vice versa. This patch add assertion to check that no duplicate query option name is declared through any helper function. It also assert that all query option names are declared in lowercase. Testing: - Manually verify test vector generation in test files containing the helper functions by running: impala-py.test --exploration=exhaustive --collect-only <test_file> - Adjust query option declaration that breaks after this change. Change-Id: I8143e47f19090e20707cfb0a05c779f4d289f33c Reviewed-on: http://gerrit.cloudera.org:8080/21707 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-08-23 02:05:36 +00:00
Riza Suminto	172925bcb7	IMPALA-3825: Delegate runtime filter aggregation to some executors IMPALA-4400 improve the runtime filter by aggregating runtime filters locally before sending filter update to the coordinator and sharing a single RuntimeFilterBank for all fragment instances in a query. However, local filter aggregation is still insufficient if the number of nodes in an impala cluster is large. For example, in a cluster of around 700 impalad backends, aggregation of 1 MB bloom filter updates in the coordinator can exceed more than 1 second. This patch aims to reduce coordinator load and speed up runtime filter aggregation by doing intermediate aggregation in a few designated impala backends before doing final aggregation and publishing in the coordinator. Query option MAX_NUM_FILTERS_AGGREGATED_PER_HOST is added to control this feature. Given N as the number of backend executors excluding the coordinator, the selected number of intermediate aggregators M = ceil(N / MAX_NUM_FILTERS_AGGREGATED_PER_HOST). Setting MAX_NUM_FILTERS_AGGREGATED_PER_HOST <= 1 will disable the intermediate aggregator feature. In the backend scheduler, M impalad will be selected randomly as the intermediate aggregator for that runtime filter. Information of this M selected impalad then passed from the scheduler to coordinator as a RuntimeFilterAggregatorInfoPB. The coordinator then converts the RuntimeFilterAggregatorInfoPB into a filter routing information TRuntimeFilterAggDesc that is piggy-backed in TRuntimeFilterSource. A new RPC endpoint named UpdateFilterFromRemote is added in data_stream_service.proto to handle filter updates from fellow impalad executor to the designated aggregator impalad. This RPC will merge filter updates into 'pending_remote_filter'. The intermediate aggregator will then combine 'pending_remote_filter' with 'pending_merge_filter' (from local aggregation) into 'result_filter' which is then sent to the coordinator. RuntimeFilterBank of the intermediate aggregator will wait for all remote filter updates for at least RUNTIME_FILTER_WAIT_TIME_MS. If RuntimeFilterBank is closing and RUNTIME_FILTER_WAIT_TIME_MS has passed, any incomplete filter will be marked as ALWAYS_TRUE and sent to the coordinator. This patch currently targets the bloom filter produced by partitioned join build only. Another kind of runtime filter is still efficient to aggregate in coordinator only, while the bloom filter from broadcast join only requires 1 valid filter update for publishing. test_runtime_filters.py is modified to clarify the exec_options dimension, test matrix constraints, and reduce pytest.skip() calls on each test. runtime_filters.test is also changed to use counter aggregation and assert on ExecSummary table so that they stay valid irrespective of the number of fragment instances. We benchmark the aggregation speed of 1 MB runtime filter aggregation on 20 executor nodes cluster with MT_DOP=36 that is instrumented to disable local aggregation, simulating 720 runtime filter updates. The speed is approximated as the duration between the earliest time a filter update is made and the time that the coordinator publishes the complete filter. The result is following: +---------------------+------------------------+ \| num aggregator node \| Aggregation speed (ms) \| +---------------------+------------------------+ \| 0 \| 1296 \| \| 1 \| 1229 \| \| 2 \| 608 \| \| 4 \| 329 \| \| 8 \| 205 \| +---------------------+------------------------+ Testing: - Exercise MAX_NUM_FILTERS_AGGREGATED_PER_HOST in test_runtime_filters.py and query-options-test.cc - Add TestRuntimeFiltersLateRemoteUpdate. - Add custom_cluster/test_runtime_filter_aggregation.py. - Pass exhaustive tests. Change-Id: I11d38ed0f223d6e5b32a19ebe725af7738ee4ab0 Reviewed-on: http://gerrit.cloudera.org:8080/20612 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-12-20 12:29:55 +00:00
Riza Suminto	feefcc6395	IMPALA-12518: Combine all exec_option dimension in test_vector.py Before this patch, when writing pytest that exercise custom query option values, we need to declare it by making new test dimension, followed by deepcopying the original vector, and inserting the selected dimension value into 'exec_option' dictionary in generated vector. This patch simplify this steps by accounting dimensions that is intended to be part of 'exec_option' and automatically combining them during vector generation in test_vector.py. Such dimension should be registered via the new ImpalaTestMatrix.add_exec_option_dimension() function. function add_exec_option_dimension() in test_dimensions.py is renamed to add_mandatory_exec_option() to make it consistent with the same functionality in ImpalaTestMatrix and avoid confusion with the new ImpalaTestMatrix.add_exec_option_dimension() function. Function name add_exec_option_dimension() in test_dimensions.py is then repurposed as a shorthand for ImpalaTestMatrix.add_exec_option_dimension(). The remaining changes for other pytest files will be done gradually. Testing: - Fix bug in TestIcebergV2Table and confirm that both True and False value for 'disable_optimized_iceberg_v2_read' options are exercised. - Run and pass all modified tests in this patch. Change-Id: I3adba260990fccf4d2f2e7c8c4e4fadc6fd43fe1 Reviewed-on: http://gerrit.cloudera.org:8080/20625 Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2023-10-27 03:22:53 +00:00
Michael Smith	8b2598cd70	IMPALA-12485: Remove Python 2 has_key Switch calls to dict#has_key (Python 2-only) for 'key in dict' syntax. Change-Id: I08e9f6667011d70ceddbf919a61d1be7d6e07ee4 Reviewed-on: http://gerrit.cloudera.org:8080/20541 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-10-10 00:44:23 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	566df80891	IMPALA-11959: Add Python 3 virtualenv This adds a Python 3 equivalent to the impala-python virtualenv base on the toolchain Python 3.7.16. This modifies bootstrap_virtualenv.py to support the two different modes. This adds py2-requirements.txt and py3-requirements.txt to allow some differences between the Python 2 and Python 3 virtualenvs. Here are some specific package changes: - allpairs is replaced with allpairspy, as allpairs did not support Python 3. - requests is upgraded slightly, because otherwise is has issues with idna==2.8. - pylint is limited to Python 3, because we are adding it and don't need it on both - flake8 is limited to Python 2, because it will take some work to switch to a version that works on Python 3 - cm_api is limited to Python 2, because it doesn't support Python 3 - pytest-random does not support Python 3 and it is unused, so it is removed - Bump the version of setuptool-scm to support Python 3 This adds impala-pylint, which can be used to do further Python 3 checks via --py3k. This also adds a bin/check-pylint-py3k.sh script to enforce specific py3k checks. The banned py3k warnings are specified in the bin/banned_py3k_warnings.txt. This is currently empty, but this can ratchet up the py3k strictness over time to avoid regressions. This pulls in a new toolchain with the fix for IMPALA-11956 to get Python 3.7.16. Testing: - Hand tested that the allpairs libraries produce the same results - The python3 virtualenv has no influence on regular tests yet Change-Id: Ica4853f440c9a46a79bd5fb8e0a66730b0b4efc0 Reviewed-on: http://gerrit.cloudera.org:8080/19567 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	c1794023bc	IMPALA-11952 (part 3): Fix raise syntax Python 3 does not support this old raise syntax: raise Exception, "message" Instead, it should be: raise Exception("message") This fixes all locations with the old raise syntax. Testing: - check-python-syntax.sh shows no errors from raise syntax Change-Id: I2722dcc2727fb65c7aedede12d73ca5b088326d7 Reviewed-on: http://gerrit.cloudera.org:8080/19553 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2023-02-28 17:11:50 +00:00
Steve Carlin	bb9fb663ce	IMPALA-10778: Allow impala-shell to connect directly to HS2 Impala-shell already uses HS2 protocol to connect to Impalad. This commit allows impala-shell to connect to any server (for example, Hive) using the hs2 protocol. This will be done via the "--strict_hs2_protocol" option. When the "--strict_hs2_protocol" option is turned on, only features supported by hs2 will work. For instance, "runtime-profile" is an impalad specific feature and will be disabled. The "--strict_hs2_protocol" will only work on servers that abide by the strict definition of what is supported by HS2. So one will be able to connect to Hive in this mode, but connections to Impala will not work. Any feature supported by Hive (e.g. kerberos authentication) should work as well. Note: While authentication should work, the test framework is not set up to create an HS2 server that does authentication at this point so this feature should be used with caution. Change-Id: I674a45640a4a7b3c9a577830dbc7b16a89865a9e Reviewed-on: http://gerrit.cloudera.org:8080/17660 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-08-27 09:45:59 +00:00
Thomas Tauber-Marshall	a4ad8f35f7	IMPALA-8390: clean up test vectors in test_cancellation.py Due to changes to TestCancellation made in IMPALA-7205 that were not reflected in TestCancellationSerial and TestCancellationFullSort, test_cancel_insert has not been running at all and test_cancel_sort has been running with unintended parameters. This patch re-enables test_cancel_insert, while including a number of constraints on its parameters to keep test execution time reasonable. It also fixes an incorrect constraint on test_cancel_sort. The patch also makes some related improvements: - Removes an xfail on test_cancel_insert related to a bug that is fixed now. - When ImpalaTestVector.get_value() is called with a value name that does not actually exist in the vector, the result is a StopIteration exception. Due to python's questionable habit of using exceptions for flow control, StopIteration is frequently treated not as an error but as the normal end of iteration, which can result in unexpected behavior, eg. when pytest_generate_tests raises a StopIteration pytest just silently ignores it and drops the test case. This patch modifies get_value() to instead raise a ValueError in this situation. - When a test has no vectors generated for it, the name of the test is now included in the logged warning. Testing: - Ran full core and exhaustive runs and verified that the expected test cases are run for test_cancellation.py now Change-Id: I9673fe82bda5314aff6a51d1961759ff286fbf6f Reviewed-on: http://gerrit.cloudera.org:8080/12960 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-13 00:27:19 +00:00
stiga-huang	818cd8fa27	IMPALA-5717: Support for reading ORC data files This patch integrates the orc library into Impala and implements HdfsOrcScanner as a middle layer between them. The HdfsOrcScanner supplies input needed from the orc-reader, tracks memory consumption of the reader and transfers the reader's output (orc::ColumnVectorBatch) into impala::RowBatch. The ORC version we used is release-1.4.3. A startup option --enable_orc_scanner is added for this feature. It's set to true by default. Setting it to false will fail queries on ORC tables. Currently, we only support reading primitive types. Writing into ORC table has not been supported neither. Tests - Most of the end-to-end tests can run on ORC format. - Add tpcds, tpch tests for ORC. - Add some ORC specific tests. - Haven't enabled test_scanner_fuzz for ORC yet, since the ORC library is not robust for corrupt files (ORC-315). Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4 Reviewed-on: http://gerrit.cloudera.org:8080/9134 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-04-11 05:13:02 +00:00
Taras Bobrovytsky	35a3e186d6	IMPALA-5478: Run TPCDS queries with decimal_v2 enabled We add new TPCDS .test files that are expected to be run with decimal_v2 enabled. The new expected results were generated using Impala and I inspected them manually. Change-Id: Ib867c51a521ec4a087bc127d99aee4b95ba97733 Reviewed-on: http://gerrit.cloudera.org:8080/8985 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-18 03:28:51 +00:00
David Knupp	f590bc0da6	IMPALA-4750: Rename test infra classes so they don't mimic test classes. This patch addresses warning messages from pytest re: the imported TestMatrix, TestVector, and TestDimension classes, which were being collected as potential test classes. The fix was to simply prepend the class names with Impala- git grep -l 'TestDimension' \| xargs \ sed -i 's/TestDimension/ImpalaTestDimension/g' git grep -l 'TestMatrix' \| xargs \ sed -i 's/TestMatrix/ImpalaTestMatrix/g' git grep -l 'TestVector' \| xargs \ sed -i 's/TestVector/ImpalaTestVector/g' The tests all passed in an exhaustive run on the upstream jenkins server: http://jenkins.impala.io:8080/view/Utility/job/pre-review-test/8/ Change-Id: I06b7bc6fd99fbb637a47ba376bf9830705c1fce1 Reviewed-on: http://gerrit.cloudera.org:8080/5794 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-01-26 23:40:22 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Taras Bobrovytsky	609b80410e	Clean up Python test import statements Many of our test scripts have import statements that look like "from xxx import *". It is a good practice to explicitly name what needs to be imported. This commit implements this practice. Also, unused import statements are removed. Change-Id: I6a33bb66552ae657d1725f765842f648faeb26a8 Reviewed-on: http://gerrit.cloudera.org:8080/3444 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Internal Jenkins	2016-07-15 23:26:18 +00:00
Casey Ching	074e5b4349	Remove hashbang from non-script python files Many python files had a hashbang and the executable bit set though they were not intended to be run a standalone script. That makes determining which python files are actually scripts very difficult. A future patch will update the hashbang in real python scripts so they use $IMPALA_HOME/bin/impala-python. Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba Reviewed-on: http://gerrit.cloudera.org:8080/599 Reviewed-by: Casey Ching <casey@cloudera.com> Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2015-08-04 05:26:07 +00:00
Alex Behm	37ca6b81ae	IMPALA-1567: Ignore 'hidden' files with special suffixes. Currently, we only consider files hidden if they have the special prefixes "." or "_". However, some tools use special suffixes to indicate a file is being operated on, and should be considered invisible. This patch adds the following hidden suffixes: '.tmp' - Flume's default for temp files '.copying' - hdfs put may produce these Change-Id: I151eafd0286fa91e062407e12dd71cfddd442430 Reviewed-on: http://gerrit.cloudera.org:8080/80 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-02-24 10:55:22 +00:00
Lenni Kuff	ebd750acc6	Minor cleanup of test_spilling custom cluster test suite Change-Id: If853893db082eae79a6ec22180e9ad5572c58f05 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4455 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-09-21 19:43:50 -07:00
Ippokratis Pandis	fe0646f76b	IMPALA-1022: Handle cases where in Parquet the expected number of rows in metadata is wrong There are cases of Parquet files where the metadata indicate wrong number of rows for these files. The parquet-scanner until now was not reporting any problem in this case. Instead it was reading as long as there where values for the read columns. But with IMPALA-1016 we are now reading at most as many rows as the rows per metadata. With this patch, the parquet-scanner, right before it finishes scannings, checks whether it read the expected number of rows (taken from metadata). In cases where the actual number of rows read is less than or greater than the expected number, it either aborts or logs an error. Change-Id: Ie6a66a38e8912730bf04762e6526ec1cadb2bcdc Reviewed-on: http://gerrit.ent.cloudera.com:8080/2755 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2944	2014-06-10 17:27:54 -07:00
Lenni Kuff	8d1674f638	Run only subset of tests with small batch_sizes + a few small fixes	2014-01-08 10:48:58 -08:00
Lenni Kuff	45c1cbe1fd	Use Python 2.6 style dictionary comprehension for building test dimensions	2014-01-08 10:47:05 -08:00
Lenni Kuff	ef9a5c2d0e	Add test suite for DEFAULT_ORDER_BY_LIMIT query option	2014-01-08 10:47:05 -08:00
Nong Li	b575b08357	Fix planner to reject compressed text formats.	2014-01-08 10:47:01 -08:00
Lenni Kuff	ef48f65e76	Add test framework for running Impala query tests via Python This is the first set of changes required to start getting our functional test infrastructure moved from JUnit to Python. After investigating a number of option, I decided to go with a python test executor named py.test (http://pytest.org/). It is very flexible, open source (MIT licensed), and will enable us to do some cool things like parallel test execution. As part of this change, we now use our "test vectors" for query test execution. This will be very nice because it means if load the "core" dataset you know you will be able to run the "core" query tests (specified by --exploration_strategy when running the tests). You will see that now each combination of table format + query exec options is treated like an individual test case. this will make it much easier to debug exactly where something failed. These new tests can be run using the script at tests/run-tests.sh	2014-01-08 10:46:50 -08:00

29 Commits