impala

mirror of https://github.com/apache/impala.git synced 2026-01-07 09:02:19 -05:00

Author	SHA1	Message	Date
David Knupp	c076f098d4	IMPALA-3491: Use unique_database fixture in test_shell_commandline.py. Before this change, a single test database was created for the entire suite, and each test was marked to run serially. With the addition of a test fixture in tests/conftest.py to create a unique database per each individual method, it's possible now to run the tests in parallel. (The tables required by individual tests are created via local test fixtures.) As such, any methods which had been responsible for setting up the test database were removed. Pytest markers for running tests serially were also removed, except in cases where interactions from running concurrency would affect other tests. Additional minor changes were made to improve PEP-8 compliance. The non-serial tests were run in a loop ten times to confirm that there weren't any unexpected failures. Review: https://gerrit.cloudera.org/#/c/3301/ Change-Id: Icdcb04a99c0907fc1ba56baa2497fafb33b0e34e Reviewed-on: http://gerrit.cloudera.org:8080/3301 Reviewed-by: David Knupp <dknupp@cloudera.com> Tested-by: Internal Jenkins	2016-06-13 18:32:32 -07:00
David Knupp	fc444c102e	IMPALA-3491: Use unique_database fixture in test_catalog_service_client.py. Even though this is just a single test, this change introduces the unique_database test fixture that was initially created to help with concurrent tests. It's still worth to do this here because we want to update all tests to use best practices. That said, there was still a performance gain to be had here. It turns out the initial code called the cleanup_db() method from the base ImpalaTestSuite class, which in turn sets the 'sync_ddl' query option to true. Not doing this at the beginning of this test results in a roughly 40x speedup. Change-Id: I5d6994f31d52e18e2e04aab0e34202e2c623e367 Reviewed-on: http://gerrit.cloudera.org:8080/3366 Reviewed-by: David Knupp <dknupp@cloudera.com> Tested-by: Internal Jenkins	2016-06-13 16:32:22 -07:00
Lars Volker	c69cd15a0a	IMPALA-3656: Hitting DCHECK/CHECK does not write minidumps When hitting a DCHECK/CHECK the daemons do not write minidumps. This is caused by glog's own stack unwinding mechanism, which catches SIGABRT and removes all other handlers before aborting. This change bumps the glog version to include a patch, which backports a change from glog, which only resets the SIGABRT handler, if it is the one installed by glog itself. `cda16b3443` Change-Id: I08e6b83af1b4ff1b8c916fe6c9052b88b760e188 Reviewed-on: http://gerrit.cloudera.org:8080/3286 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Lars Volker <lv@cloudera.com>	2016-06-11 05:31:32 -07:00
Alex Behm	19ff47091c	IMPALA-3530: Clean up test_ddl.py. Part 1. This is the first in a series of patches to clean up test_ddl.py Summary of changes: - Break up test_create() and corresponding .test files into: * test_create_database() * test_create_table() * test_create_table_like_table() * test_create_table_like_file() * test_create_table_as_select() - Merge test_nested() into the tests above - Move a test into test_hms_integration.py - Add a new test_ddl_base.py as base class for DDL tests. The plan is to split up test_ddl.py into several smaller .py files in subsequent patches. Testing: I tested test_ddl.py and test_hms_integration.py on exhaustive locally as well as in private builds on all filesystems. Change-Id: I5f4c044d39e165c2535961b8d0a765c8dbbd051c Reviewed-on: http://gerrit.cloudera.org:8080/3044 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2016-06-10 10:31:15 -07:00
Lars Volker	ca62ce65e9	IMPALA-3684, IMPALA-3693: Disable core files for breakpad tests The breakpad tests were writing core files when triggering minidump writes. This was actually not needed and interfered with test execution and artifact collection. Most notably processes would take a long time to terminate while writing core files (IMPALA-3684). The core files would also be wrongly collected by Jenkins (IMPALA-3693). This change adds code to stop test clusters reliably, making test_breakpad independent from calling setup-impala-cluster.py via os.system. It also disables core dumps for the duration of the test and re-enables them afterwards. Change-Id: If592339632aa662b59be09d911229566d5772321 Reviewed-on: http://gerrit.cloudera.org:8080/3339 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Silvius Rus <srus@cloudera.com> Tested-by: Lars Volker <lv@cloudera.com>	2016-06-09 17:31:00 -07:00
Tim Armstrong	c1093ed861	IMPALA-3669: test_scratch_disk fails on S3 Make the test deterministic by using max_block_mgr_memory instead of mem_limit, so that the non-deterministic scanner memory usage does not influence the spilling behaviour of the queries. Testing: Ran the test locally to confirm that it succeeded. Also manually computed the memory requirement. The data size to be sorted is ~220MB, so with a 64MB block manager limit per node, at least one node must spill. Change-Id: I9525a029ac020bb5b8bea210a741c9f9c5ec3c75 Reviewed-on: http://gerrit.cloudera.org:8080/3318 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Silvius Rus <srus@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2016-06-09 17:31:00 -07:00
Alex Behm	e57fd2d831	IMPALA-3491: Use unique_database fixture in test_local_fs.py Testing: Ran hdfs/core and localfs/core private builds. Change-Id: I0720458882ac3b1138deccf9af0ee57bf2eed7dc Reviewed-on: http://gerrit.cloudera.org:8080/3334 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2016-06-08 16:30:32 -07:00
Alex Behm	6c992f97f2	IMPALA-3491: Use unique_database fixture in test_last_ddl_time_update.py. Testing: Ran the test locally in a loop 10 times, and did an exhaustive private test run on HDFS. Change-Id: I97e96217301078d48584c51218345dc96f6853a6 Reviewed-on: http://gerrit.cloudera.org:8080/3104 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2016-06-07 22:30:08 -07:00
Alex Behm	025fd3bd7f	IMPALA-3646: Handle corrupt RLE literal or repeat counts of 0. Adds handling and testing for a specific Parquet data corruption scenario with plain dictionary encoded values. The problematic scenario is when the repeat or literal count of the RLE-encoded dictionary indexes is decoded as 0 - an invalid value. There are several other cases of data corruption that are not yet handled gracefully. This patch only handles one specific case. Change-Id: Ibf406c82cdded37966f09c81e4cc1446d2b60d63 Reviewed-on: http://gerrit.cloudera.org:8080/3299 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2016-06-07 17:29:59 -07:00
Alex Behm	95064359cc	IMPALA-3491: Use unique_database fixture in test_delimited_text.py. Testing: Ran the test locally 10 times in a loop on exhaustive. Change-Id: Idedd5f03984e41a4b3ebf271e50863e980c66cb6 Reviewed-on: http://gerrit.cloudera.org:8080/3096 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2016-06-07 09:34:30 -07:00
Alex Behm	a9f7cf51f4	IMPALA-3491: Use unique_database fixture in test_metadata_query_statements.py. Testing: Ran the test locally on exhaustive in a loop 10 times. Ran a private exhaustive build on hdfs. Change-Id: Ia0af1dc6534234508bd0fed03531f7fe8ff556aa Reviewed-on: http://gerrit.cloudera.org:8080/3103 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2016-06-07 09:34:30 -07:00
Tim Armstrong	d23e5505c8	IMPALA-3670: fix sorter buffer mgmt bugs Also make test_scratch_disk.py more deterministic, by using max_block_mgr_memory, which doesn't include scanner memory. The fixed test_scratch_disk.py exercises the other sorter bugs that occurs when scratch cannot be written. Testing: Added a test that does a sort with various memory limits and consumes the whole output of the sorter (we have many tests of sorts with limits but limited coverage of sorts without limits). Ran an exhaustive test run before posting for review. This added test reproduced one of the sorter bugs, where var-len blocks were not always attached to the output batch. The other test was reproduced by the test change in IMPALA-3669: test_scratch_disk fix. Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6 Reviewed-on: http://gerrit.cloudera.org:8080/3315 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2016-06-06 22:34:19 -07:00
Matthew Jacobs	a1b035a251	IMPALA-3600: Add missing admission control tests * -require_username (not strictly admission control related but it came up in the context of RM). * Coverage of failure cases: The handling of the full queue case wasn't being verified. This changes existing stress test to expect a specific message when the queue is full. * Requesting MAXINT memory, which previously led to an overflow in the pool-level mem tracker accounting. This does not yet address: * Changing pool cfg while running * Verify profile string for queued reason This is just a minimal incremental change to get additional coverage. Right now, many of the tests rely on some pre-defined configuration files which is cumbersome. In the future, we plan on refreshing the configuration story at which point we should also build more general test infrastructure for easily testing different configurations. Change-Id: I6682b15a5caac5824384c4b48a7b40afa2548954 Reviewed-on: http://gerrit.cloudera.org:8080/3272 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2016-06-06 18:34:13 -07:00
Michael Brown	067af1957c	IMPALA-3614: work around pytest bugs causing custom cluster test skips All versions of pytest contain various bugs regarding test marking (including skips) when tests are both: 1. class-level marked 2. inherited More info is available in IMPALA-3614 and IMPALA-2943, but the gist is that it's possible for some tests to be skipped when they shouldn't be. This is happening pretty badly with the custom cluster tests, because CustomClusterTestSuite has a class level skipif mark. The easiest workaround for now is to remove the pytest skipif mark in CustomClusterTestSuite and skip using explicit pytest.skip() in the setup_class() method. Some CustomClusterTestSuite children implemented their own setup_* methods, and I made some adjustments to them both to clean them up and implement proper parent method calling via super(). Testing: I ran the following combinations of all the custom cluster tests: DEBUG / HDFS / core RELEASE / HDFS / exhaustive DEBUG / LOCAL / core DEBUG / S3 / core Before, we'd get situations in which most of the tests were skipped. Consider the RELEASE/HDFS/exhaustive situation: custom_cluster/test_admission_controller.py ..... custom_cluster/test_alloc_fail.py ss custom_cluster/test_breakpad.py sssss custom_cluster/test_delegation.py sss custom_cluster/test_exchange_delays.py ss custom_cluster/test_hdfs_fd_caching.py s custom_cluster/test_hive_parquet_timestamp_conversion.py ss custom_cluster/test_insert_behaviour.py ss custom_cluster/test_legacy_joins_aggs.py s custom_cluster/test_parquet_max_page_header.py s custom_cluster/test_permanent_udfs.py sss custom_cluster/test_query_expiration.py sss custom_cluster/test_redaction.py ssss custom_cluster/test_s3a_access.py s custom_cluster/test_scratch_disk.py ssss custom_cluster/test_session_expiration.py s custom_cluster/test_spilling.py ssss authorization/test_authorization.py ss authorization/test_grant_revoke.py s Now, more tests run appropriately: custom_cluster/test_admission_controller.py ..... custom_cluster/test_alloc_fail.py ss custom_cluster/test_breakpad.py sssss custom_cluster/test_delegation.py ... custom_cluster/test_exchange_delays.py ss custom_cluster/test_hdfs_fd_caching.py . custom_cluster/test_hive_parquet_timestamp_conversion.py .. custom_cluster/test_insert_behaviour.py .. custom_cluster/test_kudu_not_available.py . custom_cluster/test_legacy_joins_aggs.py . custom_cluster/test_parquet_max_page_header.py . custom_cluster/test_permanent_udfs.py ... custom_cluster/test_query_expiration.py ... custom_cluster/test_redaction.py .... custom_cluster/test_s3a_access.py s custom_cluster/test_scratch_disk.py .... custom_cluster/test_session_expiration.py . custom_cluster/test_spilling.py .... authorization/test_authorization.py .. authorization/test_grant_revoke.py . Change-Id: Ie301b69718f8690322cc3b4130fb1c715344779c Reviewed-on: http://gerrit.cloudera.org:8080/3265 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Michael Brown <mikeb@cloudera.com>	2016-06-06 17:34:07 -07:00
Tim Armstrong	37ec25396f	IMPALA-3344: Simplify sorter and document/enforce invariants. Clarify relationships between classes, clean up the previous mess where every class was friends with the other so there's an actual distinction between public and private members. TupleIterator is now no longer tied to TupleSorter, just Run. Document and enforce invariants in many cases. Factor out some functions from large functions. Simplify and document iterator logic. Make management of buffers when iterating over output stream more explicitly correct: either use MarkNeedToReturn() or attach block to the batch as appropriate. The SortedRunMerger didn't handle resource transfer correctly, except if all the memory came from the batch's MemPool. This patch fixes the cases when resources are attached to the batches, but not the 'need_to_return' case. Document that SortedRunMerger requires 'deep_copy_input' to be true if batches can have the 'need_to_return' flag set. Also use the atomic block exchange operation when moving between blocks in unpinned runs to prevent pin failures at that point. I explicitly have avoided changing the hairy block management logic when allocating buffers for merging, that will need addressing in a follow-up patch. Add a SpilledRuns counter so that it's more explicit that spilling occurred. Testing: Added some tests for corner cases with empty and NULL strings. Fixed a test that previously failed with OOM but now succeeds. Performance: Benchmarking against old code initial revealed some regressions from changes in inlining. Force inlining the TupleComparator::operator() and iterator Next()/Prev() functions helped and performance seems similar or slightly better on the targeted orderby benchmarks. Change-Id: I9c619e81fd1b8ac50e257172c8bce101a112b52a Reviewed-on: http://gerrit.cloudera.org:8080/2826 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2016-06-02 21:33:08 -07:00
Thomas Tauber-Marshall	5231301084	IMPALA-1633: GetOperationStatus should set errorMessage and sqlState Currently, we never populate the errorMessage or sqlState fields of TGetOperationStatusResp when the GetOperationStatus HiveServer2 rpc is called. This patch checks if the query has an error status and if so sets errorMessage and sqlState. GetOperationStatus also now takes the QueryExecState lock since QueryExecState::query_state_ and QueryExecState::query_status_ are supposed to be protected by it. Additionally, this patch performs some cleanup and adds some documentation around our behavior for updating QueryExecState::query_state_/query_status_. This also addresses IMPALA-3298: TGetOperationStatusResp missing error message when data is expired Change-Id: Icb792f88286779fcf2ce409828de818bc4e80bed Reviewed-on: http://gerrit.cloudera.org:8080/3094 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Internal Jenkins	2016-06-01 19:32:39 -07:00
Alex Behm	32c40f9c5d	Remove redundant test in test_avro_schema_resolution.py Change-Id: I7123cd5e19d79122af3b4fef2c092442b7a098f1 Reviewed-on: http://gerrit.cloudera.org:8080/3095 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-05-31 23:32:11 -07:00
Tim Armstrong	8f79e9eb3c	Stress test should count failure to repartition as a memory limit exceeded error The error code for a repartitioning failure recently changed (because it is not strictly a mem limit error). This makes the corresponding change in the stress test. Change-Id: Ie67fabb8d4c0ffc65ac06f35e4a0a5c7a73baddd Reviewed-on: http://gerrit.cloudera.org:8080/3207 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-05-31 23:32:11 -07:00
Bharath Vissapragada	5ede8eb8a7	IMPALA-2336: Ignore trailing comments in non-interactive mode This patch trims trailing comments while parsing queries in non-interactive mode. Users usually have comments in the end of the script which should be ignored. Without this patch, the script fails with an exception since it expects a valid SQL. The behavior however remains the same with interactive mode. Change-Id: I723763ef7eedd03cf22058fadf06e9673a0d94d2 Reviewed-on: http://gerrit.cloudera.org:8080/3169 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2016-05-31 23:32:11 -07:00
Huaisi Xu	816735a032	IMPALA-3092: Set default value to NULL in AvroSchemaConverter This change ensures that Avro tables created without column definitions remain queryable if columns are added via ALTER TABLE. The bug was that when synthesizing an Avro schema from the column definitions we used to not add default values. Change-Id: Ib86e9ba1f4329b285ae14ee299365f7291a7410e Reviewed-on: http://gerrit.cloudera.org:8080/3219 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-05-31 23:32:11 -07:00
Alex Behm	77da3834ff	IMPALA-3369: Add ALTER TABLE SET COLUMN STATS statement. Adds a new command to manually set the table-level column stats. Syntax: ALTER TABLE [<db_name>.]<tbl_name> SET COLUMN STATS <col_name> ('statsKey'='val','statsKey2',='val2') Valid values for 'statsKey': numDVs, numNulls, avgSize, maxSize The 'val' portion needs to be a number appropriate for the given stats key (e.g., a long for numDVs, a float for avgSize). The special value of '-1' is allowed to reset stats to 'unknown'. The keys as well as the values are specified as string literals to be consistent with the existing DDL for setting TBLPROPERTIES/SERDEPROPERTIES, in particular, setting the 'numRows' table/partition property. Testing: Ran the tests locally on exhaustive. Did private runs on core/hdfs and core/S3. Change-Id: I45cd8aa7241ea962788ba9ca7d0bbfd864c4304f Reviewed-on: http://gerrit.cloudera.org:8080/3189 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-05-31 23:32:11 -07:00
Lars Volker	d16e83214a	IMPALA-3581: Change location of minidump folders to log_dir Currently the default minidump location is /tmp/impala-minidumps, which can be wiped on reboot on various distributions. This change moves the default location to FLAGS_log_dir/minidumps/$daemon. The additional trailing $daemon folder is kept to prevent name collisions in case of local test clusters and strangely configured installations. For local test clusters the minidumps will be written to $IMPALA_HOME/logs/cluster/minidumps/{catalogd,impalad,statestored}. Change-Id: Idecf5a314bfb8b0870e8aa4819c4fb39a107702f Reviewed-on: http://gerrit.cloudera.org:8080/3171 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2016-05-31 23:32:11 -07:00
Sailesh Mukil	6f1fe4ebe7	IMPALA-3577, IMPALA-3486: Partitions on multiple filesystems breaks with S3_SKIP_INSERT_STAGING The HdfsTableSink usualy creates a HDFS connection to the filesystem that the base table resides in. However, if we create a partition in a FS different than that of the base table and set S3_SKIP_INSERT_STAGING to "true", the table sink will try to write to a different filesystem with the wrong filesystem connector. This patch allows the table sink itself to work with different filesystems by getting rid of a single FS connector and getting a connector per partition. This also reenables the multiple_filesystems test and modifies it to use the unique_database fixture so that parallel runs on the same bucket do not clash and end up in failures. This patch also introduces a SECONDARY_FILESYSTEM environment variable which will be set by the test to allow S3, Isilon and the localFS to be used as the secondary filesystems. All jobs with HDFS as the default filesystem need to set the appropriate environment for S3 and Isilon, i.e. the following: - export AWS_SECERT_ACCESS_KEY - export AWS_ACCESS_KEY_ID - export SECONDARY_FILESYSTEM (to whatever filesystem needs to be tested) TODO: SECONDARY_FILESYSTEM and FILESYSTEM_PREFIX and NAMENODE have a lot of similarities. Need to clean them up in a following patch. Change-Id: Ib13b610eb9efb68c83894786cea862d7eae43aa7 Reviewed-on: http://gerrit.cloudera.org:8080/3146 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2016-05-31 23:32:11 -07:00
Michael Ho	3a4a77521e	IMPALA-3608: Updates Impala E2E test framework to allow multiple exception messages Some of our tests which are expected to fail due to low query memory limits can fail non-deterministically with different error messages. In addition, some tests may throw different error messages when running with the legacy join nodes. This change updates the test infrastructure to allow multiple exception messages to be specified by using adding "ANY_OF" to the "CATCH" subsection. Change-Id: Ie6d81fd3ae601f565b575edfeefff7c5a6c07974 Reviewed-on: http://gerrit.cloudera.org:8080/3205 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2016-05-31 23:32:10 -07:00
Michael Brown	22669e23be	IMPALA-3501: ee tests: detect build type and support different timeouts based on the same Impala compiled with the address sanitizer, or compiled with code coverage, runs through code paths much slower. This can cause end-to-end tests that pass on a non-ASAN or non-code coverage build to fail. Some examples include IMPALA-2721, IMPALA-2973, and IMPALA-3501. These classes of failures tend always to involve some time-sensitive condition that fails to succeed under such "slow builds". The works-around in the past have been to simply increase the timeout. The problem with this approach is that it relaxes conditions for tests on builds that see the field--i.e., release builds--for builds that never will--i.e., ASAN and code coverage. This patch fixes that problem by allowing test authors to set timeout values based on a specific build type. The author may choose timeouts with a default value, and different timeouts for either or both so-called "slow builds": ASAN and code coverage. We detect the so-called "specific build type" by inspecting the binary expected to be at the path under test. This removes the need to make alterations to Impala itself. The inspection done is to read the DWARF information in the binary, specifically the first compile unit's DW_AT_producer and DW_AT_name DIE attributes. We employ a heuristic based on these attributes' values to guess the build type. If we can't determine the build type, we will assume it's a debug build. More information on this is in IMPALA-3501. A quick summary of the changes follows: 1. Move some of the logic in tests.common.skip to tests.common.environ and rework some skip marks to be more precise. 2. Add Pyelftools for convenient deserialization of DWARF 3. Our Pyelftools usage requires collections.OrderedDict, which isn't in python2.6; also add Monkeypatch to handle this. 4. Add ImpalaBuild and specific_build_type_timeout, the core of the new functionality 5. Fix the statestore tests that only fail under code coverage (the basis for IMPALA-3501) Testing: The tests that were previously, reliably failing under code coverage now pass. I also ran perfunctory tests of debug, release, and ASAN builds to ensure our detection of build type is working. This patch will not turn the code coverage builds green; there are other tests that fail, and fixing all of them here is out of the scope of this patch. Change-Id: I2b675c04c54e36d404fd9e5a6cf085fb8d6d0e47 Reviewed-on: http://gerrit.cloudera.org:8080/3156 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Internal Jenkins	2016-05-25 19:41:45 -07:00
Michael Brown	5112e65be2	Revert "Revert "Add Kudu test helpers"" This reverts commit f8dd5413b65d30646c3745dfc738ed812d50a51f and effectively re-adds commit 9248dcb70478b8f93f022893776a0960f45fdc28. The difference between this patch and its original is that I fixed the changes introduced in infra/python/bootstrap_virtualenv.py to be python2.4-compatible: - removed the use of str.format(), preferring a str.join() pattern - removed the call of the exit() builtin to prefer sys.exit() The only testing I did for this patch was to ensure CDH Impala-packaging-on-demand works. Change-Id: I02ed97473868eacf45b25abe89b41e6fa2fce325 Reviewed-on: http://gerrit.cloudera.org:8080/3160 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Internal Jenkins	2016-05-24 16:40:59 -07:00
Matthew Jacobs	f413e236a8	IMPALA-3579: Strict handling of numeric overflow in text parsing Adds a query option 'strict_mode' which treats integer and floating pt overflows as parse errors. In the past, overflows were ignored and the max value was returned. When this query option is set, overflowing values are treated as if they were completely invalid data, i.e. NULL is returned. When abort_on_error is enabled, this means the query is aborted. Notes: * DECIMAL overflow/underflow is already treated as an error. * The handling in text-converter treats underflows the same as overflows, so they would result in the same behavior. However, floating point parsing never returns an underflow today. * We may also want to handle numeric values that are truncated when parsing to integer types, e.g. 10.5 -> 10. Change-Id: I7409c31ec0cb6fe0b2d9842b9f58fe1670914836 Reviewed-on: http://gerrit.cloudera.org:8080/3150 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-05-23 08:40:20 -07:00
Bharath Vissapragada	49610e2cfa	IMPALA-3314/IMPALA-3513: Fix querying tables/partitions altered to Avro format Bug: Impalads crash if we query an Avro table with stale metadata Cause: This happens because avroSchema_ is not set in HdfsTable, which is not propagated to the avro scanner and it doesn't have appropriate checks to make sure the schema is non-null. The patch fixes the following. 1. Avro scanner should gracefully handle the case where the avro schema is not set. Appropriate null checks and a meaning error message have been added. 2. This is a special case with multi-fileformat partitioned tables. avroSchema_ should be set in HdfsTable even if any subset of the partitions are backed by avro. Without this patch, we only set it if the base table file format is Avro. Change-Id: I09262d3a7b85a2263c721f3beafd0cab2a1bdf4b Reviewed-on: http://gerrit.cloudera.org:8080/3136 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Internal Jenkins	2016-05-23 08:40:20 -07:00
Alex Behm	72e4c41400	IMPALA-3491: Use unique_database fixture in test_data_errors.py. Testing: Ran the test locally 10 times in a loop on exhaustive. Change-Id: I8337daf499b90819a253b883fedaa55bd6b6630e Reviewed-on: http://gerrit.cloudera.org:8080/3087 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-05-23 08:40:19 -07:00
Alex Behm	ea45de84f4	IMPALA-3491: Merge test_hbase_metadata.py into compute_stats.py. Use unique db fixture. - Moves the test into compute_stats.py - Changes some test classes in compute_stats.py to inherit from ImpalaTestSuite and not from TestComputeStats because that will cause all tests in TestComputeStats to be run in the subclasses again (redundantly). - Clean up and add more coverage to testing incremental stats on HBase which was probably broken in this commit 6b32ff06. - Fixes a side effect that the original test had for testing incremental stats on HBase. It computes stats on a functional table which was not supposed to have stats. Testing: Ran compute_stats.py on exhaustive locally in a loop 10 times. Did a private hdfs/core run. Change-Id: Iee8b84e30948c3c98166e08cae2666574777730c Reviewed-on: http://gerrit.cloudera.org:8080/3074 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-05-23 08:40:19 -07:00
Thomas Tauber-Marshall	51869eac56	IMPALA-3542: do_as_user empty check missing Fixes a typo in ImpalaServer::AuthorizeProxyUser where we check that the 'user' parameters isn't empty twice instead of also checking the 'do_as_user' parameter. Change-Id: I8e3962f6f397804e37d4f2c667e97b55bd3ca2bf Reviewed-on: http://gerrit.cloudera.org:8080/3120 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-05-23 08:40:19 -07:00
Shiraz Ali	08eff2bc09	Revert "Add Kudu test helpers" This reverts commit 9248dcb70478b8f93f022893776a0960f45fdc28.	2016-05-20 08:46:00 -07:00
casey	36b524f68c	Add Kudu test helpers Changes: 1) Add the python Kudu module to the virtualenv. Building the virtualenv is much slower now because Cython and numpy are required. To help with the rebuild time --no-cache was removed. That option was added to help when using the dev version of impyla, the version number would be the same but the module contents were different and the cache used the old module contents. 2) Add some py.test fixtures to help create Kudu and Impala connections. Change-Id: I8e5e22b38d5bd09a36238e66a69aa42d1a941de7 Reviewed-on: http://gerrit.cloudera.org:8080/2855 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2016-05-19 19:45:48 -07:00
Matthew Jacobs	f067929f3a	IMPALA-3535: Ignore invalid per-pool default query options In 2.5 we added the ability to set per-pool default query options. A string of key-value pairs can be specified with a pool configuration. However, if any options fail to parse, then all the options are ignored. We want that behavior (and returning an error) when parsing the process-wide default query options on startup and when parsing the options sent from a client (e.g. in beeswax server) because an error can be returned immediately for the triggering action at that time (i.e. starting the impalad or submitting a query with the options set). This behavior is bad for the pool default query options because (a) the configuration is set by the administrator and there's nothing we can do until a query is submitted and (b) one invalid option shouldn't mean that other valid options aren't set. Change-Id: If04733b775963091b0314c65286df126fd812358 Reviewed-on: http://gerrit.cloudera.org:8080/3056 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-05-17 10:09:05 -07:00
Sailesh Mukil	4c9c74dd33	IMPALA-3532: S3: test_truncate_cleans_hdfs_files fails because we skip INSERT staging On the introduction of IMPALA-3452, we defaulted to skipping the INSERT staging for S3. test_truncate_cleans_hdfs_files assumes that it will always see the _impala_insert_staging folder but we will not see that on S3 runs. This patch deletes the staging folder if it exists and continues the test without taking into account the staging folder. Change-Id: I3580f03690e29fe99f441b26bc9baa4c0964d79c Reviewed-on: http://gerrit.cloudera.org:8080/3049 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2016-05-14 01:30:01 -07:00
Casey Ching	e61b5bc119	IMPALA-3511: Fix race setting up TestKuduOperations A couple of tests could both attempt to create/destroy the same database if they were running in parallel. Several other related tests were marked as requiring serial execution, these needed to be marked for serial execution as well. Change-Id: If0573a755cd371363c2e43c001d5c1ba499793c6 Reviewed-on: http://gerrit.cloudera.org:8080/3063 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2016-05-14 01:30:01 -07:00
Skye Wanderman-Milne	9174dee395	IMPALA-1578: fix text scanner to handle "\r\n" delimiters split across blocks This patch modifies HdfsTextScanner to specifically check for split "\r\n" delimiters when the scan range ends with '\r'. If there does turn out to be a split delimiter, the next tuple is considered the responsibility of the next scan range's scanner, as if the delimiter appeared fully in the second scan range. This should not affect the overall performance characteristics of the text scanner since it already must do a remote read past the end of the scan range to read the last tuple. Change-Id: Id42b441674bb21517ad2788b99942a4b5dc55420 Reviewed-on: http://gerrit.cloudera.org:8080/2803 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 23:06:36 -07:00
Sailesh Mukil	7e0cbaf1a0	IMPALA-3459: Add test for DROP TABLE PURGE for S3 It was previously thought that PURGE had no effect on S3. However, the Hive Metastore actually created a .Trash directory and copied the files there when a DROP TABLE was conducted from Impala. This patch just enables the existing PURGE tests for S3. There were a few reasons this wasn't working before. The paths given to the S3 client (boto3) should not have a leading "/". This has been fixed as it doesn't make a difference for HDFS if that exists or not. Also, PURGE is a pure delete whereas a regular DROP is a copy. A copy is consistent whereas a delete is only eventually consistent, so when we PURGE a table or partition, the files will still be visible for sometime after the query has completed. The tests have been modified to accomodate for this case as well. Change-Id: I52d2451e090b00ae2fd9a879c28defa6c940047c Reviewed-on: http://gerrit.cloudera.org:8080/3036 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 23:06:36 -07:00
Lars Volker	df8bf3a965	IMPALA-3490: Add flag to reduce minidump size IMPALA-2686 added the breakpad library to all impala daemons, thus enabling them to write minidump files. This change introduces a flag 'minidump_size_limit_hint_kb', which causes breakpad to reduce the amount of thread stack memory it includes in a minidump, aiming to reduce the minidump size during crashes with a lot of threads. Once a minidump is expected to exceed the configured value, breakpad will include the full stack memory for the first 20 threads, and afterwards capture only 2KB of stack memory for each additional thread. Change-Id: I2f3aa0df51be9f0bf0755fb288702911cdb88052 Reviewed-on: http://gerrit.cloudera.org:8080/2990 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:18:04 -07:00
Alex Behm	1c19c232f3	IMPALA-3491: Use unique_database fixture in test_views_compatibility. Testing: Ran the test locally. It was already possible to run the test in parallel before. Change-Id: I68a1349276c90a42c238bed40a1c7c221199a67a Reviewed-on: http://gerrit.cloudera.org:8080/3009 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:18:02 -07:00
Alex Behm	c0ee93bbbe	IMPALA-3491: Use unique_database fixture in test_recover_partitions.py. Testing: I ran the test 10 times in a loop locally and ran a private core/hdfs run. Change-Id: I5be5fa5d20bc6ed5b7830e0ce90201431d6aa008 Reviewed-on: http://gerrit.cloudera.org:8080/3003 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:18:00 -07:00
Alex Behm	616eb2fcce	IMPALA-3491: Use unique_database fixture in test_partition_metadata.py Also changes the test to use beeline instead of Hive -e for the portions executed in Hive because beeline is significantly faster. Testing: Tested the changes locally by running them in a loop 10 times. Also did a private core/hdfs run. Change-Id: I70d87941fbfc30f41e1c6fcfee8d8c7f16b88831 Reviewed-on: http://gerrit.cloudera.org:8080/2962 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:18:00 -07:00
Alex Behm	12097a0707	IMPALA-3491: Use unique_database fixture in test_hidden_files.py. Testing: Tested the changes locally by running them in a loop 10 times. Also did a private core/hdfs run. Change-Id: I37e1528c02e598f3fb2d673b6559d55a34bf79b4 Reviewed-on: http://gerrit.cloudera.org:8080/3002 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:59 -07:00
Alex Behm	96e18f9e62	IMPALA-3491: Use unique_database fixture in test_stale_metadata.py Testing: I ran the test 10 times in a loop locally and ran a private core/hdfs run. Change-Id: Ibd058853e6b48671838e5b51611b6c34a7a8d39d Reviewed-on: http://gerrit.cloudera.org:8080/2982 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:59 -07:00
Sailesh Mukil	0f1dd55c79	IMPALA-3488 (follow up): test_ddl.py failure on LocalFS run There was another test which used the hdfs_client and which was not skipped for localFS. It should never have run on localFS but they did not fail earlier for the same reasons as mentioned in the previous patch and in the JIRA. Marking as SkipIfLocal. Change-Id: I3436e80ccd380ecc5f5d28053b3563db2319f9e9 Reviewed-on: http://gerrit.cloudera.org:8080/2991 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:58 -07:00
Alex Behm	bff194ce17	IMPALA-3491: Use unique_database fixture in test_col_stats.py. The patch also addresses a TODO asking for test_col_stats.py to be merged into test_compute_stats.py Testing: I ran the test by itself in a loop 10 times, and the whole test_compute_stats.py locally. Also did a private core/hdfs run. Change-Id: I88aa77464a95993c018e19a52eeb496d7c3eef08 Reviewed-on: http://gerrit.cloudera.org:8080/2963 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:58 -07:00
Sailesh Mukil	41e31439d3	IMPALA-3488: test_ddl.py failure on LocalFS run Our test_ddl.py always had a bug where in the _cleanup() function, we used the hdfs_client on local FS runs. It always ended up passing because we caught generic exceptions in hdfs_client.delete_file_dir() while checking if a file existed which always caused the test to pass. With the introduction of an hdfs_client.exists() function in IMPALA-1878 which catches only the right FileNotFound exception, this bug was exposed causing our local FS runs to fail. This patch returns from the _cleanup() function if it's a local FS run because the directories of the tables it cleans up are not used in these runs. Change-Id: Ie0c9eec31a90e8f66102d18d900c613bd1306968 Reviewed-on: http://gerrit.cloudera.org:8080/2980 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:57 -07:00
Dan Hecht	a0d4249652	IMPALA-3337: fix "Cancelled" warnings when LIMIT clause is specified The cancelled status is propagated in scanner threads to cause them to shut down once the limit has been satisified, but depending on the code path and when abort_on_error=false, this internal status would sometimes incorrectly end up in the error log. Fix this by factoring out the abort_on_error handling code so that it's handled more consistently across scanners. Parquet, RC, and Avro all suffered from this bug. Testing: exhastive Change-Id: I4a91a22608e346ca21a23ea66c855eae54bbced6 Reviewed-on: http://gerrit.cloudera.org:8080/2964 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:57 -07:00
Tim Armstrong	34c95c9590	IMPALA-2345,2991: test coverage for spilling and sorts Add missing coverage for sorting by CHAR and VARCHAR. Add more coverage for spilling sorts. Fix spilling tests: ensure that they actually reliably spill (many of them had memory limits high enough that they could run entirely in memory). I ran this in a loop for a while to flush out flaky tests. The tests should be fairly predictable given that they're not run concurrently with other tests and we allocate enough block manager memory so that each operator can obtain its reservation. Change-Id: Ia2d2627a2c327dcdf269ea3216385b1af9dfa305 Reviewed-on: http://gerrit.cloudera.org:8080/2877 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:55 -07:00
Henry Robinson	a805e100b2	IMPALA-3397: Source query files from shell. This patch allows you to write SOURCE <file> or SRC <file>, and have the shell read the file and execute all the queries in it. Change-Id: Ib05df3e755cd12e9e9562de6b353857940eace03 Reviewed-on: http://gerrit.cloudera.org:8080/2663 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:54 -07:00

1 2 3 4 5 ...

927 Commits