Before this change, a single test database was created for the entire suite,
and each test was marked to run serially. With the addition of a test fixture
in tests/conftest.py to create a unique database per each individual method,
it's possible now to run the tests in parallel. (The tables required by
individual tests are created via local test fixtures.)
As such, any methods which had been responsible for setting up the test
database were removed. Pytest markers for running tests serially were also
removed, except in cases where interactions from running concurrency would
affect other tests.
Additional minor changes were made to improve PEP-8 compliance.
The non-serial tests were run in a loop ten times to confirm that there weren't
any unexpected failures.
Review: https://gerrit.cloudera.org/#/c/3301/
Change-Id: Icdcb04a99c0907fc1ba56baa2497fafb33b0e34e
Reviewed-on: http://gerrit.cloudera.org:8080/3301
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Internal Jenkins
Even though this is just a single test, this change introduces the
unique_database test fixture that was initially created to help with
concurrent tests. It's still worth to do this here because we want to
update all tests to use best practices.
That said, there was still a performance gain to be had here. It
turns out the initial code called the cleanup_db() method from the
base ImpalaTestSuite class, which in turn sets the 'sync_ddl' query
option to true. Not doing this at the beginning of this test results
in a roughly 40x speedup.
Change-Id: I5d6994f31d52e18e2e04aab0e34202e2c623e367
Reviewed-on: http://gerrit.cloudera.org:8080/3366
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Internal Jenkins
When hitting a DCHECK/CHECK the daemons do not write minidumps. This is
caused by glog's own stack unwinding mechanism, which catches SIGABRT
and removes all other handlers before aborting.
This change bumps the glog version to include a patch, which backports a
change from glog, which only resets the SIGABRT handler, if it is the
one installed by glog itself.
cda16b3443
Change-Id: I08e6b83af1b4ff1b8c916fe6c9052b88b760e188
Reviewed-on: http://gerrit.cloudera.org:8080/3286
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
This is the first in a series of patches to clean up test_ddl.py
Summary of changes:
- Break up test_create() and corresponding .test files into:
* test_create_database()
* test_create_table()
* test_create_table_like_table()
* test_create_table_like_file()
* test_create_table_as_select()
- Merge test_nested() into the tests above
- Move a test into test_hms_integration.py
- Add a new test_ddl_base.py as base class for DDL tests.
The plan is to split up test_ddl.py into several smaller
.py files in subsequent patches.
Testing: I tested test_ddl.py and test_hms_integration.py on
exhaustive locally as well as in private builds on all filesystems.
Change-Id: I5f4c044d39e165c2535961b8d0a765c8dbbd051c
Reviewed-on: http://gerrit.cloudera.org:8080/3044
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
The breakpad tests were writing core files when triggering minidump
writes. This was actually not needed and interfered with test execution
and artifact collection. Most notably processes would take a long time
to terminate while writing core files (IMPALA-3684). The core files
would also be wrongly collected by Jenkins (IMPALA-3693).
This change adds code to stop test clusters reliably, making
test_breakpad independent from calling setup-impala-cluster.py via
os.system. It also disables core dumps for the duration of the test and
re-enables them afterwards.
Change-Id: If592339632aa662b59be09d911229566d5772321
Reviewed-on: http://gerrit.cloudera.org:8080/3339
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Silvius Rus <srus@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
Make the test deterministic by using max_block_mgr_memory instead of
mem_limit, so that the non-deterministic scanner memory usage does not
influence the spilling behaviour of the queries.
Testing:
Ran the test locally to confirm that it succeeded. Also manually
computed the memory requirement. The data size to be sorted is ~220MB,
so with a 64MB block manager limit per node, at least one node must
spill.
Change-Id: I9525a029ac020bb5b8bea210a741c9f9c5ec3c75
Reviewed-on: http://gerrit.cloudera.org:8080/3318
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Silvius Rus <srus@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
Adds handling and testing for a specific Parquet data corruption
scenario with plain dictionary encoded values.
The problematic scenario is when the repeat or literal count of
the RLE-encoded dictionary indexes is decoded as 0 - an invalid value.
There are several other cases of data corruption that are not yet
handled gracefully. This patch only handles one specific case.
Change-Id: Ibf406c82cdded37966f09c81e4cc1446d2b60d63
Reviewed-on: http://gerrit.cloudera.org:8080/3299
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
Also make test_scratch_disk.py more deterministic, by using
max_block_mgr_memory, which doesn't include scanner memory.
The fixed test_scratch_disk.py exercises the other sorter bugs
that occurs when scratch cannot be written.
Testing:
Added a test that does a sort with various memory limits and consumes
the whole output of the sorter (we have many tests of sorts with limits
but limited coverage of sorts without limits). Ran an exhaustive test
run before posting for review.
This added test reproduced one of the sorter bugs, where var-len blocks
were not always attached to the output batch. The other test was
reproduced by the test change in IMPALA-3669: test_scratch_disk fix.
Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
Reviewed-on: http://gerrit.cloudera.org:8080/3315
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
* -require_username (not strictly admission control related
but it came up in the context of RM).
* Coverage of failure cases: The handling of the full queue
case wasn't being verified. This changes existing stress
test to expect a specific message when the queue is full.
* Requesting MAXINT memory, which previously led to an
overflow in the pool-level mem tracker accounting.
This does not yet address:
* Changing pool cfg while running
* Verify profile string for queued reason
This is just a minimal incremental change to get additional
coverage. Right now, many of the tests rely on some
pre-defined configuration files which is cumbersome. In the
future, we plan on refreshing the configuration story at
which point we should also build more general test
infrastructure for easily testing different configurations.
Change-Id: I6682b15a5caac5824384c4b48a7b40afa2548954
Reviewed-on: http://gerrit.cloudera.org:8080/3272
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
All versions of pytest contain various bugs regarding test marking
(including skips) when tests are both:
1. class-level marked
2. inherited
More info is available in IMPALA-3614 and IMPALA-2943, but the gist is
that it's possible for some tests to be skipped when they shouldn't be.
This is happening pretty badly with the custom cluster tests, because
CustomClusterTestSuite has a class level skipif mark.
The easiest workaround for now is to remove the pytest skipif mark in
CustomClusterTestSuite and skip using explicit pytest.skip() in the
setup_class() method. Some CustomClusterTestSuite children implemented
their own setup_* methods, and I made some adjustments to them both to
clean them up and implement proper parent method calling via super().
Testing:
I ran the following combinations of all the custom cluster tests:
DEBUG / HDFS / core
RELEASE / HDFS / exhaustive
DEBUG / LOCAL / core
DEBUG / S3 / core
Before, we'd get situations in which most of the tests were skipped.
Consider the RELEASE/HDFS/exhaustive situation:
custom_cluster/test_admission_controller.py .....
custom_cluster/test_alloc_fail.py ss
custom_cluster/test_breakpad.py sssss
custom_cluster/test_delegation.py sss
custom_cluster/test_exchange_delays.py ss
custom_cluster/test_hdfs_fd_caching.py s
custom_cluster/test_hive_parquet_timestamp_conversion.py ss
custom_cluster/test_insert_behaviour.py ss
custom_cluster/test_legacy_joins_aggs.py s
custom_cluster/test_parquet_max_page_header.py s
custom_cluster/test_permanent_udfs.py sss
custom_cluster/test_query_expiration.py sss
custom_cluster/test_redaction.py ssss
custom_cluster/test_s3a_access.py s
custom_cluster/test_scratch_disk.py ssss
custom_cluster/test_session_expiration.py s
custom_cluster/test_spilling.py ssss
authorization/test_authorization.py ss
authorization/test_grant_revoke.py s
Now, more tests run appropriately:
custom_cluster/test_admission_controller.py .....
custom_cluster/test_alloc_fail.py ss
custom_cluster/test_breakpad.py sssss
custom_cluster/test_delegation.py ...
custom_cluster/test_exchange_delays.py ss
custom_cluster/test_hdfs_fd_caching.py .
custom_cluster/test_hive_parquet_timestamp_conversion.py ..
custom_cluster/test_insert_behaviour.py ..
custom_cluster/test_kudu_not_available.py .
custom_cluster/test_legacy_joins_aggs.py .
custom_cluster/test_parquet_max_page_header.py .
custom_cluster/test_permanent_udfs.py ...
custom_cluster/test_query_expiration.py ...
custom_cluster/test_redaction.py ....
custom_cluster/test_s3a_access.py s
custom_cluster/test_scratch_disk.py ....
custom_cluster/test_session_expiration.py .
custom_cluster/test_spilling.py ....
authorization/test_authorization.py ..
authorization/test_grant_revoke.py .
Change-Id: Ie301b69718f8690322cc3b4130fb1c715344779c
Reviewed-on: http://gerrit.cloudera.org:8080/3265
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Michael Brown <mikeb@cloudera.com>
Clarify relationships between classes, clean up the previous mess
where every class was friends with the other so there's an actual
distinction between public and private members. TupleIterator
is now no longer tied to TupleSorter, just Run.
Document and enforce invariants in many cases.
Factor out some functions from large functions.
Simplify and document iterator logic.
Make management of buffers when iterating over output stream more
explicitly correct: either use MarkNeedToReturn() or attach block
to the batch as appropriate. The SortedRunMerger didn't handle
resource transfer correctly, except if all the memory came from
the batch's MemPool. This patch fixes the cases when resources
are attached to the batches, but not the 'need_to_return' case.
Document that SortedRunMerger requires 'deep_copy_input' to be true
if batches can have the 'need_to_return' flag set.
Also use the atomic block exchange operation when moving between
blocks in unpinned runs to prevent pin failures at that point.
I explicitly have avoided changing the hairy block management logic
when allocating buffers for merging, that will need addressing in
a follow-up patch.
Add a SpilledRuns counter so that it's more explicit that spilling
occurred.
Testing:
Added some tests for corner cases with empty and NULL strings.
Fixed a test that previously failed with OOM but now succeeds.
Performance:
Benchmarking against old code initial revealed some regressions from
changes in inlining. Force inlining the TupleComparator::operator() and
iterator Next()/Prev() functions helped and performance seems similar or
slightly better on the targeted orderby benchmarks.
Change-Id: I9c619e81fd1b8ac50e257172c8bce101a112b52a
Reviewed-on: http://gerrit.cloudera.org:8080/2826
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
Currently, we never populate the errorMessage or sqlState
fields of TGetOperationStatusResp when the GetOperationStatus
HiveServer2 rpc is called. This patch checks if the query has
an error status and if so sets errorMessage and sqlState.
GetOperationStatus also now takes the QueryExecState lock since
QueryExecState::query_state_ and QueryExecState::query_status_
are supposed to be protected by it.
Additionally, this patch performs some cleanup and adds some
documentation around our behavior for updating
QueryExecState::query_state_/query_status_.
This also addresses IMPALA-3298: TGetOperationStatusResp missing
error message when data is expired
Change-Id: Icb792f88286779fcf2ce409828de818bc4e80bed
Reviewed-on: http://gerrit.cloudera.org:8080/3094
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Internal Jenkins
The error code for a repartitioning failure recently changed (because it
is not strictly a mem limit error). This makes the corresponding change
in the stress test.
Change-Id: Ie67fabb8d4c0ffc65ac06f35e4a0a5c7a73baddd
Reviewed-on: http://gerrit.cloudera.org:8080/3207
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This patch trims trailing comments while parsing queries in
non-interactive mode. Users usually have comments in the end
of the script which should be ignored. Without this patch,
the script fails with an exception since it expects a valid
SQL. The behavior however remains the same with interactive
mode.
Change-Id: I723763ef7eedd03cf22058fadf06e9673a0d94d2
Reviewed-on: http://gerrit.cloudera.org:8080/3169
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This change ensures that Avro tables created without column definitions
remain queryable if columns are added via ALTER TABLE. The bug was that
when synthesizing an Avro schema from the column definitions we used to
not add default values.
Change-Id: Ib86e9ba1f4329b285ae14ee299365f7291a7410e
Reviewed-on: http://gerrit.cloudera.org:8080/3219
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Adds a new command to manually set the table-level column stats.
Syntax:
ALTER TABLE [<db_name>.]<tbl_name> SET COLUMN STATS <col_name>
('statsKey'='val','statsKey2',='val2')
Valid values for 'statsKey': numDVs, numNulls, avgSize, maxSize
The 'val' portion needs to be a number appropriate for the given stats
key (e.g., a long for numDVs, a float for avgSize).
The special value of '-1' is allowed to reset stats to 'unknown'.
The keys as well as the values are specified as string literals to be
consistent with the existing DDL for setting TBLPROPERTIES/SERDEPROPERTIES,
in particular, setting the 'numRows' table/partition property.
Testing: Ran the tests locally on exhaustive. Did private runs
on core/hdfs and core/S3.
Change-Id: I45cd8aa7241ea962788ba9ca7d0bbfd864c4304f
Reviewed-on: http://gerrit.cloudera.org:8080/3189
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Currently the default minidump location is /tmp/impala-minidumps, which can be wiped on
reboot on various distributions. This change moves the default location to
FLAGS_log_dir/minidumps/$daemon. The additional trailing $daemon folder is kept to prevent
name collisions in case of local test clusters and strangely configured installations.
For local test clusters the minidumps will be written to
$IMPALA_HOME/logs/cluster/minidumps/{catalogd,impalad,statestored}.
Change-Id: Idecf5a314bfb8b0870e8aa4819c4fb39a107702f
Reviewed-on: http://gerrit.cloudera.org:8080/3171
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
The HdfsTableSink usualy creates a HDFS connection to the filesystem
that the base table resides in. However, if we create a partition in
a FS different than that of the base table and set
S3_SKIP_INSERT_STAGING to "true", the table sink will try to write to
a different filesystem with the wrong filesystem connector.
This patch allows the table sink itself to work with different
filesystems by getting rid of a single FS connector and getting a
connector per partition.
This also reenables the multiple_filesystems test and modifies it to
use the unique_database fixture so that parallel runs on the same
bucket do not clash and end up in failures.
This patch also introduces a SECONDARY_FILESYSTEM environment variable
which will be set by the test to allow S3, Isilon and the localFS to
be used as the secondary filesystems.
All jobs with HDFS as the default filesystem need to set the
appropriate environment for S3 and Isilon, i.e. the following:
- export AWS_SECERT_ACCESS_KEY
- export AWS_ACCESS_KEY_ID
- export SECONDARY_FILESYSTEM (to whatever filesystem needs to be
tested)
TODO: SECONDARY_FILESYSTEM and FILESYSTEM_PREFIX and NAMENODE have a
lot of similarities. Need to clean them up in a following patch.
Change-Id: Ib13b610eb9efb68c83894786cea862d7eae43aa7
Reviewed-on: http://gerrit.cloudera.org:8080/3146
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
Some of our tests which are expected to fail due to low
query memory limits can fail non-deterministically with
different error messages. In addition, some tests may
throw different error messages when running with the legacy
join nodes. This change updates the test infrastructure to
allow multiple exception messages to be specified by using
adding "ANY_OF" to the "CATCH" subsection.
Change-Id: Ie6d81fd3ae601f565b575edfeefff7c5a6c07974
Reviewed-on: http://gerrit.cloudera.org:8080/3205
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
Impala compiled with the address sanitizer, or compiled with code
coverage, runs through code paths much slower. This can cause end-to-end
tests that pass on a non-ASAN or non-code coverage build to fail. Some
examples include IMPALA-2721, IMPALA-2973, and IMPALA-3501. These
classes of failures tend always to involve some time-sensitive condition
that fails to succeed under such "slow builds".
The works-around in the past have been to simply increase the timeout.
The problem with this approach is that it relaxes conditions for tests
on builds that see the field--i.e., release builds--for builds that
never will--i.e., ASAN and code coverage.
This patch fixes that problem by allowing test authors to set timeout
values based on a *specific* build type. The author may choose timeouts
with a default value, and different timeouts for either or both
so-called "slow builds": ASAN and code coverage.
We detect the so-called "specific build type" by inspecting the binary
expected to be at the path under test. This removes the need to make
alterations to Impala itself. The inspection done is to read the DWARF
information in the binary, specifically the first compile unit's
DW_AT_producer and DW_AT_name DIE attributes. We employ a heuristic
based on these attributes' values to guess the build type. If we can't
determine the build type, we will assume it's a debug build. More
information on this is in IMPALA-3501.
A quick summary of the changes follows:
1. Move some of the logic in tests.common.skip to tests.common.environ
and rework some skip marks to be more precise.
2. Add Pyelftools for convenient deserialization of DWARF
3. Our Pyelftools usage requires collections.OrderedDict, which isn't in
python2.6; also add Monkeypatch to handle this.
4. Add ImpalaBuild and specific_build_type_timeout, the core of the new
functionality
5. Fix the statestore tests that only fail under code coverage (the
basis for IMPALA-3501)
Testing:
The tests that were previously, reliably failing under code coverage now
pass. I also ran perfunctory tests of debug, release, and ASAN builds to
ensure our detection of build type is working. This patch will *not*
turn the code coverage builds green; there are other tests that fail,
and fixing all of them here is out of the scope of this patch.
Change-Id: I2b675c04c54e36d404fd9e5a6cf085fb8d6d0e47
Reviewed-on: http://gerrit.cloudera.org:8080/3156
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
This reverts commit f8dd5413b65d30646c3745dfc738ed812d50a51f and
effectively re-adds commit 9248dcb70478b8f93f022893776a0960f45fdc28. The
difference between this patch and its original is that I fixed the
changes introduced in infra/python/bootstrap_virtualenv.py to be
python2.4-compatible:
- removed the use of str.format(), preferring a str.join() pattern
- removed the call of the exit() builtin to prefer sys.exit()
The only testing I did for this patch was to ensure
CDH Impala-packaging-on-demand works.
Change-Id: I02ed97473868eacf45b25abe89b41e6fa2fce325
Reviewed-on: http://gerrit.cloudera.org:8080/3160
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
Adds a query option 'strict_mode' which treats integer and
floating pt overflows as parse errors. In the past,
overflows were ignored and the max value was returned. When
this query option is set, overflowing values are treated as if
they were completely invalid data, i.e. NULL is returned.
When abort_on_error is enabled, this means the query is
aborted.
Notes:
* DECIMAL overflow/underflow is already treated as an error.
* The handling in text-converter treats underflows the same
as overflows, so they would result in the same behavior.
However, floating point parsing never returns an underflow
today.
* We may also want to handle numeric values that are truncated
when parsing to integer types, e.g. 10.5 -> 10.
Change-Id: I7409c31ec0cb6fe0b2d9842b9f58fe1670914836
Reviewed-on: http://gerrit.cloudera.org:8080/3150
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
Bug: Impalads crash if we query an Avro table with stale metadata
Cause: This happens because avroSchema_ is not set in HdfsTable,
which is not propagated to the avro scanner and it doesn't have
appropriate checks to make sure the schema is non-null.
The patch fixes the following.
1. Avro scanner should gracefully handle the case where the avro schema
is not set. Appropriate null checks and a meaning error message have
been added.
2. This is a special case with multi-fileformat partitioned tables.
avroSchema_ should be set in HdfsTable even if any subset of the
partitions are backed by avro. Without this patch, we only set it
if the base table file format is Avro.
Change-Id: I09262d3a7b85a2263c721f3beafd0cab2a1bdf4b
Reviewed-on: http://gerrit.cloudera.org:8080/3136
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Internal Jenkins
Testing: Ran the test locally 10 times in a loop on exhaustive.
Change-Id: I8337daf499b90819a253b883fedaa55bd6b6630e
Reviewed-on: http://gerrit.cloudera.org:8080/3087
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
- Moves the test into compute_stats.py
- Changes some test classes in compute_stats.py to inherit from
ImpalaTestSuite and not from TestComputeStats because that
will cause all tests in TestComputeStats to be run in the
subclasses again (redundantly).
- Clean up and add more coverage to testing incremental stats on
HBase which was probably broken in this commit 6b32ff06.
- Fixes a side effect that the original test had for testing
incremental stats on HBase. It computes stats on a functional
table which was not supposed to have stats.
Testing: Ran compute_stats.py on exhaustive locally in a loop 10 times.
Did a private hdfs/core run.
Change-Id: Iee8b84e30948c3c98166e08cae2666574777730c
Reviewed-on: http://gerrit.cloudera.org:8080/3074
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Fixes a typo in ImpalaServer::AuthorizeProxyUser where we
check that the 'user' parameters isn't empty twice instead
of also checking the 'do_as_user' parameter.
Change-Id: I8e3962f6f397804e37d4f2c667e97b55bd3ca2bf
Reviewed-on: http://gerrit.cloudera.org:8080/3120
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Changes:
1) Add the python Kudu module to the virtualenv. Building the virtualenv
is much slower now because Cython and numpy are required. To help with
the rebuild time --no-cache was removed. That option was added to help
when using the dev version of impyla, the version number would be the
same but the module contents were different and the cache used the old
module contents.
2) Add some py.test fixtures to help create Kudu and Impala connections.
Change-Id: I8e5e22b38d5bd09a36238e66a69aa42d1a941de7
Reviewed-on: http://gerrit.cloudera.org:8080/2855
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
In 2.5 we added the ability to set per-pool default query
options. A string of key-value pairs can be specified with a
pool configuration. However, if any options fail to parse,
then all the options are ignored. We want that behavior (and
returning an error) when parsing the process-wide default
query options on startup and when parsing the options sent
from a client (e.g. in beeswax server) because an error can
be returned immediately for the triggering action at that
time (i.e. starting the impalad or submitting a query with
the options set). This behavior is bad for the pool default
query options because (a) the configuration is set by the
administrator and there's nothing we can do until a query is
submitted and (b) one invalid option shouldn't mean that
other valid options aren't set.
Change-Id: If04733b775963091b0314c65286df126fd812358
Reviewed-on: http://gerrit.cloudera.org:8080/3056
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
On the introduction of IMPALA-3452, we defaulted to skipping the
INSERT staging for S3. test_truncate_cleans_hdfs_files assumes that
it will always see the _impala_insert_staging folder but we will not
see that on S3 runs.
This patch deletes the staging folder if it exists and continues the
test without taking into account the staging folder.
Change-Id: I3580f03690e29fe99f441b26bc9baa4c0964d79c
Reviewed-on: http://gerrit.cloudera.org:8080/3049
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
A couple of tests could both attempt to create/destroy the same
database if they were running in parallel. Several other related
tests were marked as requiring serial execution, these needed to be
marked for serial execution as well.
Change-Id: If0573a755cd371363c2e43c001d5c1ba499793c6
Reviewed-on: http://gerrit.cloudera.org:8080/3063
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This patch modifies HdfsTextScanner to specifically check for split
"\r\n" delimiters when the scan range ends with '\r'. If there does
turn out to be a split delimiter, the next tuple is considered the
responsibility of the next scan range's scanner, as if the delimiter
appeared fully in the second scan range. This should not affect the
overall performance characteristics of the text scanner since it
already must do a remote read past the end of the scan range to read
the last tuple.
Change-Id: Id42b441674bb21517ad2788b99942a4b5dc55420
Reviewed-on: http://gerrit.cloudera.org:8080/2803
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
It was previously thought that PURGE had no effect on S3. However,
the Hive Metastore actually created a .Trash directory and copied the
files there when a DROP TABLE was conducted from Impala.
This patch just enables the existing PURGE tests for S3. There were a
few reasons this wasn't working before. The paths given to the S3
client (boto3) should not have a leading "/". This has been fixed as
it doesn't make a difference for HDFS if that exists or not.
Also, PURGE is a pure delete whereas a regular DROP is a copy. A copy
is consistent whereas a delete is only eventually consistent, so when
we PURGE a table or partition, the files will still be visible for
sometime after the query has completed. The tests have been modified
to accomodate for this case as well.
Change-Id: I52d2451e090b00ae2fd9a879c28defa6c940047c
Reviewed-on: http://gerrit.cloudera.org:8080/3036
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
IMPALA-2686 added the breakpad library to all impala daemons, thus enabling them
to write minidump files. This change introduces a flag
'minidump_size_limit_hint_kb', which causes breakpad to reduce the amount of
thread stack memory it includes in a minidump, aiming to reduce the minidump
size during crashes with a lot of threads. Once a minidump is expected to
exceed the configured value, breakpad will include the full stack memory for the
first 20 threads, and afterwards capture only 2KB of stack memory for each
additional thread.
Change-Id: I2f3aa0df51be9f0bf0755fb288702911cdb88052
Reviewed-on: http://gerrit.cloudera.org:8080/2990
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
Testing: Ran the test locally. It was already possible to run the
test in parallel before.
Change-Id: I68a1349276c90a42c238bed40a1c7c221199a67a
Reviewed-on: http://gerrit.cloudera.org:8080/3009
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Testing: I ran the test 10 times in a loop locally and ran
a private core/hdfs run.
Change-Id: I5be5fa5d20bc6ed5b7830e0ce90201431d6aa008
Reviewed-on: http://gerrit.cloudera.org:8080/3003
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Also changes the test to use beeline instead of Hive -e
for the portions executed in Hive because beeline is
significantly faster.
Testing: Tested the changes locally by running them in a loop
10 times. Also did a private core/hdfs run.
Change-Id: I70d87941fbfc30f41e1c6fcfee8d8c7f16b88831
Reviewed-on: http://gerrit.cloudera.org:8080/2962
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
Testing: Tested the changes locally by running them in a loop
10 times. Also did a private core/hdfs run.
Change-Id: I37e1528c02e598f3fb2d673b6559d55a34bf79b4
Reviewed-on: http://gerrit.cloudera.org:8080/3002
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
Testing: I ran the test 10 times in a loop locally and ran
a private core/hdfs run.
Change-Id: Ibd058853e6b48671838e5b51611b6c34a7a8d39d
Reviewed-on: http://gerrit.cloudera.org:8080/2982
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
There was another test which used the hdfs_client and which was not
skipped for localFS. It should never have run on localFS but they
did not fail earlier for the same reasons as mentioned in the previous
patch and in the JIRA. Marking as SkipIfLocal.
Change-Id: I3436e80ccd380ecc5f5d28053b3563db2319f9e9
Reviewed-on: http://gerrit.cloudera.org:8080/2991
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
The patch also addresses a TODO asking for
test_col_stats.py to be merged into test_compute_stats.py
Testing: I ran the test by itself in a loop 10 times,
and the whole test_compute_stats.py locally. Also did
a private core/hdfs run.
Change-Id: I88aa77464a95993c018e19a52eeb496d7c3eef08
Reviewed-on: http://gerrit.cloudera.org:8080/2963
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Our test_ddl.py always had a bug where in the _cleanup() function,
we used the hdfs_client on local FS runs. It always ended up passing
because we caught generic exceptions in hdfs_client.delete_file_dir()
while checking if a file existed which always caused the test to pass.
With the introduction of an hdfs_client.exists() function in
IMPALA-1878 which catches only the right FileNotFound exception, this
bug was exposed causing our local FS runs to fail.
This patch returns from the _cleanup() function if it's a local FS run
because the directories of the tables it cleans up are not used in
these runs.
Change-Id: Ie0c9eec31a90e8f66102d18d900c613bd1306968
Reviewed-on: http://gerrit.cloudera.org:8080/2980
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The cancelled status is propagated in scanner threads to cause them to
shut down once the limit has been satisified, but depending on the code
path and when abort_on_error=false, this internal status would sometimes
incorrectly end up in the error log. Fix this by factoring out the
abort_on_error handling code so that it's handled more consistently
across scanners. Parquet, RC, and Avro all suffered from this bug.
Testing: exhastive
Change-Id: I4a91a22608e346ca21a23ea66c855eae54bbced6
Reviewed-on: http://gerrit.cloudera.org:8080/2964
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Add missing coverage for sorting by CHAR and VARCHAR.
Add more coverage for spilling sorts.
Fix spilling tests: ensure that they actually reliably spill (many of
them had memory limits high enough that they could run entirely in
memory).
I ran this in a loop for a while to flush out flaky tests. The tests
should be fairly predictable given that they're not run concurrently
with other tests and we allocate enough block manager memory so that
each operator can obtain its reservation.
Change-Id: Ia2d2627a2c327dcdf269ea3216385b1af9dfa305
Reviewed-on: http://gerrit.cloudera.org:8080/2877
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This patch allows you to write SOURCE <file> or SRC <file>, and have the
shell read the file and execute all the queries in it.
Change-Id: Ib05df3e755cd12e9e9562de6b353857940eace03
Reviewed-on: http://gerrit.cloudera.org:8080/2663
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins