Commit Graph

927 Commits

Author SHA1 Message Date
David Knupp
c076f098d4 IMPALA-3491: Use unique_database fixture in test_shell_commandline.py.
Before this change, a single test database was created for the entire suite,
and each test was marked to run serially. With the addition of a test fixture
in tests/conftest.py to create a unique database per each individual method,
it's possible now to run the tests in parallel. (The tables required by
individual tests are created via local test fixtures.)

As such, any methods which had been responsible for setting up the test
database were removed. Pytest markers for running tests serially were also
removed, except in cases where interactions from running concurrency would
affect other tests.

Additional minor changes were made to improve PEP-8 compliance.

The non-serial tests were run in a loop ten times to confirm that there weren't
any unexpected failures.

Review: https://gerrit.cloudera.org/#/c/3301/

Change-Id: Icdcb04a99c0907fc1ba56baa2497fafb33b0e34e
Reviewed-on: http://gerrit.cloudera.org:8080/3301
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Internal Jenkins
2016-06-13 18:32:32 -07:00
David Knupp
fc444c102e IMPALA-3491: Use unique_database fixture in test_catalog_service_client.py.
Even though this is just a single test, this change introduces the
unique_database test fixture that was initially created to help with
concurrent tests. It's still worth to do this here because we want to
update all tests to use best practices.

That said, there was still a performance gain to be had here. It
turns out the initial code called the cleanup_db() method from the
base ImpalaTestSuite class, which in turn sets the 'sync_ddl' query
option to true. Not doing this at the beginning of this test results
in a roughly 40x speedup.

Change-Id: I5d6994f31d52e18e2e04aab0e34202e2c623e367
Reviewed-on: http://gerrit.cloudera.org:8080/3366
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Internal Jenkins
2016-06-13 16:32:22 -07:00
Lars Volker
c69cd15a0a IMPALA-3656: Hitting DCHECK/CHECK does not write minidumps
When hitting a DCHECK/CHECK the daemons do not write minidumps. This is
caused by glog's own stack unwinding mechanism, which catches SIGABRT
and removes all other handlers before aborting.

This change bumps the glog version to include a patch, which backports a
change from glog, which only resets the SIGABRT handler, if it is the
one installed by glog itself.

cda16b3443

Change-Id: I08e6b83af1b4ff1b8c916fe6c9052b88b760e188
Reviewed-on: http://gerrit.cloudera.org:8080/3286
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
2016-06-11 05:31:32 -07:00
Alex Behm
19ff47091c IMPALA-3530: Clean up test_ddl.py. Part 1.
This is the first in a series of patches to clean up test_ddl.py

Summary of changes:
  - Break up test_create() and corresponding .test files into:
    * test_create_database()
    * test_create_table()
    * test_create_table_like_table()
    * test_create_table_like_file()
    * test_create_table_as_select()
  - Merge test_nested() into the tests above
  - Move a test into test_hms_integration.py
  - Add a new test_ddl_base.py as base class for DDL tests.
    The plan is to split up test_ddl.py into several smaller
    .py files in subsequent patches.

Testing: I tested test_ddl.py and test_hms_integration.py on
exhaustive locally as well as in private builds on all filesystems.

Change-Id: I5f4c044d39e165c2535961b8d0a765c8dbbd051c
Reviewed-on: http://gerrit.cloudera.org:8080/3044
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2016-06-10 10:31:15 -07:00
Lars Volker
ca62ce65e9 IMPALA-3684, IMPALA-3693: Disable core files for breakpad tests
The breakpad tests were writing core files when triggering minidump
writes. This was actually not needed and interfered with test execution
and artifact collection. Most notably processes would take a long time
to terminate while writing core files (IMPALA-3684). The core files
would also be wrongly collected by Jenkins (IMPALA-3693).

This change adds code to stop test clusters reliably, making
test_breakpad independent from calling setup-impala-cluster.py via
os.system. It also disables core dumps for the duration of the test and
re-enables them afterwards.

Change-Id: If592339632aa662b59be09d911229566d5772321
Reviewed-on: http://gerrit.cloudera.org:8080/3339
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Silvius Rus <srus@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
2016-06-09 17:31:00 -07:00
Tim Armstrong
c1093ed861 IMPALA-3669: test_scratch_disk fails on S3
Make the test deterministic by using max_block_mgr_memory instead of
mem_limit, so that the non-deterministic scanner memory usage does not
influence the spilling behaviour of the queries.

Testing:
Ran the test locally to confirm that it succeeded. Also manually
computed the memory requirement. The data size to be sorted is ~220MB,
so with a 64MB block manager limit per node, at least one node must
spill.

Change-Id: I9525a029ac020bb5b8bea210a741c9f9c5ec3c75
Reviewed-on: http://gerrit.cloudera.org:8080/3318
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Silvius Rus <srus@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2016-06-09 17:31:00 -07:00
Alex Behm
e57fd2d831 IMPALA-3491: Use unique_database fixture in test_local_fs.py
Testing: Ran hdfs/core and localfs/core private builds.

Change-Id: I0720458882ac3b1138deccf9af0ee57bf2eed7dc
Reviewed-on: http://gerrit.cloudera.org:8080/3334
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2016-06-08 16:30:32 -07:00
Alex Behm
6c992f97f2 IMPALA-3491: Use unique_database fixture in test_last_ddl_time_update.py.
Testing: Ran the test locally in a loop 10 times, and did an
exhaustive private test run on HDFS.

Change-Id: I97e96217301078d48584c51218345dc96f6853a6
Reviewed-on: http://gerrit.cloudera.org:8080/3104
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2016-06-07 22:30:08 -07:00
Alex Behm
025fd3bd7f IMPALA-3646: Handle corrupt RLE literal or repeat counts of 0.
Adds handling and testing for a specific Parquet data corruption
scenario with plain dictionary encoded values.

The problematic scenario is when the repeat or literal count of
the RLE-encoded dictionary indexes is decoded as 0 - an invalid value.

There are several other cases of data corruption that are not yet
handled gracefully. This patch only handles one specific case.

Change-Id: Ibf406c82cdded37966f09c81e4cc1446d2b60d63
Reviewed-on: http://gerrit.cloudera.org:8080/3299
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2016-06-07 17:29:59 -07:00
Alex Behm
95064359cc IMPALA-3491: Use unique_database fixture in test_delimited_text.py.
Testing: Ran the test locally 10 times in a loop on exhaustive.

Change-Id: Idedd5f03984e41a4b3ebf271e50863e980c66cb6
Reviewed-on: http://gerrit.cloudera.org:8080/3096
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2016-06-07 09:34:30 -07:00
Alex Behm
a9f7cf51f4 IMPALA-3491: Use unique_database fixture in test_metadata_query_statements.py.
Testing: Ran the test locally on exhaustive in a loop 10 times.
Ran a private exhaustive build on hdfs.

Change-Id: Ia0af1dc6534234508bd0fed03531f7fe8ff556aa
Reviewed-on: http://gerrit.cloudera.org:8080/3103
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2016-06-07 09:34:30 -07:00
Tim Armstrong
d23e5505c8 IMPALA-3670: fix sorter buffer mgmt bugs
Also make test_scratch_disk.py more deterministic, by using
max_block_mgr_memory, which doesn't include scanner memory.
The fixed test_scratch_disk.py exercises the other sorter bugs
that occurs when scratch cannot be written.

Testing:
Added a test that does a sort with various memory limits and consumes
the whole output of the sorter (we have many tests of sorts with limits
but limited coverage of sorts without limits).  Ran an exhaustive test
run before posting for review.

This added test reproduced one of the sorter bugs, where var-len blocks
were not always attached to the output batch. The other test was
reproduced by the test change in IMPALA-3669: test_scratch_disk fix.

Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
Reviewed-on: http://gerrit.cloudera.org:8080/3315
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2016-06-06 22:34:19 -07:00
Matthew Jacobs
a1b035a251 IMPALA-3600: Add missing admission control tests
* -require_username (not strictly admission control related
  but it came up in the context of RM).
* Coverage of failure cases: The handling of the full queue
  case wasn't being verified. This changes existing stress
  test to expect a specific message when the queue is full.
* Requesting MAXINT memory, which previously led to an
  overflow in the pool-level mem tracker accounting.

This does not yet address:
* Changing pool cfg while running
* Verify profile string for queued reason

This is just a minimal incremental change to get additional
coverage. Right now, many of the tests rely on some
pre-defined configuration files which is cumbersome. In the
future, we plan on refreshing the configuration story at
which point we should also build more general test
infrastructure for easily testing different configurations.

Change-Id: I6682b15a5caac5824384c4b48a7b40afa2548954
Reviewed-on: http://gerrit.cloudera.org:8080/3272
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
2016-06-06 18:34:13 -07:00
Michael Brown
067af1957c IMPALA-3614: work around pytest bugs causing custom cluster test skips
All versions of pytest contain various bugs regarding test marking
(including skips) when tests are both:

1. class-level marked
2. inherited

More info is available in IMPALA-3614 and IMPALA-2943, but the gist is
that it's possible for some tests to be skipped when they shouldn't be.
This is happening pretty badly with the custom cluster tests, because
CustomClusterTestSuite has a class level skipif mark.

The easiest workaround for now is to remove the pytest skipif mark in
CustomClusterTestSuite and skip using explicit pytest.skip() in the
setup_class() method. Some CustomClusterTestSuite children implemented
their own setup_* methods, and I made some adjustments to them both to
clean them up and implement proper parent method calling via super().

Testing:

I ran the following combinations of all the custom cluster tests:

DEBUG   / HDFS  / core
RELEASE / HDFS  / exhaustive
DEBUG   / LOCAL / core
DEBUG   / S3    / core

Before, we'd get situations in which most of the tests were skipped.
Consider the RELEASE/HDFS/exhaustive situation:

  custom_cluster/test_admission_controller.py .....
  custom_cluster/test_alloc_fail.py ss
  custom_cluster/test_breakpad.py sssss
  custom_cluster/test_delegation.py sss
  custom_cluster/test_exchange_delays.py ss
  custom_cluster/test_hdfs_fd_caching.py s
  custom_cluster/test_hive_parquet_timestamp_conversion.py ss
  custom_cluster/test_insert_behaviour.py ss
  custom_cluster/test_legacy_joins_aggs.py s
  custom_cluster/test_parquet_max_page_header.py s
  custom_cluster/test_permanent_udfs.py sss
  custom_cluster/test_query_expiration.py sss
  custom_cluster/test_redaction.py ssss
  custom_cluster/test_s3a_access.py s
  custom_cluster/test_scratch_disk.py ssss
  custom_cluster/test_session_expiration.py s
  custom_cluster/test_spilling.py ssss
  authorization/test_authorization.py ss
  authorization/test_grant_revoke.py s

Now, more tests run appropriately:

  custom_cluster/test_admission_controller.py .....
  custom_cluster/test_alloc_fail.py ss
  custom_cluster/test_breakpad.py sssss
  custom_cluster/test_delegation.py ...
  custom_cluster/test_exchange_delays.py ss
  custom_cluster/test_hdfs_fd_caching.py .
  custom_cluster/test_hive_parquet_timestamp_conversion.py ..
  custom_cluster/test_insert_behaviour.py ..
  custom_cluster/test_kudu_not_available.py .
  custom_cluster/test_legacy_joins_aggs.py .
  custom_cluster/test_parquet_max_page_header.py .
  custom_cluster/test_permanent_udfs.py ...
  custom_cluster/test_query_expiration.py ...
  custom_cluster/test_redaction.py ....
  custom_cluster/test_s3a_access.py s
  custom_cluster/test_scratch_disk.py ....
  custom_cluster/test_session_expiration.py .
  custom_cluster/test_spilling.py ....
  authorization/test_authorization.py ..
  authorization/test_grant_revoke.py .

Change-Id: Ie301b69718f8690322cc3b4130fb1c715344779c
Reviewed-on: http://gerrit.cloudera.org:8080/3265
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Michael Brown <mikeb@cloudera.com>
2016-06-06 17:34:07 -07:00
Tim Armstrong
37ec25396f IMPALA-3344: Simplify sorter and document/enforce invariants.
Clarify relationships between classes, clean up the previous mess
where every class was friends with the other so there's an actual
distinction between public and private members. TupleIterator
is now no longer tied to TupleSorter, just Run.

Document and enforce invariants in many cases.

Factor out some functions from large functions.

Simplify and document iterator logic.

Make management of buffers when iterating over output stream more
explicitly correct: either use MarkNeedToReturn() or attach block
to the batch as appropriate. The SortedRunMerger didn't handle
resource transfer correctly, except if all the memory came from
the batch's MemPool. This patch fixes the cases when resources
are attached to the batches, but not the 'need_to_return' case.
Document that SortedRunMerger requires 'deep_copy_input' to be true
if batches can have the 'need_to_return' flag set.

Also use the atomic block exchange operation when moving between
blocks in unpinned runs to prevent pin failures at that point.
I explicitly have avoided changing the hairy block management logic
when allocating buffers for merging, that will need addressing in
a follow-up patch.

Add a SpilledRuns counter so that it's more explicit that spilling
occurred.

Testing:
Added some tests for corner cases with empty and NULL strings.
Fixed a test that previously failed with OOM but now succeeds.

Performance:
Benchmarking against old code initial revealed some regressions from
changes in inlining. Force inlining the TupleComparator::operator() and
iterator Next()/Prev() functions helped and performance seems similar or
slightly better on the targeted orderby benchmarks.

Change-Id: I9c619e81fd1b8ac50e257172c8bce101a112b52a
Reviewed-on: http://gerrit.cloudera.org:8080/2826
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2016-06-02 21:33:08 -07:00
Thomas Tauber-Marshall
5231301084 IMPALA-1633: GetOperationStatus should set errorMessage and sqlState
Currently, we never populate the errorMessage or sqlState
fields of TGetOperationStatusResp when the GetOperationStatus
HiveServer2 rpc is called. This patch checks if the query has
an error status and if so sets errorMessage and sqlState.

GetOperationStatus also now takes the QueryExecState lock since
QueryExecState::query_state_ and QueryExecState::query_status_
are supposed to be protected by it.

Additionally, this patch performs some cleanup and adds some
documentation around our behavior for updating
QueryExecState::query_state_/query_status_.

This also addresses IMPALA-3298: TGetOperationStatusResp missing
error message when data is expired

Change-Id: Icb792f88286779fcf2ce409828de818bc4e80bed
Reviewed-on: http://gerrit.cloudera.org:8080/3094
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Internal Jenkins
2016-06-01 19:32:39 -07:00
Alex Behm
32c40f9c5d Remove redundant test in test_avro_schema_resolution.py
Change-Id: I7123cd5e19d79122af3b4fef2c092442b7a098f1
Reviewed-on: http://gerrit.cloudera.org:8080/3095
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-05-31 23:32:11 -07:00
Tim Armstrong
8f79e9eb3c Stress test should count failure to repartition as a memory limit exceeded error
The error code for a repartitioning failure recently changed (because it
is not strictly a mem limit error). This makes the corresponding change
in the stress test.

Change-Id: Ie67fabb8d4c0ffc65ac06f35e4a0a5c7a73baddd
Reviewed-on: http://gerrit.cloudera.org:8080/3207
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-31 23:32:11 -07:00
Bharath Vissapragada
5ede8eb8a7 IMPALA-2336: Ignore trailing comments in non-interactive mode
This patch trims trailing comments while parsing queries in
non-interactive mode. Users usually have comments in the end
of the script which should be ignored. Without this patch,
the script fails with an exception since it expects a valid
SQL. The behavior however remains the same with interactive
mode.

Change-Id: I723763ef7eedd03cf22058fadf06e9673a0d94d2
Reviewed-on: http://gerrit.cloudera.org:8080/3169
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2016-05-31 23:32:11 -07:00
Huaisi Xu
816735a032 IMPALA-3092: Set default value to NULL in AvroSchemaConverter
This change ensures that Avro tables created without column definitions
remain queryable if columns are added via ALTER TABLE. The bug was that
when synthesizing an Avro schema from the column definitions we used to
not add default values.

Change-Id: Ib86e9ba1f4329b285ae14ee299365f7291a7410e
Reviewed-on: http://gerrit.cloudera.org:8080/3219
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-05-31 23:32:11 -07:00
Alex Behm
77da3834ff IMPALA-3369: Add ALTER TABLE SET COLUMN STATS statement.
Adds a new command to manually set the table-level column stats.

Syntax:
ALTER TABLE [<db_name>.]<tbl_name> SET COLUMN STATS <col_name>
('statsKey'='val','statsKey2',='val2')

Valid values for 'statsKey': numDVs, numNulls, avgSize, maxSize

The 'val' portion needs to be a number appropriate for the given stats
key (e.g., a long for numDVs, a float for avgSize).

The special value of '-1' is allowed to reset stats to 'unknown'.

The keys as well as the values are specified as string literals to be
consistent with the existing DDL for setting TBLPROPERTIES/SERDEPROPERTIES,
in particular, setting the 'numRows' table/partition property.

Testing: Ran the tests locally on exhaustive. Did private runs
on core/hdfs and core/S3.

Change-Id: I45cd8aa7241ea962788ba9ca7d0bbfd864c4304f
Reviewed-on: http://gerrit.cloudera.org:8080/3189
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-05-31 23:32:11 -07:00
Lars Volker
d16e83214a IMPALA-3581: Change location of minidump folders to log_dir
Currently the default minidump location is /tmp/impala-minidumps, which can be wiped on
reboot on various distributions. This change moves the default location to
FLAGS_log_dir/minidumps/$daemon. The additional trailing $daemon folder is kept to prevent
name collisions in case of local test clusters and strangely configured installations.

For local test clusters the minidumps will be written to
$IMPALA_HOME/logs/cluster/minidumps/{catalogd,impalad,statestored}.

Change-Id: Idecf5a314bfb8b0870e8aa4819c4fb39a107702f
Reviewed-on: http://gerrit.cloudera.org:8080/3171
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2016-05-31 23:32:11 -07:00
Sailesh Mukil
6f1fe4ebe7 IMPALA-3577, IMPALA-3486: Partitions on multiple filesystems breaks with S3_SKIP_INSERT_STAGING
The HdfsTableSink usualy creates a HDFS connection to the filesystem
that the base table resides in. However, if we create a partition in
a FS different than that of the base table and set
S3_SKIP_INSERT_STAGING to "true", the table sink will try to write to
a different filesystem with the wrong filesystem connector.

This patch allows the table sink itself to work with different
filesystems by getting rid of a single FS connector and getting a
connector per partition.

This also reenables the multiple_filesystems test and modifies it to
use the unique_database fixture so that parallel runs on the same
bucket do not clash and end up in failures.

This patch also introduces a SECONDARY_FILESYSTEM environment variable
which will be set by the test to allow S3, Isilon and the localFS to
be used as the secondary filesystems.

All jobs with HDFS as the default filesystem need to set the
appropriate environment for S3 and Isilon, i.e. the following:
 - export AWS_SECERT_ACCESS_KEY
 - export AWS_ACCESS_KEY_ID
 - export SECONDARY_FILESYSTEM (to whatever filesystem needs to be
   tested)

TODO: SECONDARY_FILESYSTEM and FILESYSTEM_PREFIX and NAMENODE have a
lot of similarities. Need to clean them up in a following patch.

Change-Id: Ib13b610eb9efb68c83894786cea862d7eae43aa7
Reviewed-on: http://gerrit.cloudera.org:8080/3146
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2016-05-31 23:32:11 -07:00
Michael Ho
3a4a77521e IMPALA-3608: Updates Impala E2E test framework to allow multiple exception messages
Some of our tests which are expected to fail due to low
query memory limits can fail non-deterministically with
different error messages. In addition, some tests may
throw different error messages when running with the legacy
join nodes. This change updates the test infrastructure to
allow multiple exception messages to be specified by using
adding "ANY_OF" to the "CATCH" subsection.

Change-Id: Ie6d81fd3ae601f565b575edfeefff7c5a6c07974
Reviewed-on: http://gerrit.cloudera.org:8080/3205
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
2016-05-31 23:32:10 -07:00
Michael Brown
22669e23be IMPALA-3501: ee tests: detect build type and support different timeouts based on the same
Impala compiled with the address sanitizer, or compiled with code
coverage, runs through code paths much slower. This can cause end-to-end
tests that pass on a non-ASAN or non-code coverage build to fail. Some
examples include IMPALA-2721, IMPALA-2973, and IMPALA-3501. These
classes of failures tend always to involve some time-sensitive condition
that fails to succeed under such "slow builds".

The works-around in the past have been to simply increase the timeout.
The problem with this approach is that it relaxes conditions for tests
on builds that see the field--i.e., release builds--for builds that
never will--i.e., ASAN and code coverage.

This patch fixes that problem by allowing test authors to set timeout
values based on a *specific* build type. The author may choose timeouts
with a default value, and different timeouts for either or both
so-called "slow builds": ASAN and code coverage.

We detect the so-called "specific build type" by inspecting the binary
expected to be at the path under test. This removes the need to make
alterations to Impala itself. The inspection done is to read the DWARF
information in the binary, specifically the first compile unit's
DW_AT_producer and DW_AT_name DIE attributes. We employ a heuristic
based on these attributes' values to guess the build type. If we can't
determine the build type, we will assume it's a debug build. More
information on this is in IMPALA-3501.

A quick summary of the changes follows:

1. Move some of the logic in tests.common.skip to tests.common.environ
   and rework some skip marks to be more precise.

2. Add Pyelftools for convenient deserialization of DWARF

3. Our Pyelftools usage requires collections.OrderedDict, which isn't in
   python2.6; also add Monkeypatch to handle this.

4. Add ImpalaBuild and specific_build_type_timeout, the core of the new
   functionality

5. Fix the statestore tests that only fail under code coverage (the
   basis for IMPALA-3501)

Testing:

The tests that were previously, reliably failing under code coverage now
pass. I also ran perfunctory tests of debug, release, and ASAN builds to
ensure our detection of build type is working. This patch will *not*
turn the code coverage builds green; there are other tests that fail,
and fixing all of them here is out of the scope of this patch.

Change-Id: I2b675c04c54e36d404fd9e5a6cf085fb8d6d0e47
Reviewed-on: http://gerrit.cloudera.org:8080/3156
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
2016-05-25 19:41:45 -07:00
Michael Brown
5112e65be2 Revert "Revert "Add Kudu test helpers""
This reverts commit f8dd5413b65d30646c3745dfc738ed812d50a51f and
effectively re-adds commit 9248dcb70478b8f93f022893776a0960f45fdc28. The
difference between this patch and its original is that I fixed the
changes introduced in infra/python/bootstrap_virtualenv.py to be
python2.4-compatible:

- removed the use of str.format(), preferring a str.join() pattern
- removed the call of the exit() builtin to prefer sys.exit()

The only testing I did for this patch was to ensure
CDH Impala-packaging-on-demand works.

Change-Id: I02ed97473868eacf45b25abe89b41e6fa2fce325
Reviewed-on: http://gerrit.cloudera.org:8080/3160
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
2016-05-24 16:40:59 -07:00
Matthew Jacobs
f413e236a8 IMPALA-3579: Strict handling of numeric overflow in text parsing
Adds a query option 'strict_mode' which treats integer and
floating pt overflows as parse errors. In the past,
overflows were ignored and the max value was returned. When
this query option is set, overflowing values are treated as if
they were completely invalid data, i.e. NULL is returned.
When abort_on_error is enabled, this means the query is
aborted.

Notes:
* DECIMAL overflow/underflow is already treated as an error.
* The handling in text-converter treats underflows the same
  as overflows, so they would result in the same behavior.
  However, floating point parsing never returns an underflow
  today.
* We may also want to handle numeric values that are truncated
  when parsing to integer types, e.g. 10.5 -> 10.

Change-Id: I7409c31ec0cb6fe0b2d9842b9f58fe1670914836
Reviewed-on: http://gerrit.cloudera.org:8080/3150
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2016-05-23 08:40:20 -07:00
Bharath Vissapragada
49610e2cfa IMPALA-3314/IMPALA-3513: Fix querying tables/partitions altered to Avro format
Bug: Impalads crash if we query an Avro table with stale metadata

Cause: This happens because avroSchema_ is not set in HdfsTable,
which is not propagated to the avro scanner and it doesn't have
appropriate checks to make sure the schema is non-null.

The patch fixes the following.

1. Avro scanner should gracefully handle the case where the avro schema
   is not set. Appropriate null checks and a meaning error message have
   been added.

2. This is a special case with multi-fileformat partitioned tables.
   avroSchema_ should be set in HdfsTable even if any subset of the
   partitions are backed by avro. Without this patch, we only set it
   if the base table file format is Avro.

Change-Id: I09262d3a7b85a2263c721f3beafd0cab2a1bdf4b
Reviewed-on: http://gerrit.cloudera.org:8080/3136
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Internal Jenkins
2016-05-23 08:40:20 -07:00
Alex Behm
72e4c41400 IMPALA-3491: Use unique_database fixture in test_data_errors.py.
Testing: Ran the test locally 10 times in a loop on exhaustive.

Change-Id: I8337daf499b90819a253b883fedaa55bd6b6630e
Reviewed-on: http://gerrit.cloudera.org:8080/3087
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-05-23 08:40:19 -07:00
Alex Behm
ea45de84f4 IMPALA-3491: Merge test_hbase_metadata.py into compute_stats.py. Use unique db fixture.
- Moves the test into compute_stats.py
- Changes some test classes in compute_stats.py to inherit from
  ImpalaTestSuite and not from TestComputeStats because that
  will cause all tests in TestComputeStats to be run in the
  subclasses again (redundantly).
- Clean up and add more coverage to testing incremental stats on
  HBase which was probably broken in this commit 6b32ff06.
- Fixes a side effect that the original test had for testing
  incremental stats on HBase. It computes stats on a functional
  table which was not supposed to have stats.

Testing: Ran compute_stats.py on exhaustive locally in a loop 10 times.
Did a private hdfs/core run.

Change-Id: Iee8b84e30948c3c98166e08cae2666574777730c
Reviewed-on: http://gerrit.cloudera.org:8080/3074
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-05-23 08:40:19 -07:00
Thomas Tauber-Marshall
51869eac56 IMPALA-3542: do_as_user empty check missing
Fixes a typo in ImpalaServer::AuthorizeProxyUser where we
check that the 'user' parameters isn't empty twice instead
of also checking the 'do_as_user' parameter.

Change-Id: I8e3962f6f397804e37d4f2c667e97b55bd3ca2bf
Reviewed-on: http://gerrit.cloudera.org:8080/3120
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-23 08:40:19 -07:00
Shiraz Ali
08eff2bc09 Revert "Add Kudu test helpers"
This reverts commit 9248dcb70478b8f93f022893776a0960f45fdc28.
2016-05-20 08:46:00 -07:00
casey
36b524f68c Add Kudu test helpers
Changes:

1) Add the python Kudu module to the virtualenv. Building the virtualenv
is much slower now because Cython and numpy are required. To help with
the rebuild time --no-cache was removed. That option was added to help
when using the dev version of impyla, the version number would be the
same but the module contents were different and the cache used the old
module contents.

2) Add some py.test fixtures to help create Kudu and Impala connections.

Change-Id: I8e5e22b38d5bd09a36238e66a69aa42d1a941de7
Reviewed-on: http://gerrit.cloudera.org:8080/2855
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2016-05-19 19:45:48 -07:00
Matthew Jacobs
f067929f3a IMPALA-3535: Ignore invalid per-pool default query options
In 2.5 we added the ability to set per-pool default query
options. A string of key-value pairs can be specified with a
pool configuration. However, if any options fail to parse,
then all the options are ignored. We want that behavior (and
returning an error) when parsing the process-wide default
query options on startup and when parsing the options sent
from a client (e.g. in beeswax server) because an error can
be returned immediately for the triggering action at that
time (i.e. starting the impalad or submitting a query with
the options set). This behavior is bad for the pool default
query options because (a) the configuration is set by the
administrator and there's nothing we can do until a query is
submitted and (b) one invalid option shouldn't mean that
other valid options aren't set.

Change-Id: If04733b775963091b0314c65286df126fd812358
Reviewed-on: http://gerrit.cloudera.org:8080/3056
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-17 10:09:05 -07:00
Sailesh Mukil
4c9c74dd33 IMPALA-3532: S3: test_truncate_cleans_hdfs_files fails because we skip INSERT staging
On the introduction of IMPALA-3452, we defaulted to skipping the
INSERT staging for S3. test_truncate_cleans_hdfs_files assumes that
it will always see the _impala_insert_staging folder but we will not
see that on S3 runs.

This patch deletes the staging folder if it exists and continues the
test without taking into account the staging folder.

Change-Id: I3580f03690e29fe99f441b26bc9baa4c0964d79c
Reviewed-on: http://gerrit.cloudera.org:8080/3049
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2016-05-14 01:30:01 -07:00
Casey Ching
e61b5bc119 IMPALA-3511: Fix race setting up TestKuduOperations
A couple of tests could both attempt to create/destroy the same
database if they were running in parallel. Several other related
tests were marked as requiring serial execution, these needed to be
marked for serial execution as well.

Change-Id: If0573a755cd371363c2e43c001d5c1ba499793c6
Reviewed-on: http://gerrit.cloudera.org:8080/3063
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2016-05-14 01:30:01 -07:00
Skye Wanderman-Milne
9174dee395 IMPALA-1578: fix text scanner to handle "\r\n" delimiters split across blocks
This patch modifies HdfsTextScanner to specifically check for split
"\r\n" delimiters when the scan range ends with '\r'. If there does
turn out to be a split delimiter, the next tuple is considered the
responsibility of the next scan range's scanner, as if the delimiter
appeared fully in the second scan range. This should not affect the
overall performance characteristics of the text scanner since it
already must do a remote read past the end of the scan range to read
the last tuple.

Change-Id: Id42b441674bb21517ad2788b99942a4b5dc55420
Reviewed-on: http://gerrit.cloudera.org:8080/2803
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 23:06:36 -07:00
Sailesh Mukil
7e0cbaf1a0 IMPALA-3459: Add test for DROP TABLE PURGE for S3
It was previously thought that PURGE had no effect on S3. However,
the Hive Metastore actually created a .Trash directory and copied the
files there when a DROP TABLE was conducted from Impala.

This patch just enables the existing PURGE tests for S3. There were a
few reasons this wasn't working before. The paths given to the S3
client (boto3) should not have a leading "/". This has been fixed as
it doesn't make a difference for HDFS if that exists or not.

Also, PURGE is a pure delete whereas a regular DROP is a copy. A copy
is consistent whereas a delete is only eventually consistent, so when
we PURGE a table or partition, the files will still be visible for
sometime after the query has completed. The tests have been modified
to accomodate for this case as well.

Change-Id: I52d2451e090b00ae2fd9a879c28defa6c940047c
Reviewed-on: http://gerrit.cloudera.org:8080/3036
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 23:06:36 -07:00
Lars Volker
df8bf3a965 IMPALA-3490: Add flag to reduce minidump size
IMPALA-2686 added the breakpad library to all impala daemons, thus enabling them
to write minidump files. This change introduces a flag
'minidump_size_limit_hint_kb', which causes breakpad to reduce the amount of
thread stack memory it includes in a minidump, aiming to reduce the minidump
size during crashes with a lot of threads. Once a minidump is expected to
exceed the configured value, breakpad will include the full stack memory for the
first 20 threads, and afterwards capture only 2KB of stack memory for each
additional thread.

Change-Id: I2f3aa0df51be9f0bf0755fb288702911cdb88052
Reviewed-on: http://gerrit.cloudera.org:8080/2990
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:04 -07:00
Alex Behm
1c19c232f3 IMPALA-3491: Use unique_database fixture in test_views_compatibility.
Testing: Ran the test locally. It was already possible to run the
test in parallel before.

Change-Id: I68a1349276c90a42c238bed40a1c7c221199a67a
Reviewed-on: http://gerrit.cloudera.org:8080/3009
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:02 -07:00
Alex Behm
c0ee93bbbe IMPALA-3491: Use unique_database fixture in test_recover_partitions.py.
Testing: I ran the test 10 times in a loop locally and ran
a private core/hdfs run.

Change-Id: I5be5fa5d20bc6ed5b7830e0ce90201431d6aa008
Reviewed-on: http://gerrit.cloudera.org:8080/3003
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:00 -07:00
Alex Behm
616eb2fcce IMPALA-3491: Use unique_database fixture in test_partition_metadata.py
Also changes the test to use beeline instead of Hive -e
for the portions executed in Hive because beeline is
significantly faster.

Testing: Tested the changes locally by running them in a loop
10 times. Also did a private core/hdfs run.

Change-Id: I70d87941fbfc30f41e1c6fcfee8d8c7f16b88831
Reviewed-on: http://gerrit.cloudera.org:8080/2962
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:00 -07:00
Alex Behm
12097a0707 IMPALA-3491: Use unique_database fixture in test_hidden_files.py.
Testing: Tested the changes locally by running them in a loop
10 times. Also did a private core/hdfs run.

Change-Id: I37e1528c02e598f3fb2d673b6559d55a34bf79b4
Reviewed-on: http://gerrit.cloudera.org:8080/3002
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:59 -07:00
Alex Behm
96e18f9e62 IMPALA-3491: Use unique_database fixture in test_stale_metadata.py
Testing: I ran the test 10 times in a loop locally and ran
a private core/hdfs run.

Change-Id: Ibd058853e6b48671838e5b51611b6c34a7a8d39d
Reviewed-on: http://gerrit.cloudera.org:8080/2982
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:59 -07:00
Sailesh Mukil
0f1dd55c79 IMPALA-3488 (follow up): test_ddl.py failure on LocalFS run
There was another test which used the hdfs_client and which was not
skipped for localFS. It should never have run on localFS but they
did not fail earlier for the same reasons as mentioned in the previous
patch and in the JIRA. Marking as SkipIfLocal.

Change-Id: I3436e80ccd380ecc5f5d28053b3563db2319f9e9
Reviewed-on: http://gerrit.cloudera.org:8080/2991
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:58 -07:00
Alex Behm
bff194ce17 IMPALA-3491: Use unique_database fixture in test_col_stats.py.
The patch also addresses a TODO asking for
test_col_stats.py to be merged into test_compute_stats.py

Testing: I ran the test by itself in a loop 10 times,
and the whole test_compute_stats.py locally. Also did
a private core/hdfs run.

Change-Id: I88aa77464a95993c018e19a52eeb496d7c3eef08
Reviewed-on: http://gerrit.cloudera.org:8080/2963
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:58 -07:00
Sailesh Mukil
41e31439d3 IMPALA-3488: test_ddl.py failure on LocalFS run
Our test_ddl.py always had a bug where in the _cleanup() function,
we used the hdfs_client on local FS runs. It always ended up passing
because we caught generic exceptions in hdfs_client.delete_file_dir()
while checking if a file existed which always caused the test to pass.

With the introduction of an hdfs_client.exists() function in
IMPALA-1878 which catches only the right FileNotFound exception, this
bug was exposed causing our local FS runs to fail.

This patch returns from the _cleanup() function if it's a local FS run
because the directories of the tables it cleans up are not used in
these runs.

Change-Id: Ie0c9eec31a90e8f66102d18d900c613bd1306968
Reviewed-on: http://gerrit.cloudera.org:8080/2980
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:57 -07:00
Dan Hecht
a0d4249652 IMPALA-3337: fix "Cancelled" warnings when LIMIT clause is specified
The cancelled status is propagated in scanner threads to cause them to
shut down once the limit has been satisified, but depending on the code
path and when abort_on_error=false, this internal status would sometimes
incorrectly end up in the error log. Fix this by factoring out the
abort_on_error handling code so that it's handled more consistently
across scanners. Parquet, RC, and Avro all suffered from this bug.

Testing: exhastive

Change-Id: I4a91a22608e346ca21a23ea66c855eae54bbced6
Reviewed-on: http://gerrit.cloudera.org:8080/2964
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:57 -07:00
Tim Armstrong
34c95c9590 IMPALA-2345,2991: test coverage for spilling and sorts
Add missing coverage for sorting by CHAR and VARCHAR.

Add more coverage for spilling sorts.

Fix spilling tests: ensure that they actually reliably spill (many of
them had memory limits high enough that they could run entirely in
memory).

I ran this in a loop for a while to flush out flaky tests. The tests
should be fairly predictable given that they're not run concurrently
with other tests and we allocate enough block manager memory so that
each operator can obtain its reservation.

Change-Id: Ia2d2627a2c327dcdf269ea3216385b1af9dfa305
Reviewed-on: http://gerrit.cloudera.org:8080/2877
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:55 -07:00
Henry Robinson
a805e100b2 IMPALA-3397: Source query files from shell.
This patch allows you to write SOURCE <file> or SRC <file>, and have the
shell read the file and execute all the queries in it.

Change-Id: Ib05df3e755cd12e9e9562de6b353857940eace03
Reviewed-on: http://gerrit.cloudera.org:8080/2663
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:54 -07:00