Commit Graph

38 Commits

Author SHA1 Message Date
Michael Ho
a41918d443 Fix E2E test infrastructure to handle missing exceptions correctly
This change fixes a bug in the E2E infrastructure that handles
the case when an expected exception wasn't thrown. The code was
expecting that test_section['CATCH'] to be a string but in
reality it's a list of strings. It also clarifies the error
message about the missing exception. This change also enforces
that the CATCH subsection in tests cannot be empty.

Change-Id: I7d83c5db59e8a239e4e70694a1e625af6f21419c
Reviewed-on: http://gerrit.cloudera.org:8080/5260
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
2016-12-01 23:43:03 +00:00
Thomas Tauber-Marshall
3833707dbd IMPALA-4466: Improve Kudu CRUD test coverage
The results in the test files were verified by hand.

This patch also introduces a new test section 'DML_RESULTS', which
takes the name of a table as a comment and the contents of the
table as its body and then verifies that the body matches the
actual contents of the table. This makes it easy to check that a
DML operation has the desired effect on the contents of a table,
rather than always having to add another test case that runs a
select on the table. For now, this section cannot be used in a
test along with the RESULTS or ERRORS sections.

TODO: Refactor the DML test case handling (IMPALA-4471)

Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6
Reviewed-on: http://gerrit.cloudera.org:8080/4953
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2016-11-17 02:54:30 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Tim Armstrong
bc8c55afcd IMPALA-3729: batch_size=1 coverage for avro scanner
Also fix a stale comment in the avro scanner header.

The main work here is to fix the handling of empty result sets in the
test result verifier. This is a problem because we wanted to verify
that the results in the test file were a superset of the rows
returned, and this was thrown off by superflous '' rows in the expected
and actual result sets.

The basic problem is that the way test file sections
was parsed conflated an empty result section with non-empty result
section that had a single empty string. I.e.:

---- RESULTS
====

vs
---- RESULTS

====

both got resolved to [''].

Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Reviewed-on: http://gerrit.cloudera.org:8080/3413
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2016-07-19 23:30:02 -07:00
Taras Bobrovytsky
609b80410e Clean up Python test import statements
Many of our test scripts have import statements that look like
"from xxx import *". It is a good practice to explicitly name what
needs to be imported. This commit implements this practice. Also,
unused import statements are removed.

Change-Id: I6a33bb66552ae657d1725f765842f648faeb26a8
Reviewed-on: http://gerrit.cloudera.org:8080/3444
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
2016-07-15 23:26:18 +00:00
Michael Ho
3a4a77521e IMPALA-3608: Updates Impala E2E test framework to allow multiple exception messages
Some of our tests which are expected to fail due to low
query memory limits can fail non-deterministically with
different error messages. In addition, some tests may
throw different error messages when running with the legacy
join nodes. This change updates the test infrastructure to
allow multiple exception messages to be specified by using
adding "ANY_OF" to the "CATCH" subsection.

Change-Id: Ie6d81fd3ae601f565b575edfeefff7c5a6c07974
Reviewed-on: http://gerrit.cloudera.org:8080/3205
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
2016-05-31 23:32:10 -07:00
Skye Wanderman-Milne
9b51b2b6e6 IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
This patch introduces a new query option,
PARQUET_FALLBACK_SCHEMA_RESOLUTION which allows Parquet files' schemas
to be resolved by either name or position.  It's "fallback" because
eventually field IDs will be the primary schema resolution scheme, and
we don't want to create an option that we will have to change the name
of later. The default is still by position. I chose to do a query
option because it will make testing easier and also be easier to
diagnose resolution problems quickly in the field. If users want to
switch the default behavior to be by name (like Hive), they can use
the --default_query_options flag.

This patch also introduces a new test section, SHELL, which can be
used to execute shell commands in a .test file. This is useful for
copying files into test tables.

Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6
Reviewed-on: http://gerrit.cloudera.org:8080/2384
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
2016-04-02 04:04:25 +00:00
Henry Robinson
b3937295fb Runtime filters tests
This patch adds functional tests for runtime filters. It relies on
setting RUNTIME_FILTER_WAIT_TIME_MS high enough to ensure that filters
are received.

To make the test files more readable, this patch also adds a new COMMENT
section to the test syntax, and allows blank spaces between queries so
that the separation of different test cases can be made more obvious.

Currently missing is a test for disabling probe-side filters based on
selectivity, as we lack suitable tables to trigger the disable condition.

Change-Id: I94d617c6d23ffa394a6eb7ead56f1cfb701e0d90
Reviewed-on: http://gerrit.cloudera.org:8080/2603
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
2016-03-23 04:07:14 +00:00
David Alves
82222abaf5 Merge branch 'feature/kudu' into cdh5-trunk
This merges the 'feature/kudu' branch with cdh5-trunk as of commit:
055500cc753f87f6d1c70627321fcc825044e183

This patch is not a pure merge patch in the sense that goes beyond conflict
resolution to also address reviews to the 'feature/kudu' branch as a whole.

The review items and their resolution can be inspected at:
http://gerrit.cloudera.org:8080/#/c/1403/

Change-Id: I6dd4270cd17a4f5c02811c343726db3504275a92
2016-03-11 11:37:58 -08:00
Lars Volker
6b566a2d35 IMPALA-3004: Fix QueryTest tests
Test files in testdata/workloads/functional-query/queries/QueryTest
are parsed by test_file_parser.py, which used to ignore everything
before the first ==== line as a file header. This change fixes all
affected files.

This change also modifies the test file parser to forbid headers
starting with what looks like a subsection title ('----'), which
should prevent the reintroduction of similar errors in the future.

Change-Id: Iaa1bc5ffd02782e24289c7843dcb35401c334519
Reviewed-on: http://gerrit.cloudera.org:8080/2220
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
2016-02-19 00:03:15 -08:00
Tim Armstrong
2c2670e389 IMPALA-1305: streaming pre-aggregations
Aggregations are implemented as a distributed pre-aggregation, an
exchange, then a final aggregation that produces the results of the
aggregation. In many cases the pre-aggregation significantly reduces the
amount of data to be exchanged. However, in other cases, the
preaggregation does not greatly reduce the amount of data exchanged or
can use a lot of memory and starve other operators that would benefit
more from the additional memory.

In these cases we would be better off "passing through" some input tuples
by transforming them into intermediate tuples without aggregating them.

This patch adds a streaming pre-aggregation mode to
PartitionedAggregationNode that tries to aggregate input rows with a
hash table, but can switch to passing through the input tuples (after
transforming them into the appropriate tuple format). It does this if
it hits a memory limit or if the aggregation is not sufficiently
reducing the node's output (specifically, if the number of aggregated
rows in the hash table is more than half the number of unaggregated rows
consumed by the pre-aggregation). Pre-aggregations never need to spill
because they can pass through rows when under memory pressure.

This initial implementation is quite conservative: it retains the
partitioning of the previous implementation because switching to a
single partition proved to regress performance of some queries while
improving others. It also always keeps hash tables around and updates
them with matching input rows so that reduction statistics are updated
and early decisions to pass through data can be reversed.  Future work
could explore different approaches within the new framework to get
larger performance gains. Currently we see significant performance
benefits for queries with a very low reduction factor, e.g. group by on
a nearly unique column

Includes codegen support for the passthrough streaming.

Adds a query option, disable_streaming_preaggregations, in case a user
wants to revert to the old behaviour.

Adds TPC-H tests to exercise the new passthrough code path and updates
planner tests to include the new [STREAMING] detail added by the planner.

Change-Id: Ia40525340cba89a8c4e70164ae11447e96494664
Reviewed-on: http://gerrit.cloudera.org:8080/1698
Tested-by: Internal Jenkins
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
2016-02-11 19:03:51 +00:00
Lars Volker
b3ae4921a6 Fix comment handling in test file parser
The test file parser is supposed to handle multiple-item comments for a
section. The implementation had an issue were only single-item comments
were handled correctly.

Change-Id: I47f7201044f6f92d10a62bb9d2eada1bd4c47a23
Reviewed-on: http://gerrit.cloudera.org:8080/1819
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
2016-01-22 21:01:18 +00:00
Michael Ho
f0c2742641 IMPALA-2004: Implement "SHOW CREATE" for udfs and udas.
This patch extends the SHOW statement to also support
user-defined functions and user-defined aggregate functions.
The syntax of the new SHOW statements is as follows:

SHOW CREATE [AGGREGATE] FUNCTION [<db_name>.]<func_name>;

<db_name> and <func_name> are the names of the database
and udf/uda respectively.

Sample outputs of the new SHOW statements are as follows:

Query: show create function fn
+------------------------------------------------------------------+
| result                                                           |
+------------------------------------------------------------------+
| CREATE FUNCTION default.fn()                                     |
|  RETURNS INT                                                     |
|  LOCATION 'hdfs://localhost:20500/test-warehouse/libTestUdfs.so' |
|  SYMBOL='_Z2FnPN10impala_udf15FunctionContextE'                  |
|                                                                  |
+------------------------------------------------------------------+

Query: show create aggregate function agg_fn
+------------------------------------------------------------------------------------------+
| result                                                                                   |
+------------------------------------------------------------------------------------------+
| CREATE AGGREGATE FUNCTION default.agg_fn(INT)                                            |
|  RETURNS BIGINT                                                                          |
|  LOCATION 'hdfs://localhost:20500/test-warehouse/libudasample.so'                        |
|  UPDATE_FN='_Z11CountUpdatePN10impala_udf15FunctionContextERKNS_6IntValEPNS_9BigIntValE' |
|  INIT_FN='_Z9CountInitPN10impala_udf15FunctionContextEPNS_9BigIntValE'                   |
|  MERGE_FN='_Z10CountMergePN10impala_udf15FunctionContextERKNS_9BigIntValEPS2_'           |
|  FINALIZE_FN='_Z13CountFinalizePN10impala_udf15FunctionContextERKNS_9BigIntValE'         |
|                                                                                          |
+------------------------------------------------------------------------------------------+

Please note that all the overloaded functions which match
the given function name and category will be printed.

This patch also extends the python test infrastructure to
support expected results which include newline characters.
A new subsection comment called 'MULTI_LINE' has been added
for the 'RESULT' section. With this comment, a test can
include its multi-line output inside [ ] and the content
inside [ ] will be treated as a single line, including the
newline character.

Change-Id: Idbe433eeaf5e24ed55c31d905fea2a6160c46011
Reviewed-on: http://gerrit.cloudera.org:8080/1271
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-10-23 05:11:07 +00:00
Casey Ching
cb4998c28e Python: Remove log configuration from test_file_parser.py
Previously the test_file_parser would setup the logging
configuration as part of importing the module. The
test_file_parser is not executable and not a logging utility so it
should not have any effect on logging. If some other file relies on this
it should be fixed separately.

Change-Id: Ib7293d152d0c0cd3c8f31533c95e50b2678e927b
Reviewed-on: http://gerrit.cloudera.org:8080/473
Tested-by: Internal Jenkins
Reviewed-by: Casey Ching <casey@cloudera.com>
2015-07-16 23:30:45 +00:00
David Alves
af466e4ae8 Fix 'only' constraint handling for schema_constraints.csv
The goal of the 'only' is that only the explicitely mentioned tables are
to be created in the specified format. However there is currently a bug
in the parsing of the file that makes it so that only the _last_ table
is actually created. That is if there is a sequence of statements for a
certain format with the 'only' constraint only the last one is used.

This patch fixes this by making sure we actually create multiple tables
if we find them.

Change-Id: I28e91aeefc03dcca7de6ad4aa50456dcb90ed95c
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/6973
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: jenkins
2015-06-18 17:08:11 -07:00
Martin Grund
6b945cf257 Starting Kudu as part of the run-all.sh command / data loading
This starts a kudu mini-cluster with a master and three tablet servers
on a single host. This requires to have a checkout of the kudu-bin
project accessible. By default the location of the checkout is expected
to be $IMPALA_HOME/../kudu-bin.

In addition, this patch enables loading data to kudu via the
load-data.py command. Currently only the "liketbl" is created for Kudu,
but not laoded with data. This has to be done manually from the kudu-bin
repo for now.

Change-Id: Ia7981b023f119759e5e13e78322a6c89f82bd085
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/6499
Tested-by: jenkins
Reviewed-by: David Alves <david.alves@cloudera.com>
2015-06-01 15:53:34 -07:00
Alex Behm
f696861c5c Throw error on unrecognized test sections.
Our .test file parser used to not abort tests when there
is a malformed test/section. This patch changes that behavior
to report an error and treat the test as failed.

Quite a few tests were not well-formed, and were not executed
as a result. This patch fixes those tests.

Arguably, the test file parser should be more flexible in which places
to accept comments, but this patch does not address that problem.

Change-Id: If53358eb0cb958b68e51940b071e64c1d6c3ec6f
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5468
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-12-02 18:08:09 -08:00
Lenni Kuff
293ead3b2a [CDH5] Authorize SHOW ROLES statements and support SHOW CURRENT ROLES
This patch adds the necessary changes required to authorize SHOW ROLES statements.
This is not as easy as it could be because the Sentry Service doesn't currently
expose the metadata for who is/isn't authorized to execute these statements. To authorize
the statements, we need to first make an RPC to the Sentry Service (via the
Catalog Server) and then only proceed with the SHOW statement if the check succeeds.
We should consider revisiting this approach in the future when more metadata is available
from Sentry.

Additionally, this patch adds support for SHOW CURRENT ROLES which shows all roles
that are currently granted to the current user.

Change-Id: Ia01c20d58ab081f49a85566075836d8c6e25dbd4
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4367
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-09-19 05:41:33 -07:00
Skye Wanderman-Milne
1cc628d32d IMPALA-950: Skip computing stats for decimal columns.
This patch also adds a mechanism to return analysis warnings to
client, which is used to log skipped decimal columns.

Change-Id: I30c246044a68ec8861cd5bed072bd54e65a079e6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2822
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit fc77422acef7e6f93fdeb5448309414b905f0725)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2984
2014-06-11 19:16:34 -07:00
Victor Bittorf
09aff77a6c IMPALA-943: removed database udf_test from front-end tests
Added CATCH section to test files.

Change-Id: I28ba3a6e5ae4c53df5b86505573793d7b150863b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2782
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 5b616715958f3ebfdc45b8dc0e4baa82bd55f1d2)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2912
2014-06-09 19:06:15 -07:00
Lenni Kuff
cc1c0c61fd IMP-1291: Support "extended" ASCII characters as delimiters in text files
This fixes how we validate delimiters to be in line with Hive. A delimiter must
fit in a single byte and can be specified in the following formats, as far as I can
tell (there isn't documentation):
- A single ASCII or unicode character (ex. '|')
- An escape character in octal format (ex. \001. Stored in the metastore as a
  unicode character: \u0001).
- A signed decimal integer in the range [-128:127]. Used to support delimiters
  for ASCII character values between 128-255 (-2 maps to ASCII 254).

Previously, we were not handling the "signed integer" case so there was no way
to specify a delimiter in the "extended" ASCII range of 128-255.

To support result validation, the test infrastructure had to be updated to support
reading/writing different character encodings.

Change-Id: Ie3c4d444dc9c6e60192093ed0c0f6f151eab16bc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1848
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1888
2014-03-13 13:00:15 -07:00
Alex Behm
1497002013 Added SHOW TABLE/COLUMN STATS command.
Fixed the following stats-related bugs:
- Per-partition row count was not distributed properly via CatalogService
- HBase column stats were not loaded and distributed properly

Enhancements to test framework:
- Allow regex specification of expected row or column values
- Fixed expected results of some tests because the test framework
  did not catch that they were incorrect

Change-Id: I1fa8e710bbcf0ddb62b961fdd26ecd9ce7b75d51
Reviewed-on: http://gerrit.ent.cloudera.com:8080/813
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:51 -08:00
ishaan
565d15579c Add the ability to use a workload as the unit of execution in the Impala benchmark runner.
At the moment, a query is the default unit of execution and parallelism in the Impala
performance suite. With this change, we now have the ability to treat a workload as the
unit of execution. A workload is defined as a unique combination of the dataset, scale
factor, a subset (or all) of the queries in the dataset, and a table format (file format,
compression codec and compression scheme).

It introduces two new command line options in bin/run-workload.py:
  * --execution_scope
    The default scope is 'query', and it maintains previous semantics. The
    new scope is 'workload', which toggles the unit of execution to a workload.
  * --shuffle_query_exec_order.
    Shuffles the order in which queries are executed (only applicable when the
    execution_scope if workload), defaults to False.

Change-Id: I790d75f0896210cda8eb999015b0be04246e4c45
Reviewed-on: http://gerrit.ent.cloudera.com:8080/503
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:53:07 -08:00
Lenni Kuff
d66d3bfce3 IMPALA-161: Add Impala support for CREATE TABLE AS SELECT
This adds support for CREATE TABLE AS SELECT to Impala. It supports all functionality a
regular CREATE TABLE statement includes, except it does not allow for for specifying
partition columns. Hive also has this limitation and it wouldn't be too hard to support
in the future.

Change-Id: I4ca3c3b8f1576441b8bb5ed9dc521d7dfa96ab74
Reviewed-on: http://gerrit.ent.cloudera.com:8080/157
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:17 -08:00
Alex Behm
8ad15fabcf IMPALA-372: Added CREATE/DROP/ALTER VIEW. 2014-01-08 10:51:35 -08:00
Henry Robinson
397b82f197 Respect qualification of table names in RESET / RELOAD test sections 2014-01-08 10:50:55 -08:00
Lenni Kuff
831ee529be Fixed data loading bugs, moved most tables out of load-dependent-tables 2014-01-08 10:48:56 -08:00
ishaan
5ed84d7f65 IMP-739 Results for show queries should check for subset, not equality. 2014-01-08 10:48:46 -08:00
Lenni Kuff
328ceed4e7 Add support for generating lzo compressed text files and running tests against lzo 2014-01-08 10:48:38 -08:00
Lenni Kuff
51908060b3 Ignore header "comments" in schema template files 2014-01-08 10:48:24 -08:00
Lenni Kuff
1d394cf77c IMP-775: Fix updating test results to preserve comments, add test file parser unittests 2014-01-08 10:48:23 -08:00
ishaan
5138a720bb IMP-768: Enable the python test framework to check for insert results. 2014-01-08 10:48:22 -08:00
ishaan
09d6d931f4 Change the way data is loaded 2014-01-08 10:48:09 -08:00
Lenni Kuff
b7c348edfa Fix build break due to using Python 2.7 API 2014-01-08 10:46:54 -08:00
Lenni Kuff
837f35eab3 Updated results for more query tests to reflect proper ordering + improved result updating 2014-01-08 10:46:53 -08:00
Lenni Kuff
1b248d067b Add TPC-DS dataset and workload 2014-01-08 10:46:52 -08:00
Lenni Kuff
ef48f65e76 Add test framework for running Impala query tests via Python
This is the first set of changes required to start getting our functional test
infrastructure moved from JUnit to Python. After investigating a number of
option, I decided to go with a python test executor named py.test
(http://pytest.org/). It is very flexible, open source (MIT licensed), and will
enable us to do some cool things like parallel test execution.

As part of this change, we now use our "test vectors" for query test execution.
This will be very nice because it means if load the "core" dataset you know you
will be able to run the "core" query tests (specified by --exploration_strategy
when running the tests).

You will see that now each combination of table format + query exec options is
treated like an individual test case. this will make it much easier to debug
exactly where something failed.

These new tests can be run using the script at tests/run-tests.sh
2014-01-08 10:46:50 -08:00
Lenni Kuff
1e25c98fb4 Test data loading framework improvements
This change includes a number of improvements for the test data loading framework:
* Named sections for schema template definitions
* Removal of uneeded sections from schema template definitions (ex. ANALYZE TABLE)
* More granular data loading via table name filters
* Improved robustness in detecting failed data loads
* Table level constraints for specific file formats
* Re-written compute stats script
2014-01-08 10:46:49 -08:00