Commit Graph

53 Commits

Author SHA1 Message Date
Taras Bobrovytsky
e94de02469 Added execution summary, modified benchmark to handle JSON
- Added execution summary to the beeswax client and QueryResult
- Modified report-benchmark-results to handle JSON and perform
  execution summary comparison between runs
- Added comments to the new workload runner

Change-Id: I9c3c5f2fdc5d8d1e70022c4077334bc44e3a2d1d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3598
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
(cherry picked from commit fd0b1406be2511c202e02fa63af94fbbe5e18eee)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3618
2014-07-25 21:06:00 -07:00
ishaan
3bed0be1df Refactor the performance framework and change its execution strategy.
This patch introduces new abstractions and changes the way queries are run via the
workload runner. A new class 'Workload' is introduced, which represents the notion of a
workload in the performance framework (i.e, A set of query names mapped to query
strings).

The new workflow is:
 - run-workload acts as a driver. It accepts user parmaters for which queries to
   run and their execution strategy. It generates workload objects and passes them to the
   workload-runner.
 - The workload runner takes a workload, its execution parameters and generates a set of
   test vectors over which the workload is run iteratively.
 - A workload is executed by initialiazing a QueryExecutor for each query being run in a
   test vector. The workload executor is then responsible for execution and gathering
   results.
 - The execution details of every query being executed are are stored and returned to the
   driver (run-workload).

Change-Id: Ia16360140d65e6733e534e823bc5d5614622ab5f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3616
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
2014-07-25 18:17:11 -07:00
Henry Robinson
ff32821c6b [CDH5] Test to confirm that ACLs are inherited correctly on INSERT
Change-Id: I781a6b7203c2e12b484162954abae51a6443bead
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3076
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-07-09 19:04:55 -07:00
Skye Wanderman-Milne
1cc628d32d IMPALA-950: Skip computing stats for decimal columns.
This patch also adds a mechanism to return analysis warnings to
client, which is used to log skipped decimal columns.

Change-Id: I30c246044a68ec8861cd5bed072bd54e65a079e6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2822
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit fc77422acef7e6f93fdeb5448309414b905f0725)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2984
2014-06-11 19:16:34 -07:00
Victor Bittorf
09aff77a6c IMPALA-943: removed database udf_test from front-end tests
Added CATCH section to test files.

Change-Id: I28ba3a6e5ae4c53df5b86505573793d7b150863b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2782
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 5b616715958f3ebfdc45b8dc0e4baa82bd55f1d2)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2912
2014-06-09 19:06:15 -07:00
Henry Robinson
d264ab90fe Add support for client SSL to Python Beeswax client
Change-Id: I0d9352471067bfe19e25221e0ecbbb08f945b962
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2810
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 545bd30d5cf3cae9a3581d7bc942a909a1a98806)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2850
Tested-by: Henry Robinson <henry@cloudera.com>
2014-06-05 10:48:23 -07:00
Henry Robinson
93a3d65492 Support for LDAP tests
* Allow Beeswax connections to optionally use LDAP
* Run custom cluster tests from the aux repo, if it exists

Change-Id: I054af64e030ad0cd722ae8dd75afda9c58ea2913
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2547
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2640
2014-05-21 05:52:55 -07:00
casey
2351266d0e Replace single process mini-dfs with multiple processes
This should allow individual service components, such as a single nodemanager,
to be shutdown for failure testing. The mini-cluster bundled with hadoop is a
single process that does not expose the ability to control individual roles.
Now each role can be controlled and configured independently of the others.

Change-Id: Ic1d42e024226c6867e79916464d184fce886d783
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1432
Tested-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2297
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-04-23 18:24:05 -07:00
Henry Robinson
99c37aac37 IMPALA-827: Add an option for directories created by INSERT to inherit
their parent's permissions

This patch adds --insert_inherit_permissions. If true, all
new partition directories created by INSERT will inherit their
permissions from their parent. When false, the directories are created
with the default permissions.

Change-Id: Ib2b4c251e51ea5048387169678e8dde34ecfe5f6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1917
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-04-04 10:25:20 -07:00
Lenni Kuff
cc1c0c61fd IMP-1291: Support "extended" ASCII characters as delimiters in text files
This fixes how we validate delimiters to be in line with Hive. A delimiter must
fit in a single byte and can be specified in the following formats, as far as I can
tell (there isn't documentation):
- A single ASCII or unicode character (ex. '|')
- An escape character in octal format (ex. \001. Stored in the metastore as a
  unicode character: \u0001).
- A signed decimal integer in the range [-128:127]. Used to support delimiters
  for ASCII character values between 128-255 (-2 maps to ASCII 254).

Previously, we were not handling the "signed integer" case so there was no way
to specify a delimiter in the "extended" ASCII range of 128-255.

To support result validation, the test infrastructure had to be updated to support
reading/writing different character encodings.

Change-Id: Ie3c4d444dc9c6e60192093ed0c0f6f151eab16bc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1848
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1888
2014-03-13 13:00:15 -07:00
Alex Behm
9cabee4a71 Wait for the Metastore to come up before starting HiveServer2.
Change-Id: Ic8e29efe63f6745e1ff44248657cbd7882bb16d9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1626
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1670
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-02-25 21:05:33 -08:00
Nong Li
b310935424 Minor workload runner logging improvements.
Change-Id: I75d27593599e654f7fab1cd104dd9fe9fa88cfdb
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1145
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins

Conflicts:
	tests/common/workload_runner.py
2014-01-08 10:54:38 -08:00
ishaan
6b9f01374b Selectively mute fabric's logging while running remote commands.
This change encloses fabric's task method with its 'hide' context manager. The current
state of the running commands are muted (i.e, hosts connected to, which command is running
etc.). Error messages are NOT muted, and will still be displayed (connection error,
command failure).

Change-Id: Ibfbbb995ab6fe057faec9af8be90449654b21f8c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1155
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:37 -08:00
ishaan
c9aa31ac02 Suppress log spew while running plugins in the workload runner.
The plugin runner uses fabric as the underlying mechanism for running remote commands on
cluster hosts. fabric in turn uses paramiko, which generates a lot of log spew. This
change seta parmiko's logging level to ERROR, eliminating excess logging. Additionally,
it also constrains fabric's logging.

Change-Id: I6229d64f95f9c1512cc01842c4a661e96e421086
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1064
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:23 -08:00
Lenni Kuff
0bae3978c9 Update compute-stats.py to execute using Impala
Updates our compute stats script to execute using Impala. This allows us
to easily compute stats on all tables in a database or all tables in the
metastore.
The updated stats caused one of the TPCH plans to change so this also
updates the TPCH planner test results.

Change-Id: I17e5dcd1036a35e40eb4eb2c8e4a20702db9049c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1024
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:18 -08:00
Lenni Kuff
9d71dd3d0c IMP-1158: Dropping a database in Impala does not cleanup the db's HDFS directory
We need to pass a flag to the metastore for the cleanup to happen. Previously we
were passing 'false' when we need to pass 'true' to get the same behavior as Hive
when dropping databases. Added a test case to validate the cleanup when dropping
databases and tables.

Change-Id: I500a3d3ac52c1b2031fae842403a670cfe43fa98
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1035
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:18 -08:00
Henry Robinson
f241782966 IMPALA-620: Fix re-registration starvation bug in statestore
This patch fixes a slightly pathological state that occurs when the
statestore is under heavy load. The result of the bug is that
subscribers cannot successfully re-register because the statestore never
marks them as failed.

The exact sequence of events is as follows:

1. Subscriber registers with state-store.
2. Statestore does not send heartbeats in timely fashion to
   subscriber. Subscriber times-out.
3. Subscriber is restarted quickly. Statestore does not detect
   restart.
4. Subscriber's RegisterSubscriber() call fails, because statestore
   detects duplicate registration.
5. Subscriber restarts again. Since state-store is slow to send
   heartbeats, the state-store has not detected the restart and the
   subscriber receives a heartbeat message from the statestore and
   does not reject it.
6. Statestore continues to believe subscriber is alive, since the
   heartbeats are not being rejected.

To fix this, we add a registration ID to each successfully registered
subscriber that is known to both subscriber and statestore. If the
subscriber should restart and re-register, it receives a new
registration ID. Whenever a heartbeat arrives, it compares its
registration ID to that sent by the statestore with the heartbeat, and
rejects the heartbeat if they do not match.

We also allow re-registration of existing subscribers (getting rid of
the dreaded "Duplicate subscription" message). A new registration
overwrites an old one.

Change-Id: Ie32df3a586ccb375375ebfbcbec1aaeb930b6bfe
Reviewed-on: http://gerrit.ent.cloudera.com:8080/778
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-01-08 10:53:53 -08:00
Alex Behm
1497002013 Added SHOW TABLE/COLUMN STATS command.
Fixed the following stats-related bugs:
- Per-partition row count was not distributed properly via CatalogService
- HBase column stats were not loaded and distributed properly

Enhancements to test framework:
- Allow regex specification of expected row or column values
- Fixed expected results of some tests because the test framework
  did not catch that they were incorrect

Change-Id: I1fa8e710bbcf0ddb62b961fdd26ecd9ce7b75d51
Reviewed-on: http://gerrit.ent.cloudera.com:8080/813
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:51 -08:00
Lenni Kuff
72e211ca4a Use Hive Metastore Service instead of HiveServer 1 in test infrastructure
Change-Id: I4e2ba02b2101bae95d196ab13f9453e1b3a9d7be
Reviewed-on: http://gerrit.ent.cloudera.com:8080/689
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:26 -08:00
ishaan
565d15579c Add the ability to use a workload as the unit of execution in the Impala benchmark runner.
At the moment, a query is the default unit of execution and parallelism in the Impala
performance suite. With this change, we now have the ability to treat a workload as the
unit of execution. A workload is defined as a unique combination of the dataset, scale
factor, a subset (or all) of the queries in the dataset, and a table format (file format,
compression codec and compression scheme).

It introduces two new command line options in bin/run-workload.py:
  * --execution_scope
    The default scope is 'query', and it maintains previous semantics. The
    new scope is 'workload', which toggles the unit of execution to a workload.
  * --shuffle_query_exec_order.
    Shuffles the order in which queries are executed (only applicable when the
    execution_scope if workload), defaults to False.

Change-Id: I790d75f0896210cda8eb999015b0be04246e4c45
Reviewed-on: http://gerrit.ent.cloudera.com:8080/503
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:53:07 -08:00
Henry Robinson
073abb803d Use only Python 2.6-compatible interfaces from ElementTree
Change-Id: I746c2bf36472c3e08a77bd46ae407e29f5c0596a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/494
Tested-by: jenkins
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:51 -08:00
Henry Robinson
a46276325c IMPALA-415: Don't delete hidden files in the root directory for INSERT
OVERWRITE

INSERT OVERWRITE into an unpartitioned table is supposed to remove all
data files from the root. This should not include hidden files or
directories. This patch excludes hidden files from deletion, and adds a
test case.

Partition directories are still removed in their entirety: the cost of
statting a large number of files and directories rather than issuing a
single "rm -rf" outweighs the benefits of preserving hidden files for
now.

Hive does not preserve hidden files in either configuration.

Change-Id: Ia73e55e011c26c88f14745075210cf359764e3c1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/418
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:50 -08:00
ishaan
1780084753 Fail a performance job when a plugin fails.
Change-Id: I81fd9ec742a9efbdb16466c8ee7387fabe3e16e4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/342
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:31 -08:00
Lenni Kuff
d66d3bfce3 IMPALA-161: Add Impala support for CREATE TABLE AS SELECT
This adds support for CREATE TABLE AS SELECT to Impala. It supports all functionality a
regular CREATE TABLE statement includes, except it does not allow for for specifying
partition columns. Hive also has this limitation and it wouldn't be too hard to support
in the future.

Change-Id: I4ca3c3b8f1576441b8bb5ed9dc521d7dfa96ab74
Reviewed-on: http://gerrit.ent.cloudera.com:8080/157
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:17 -08:00
Alex Leblang
a0ad16af41 Testing framework changes for VTune plugin
This patch contains changes to the general test and plugin framework that were needed to make the VTune plugin run. These changes create a conext dictionary that is passed to the plugin.

Change-Id: I12ee2076fb0d777813c56bbb338e6d20426afaff
Reviewed-on: http://gerrit.ent.cloudera.com:8080/111
Reviewed-by: Alex Leblang <alex.leblang@cloudera.com>
Tested-by: Alex Leblang <alex.leblang@cloudera.com>
2014-01-08 10:52:12 -08:00
Alex Leblang
a4e441cc15 VTune Plugin
This is a plugin that runs Intel's VTune from within the Impala test plugin runner framework.

Change-Id: I673a2f6c4785bfcf565e8b002c14ada7a7c8f48a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/195
Reviewed-by: Alex Leblang <alex.leblang@cloudera.com>
Tested-by: Alex Leblang <alex.leblang@cloudera.com>
2014-01-08 10:52:12 -08:00
ishaan
a6cb5f70a4 Introduce the notion of scope to the plugin framework.
Change-Id: I2cf39c38e7e0a359950d9e05e2daed433fc0c38f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/144
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:05 -08:00
ishaan
6f9569ea6f Add a plugin framework for run-workload 2014-01-08 10:51:39 -08:00
Alex Behm
8ad15fabcf IMPALA-372: Added CREATE/DROP/ALTER VIEW. 2014-01-08 10:51:35 -08:00
ishaan
e7c6d57f9c IMP-773: Add better logging/error detection to start-impala-cluster.py 2014-01-08 10:51:25 -08:00
Henry Robinson
397b82f197 Respect qualification of table names in RESET / RELOAD test sections 2014-01-08 10:50:55 -08:00
Lenni Kuff
9a4feb7391 Add impala local failure test framework and tests 2014-01-08 10:50:18 -08:00
Lenni Kuff
0d45a3a54b Add --continue_on_error and --hive_cmd option to compute stats script 2014-01-08 10:49:50 -08:00
Lenni Kuff
36e9fe1c1a Run compute table stats statements using Hive CLI
This works around a problem with computing table stats via the Hive Meta Store client
API. When executing these stements via the MetaStoreClient, all tables were getting a
num_rows=0 value returned from the ANALYZE TABLE query.
2014-01-08 10:49:19 -08:00
Lenni Kuff
831ee529be Fixed data loading bugs, moved most tables out of load-dependent-tables 2014-01-08 10:48:56 -08:00
Lenni Kuff
e0a7b7cb55 Compute column stats on tables used by Planner tests 2014-01-08 10:48:48 -08:00
ishaan
5ed84d7f65 IMP-739 Results for show queries should check for subset, not equality. 2014-01-08 10:48:46 -08:00
Lenni Kuff
328ceed4e7 Add support for generating lzo compressed text files and running tests against lzo 2014-01-08 10:48:38 -08:00
Lenni Kuff
dd9798c9f3 IMP-785: calculation_util.calculate_mean does not calculate mean (instead median) 2014-01-08 10:48:35 -08:00
Henry Robinson
e15a39143a Fix definition of calculate_mean 2014-01-08 10:48:34 -08:00
Lenni Kuff
4cf7d2634e Update benchmark runner to use mean of all results if num_clients > 1 2014-01-08 10:48:30 -08:00
Lenni Kuff
51908060b3 Ignore header "comments" in schema template files 2014-01-08 10:48:24 -08:00
Lenni Kuff
6e1f8d178a Update utility script to compute column and table stats for given table(s) 2014-01-08 10:48:23 -08:00
Lenni Kuff
1d394cf77c IMP-775: Fix updating test results to preserve comments, add test file parser unittests 2014-01-08 10:48:23 -08:00
ishaan
5138a720bb IMP-768: Enable the python test framework to check for insert results. 2014-01-08 10:48:22 -08:00
Lenni Kuff
3ee82e7543 Add support for running Impala query tests against secure cluster
Adds support for running all the Impala query tests against a secure cluster. This run
mode can be selected by adding a --use_kerberos flag to run-tests.py and pointing to the
correct (secure) Hive Metastore Service.
2014-01-08 10:48:21 -08:00
ishaan
09d6d931f4 Change the way data is loaded 2014-01-08 10:48:09 -08:00
Lenni Kuff
99bb22dcac Add db name filter to compute stats, run compute stats on functional/text tables 2014-01-08 10:48:08 -08:00
Lenni Kuff
b7c348edfa Fix build break due to using Python 2.7 API 2014-01-08 10:46:54 -08:00
Lenni Kuff
837f35eab3 Updated results for more query tests to reflect proper ordering + improved result updating 2014-01-08 10:46:53 -08:00