Commit Graph

414 Commits

Author SHA1 Message Date
Taras Bobrovytsky
e94de02469 Added execution summary, modified benchmark to handle JSON
- Added execution summary to the beeswax client and QueryResult
- Modified report-benchmark-results to handle JSON and perform
  execution summary comparison between runs
- Added comments to the new workload runner

Change-Id: I9c3c5f2fdc5d8d1e70022c4077334bc44e3a2d1d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3598
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
(cherry picked from commit fd0b1406be2511c202e02fa63af94fbbe5e18eee)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3618
2014-07-25 21:06:00 -07:00
ishaan
3bed0be1df Refactor the performance framework and change its execution strategy.
This patch introduces new abstractions and changes the way queries are run via the
workload runner. A new class 'Workload' is introduced, which represents the notion of a
workload in the performance framework (i.e, A set of query names mapped to query
strings).

The new workflow is:
 - run-workload acts as a driver. It accepts user parmaters for which queries to
   run and their execution strategy. It generates workload objects and passes them to the
   workload-runner.
 - The workload runner takes a workload, its execution parameters and generates a set of
   test vectors over which the workload is run iteratively.
 - A workload is executed by initialiazing a QueryExecutor for each query being run in a
   test vector. The workload executor is then responsible for execution and gathering
   results.
 - The execution details of every query being executed are are stored and returned to the
   driver (run-workload).

Change-Id: Ia16360140d65e6733e534e823bc5d5614622ab5f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3616
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
2014-07-25 18:17:11 -07:00
Dan Hecht
1fee56cb26 IMPALA-1080: Implement "SET <query_option>" as SQL statement.
Also add support for "SET", which returns a table of query options and
their respective values.

The front-end parses the option into a (key, value) pair and then the
existing backend logic is used to set the option, or return the result
sets.

Change-Id: I40dbd98537e2a73bdd5b27d8b2575a2fe6f8295b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3582
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
Tested-by: jenkins
(cherry picked from commit aa0f6a2fc1d3fe21f22cc7bc56887e1fdb02250b)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3614
2014-07-25 10:25:09 -07:00
Nong Li
cfa58a4567 Run test_rows_availability serially.
Change-Id: Id87a209a614f889209456f8c0d9aedd8ad0e513f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3565
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3584
2014-07-22 14:35:46 -07:00
Nong Li
7dc57aaa9e Change buffered block mgr to support multiple clients.
This patch does a few things:
1. Moves the buffer block mgr from the sorter to the runtime state. This is now
   one that is shared across the query fragment. The partitioned hash join and agg
   will use this as well.
2. Adds a Client interface to the block mgr. Each exec node is a different client
   and can reserve a minimum number of buffers. This avoid starvation.
3. Updated the BufferedBlockMgr interface's for getting pinned blocks to collapse
   two existing APIs.

Change-Id: Ibb31fbe480f3726048457f26e24a9e33f7201d86
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3504
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3574
2014-07-22 12:45:37 -07:00
Nong Li
a25400c94e Increase timeout in test_rows_availability to make sure query state is what we expect.
Change-Id: Id4feebcc7b7cecb07555009219e6420e48a0c82b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3534
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3579
2014-07-22 12:12:13 -07:00
Nong Li
202d656ddc Stop setting query state to EXCEPTION for non-exception cases.
We were setting the state to exception on Cancel() all the time.
We use the cancellation path as the normal cleanup path so this
gets called even when the query went fine (e.g. UnregisterQuery
calls Cancel()). We had already plumbed through a 'cause' argument
to differentiate.

Change-Id: Icf1091c165dec36d3dad7ce308367bbbc9edee4f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3524
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3575
2014-07-22 04:08:28 -07:00
ishaan
c6f49bb8e3 Fix the query generator to work with python 2.6.x
Change-Id: Ib7ca870f946d365cb7e026cf753c8f25795dcb06
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3138
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-07-21 20:05:50 -07:00
Abdullah Yousufi
6c1e272ef7 IMPALA-1059: Make backticking -d option argument idempotent
There was an issue with the previous fix to IMPALA-1059
if the user tried to reconnect within the shell after
having passed in a database via the -d option. The
passed database would be doubly backticked. This makes
the backticking of the argument idempotent.

Change-Id: I6eaed997c2be73d8659a2a12046ce393b97ec82c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3467
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3502
2014-07-15 18:10:40 -07:00
Nong Li
188a0ea833 Rework structure of hash table.
This patch does two things in preparation for external joins. The
hash table used to contain a directory structure (buckets and nodes)
both of which were contiguous. The nodes contained the tuple ptrs
within it.

This patch changes it so the nodes are not stored contiguously but
allocated in pages. (this structure is dense and does not require
random lookups by index). The bucket structure is still contiguous
since we rely on the doubling property and random lookup by index.

The second change is that the node's no longer store the tuple ptrs
within them. This makes it easier to build the hash table ontop of
existing data.

Here's a quick benchmark doing a self join on tpch lineitem. Both
build and probe times decreased a bit.

Before:
 HASH_JOIN_NODE (id=2):(Total: 1s139ms, non-child: 985.939ms, % non-child: 86.50%)
         - BuildBuckets: 2.10M (2097152)
         - BuildRows: 6.00M (6001215)
         - BuildTime: 527.991ms
         - LeftChildRows: 6.00M (6001215)
         - LeftChildTime: 451.964ms
         - LoadFactor: 0.50
         - RowsReturned: 30.01M (30012985)
         - RowsReturnedRate: 26.33 M/sec
After:
HASH_JOIN_NODE (id=2):(Total: 1s019ms, non-child: 835.350ms, % non-child: 81.97%)
         - BuildBuckets: 2.10M (2097152)
         - BuildRows: 6.00M (6001215)
         - BuildTime: 423.175ms
         - LeftChildRows: 6.00M (6001215)
         - LeftChildTime: 406.67ms
         - LoadFactor: 0.50
         - RowsReturned: 30.01M (30012985)
         - RowsReturnedRate: 29.45 M/sec

Change-Id: I79e209a24c24fb4f2f99574bcf187746fddadc06
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3245
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-07-15 16:57:09 -07:00
Abdullah Yousufi
864ed53511 IMPALA-1059: Backtick argument passed to USE by shell -d option
If not backticked, arguments such as parquet are interpreted as
keywords, when it is possible a database by that name exists.

This could have been avoided via single quotes around backticks: -d '`parquet`'
Otherwise, -d `parquet` throws a commandline error.

In interactive mode, backticks alone (ex. use `parquet`) will pass the
name as an identifier rather than a keyword.

Change-Id: I24b43eeeb6b4bfda5388165856788a20b64bc2ba
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3307
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3500
2014-07-15 15:43:49 -07:00
Taras Bobrovytsky
568e851774 Added option to specify the scale factor for pytest
This allows execution of tests on a cluster with multiple scale factors.

For example:
py.test <test file> --impalad <cluster ip>:21000 --scale_factor 300gb

Change-Id: I5230a6ef354def44b984eab2ac8a01989b9a471c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3051
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3215
2014-07-15 14:44:37 -07:00
Taras Bobrovytsky
8d6f8ff01c run-workload should exit with a non-zero error code if a query fails and abort_on_error is true
The exception raised by a child thread did not reach the main thread, so the
script exited with 0 instead of 1.

Change-Id: I09be9dc824386bf25a64af0323cbf78f6d006b91
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3081
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3214
2014-07-15 14:43:10 -07:00
Abdullah Yousufi
f4d1afe0ce IMPALA-921: Change EXPLAIN_LEVEL value from 0 to 1 in impala-shell for SET command
Change-Id: I2bfcefb5c8143d4cb4d74157c5309cd9445bac02
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3383
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3499
2014-07-15 12:32:43 -07:00
Henry Robinson
9d0173c647 [CDH5] Disable ACL tests
The tests pass every time locally (in a 60 minute run), but fail
intermittently on our build machines.

Change-Id: I62d5ea0df8c42728a538b29bd16006be3179bfd3
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3489
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-07-14 15:38:11 -07:00
Henry Robinson
ff32821c6b [CDH5] Test to confirm that ACLs are inherited correctly on INSERT
Change-Id: I781a6b7203c2e12b484162954abae51a6443bead
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3076
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-07-09 19:04:55 -07:00
ishaan
f262fcea64 Support utf-8 input and out in the shell
Also add --strict_unicode option which controls whether invalid unicode
code points should be ignored on input.

Change-Id: Ice59d6dd3df4557ab3b1fc91d7ddc0e1bf03f1c7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3218
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-07-02 23:18:27 -07:00
Matthew Jacobs
65c1a6f21e Remove SOURCE keyword by parsing as an identifier and checking the value
Reverts "IMPALA-1033: Remove SOURCE keyword; very common identifier"

Change-Id: I3fcf6d02786e00287b564cff0a823d0c19504e7a
2014-06-30 16:47:47 -07:00
Alex Behm
7777fbff53 Clean up expr substitution and cloning.
Before: The pre- and postconditions of expr substitution and cloning,
in particular, their effect on the isAnalyzed_ flag were unclear and
sometimes inconsistent e.g., some literal exprs set isAnalyzed_ to
true in their c'tor. As a result, several places required ad-hoc
solutions like Expr.unsetIsAnalyzed() and Expr.reanalyze().

This patch cleans up expr substitution and cloning, summarized as follows:

Expr analysis:
All exprs start our with isAnalyzed_ = false. The flag it set to true
iff analyze() has been called on the expr.

Expr.clone():
Creates a deep copy of an expr including all its analysis state.

Expr.equals():
Comparison of expr trees ignores implicit casts. This simplifies expr
substitution because un/analyzed exprs can be easily compared/substituted.

ExprSubstitutionMap:
When adding a mapping, the rhs expr must be analyzed to allow
substitution across query blocks. There is no requirement on the lhs expr.

Expr substitution:
Substitution returns an analyzed clone of the original expr with exprs
substituted. While performing the substitution, implicit casts and analysis
state are removed such that the returned result has minimal implicit casts
and types.
There are two versions of substitute functions: One that throws exceptions
one that does not, because the caller may have different expectations on
whether a substitution must succeed or not.

Numeric literals:
This patch combines IntLiteral and DecimalLiteral into a NumericLiteral.
Its main benefit is that analyze() always produces the same type, even if
the literal was implicitly cast and/or isAnalyzed was unset because of
expr substitution. This was not the case before because an implicit cast
could permanently turn an IntLiteral into a DecimalLiteral.

There is no more need for unsetIsAnalyzed() or reanalyze().

Change-Id: I646110e3714cff8ae8d5a378c25a107dd43334b6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3228
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3318
2014-06-30 10:18:26 -07:00
Lenni Kuff
ad933ec765 Switch terminology of 'impersonated user' to 'delegated user'
This is to help ensure naming is consistent across the platform and
also avoid confusion with HS2 "impersonation" which is something very
different.

Change-Id: I48c1b76dff75b92b11ddc7aab0eb9a3a5d20e489
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3315
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 931f6a66c0d8dff25b746d127dc1f36e96b12f98)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3326
2014-06-28 20:46:06 -07:00
Dimitris Tsirogiannis
5a6f53db16 Add partition pruning tests
The following changes are included in this commit:
1. Modified the alltypesagg table to include an additional partition key
that has nulls.
2. Added a number of tests in hdfs.test that exercise the partition
pruning logic (see IMPALA-887).
3. Modified all the tests that are affected by the change in alltypesagg.

Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236
2014-06-24 02:14:27 -07:00
Nong Li
a7beb12540 [CDH5] Fix column stats for decimal.
Change-Id: I72b31f6431bf6259e759fd290200fd1a755f82c6
2014-06-20 23:03:06 -07:00
Alex Behm
881f3a8c33 Re-order union operands descending by their estimated per-host memory.
Re-order union operands descending by their estimated per-host memory,
s.t. parent nodes can gauge the peak memory consumption of a MergeNode after
opening it during execution (a MergeNode opens its first operand in Open()).
Scan nodes are always ordered last because they can dynamically scale down their
memory usage, whereas many other nodes cannot (e.g., joins, aggregations).
One goal is to decrease the likelihood of a SortNode parent claiming too much
memory in its Open(), possibly causing the mem limit to be hit when subsequent
union operands are executed.

Change-Id: Ia51caaffd55305ea3dbd2146cd55acc7da67f382
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3146
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3213
Tested-by: jenkins
2014-06-20 18:46:10 -07:00
Taras Bobrovytsky
7faaa65996 Added order by query tests
- Added static order by tests to test_queries.py and QueryTest/sort.test
- test_order_by.py also contains tests with static queries that are run with
  multiple memory limits.
- Added stress, scratch disk and failpoints tests
- Incorporated Srinath's change that copied all order by with limit tests into
  the top-n.test file

Extra time required:

Serial:
scratch disk: 42 seconds
test queries sort : 77 seconds
test sort: 56 seconds
sort stress: 142 seconds
TOTAL: 5 min 17 seconds

Parallel(8 threads):
scratch disk: 40 seconds
test queries sort: 42 seconds
test sort: 49 seconds
sort stress: 93 seconds
TOTAL: 3 min 44 sec

Change-Id: Ic5716bcfabb5bb3053c6b9cebc9bfbbb9dc64a7c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2820
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3205
2014-06-20 13:35:10 -07:00
Dimitris Tsirogiannis
7dbd3a5860 IMPALA-1040: Reading a decimal partitioned column with invalid values
This commit fixes IMPALA-1040 in which when an invalid value is inserted
to a decimal partitioned column through hive it results in a non
informative error message and in some cases in the associated table to
disappear from Impala's catalog. The fix results in a more informative
error message to always be thrown by Impala to indicate the insertion of
an invalid partition key value.

Change-Id: I2855ea69944e269fb7e02b3825f44e64352151e7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3062
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3200
2014-06-20 12:46:52 -07:00
Ippokratis Pandis
6026f1ebe1 IMPALA-1055: Compute stats query statements don't quote DB and table names
The compute stats statement was not quoting the DB and table names. If those names
were aliasing with keywords, then the compute stats would not execute due to a syntax
error.

Change-Id: Ie08421246bb54a63a44eaf19d0d835da780b7033
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3170
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3198
2014-06-20 09:32:52 -07:00
Nong Li
52f2b2cb52 Fix overflow in decimal divide. Added warning if overflow happened.
Change-Id: I2e9167dbec83b3d1c2cf0e52fae4e09d6b5a38ce
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3141
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3191
2014-06-20 02:24:41 -07:00
ishaan
d6042f7780 Disable metric verification for mem-pool.total-bytes.
This is to unblock the builds until IMPALA-1057 is resolved.

Change-Id: I3d2c861737526c33cf48b444c81c429b9abbe829
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3185
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-06-19 18:18:01 -07:00
Alex Behm
aacd8bcf72 Change UnionNode to open its first child in UnionNode::Open().
This patch ensures that rows are available for clients to fetch
after we advance the query to FINISHED if the coordinator
fragment is rooted at a UnionNode.

Change-Id: I9b4ad3f70b46c7e7720bdd5ca9ad85479c2cb7fd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3139
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3168
2014-06-19 16:44:43 -07:00
ishaan
dc3dc3dc1e Enable tpch queries to run on text to unblock the full data load build.
Some planner tests depend on data being populated in the tpch tmp tables (in text format)
. This change re-enables the tpch query tests to run on text so that they pass.

Change-Id: I4ed09f55e05cb01978cb6f0808c6395552c0f129
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3176
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-06-19 16:19:13 -07:00
Alex Behm
ef6705d7e0 Rename MergeNode to UnionNode.
Change-Id: I9e3675a103757db1345b04bd1d102d2719efddd0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3128
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3154
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-06-19 12:44:21 -07:00
Skye Wanderman-Milne
c3c9365c17 Change shell to print WARNINGS instead of ERRORS
Change-Id: I8b41a2f4307e31eda970ca891adb4f12fea926bb
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3088
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
(cherry picked from commit 0a655f759d5096def89d2c72be5aa9a0cb2c10b1)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3149
2014-06-19 10:42:58 -07:00
Lenni Kuff
0ac0527643 Reduce test execution time by limiting long running tests to exhaustive exec strategy
I looked at the latest run from master and took the tests suites that had long
execution times. This cleans those test suites up to either completely disable them
on 'core' or add constraints to limit the number of test vectors. It shouldn't impact
nightly coverage since we still run the same tests exhaustively.

Change-Id: I10c78c35155b00de0c36d9fc0923b2b1fc6b44de
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3119
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3125
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-06-18 16:18:17 -07:00
anusha
6b3689e8c7 IMPALA-973: Fix for invalidate metadata behaviour
Change-Id: Ie0c4c458b0919978b03ebaba28bf37950dd34643
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3009
Tested-by: jenkins
Reviewed-by: Anusha Dasarakothapalli <anusha.dasarakothapalli@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3091
2014-06-17 12:18:50 -07:00
Dimitris Tsirogiannis
67eb5eb3a8 IMPALA-1028: Cardinality estimate is wrong for partitioned tables if we
filter out all partitions

This commit fixes IMPALA-1028 in which the cardinality estimate is not
correct when all the partitions of a partitioned table are filtered out.
To fix this issue we make sure that the estimated result cardinality of
the scan node is zero when all the partitions are filtered out.

Change-Id: I225949eb2e8f905a5d0f678d7f199fb95ba4aab0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3063
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3083
2014-06-16 20:36:13 -07:00
Matthew Jacobs
b3c98cf3c8 Fix occasional admission control test failures
The admission control tests could occasionally fail when cancelled
queries return OK (IMPALA-1047). Until fixed, we can just treat
such queries as if there were cancelled.

Change-Id: Id9fc8e9f585e466059d4ffefb4d9ed407206ad1d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3019
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 2901a8a960076f2aec74cb5a1f5000953359a68f)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3025
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
2014-06-16 15:50:33 -07:00
Matthew Jacobs
dbe1b534ed IMPALA-1050: NPE error when pool placement policy cannot map user to pool
Change-Id: I53ed823ee55bee96269f4119af7da2dab25d4a7c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3028
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 569bd5d4a8e30a907a33551c58a3ab80849b8dc9)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3061
2014-06-15 13:38:20 -07:00
Srinath Shankar
0df773eed6 Check RuntimeState for cancellation in sorter.
Currently, cancellation checking when a SortNode is executing only
happens when a batch is being added to the sorter (SortNode::SortInput()) or
when a batch is being retrieved from the sorter (SortNode::GetNext())

This fix passes in a RuntimeState into the Sorter instance itself, which
checks for cancellation at the following points:
i) During an in-memory sort (In Partition() and SortHelper()). In Partition(),
 the cancellation check may be delayed if the input is completely sorted.
ii) During an intermediate merge before each batch of rows from a merge is
 copied into a run.

Change-Id: I5c28c7244ee2e40627cf14542b99f872e3a8c343
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3007
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3059
2014-06-14 17:48:40 -07:00
Skye Wanderman-Milne
bbb908db1e Add HS2 GetLog() test
Change-Id: I24cc4a1873942cb4d67dcf75ed57ce7becec6f11
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3016
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 33f332f44c31fea747fadc56c7816c1da3b25b6c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3040
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-06-13 18:39:07 -07:00
Henry Robinson
d162571211 Fix 'summary' when exch map is not set
Change-Id: I66d9987f45f6cee045a300f86de357a2761929d7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3000
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 6f82cb296d0b3f0546d4e8a26485b79f20ff8996)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3020
Tested-by: Henry Robinson <henry@cloudera.com>
2014-06-12 22:18:04 -07:00
anusha
ffc334a735 IMPALA-834: Fix for Create Table like Views
Change-Id: Ied1f706c48a1106e1d6fc2aa73e57746f52ea333
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2939
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3014
Reviewed-by: Anusha Dasarakothapalli <anusha.dasarakothapalli@cloudera.com>
2014-06-12 22:13:30 -07:00
Henry Robinson
9a7c6d286f Add 'summary' to shell
Users can now type 'summary' in the Impala shell after a query executes
to get a breakdown of the work done by each part of the query plan.

Change-Id: Ia6a43429ffc7778f3c2c8fcbf45d83828263c2ab
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2963
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
(cherry picked from commit 9b98d42acb14d43a64832767528ee572eac4979b)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2995
2014-06-12 02:59:58 -07:00
Skye Wanderman-Milne
1cc628d32d IMPALA-950: Skip computing stats for decimal columns.
This patch also adds a mechanism to return analysis warnings to
client, which is used to log skipped decimal columns.

Change-Id: I30c246044a68ec8861cd5bed072bd54e65a079e6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2822
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit fc77422acef7e6f93fdeb5448309414b905f0725)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2984
2014-06-11 19:16:34 -07:00
ayousufi
66e90d75ee IMPALA-286: Display set query options in default section in impala-shell
Options displayed with 'set' command. Default values distinguished
from set values by square brackets.

Change-Id: Iacf0574555aab78aa0ba2008ceb8776d372a57a5
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2913
Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com>
Tested-by: jenkins
2014-06-11 11:51:19 -07:00
Skye Wanderman-Milne
6ac9a8104b IMPALA-1009: UDF/UDA leaks should not fail queries
With this change, leaky UDFs built with the SDK will still fail when
using the test harness, but leaky UDFs running in Impala will only
trigger a warning. This change also updates the test infrastructure to
always check for non-fatal errors/warnings.

Change-Id: I5615349b9d691e4eddea3e03e152ef12e73835e7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2844
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 60ce5190d96add6104aba642d2354d87a26000fa)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2938
2014-06-10 21:46:47 -07:00
Nong Li
5e49150a22 Speed up views compat test.
- Use a smaller table so hive runs faster
- Don't invalidate the catalog, just the view created in hive
- This lets us run it in parallel

Change-Id: I8085d8967dc96cbbb20e2d719072b29fe591cd98
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2958
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-10 20:53:23 -07:00
Nong Li
ad534429df [CDH5] Disable flaky hdfs caching test.
Change-Id: I19900ae029876d8f74169eda0f08f5be3509fbaf
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2946
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-10 18:24:42 -07:00
Ippokratis Pandis
fe0646f76b IMPALA-1022: Handle cases where in Parquet the expected number of rows in metadata is wrong
There are cases of Parquet files where the metadata indicate wrong number of rows for
these files. The parquet-scanner until now was not reporting any problem in this case.
Instead it was reading as long as there where values for the read columns.
But with IMPALA-1016 we are now reading at most as many rows as the rows per metadata.
With this patch, the parquet-scanner, right before it finishes scannings, checks whether
it read the expected number of rows (taken from metadata). In cases where the actual
number of rows read is less than or greater than the expected number, it either aborts
or logs an error.

Change-Id: Ie6a66a38e8912730bf04762e6526ec1cadb2bcdc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2755
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2944
2014-06-10 17:27:54 -07:00
Lenni Kuff
892eccc8d0 CDH-19184: Impala should show impersonated user (if there is one) rather than connected user
Currently, we always display the 'User' as the connected user in the debug webpage and
runtime profiles. This is confusing when impersonation + authorization is enabled because
there is not an easy way to find the impersonated user other than looking at the audit
log records. This change does the following:
* Updates the "User" field in the runtime profile to show the "effective user".
  The effective user is the connected user if there is no impersonated user,
  otherwise it is the impersonated user. This should help CM display the correct user
  as well.
* Add two new fields in the runtime profile "Connected User" & "Impersonated User"
  to make it easier to tell which user is which.
* Update the /queries debug webpage to show the effective user rather than the
  connected user.

Change-Id: I639de6738242d2c378e785271a72257301a53ade
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2863
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit d4ad768780dfdfe0874f2b3e9c59074f1c3685d7)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2935
2014-06-10 11:08:25 -07:00
Lenni Kuff
b3ebfddadd Allow tests to access query result column values by col alias or col position
For example, you can now do something like:
result_set = execute("select * from tbl")
result_row = result_set[0]
result_row['col_alias'] or result_row[4]

to access column values. If the column alias/position does not exist an exception is
thrown.

Change-Id: Ie4b65619ed17fd90bf39e0966a7fc7e1180dbc5c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2719
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2922
2014-06-09 23:24:26 -07:00