Commit Graph

2565 Commits

Author SHA1 Message Date
Ippokratis Pandis
e1ae5fe95a IMPALA-1068: COMPUTE STATS should place -1 in #NULLs
With IMPALA-1033 we disabled the counting of the number of NULLs in each column,
and that gave a 2x speed-up in the computation. But erroneously the value 0 was
being placed in the number of NULLs, instead of the correct -1 that indicates
'unknown'.

Change-Id: Ib882eb2a87e7e2469f606081cb2881461b441a45
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3377
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3378
2014-07-07 15:13:25 -07:00
Skye Wanderman-Milne
b572fe0af5 Remove unnecessary decimal casts for some builtins.
For arithmetic ops, this is an optimization. The Add(decimal,decimal) already handles
the cast as part of the operation.

For binary predicates, the cast is bad and can lead to overflows. The decimal Compare()
function has custom logic to not overflow.

Change-Id: I9f5ad74ea89e9dfa5a3a40c1e07f7e9178bf1d52
(cherry picked from commit 6bffaa885542443ca559888d921853ecd194cbcb)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3414
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-07-03 21:32:51 -07:00
Skye Wanderman-Milne
dbae673715 Open and close exprs on partition key exprs in HdfsPartitionDescriptor
Change-Id: I954cd54113b4fb0d65423850a3a4145791b36107
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3136
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit bf7af4dc7d5013b5d0f0f0797aba3c37f17c1fb6)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3395
2014-07-03 12:04:25 -07:00
ishaan
f262fcea64 Support utf-8 input and out in the shell
Also add --strict_unicode option which controls whether invalid unicode
code points should be ignored on input.

Change-Id: Ice59d6dd3df4557ab3b1fc91d7ddc0e1bf03f1c7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3218
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-07-02 23:18:27 -07:00
Lenni Kuff
1d3267ef8b Add NOTICE.txt file to Impala repo
Change-Id: Ic1a1304d7425e4bc56daebf4418045889410d6a8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3227
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
(cherry picked from commit 8f6c6659883f5baaa2a576ae3163b20d7f11a7a1)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3387
2014-07-02 15:23:24 -07:00
Nong Li
274f97efc5 IMPALA-1066: Fix bad free in Min()/Max() of strings.
Change-Id: If66844a88accdc369458ab92f033eef50775d69e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3373
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-07-01 20:45:08 -07:00
Nong Li
f05e2a92af IMPALA-1066: Build with -no-strict-alias.
Change-Id: I2d9684b0d1f352cba27dff92273d93d60d8435c2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3336
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3375
Reviewed-by: Nong Li <nong@cloudera.com>
2014-07-01 20:44:36 -07:00
Dimitris Tsirogiannis
cf782fe500 IMPALA-1065: Running explain on attached (TPC-DS) query throws
IllegalStateExcpetion

This commit resolves IMPALA-1065 where the explain statement of TPC-DS
Q48 resulted in an IllegalStateException due to an overflow in the
cardinality estimation of a cross join operator. The fix is to check if
an overflow has occurred and reset the cardinality estimation to a valid
value.

Change-Id: I0e88fde07e7a5d86819af317e98bab7ac08d5a8a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3346
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3366
2014-07-01 19:23:29 -07:00
Henry Robinson
dd4c1c32dc Add optional RM reservation limit to memtrackers
If RM and per-query memory limits were enabled at the same time, the
per-query limit would be ignored if RM wanted to expand the memory
allocation. This change adds an optional reservation limit to a
memtracker. The original limit goes back to being a hard limit -
i.e. any attempt to consume more than that amount results in
failure. The RM reservation limit is the RM-allocated memory limit. If
that is exceeded it triggers the ExpandRmReservation() method, which tries
to retrieve more memory as long as the hard limit is observed.

The net effect is that per-query memory limits have the intended,
hard-limit effect, while the RM limits coexist nicely and can expand
with more memory as required.

At the same time, we change the precedence of various ways of suggesting
an initial reservation size so that the user can change the reservation
size via a query option (MEM_RESERVATION_SIZE).

Change-Id: I41bfa4eb1336810a8a5946f6be3472111a052144
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3134
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-07-01 18:08:47 -07:00
Alex Behm
003da0ec59 IMPALA-1061: Fix resolution of implicit table aliases in views and star expressions.
This patch cleans up registration and resolution of implicit table aliases as follows:
During analysis, we register all legal aliases of table/view references and remember
implicit table aliases that are ambiguous. When resolving table or column references
we consider all legal table aliases.

A table/view may have either one explicit alias or two implicit aliases.
The implicit aliases are the fully-qualified and the unqualified table name.
Within a single query, explicit and implicit aliases can be mixed as long as there
are no clashes between explicit and fully-qualified implicit aliases, and there are
no ambiguous references to implicit unqualified aliases.

Change-Id: I5734539aa821d130882491ec628dae8128d22e2f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3258
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3359
2014-07-01 17:50:21 -07:00
Skye Wanderman-Milne
f0fb28158b FE changes to avoid shipping null-type expressions to the BE.
Once the expr refactoring goes in, the BE will not be able to evaluate
any TYPE_NULL exprs. This patch ensures that the FE casts all null
literals and slot refs before they reach the BE.

There are a bunch of places where we know the appropriate type and
just weren't using it before. This patch also introduces a few notable
hacks:

* Serializing null SlotRefs and NullLiterals as boolean NullLiterals
  in case they weren't cast earlier.
* Converting null SlotRefs to NullLiterals in uncheckedCastTo() since
  we don't need to read from the slot at all.

This works, but we should consider adding a final pass that cleans up
the plan tree and takes care of this.

Change-Id: Ic2ee181139059553d7f2d0e17e9dacaee241df17
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3294
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit a8a67ebcad12956a8260b4ea4189afb7ffab4b68)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3361
2014-07-01 15:48:08 -07:00
Skye Wanderman-Milne
a5c85898e6 Fix StringFunctions::SubString()
Without this patch, the returned StringValue's ptr would be before the
input pointer if the 'pos' argument was < -input.len

Change-Id: I7bd506f5d1119741a94817c34a017215b67cc26e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3351
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
(cherry picked from commit bad40d2beceffaacc409e34041a00d3ffbabf201)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3360
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-07-01 15:24:39 -07:00
Victor Bittorf
3c388cd1dc CDH-19918: fixed Moscow timezone conversion.
Conversion from UTC to Moscow time was incorrect, this has been fixed.

Change-Id: Ib2a1720424bffff4f09713bfb06b5046fb38c031
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3311
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 9ae067013daf5e2e3a1dca3b31758e87f95432d1)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3357
2014-07-01 13:49:53 -07:00
Victor Bittorf
140b1c8b95 Fixed UDF memory leak warning for STDDEV
Change-Id: I8df3d28e9dc0f06819f6512c175b5dec4210a329
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3312
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 7f44fa68e2d06aa0166263a89a4eaecc21baaa25)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3358
2014-07-01 13:45:20 -07:00
Nong Li
9abca8321b Fix result precision in decimal round/truncate/etc and overflow.
Change-Id: I23840734fd5b7ab7404d94f6df05410b153354de
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3338
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-07-01 08:05:39 -07:00
Nong Li
3fe082d3c9 Add CASE decimal builtin.
Change-Id: I007e7f319acd6a5bce739a08797d1d87ffc64472
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3275
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-07-01 08:05:28 -07:00
Nong Li
d0fe59fe95 Remove unnecessary include from udf dev library.
Change-Id: I8bdc9474d817bf63a0908a0c8e4e7f754b4e0b33
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3331
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-07-01 08:05:09 -07:00
Matthew Jacobs
65c1a6f21e Remove SOURCE keyword by parsing as an identifier and checking the value
Reverts "IMPALA-1033: Remove SOURCE keyword; very common identifier"

Change-Id: I3fcf6d02786e00287b564cff0a823d0c19504e7a
2014-06-30 16:47:47 -07:00
Dimitris Tsirogiannis
630d90392e CDH-20089: Query planning failed in HdfsScanNode.evalBinaryPredicate
This commit fixes issue CDH-20089 where an error is thrown when we have
a binary predicate on a partition key that has no values.

Change-Id: I3b5cefb4d7193045fc6fc5e94766589c2299b5b1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3327
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3335
2014-06-30 15:05:31 -07:00
Alex Behm
7777fbff53 Clean up expr substitution and cloning.
Before: The pre- and postconditions of expr substitution and cloning,
in particular, their effect on the isAnalyzed_ flag were unclear and
sometimes inconsistent e.g., some literal exprs set isAnalyzed_ to
true in their c'tor. As a result, several places required ad-hoc
solutions like Expr.unsetIsAnalyzed() and Expr.reanalyze().

This patch cleans up expr substitution and cloning, summarized as follows:

Expr analysis:
All exprs start our with isAnalyzed_ = false. The flag it set to true
iff analyze() has been called on the expr.

Expr.clone():
Creates a deep copy of an expr including all its analysis state.

Expr.equals():
Comparison of expr trees ignores implicit casts. This simplifies expr
substitution because un/analyzed exprs can be easily compared/substituted.

ExprSubstitutionMap:
When adding a mapping, the rhs expr must be analyzed to allow
substitution across query blocks. There is no requirement on the lhs expr.

Expr substitution:
Substitution returns an analyzed clone of the original expr with exprs
substituted. While performing the substitution, implicit casts and analysis
state are removed such that the returned result has minimal implicit casts
and types.
There are two versions of substitute functions: One that throws exceptions
one that does not, because the caller may have different expectations on
whether a substitution must succeed or not.

Numeric literals:
This patch combines IntLiteral and DecimalLiteral into a NumericLiteral.
Its main benefit is that analyze() always produces the same type, even if
the literal was implicitly cast and/or isAnalyzed was unset because of
expr substitution. This was not the case before because an implicit cast
could permanently turn an IntLiteral into a DecimalLiteral.

There is no more need for unsetIsAnalyzed() or reanalyze().

Change-Id: I646110e3714cff8ae8d5a378c25a107dd43334b6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3228
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3318
2014-06-30 10:18:26 -07:00
Alex Behm
96722da3fe Fix misplaced comment in testfile.
Change-Id: I55dc7d0e8e74a4f8c9a99e9601b2578ef6b0390d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3303
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3317
2014-06-30 10:17:26 -07:00
Lenni Kuff
ad933ec765 Switch terminology of 'impersonated user' to 'delegated user'
This is to help ensure naming is consistent across the platform and
also avoid confusion with HS2 "impersonation" which is something very
different.

Change-Id: I48c1b76dff75b92b11ddc7aab0eb9a3a5d20e489
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3315
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 931f6a66c0d8dff25b746d127dc1f36e96b12f98)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3326
2014-06-28 20:46:06 -07:00
Dimitris Tsirogiannis
2aedf5fab4 Add missing ALTER TABLE statement in alltypesaggmultifiles table.
The DDL statements for adding the partitions of alltypesaggmultifiles
did not include an ALTER TABLE stmt for one of the partitions, thereby
causing the planner tests to fail when test data were loaded from a
snapshot.

Change-Id: Id4b078cd334d816d6eb8eb15e5856189701a4bca
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3305
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3310
2014-06-27 18:00:09 -07:00
Nong Li
163750f170 Fix decimal multiply result precision off by 1.
Change-Id: I860e0d13ee9bae7d3e180103a22fe7606a320b13
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3249
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-27 11:22:05 -07:00
Nong Li
3e31f81731 Fix index out of bounds with rtrim().
Change-Id: I8c420a45aacdb0ce8f6a83fa8cdf5e91b8ef1f77
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3268
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-27 11:22:00 -07:00
Nong Li
67e80b16e3 Add int96 to multiint benchmark.
This was one idea to just cast to __int128_t as a poor man's int96.
Unfortunately, it seems too slow: ~15x for add, ~10x for multi and
3x for divide compared to __int128_t.

Change-Id: I06eb3fa3ac1edc2c174873a73a252a0165911b1c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2433
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-27 11:21:54 -07:00
Nong Li
553395928e Change logging level of thrift plan in plan fragment executor.
VLOG(3) includes each row which is much less often useful than the serialized
plan.

Change-Id: I933188f046dafb51da9d06583697792113a9165a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3289
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
2014-06-27 11:21:47 -07:00
Skye Wanderman-Milne
5305b17121 IMPALA-1053: only log unsupported type warning once
Change-Id: Ibb34e4632f87ac192bb58d4d6616b41e7dac53d2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3140
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 51a1db1ceaa0a928f364f333c4351abefd90b2f8)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3297
2014-06-26 23:58:40 -07:00
Skye Wanderman-Milne
3a6d6b71cb Fix NULL handling in ArithmeticExpr
Before: if both operands to an arithmetic expression were null
literals, we would set the operand types and return type to INT. This
isn't correct for operators that don't support ints, e.g. divide
(there's a separate integer division function), since the function
signature wouldn't match the arithmetic expr's types. I think we
didn't run into problems because the BE uses void*s everywhere, but I
hit this when I switched the arithmetic functions to the UDF
interface.

In addition, some of the builtins were registered with the wrong
return type.

After: set the operand types to a type appropriate for the operator
before we set the return type, meaning the return type gets assigned
correctly using the existing logic.

Change-Id: I39fa147c178d895bdffaf1be676ddaa3af1d42c8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3255
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 2634932790d1f4a42ce64f73ec3722a8a7be04af)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3298
2014-06-26 23:52:02 -07:00
Skye Wanderman-Milne
6d17b93814 Open and close exprs in tests
Change-Id: Ie4abc8e1e56fc77d68d9656260b8f4adcc2a36e9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3135
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit f7eafefa1051ac9f3e5649f45655b80223af5f29)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3296
2014-06-26 23:48:29 -07:00
Skye Wanderman-Milne
3a6600c964 Fix UDF test
UDF invocations in udf.test should not specify a database. This is how
we switch between testing IR UDFs in the ir_function_test database and
native UDFs in the native_function_test database.

Change-Id: I09ede18f2b91440ef7a2a76b0daf41a007af2671
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3130
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 4d6160c0b88285aea754f6353cdd02b5e4b15633)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3295
2014-06-26 22:17:56 -07:00
Paden Tomasello
d6a20c2f08 Rowbatch.cc uses LZ4 codec instead of Snappy codec
Comparison of Exchange node data for Lz4 and Snappy
running query: select (star symbol) from tpch.lineitem order by
 l_orderkey

Snappy:
XCHANGE_NODE (id equal 2):(Total: 36s021ms...)
BytesReceived: 26.75 MB (28047762)
DeserializeRowBatchTimer: 246.561ms

Lz4:
EXCHANGE_NODE (id equal 2):(Total: 34s699ms...)
BytesReceived: 11.20 MB (11741118)
DeserializeRowBatchTimer: 131.379ms

Change-Id: Iae8d212ba0fd508542f3ef9ddaf7507426e13253
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3120
Reviewed-by: Paden Tomasello <paden.tomasello@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3252
2014-06-26 12:06:39 -07:00
Dimitris Tsirogiannis
6a795915d6 Fix loading data from snapshopt for alltypesagg table.
The alltypesagg table was not loaded correctly from a snapshot file due
to a missing ALTER TABLE statement, thereby causing some tests to fail.

Change-Id: I74066a99529f24fc268bb5779d3fb64fbd4f66b9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3248
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3270
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
2014-06-25 21:52:11 -07:00
Dimitris Tsirogiannis
5a6f53db16 Add partition pruning tests
The following changes are included in this commit:
1. Modified the alltypesagg table to include an additional partition key
that has nulls.
2. Added a number of tests in hdfs.test that exercise the partition
pruning logic (see IMPALA-887).
3. Modified all the tests that are affected by the change in alltypesagg.

Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236
2014-06-24 02:14:27 -07:00
Lenni Kuff
13f487ae31 CDH-19900: Change to make Hive/Impala privilege models consistent
This makes two changes to the privilege model:
* All CREATE statements now require ALL privileges on the parent object
* The user should always be able to perform "use default".

Additionally, it enables all of the authorization tests, and fixes a bug with
new privilege format from Sentry, and corrects an issue where a role wasn't
always being updated during an 'invalidate metadata' operation.

Change-Id: I92bab4ee0455574a2785bb5483b6d05611c3dfdc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3225
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-23 19:08:37 -07:00
Alex Behm
bf85225911 IMPALA-881: Tests for joins with union inputs.
Change-Id: I4be6821ac3938345ca95c542d868c87512ff66da
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3229
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-06-23 15:38:06 -07:00
Skye Wanderman-Milne
bf8e1b81a0 Make sure QueryExecState::Wait() completes before fetching rows.
We run Wait() asynchronously for API compatibility, but many
QueryExecState functions cannot actually be run concurrently with
Wait() (e.g., Wait() opens output_exprs_, which are then evaluated in
FetchRows()).

Change-Id: I708aa23fdb238ee7aede1113790f48da2859cab9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2993
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 47f20b643e80f0f8640be9264d7ee3fc5d14dad0)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3226
2014-06-23 11:40:08 -07:00
Henry Robinson
bac4f6c9c8 Properly account for all finished-with expansions
Change-Id: I86819add942d13fcef3a9dab6977fcabe6cfdb4f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3220
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
2014-06-21 00:40:26 -07:00
Henry Robinson
2a374e5893 Prepare resource broker for cancellation changes
This patch anticipates the changes to Llama that allow a
client-specified resource ID to be returned with every reservation or
expansion request. Doing this allows us to remove the tricky
coordination logic between WaitForNotification() and AMNotification()
when we don't know which side will access the rendezvous data structures
first. Now we can guarantee that the consumer-side will be set-up before
the notification is received.

Change-Id: I908b1dae8d074a84b0465e3a444d6651f126efd7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3093
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
2014-06-21 00:08:19 -07:00
Henry Robinson
7992e872c1 [CDH5] Upgrade Llama
Change-Id: Ie91ba1bc55e02f7eb70c90ce1ed8ce1242fa553d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3161
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
2014-06-21 00:03:16 -07:00
Nong Li
a7beb12540 [CDH5] Fix column stats for decimal.
Change-Id: I72b31f6431bf6259e759fd290200fd1a755f82c6
2014-06-20 23:03:06 -07:00
Nong Li
b72ef379b6 [CDH5] Update hive thirdparty.
Change-Id: Ia13b2b2723ba0aae3e349f47d635e6d925f623eb
2014-06-20 23:03:05 -07:00
Srinath Shankar
7b81a0330c Change units and naming of some counters used in sort
Also differentiates between memory limit and memory used.
Change-Id: Ic5534345830b1c3b5109697a93868eb5d40befda
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3219
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
2014-06-20 20:05:10 -07:00
Alex Behm
881f3a8c33 Re-order union operands descending by their estimated per-host memory.
Re-order union operands descending by their estimated per-host memory,
s.t. parent nodes can gauge the peak memory consumption of a MergeNode after
opening it during execution (a MergeNode opens its first operand in Open()).
Scan nodes are always ordered last because they can dynamically scale down their
memory usage, whereas many other nodes cannot (e.g., joins, aggregations).
One goal is to decrease the likelihood of a SortNode parent claiming too much
memory in its Open(), possibly causing the mem limit to be hit when subsequent
union operands are executed.

Change-Id: Ia51caaffd55305ea3dbd2146cd55acc7da67f382
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3146
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3213
Tested-by: jenkins
2014-06-20 18:46:10 -07:00
Victor Bittorf
2d7f2e19b2 IMPALA 938: Infer schema from Parquet file
Syntax is "CREATE TABLE name LIKE fileformat '/path/to/file'".
Supports all options that CREATE TABLE does. Currently only PARQUET is supported.
Run testdata/bin/create-load-data.sh after pulling this patch.

Change-Id: Ibb9fbb89dbde6acceb850b914c48d12f22b33f55
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2720
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3158
2014-06-20 17:38:01 -07:00
Taras Bobrovytsky
7faaa65996 Added order by query tests
- Added static order by tests to test_queries.py and QueryTest/sort.test
- test_order_by.py also contains tests with static queries that are run with
  multiple memory limits.
- Added stress, scratch disk and failpoints tests
- Incorporated Srinath's change that copied all order by with limit tests into
  the top-n.test file

Extra time required:

Serial:
scratch disk: 42 seconds
test queries sort : 77 seconds
test sort: 56 seconds
sort stress: 142 seconds
TOTAL: 5 min 17 seconds

Parallel(8 threads):
scratch disk: 40 seconds
test queries sort: 42 seconds
test sort: 49 seconds
sort stress: 93 seconds
TOTAL: 3 min 44 sec

Change-Id: Ic5716bcfabb5bb3053c6b9cebc9bfbbb9dc64a7c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2820
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3205
2014-06-20 13:35:10 -07:00
ishaan
0d0614765d Only use nproc to determine functional test concurrency when it's available in the os.
Some operating systems don't ship which nproc, which causes impala-config.sh to fail. This
change alleviates the problem by checking if nproc exists, and setting a reasonable
default if it fails.

Change-Id: Ic6e4d0fbce57eedc82163cfa17f71bdccbc38b51
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3208
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-06-20 12:52:08 -07:00
Dimitris Tsirogiannis
7dbd3a5860 IMPALA-1040: Reading a decimal partitioned column with invalid values
This commit fixes IMPALA-1040 in which when an invalid value is inserted
to a decimal partitioned column through hive it results in a non
informative error message and in some cases in the associated table to
disappear from Impala's catalog. The fix results in a more informative
error message to always be thrown by Impala to indicate the insertion of
an invalid partition key value.

Change-Id: I2855ea69944e269fb7e02b3825f44e64352151e7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3062
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3200
2014-06-20 12:46:52 -07:00
Henry Robinson
df9c13dcbe Fix memtracker instantiation when using FETCH_FIRST
Change-Id: I47b614b3559880f428951b015291bee4f5af6c49
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3038
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
2014-06-20 12:29:20 -07:00
Srinath Shankar
c4219929f9 Change memory allocation in buffered block manager and sorter
The sorter and block manager currently allocate all of their memory
up-front. This patch changes that so that memory is allocated as a run
is built. Only the minimum number of blocks required are allocated
up-front.
Added a non-blocking TryExpand() call to the buffered block manager to
allocate a new buffer and assign it to a block. The only place where this
is invoked is when the sorter tries to extend a run.
While there are other ways of doing this, this seemed like a minimally
invasive change to make at this point.
In the merge phase, the sorter does not try to allocate more buffers,
but instead works with the buffers allocated up to that point. This is
something that is pretty easy to change.
Other changes include:
a) There is no longer a max_available_buffers() in the block manager.
   Replaced by a combination of available_allocated_buffers() and
   TryExpand().
b) In WriteUnpinnedBlocks(), unallocated memory is taken into account
   to determine if blocks should be written out.
c) The sorter uses a block to copy out sorted var-len data when
   unpinning the blocks in a run. This block is now allocated up-front.
Conflicts:

	tests/query_test/test_sort.py

Change-Id: Ifbb2ffd679a882afe8895f4785ec6d7c49c30b98
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3148
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3199
2014-06-20 09:57:13 -07:00