With IMPALA-1033 we disabled the counting of the number of NULLs in each column,
and that gave a 2x speed-up in the computation. But erroneously the value 0 was
being placed in the number of NULLs, instead of the correct -1 that indicates
'unknown'.
Change-Id: Ib882eb2a87e7e2469f606081cb2881461b441a45
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3377
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3378
For arithmetic ops, this is an optimization. The Add(decimal,decimal) already handles
the cast as part of the operation.
For binary predicates, the cast is bad and can lead to overflows. The decimal Compare()
function has custom logic to not overflow.
Change-Id: I9f5ad74ea89e9dfa5a3a40c1e07f7e9178bf1d52
(cherry picked from commit 6bffaa885542443ca559888d921853ecd194cbcb)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3414
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
IllegalStateExcpetion
This commit resolves IMPALA-1065 where the explain statement of TPC-DS
Q48 resulted in an IllegalStateException due to an overflow in the
cardinality estimation of a cross join operator. The fix is to check if
an overflow has occurred and reset the cardinality estimation to a valid
value.
Change-Id: I0e88fde07e7a5d86819af317e98bab7ac08d5a8a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3346
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3366
If RM and per-query memory limits were enabled at the same time, the
per-query limit would be ignored if RM wanted to expand the memory
allocation. This change adds an optional reservation limit to a
memtracker. The original limit goes back to being a hard limit -
i.e. any attempt to consume more than that amount results in
failure. The RM reservation limit is the RM-allocated memory limit. If
that is exceeded it triggers the ExpandRmReservation() method, which tries
to retrieve more memory as long as the hard limit is observed.
The net effect is that per-query memory limits have the intended,
hard-limit effect, while the RM limits coexist nicely and can expand
with more memory as required.
At the same time, we change the precedence of various ways of suggesting
an initial reservation size so that the user can change the reservation
size via a query option (MEM_RESERVATION_SIZE).
Change-Id: I41bfa4eb1336810a8a5946f6be3472111a052144
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3134
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
This patch cleans up registration and resolution of implicit table aliases as follows:
During analysis, we register all legal aliases of table/view references and remember
implicit table aliases that are ambiguous. When resolving table or column references
we consider all legal table aliases.
A table/view may have either one explicit alias or two implicit aliases.
The implicit aliases are the fully-qualified and the unqualified table name.
Within a single query, explicit and implicit aliases can be mixed as long as there
are no clashes between explicit and fully-qualified implicit aliases, and there are
no ambiguous references to implicit unqualified aliases.
Change-Id: I5734539aa821d130882491ec628dae8128d22e2f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3258
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3359
Once the expr refactoring goes in, the BE will not be able to evaluate
any TYPE_NULL exprs. This patch ensures that the FE casts all null
literals and slot refs before they reach the BE.
There are a bunch of places where we know the appropriate type and
just weren't using it before. This patch also introduces a few notable
hacks:
* Serializing null SlotRefs and NullLiterals as boolean NullLiterals
in case they weren't cast earlier.
* Converting null SlotRefs to NullLiterals in uncheckedCastTo() since
we don't need to read from the slot at all.
This works, but we should consider adding a final pass that cleans up
the plan tree and takes care of this.
Change-Id: Ic2ee181139059553d7f2d0e17e9dacaee241df17
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3294
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit a8a67ebcad12956a8260b4ea4189afb7ffab4b68)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3361
Without this patch, the returned StringValue's ptr would be before the
input pointer if the 'pos' argument was < -input.len
Change-Id: I7bd506f5d1119741a94817c34a017215b67cc26e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3351
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
(cherry picked from commit bad40d2beceffaacc409e34041a00d3ffbabf201)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3360
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Before: The pre- and postconditions of expr substitution and cloning,
in particular, their effect on the isAnalyzed_ flag were unclear and
sometimes inconsistent e.g., some literal exprs set isAnalyzed_ to
true in their c'tor. As a result, several places required ad-hoc
solutions like Expr.unsetIsAnalyzed() and Expr.reanalyze().
This patch cleans up expr substitution and cloning, summarized as follows:
Expr analysis:
All exprs start our with isAnalyzed_ = false. The flag it set to true
iff analyze() has been called on the expr.
Expr.clone():
Creates a deep copy of an expr including all its analysis state.
Expr.equals():
Comparison of expr trees ignores implicit casts. This simplifies expr
substitution because un/analyzed exprs can be easily compared/substituted.
ExprSubstitutionMap:
When adding a mapping, the rhs expr must be analyzed to allow
substitution across query blocks. There is no requirement on the lhs expr.
Expr substitution:
Substitution returns an analyzed clone of the original expr with exprs
substituted. While performing the substitution, implicit casts and analysis
state are removed such that the returned result has minimal implicit casts
and types.
There are two versions of substitute functions: One that throws exceptions
one that does not, because the caller may have different expectations on
whether a substitution must succeed or not.
Numeric literals:
This patch combines IntLiteral and DecimalLiteral into a NumericLiteral.
Its main benefit is that analyze() always produces the same type, even if
the literal was implicitly cast and/or isAnalyzed was unset because of
expr substitution. This was not the case before because an implicit cast
could permanently turn an IntLiteral into a DecimalLiteral.
There is no more need for unsetIsAnalyzed() or reanalyze().
Change-Id: I646110e3714cff8ae8d5a378c25a107dd43334b6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3228
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3318
This is to help ensure naming is consistent across the platform and
also avoid confusion with HS2 "impersonation" which is something very
different.
Change-Id: I48c1b76dff75b92b11ddc7aab0eb9a3a5d20e489
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3315
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 931f6a66c0d8dff25b746d127dc1f36e96b12f98)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3326
The DDL statements for adding the partitions of alltypesaggmultifiles
did not include an ALTER TABLE stmt for one of the partitions, thereby
causing the planner tests to fail when test data were loaded from a
snapshot.
Change-Id: Id4b078cd334d816d6eb8eb15e5856189701a4bca
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3305
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3310
This was one idea to just cast to __int128_t as a poor man's int96.
Unfortunately, it seems too slow: ~15x for add, ~10x for multi and
3x for divide compared to __int128_t.
Change-Id: I06eb3fa3ac1edc2c174873a73a252a0165911b1c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2433
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
VLOG(3) includes each row which is much less often useful than the serialized
plan.
Change-Id: I933188f046dafb51da9d06583697792113a9165a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3289
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Before: if both operands to an arithmetic expression were null
literals, we would set the operand types and return type to INT. This
isn't correct for operators that don't support ints, e.g. divide
(there's a separate integer division function), since the function
signature wouldn't match the arithmetic expr's types. I think we
didn't run into problems because the BE uses void*s everywhere, but I
hit this when I switched the arithmetic functions to the UDF
interface.
In addition, some of the builtins were registered with the wrong
return type.
After: set the operand types to a type appropriate for the operator
before we set the return type, meaning the return type gets assigned
correctly using the existing logic.
Change-Id: I39fa147c178d895bdffaf1be676ddaa3af1d42c8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3255
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 2634932790d1f4a42ce64f73ec3722a8a7be04af)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3298
UDF invocations in udf.test should not specify a database. This is how
we switch between testing IR UDFs in the ir_function_test database and
native UDFs in the native_function_test database.
Change-Id: I09ede18f2b91440ef7a2a76b0daf41a007af2671
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3130
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 4d6160c0b88285aea754f6353cdd02b5e4b15633)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3295
The following changes are included in this commit:
1. Modified the alltypesagg table to include an additional partition key
that has nulls.
2. Added a number of tests in hdfs.test that exercise the partition
pruning logic (see IMPALA-887).
3. Modified all the tests that are affected by the change in alltypesagg.
Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236
This makes two changes to the privilege model:
* All CREATE statements now require ALL privileges on the parent object
* The user should always be able to perform "use default".
Additionally, it enables all of the authorization tests, and fixes a bug with
new privilege format from Sentry, and corrects an issue where a role wasn't
always being updated during an 'invalidate metadata' operation.
Change-Id: I92bab4ee0455574a2785bb5483b6d05611c3dfdc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3225
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
We run Wait() asynchronously for API compatibility, but many
QueryExecState functions cannot actually be run concurrently with
Wait() (e.g., Wait() opens output_exprs_, which are then evaluated in
FetchRows()).
Change-Id: I708aa23fdb238ee7aede1113790f48da2859cab9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2993
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 47f20b643e80f0f8640be9264d7ee3fc5d14dad0)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3226
This patch anticipates the changes to Llama that allow a
client-specified resource ID to be returned with every reservation or
expansion request. Doing this allows us to remove the tricky
coordination logic between WaitForNotification() and AMNotification()
when we don't know which side will access the rendezvous data structures
first. Now we can guarantee that the consumer-side will be set-up before
the notification is received.
Change-Id: I908b1dae8d074a84b0465e3a444d6651f126efd7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3093
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
Re-order union operands descending by their estimated per-host memory,
s.t. parent nodes can gauge the peak memory consumption of a MergeNode after
opening it during execution (a MergeNode opens its first operand in Open()).
Scan nodes are always ordered last because they can dynamically scale down their
memory usage, whereas many other nodes cannot (e.g., joins, aggregations).
One goal is to decrease the likelihood of a SortNode parent claiming too much
memory in its Open(), possibly causing the mem limit to be hit when subsequent
union operands are executed.
Change-Id: Ia51caaffd55305ea3dbd2146cd55acc7da67f382
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3146
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3213
Tested-by: jenkins
Syntax is "CREATE TABLE name LIKE fileformat '/path/to/file'".
Supports all options that CREATE TABLE does. Currently only PARQUET is supported.
Run testdata/bin/create-load-data.sh after pulling this patch.
Change-Id: Ibb9fbb89dbde6acceb850b914c48d12f22b33f55
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2720
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3158
- Added static order by tests to test_queries.py and QueryTest/sort.test
- test_order_by.py also contains tests with static queries that are run with
multiple memory limits.
- Added stress, scratch disk and failpoints tests
- Incorporated Srinath's change that copied all order by with limit tests into
the top-n.test file
Extra time required:
Serial:
scratch disk: 42 seconds
test queries sort : 77 seconds
test sort: 56 seconds
sort stress: 142 seconds
TOTAL: 5 min 17 seconds
Parallel(8 threads):
scratch disk: 40 seconds
test queries sort: 42 seconds
test sort: 49 seconds
sort stress: 93 seconds
TOTAL: 3 min 44 sec
Change-Id: Ic5716bcfabb5bb3053c6b9cebc9bfbbb9dc64a7c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2820
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3205
Some operating systems don't ship which nproc, which causes impala-config.sh to fail. This
change alleviates the problem by checking if nproc exists, and setting a reasonable
default if it fails.
Change-Id: Ic6e4d0fbce57eedc82163cfa17f71bdccbc38b51
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3208
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
This commit fixes IMPALA-1040 in which when an invalid value is inserted
to a decimal partitioned column through hive it results in a non
informative error message and in some cases in the associated table to
disappear from Impala's catalog. The fix results in a more informative
error message to always be thrown by Impala to indicate the insertion of
an invalid partition key value.
Change-Id: I2855ea69944e269fb7e02b3825f44e64352151e7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3062
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3200
The sorter and block manager currently allocate all of their memory
up-front. This patch changes that so that memory is allocated as a run
is built. Only the minimum number of blocks required are allocated
up-front.
Added a non-blocking TryExpand() call to the buffered block manager to
allocate a new buffer and assign it to a block. The only place where this
is invoked is when the sorter tries to extend a run.
While there are other ways of doing this, this seemed like a minimally
invasive change to make at this point.
In the merge phase, the sorter does not try to allocate more buffers,
but instead works with the buffers allocated up to that point. This is
something that is pretty easy to change.
Other changes include:
a) There is no longer a max_available_buffers() in the block manager.
Replaced by a combination of available_allocated_buffers() and
TryExpand().
b) In WriteUnpinnedBlocks(), unallocated memory is taken into account
to determine if blocks should be written out.
c) The sorter uses a block to copy out sorted var-len data when
unpinning the blocks in a run. This block is now allocated up-front.
Conflicts:
tests/query_test/test_sort.py
Change-Id: Ifbb2ffd679a882afe8895f4785ec6d7c49c30b98
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3148
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3199