Also add support for "SET", which returns a table of query options and
their respective values.
The front-end parses the option into a (key, value) pair and then the
existing backend logic is used to set the option, or return the result
sets.
Change-Id: I40dbd98537e2a73bdd5b27d8b2575a2fe6f8295b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3582
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
Tested-by: jenkins
(cherry picked from commit aa0f6a2fc1d3fe21f22cc7bc56887e1fdb02250b)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3614
Adds an aggregate function to compute equi-depth histograms. The UDA
creates a sample of the column values using weighted reservoir sampling
and computes the histogram from the sorted sample.
TODO:
* Extract highly frequent values into separate buckets (i.e. 'compressed
histogram').
* Expose separate finalize fn to produce samples and histogram data for stats
Change-Id: I314ce5fb8c73b935c4d61ea5bbd6816c59b3b41e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3552
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit c5c475712f88244e15160befaf4e99d6e165a148)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3608
We recently changed user-initiated cancellation to not set the query state
to EXCEPTION. In FetchInternal() we relied on the previous behavior for
detecting cancellations/errors after BlockOnWait().
This patch fixes the cancellation/error check to use the query status
instead of the query state.
Change-Id: I48b4834e77b6e692fb6722637fb9fd5d8c8d9d97
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3597
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3600
This also means clients of the block mgr need to delete all blocks in close.
This is less important for sorting since it's typically at the end but will
be useful very soon.
Change-Id: Ia4ee188ad845540039ede5fe410a6048abe2bf5a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3540
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3588
This patch does a few things:
1. Moves the buffer block mgr from the sorter to the runtime state. This is now
one that is shared across the query fragment. The partitioned hash join and agg
will use this as well.
2. Adds a Client interface to the block mgr. Each exec node is a different client
and can reserve a minimum number of buffers. This avoid starvation.
3. Updated the BufferedBlockMgr interface's for getting pinned blocks to collapse
two existing APIs.
Change-Id: Ibb31fbe480f3726048457f26e24a9e33f7201d86
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3504
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3574
We were setting the state to exception on Cancel() all the time.
We use the cancellation path as the normal cleanup path so this
gets called even when the query went fine (e.g. UnregisterQuery
calls Cancel()). We had already plumbed through a 'cause' argument
to differentiate.
Change-Id: Icf1091c165dec36d3dad7ce308367bbbc9edee4f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3524
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3575
FromUtc and ToUtc use thirdparty libraries which use inline asm which
isn't currently supported with JIT. The UDFs are included in this
commit, but the function symbols were not changed in
impala_functions.py
Change-Id: I0824a434d4a26a39abf29bc6e47d51b5ad7991d6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3390
Reviewed-by: Paden Tomasello <paden.tomasello@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 8e149ccd78010b7a22d6fff1b0de5614848b02ac)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3548
We used to maintain a separate hash table (in the form of a boost
unordered set) to keep track of the build rows that have been matched.
This patch changes it by just keeping a bit in the hash table. It is not
possible to use boost::unordered_set for tables that are large.
Change-Id: Ie36e609bf79e5e7e403417a3c02a0817d37acc60
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3478
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
This patch does two things in preparation for external joins. The
hash table used to contain a directory structure (buckets and nodes)
both of which were contiguous. The nodes contained the tuple ptrs
within it.
This patch changes it so the nodes are not stored contiguously but
allocated in pages. (this structure is dense and does not require
random lookups by index). The bucket structure is still contiguous
since we rely on the doubling property and random lookup by index.
The second change is that the node's no longer store the tuple ptrs
within them. This makes it easier to build the hash table ontop of
existing data.
Here's a quick benchmark doing a self join on tpch lineitem. Both
build and probe times decreased a bit.
Before:
HASH_JOIN_NODE (id=2):(Total: 1s139ms, non-child: 985.939ms, % non-child: 86.50%)
- BuildBuckets: 2.10M (2097152)
- BuildRows: 6.00M (6001215)
- BuildTime: 527.991ms
- LeftChildRows: 6.00M (6001215)
- LeftChildTime: 451.964ms
- LoadFactor: 0.50
- RowsReturned: 30.01M (30012985)
- RowsReturnedRate: 26.33 M/sec
After:
HASH_JOIN_NODE (id=2):(Total: 1s019ms, non-child: 835.350ms, % non-child: 81.97%)
- BuildBuckets: 2.10M (2097152)
- BuildRows: 6.00M (6001215)
- BuildTime: 423.175ms
- LeftChildRows: 6.00M (6001215)
- LeftChildTime: 406.67ms
- LoadFactor: 0.50
- RowsReturned: 30.01M (30012985)
- RowsReturnedRate: 29.45 M/sec
Change-Id: I79e209a24c24fb4f2f99574bcf187746fddadc06
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3245
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
I'm not sure when we added this but it does not have any benefit. The join nodes
combine the tuple*'s from the LHS and RHS anyway and the extra Tuple* reserved in
the LHS row batch is never written to or read.
Change-Id: I40f88f417161ef72185e995b6c5b8f56f31fbfc4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3438
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
The root cause of the problem was that columns of a Table were not
added to the colsByName_ map with lower case keys on the Table.load() path
that is only exercised by the catalog server (the Impalads "load" tables
via Table.loadFromThrift() which did the right thing).
The above led to an empty column stats object being sent to the HMS
after an otherwise successful compute stats.
The problem was sporadic for the following reasons:
1. Only certain file formats like avro/snap/block have uppercase
column names in the HMS because the table was created by Hive
2. Some of our tests executed via run-tests.py, notably the
cancellation tests, aren't deterministic in which test vectors
are executed in a particular run. As a result, we only see the
cancellation test run compute stats on an avro/snap/block
once in a while (this behavior is unaffected by this patch).
This patch includes other minor bugfixes and simplifications
related to compute stats.
Change-Id: I7cb5fe69404e35133eda314d9f7d072c78416ff1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3468
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3479
I hit this in the expr refactoring. This makes sure we never expose a
function that returns a DecimalVal directly (rather than through an
extra return parameter as specified by the ABI), which will crash if
called from precompiled native code.
Change-Id: Ifb249086c221b53553d3e7fb39af065f4cca2bac
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3425
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 429448935555b098e324bcb97ab43a7c90e0b918)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3473
For arithmetic ops, this is an optimization. The Add(decimal,decimal) already handles
the cast as part of the operation.
For binary predicates, the cast is bad and can lead to overflows. The decimal Compare()
function has custom logic to not overflow.
Change-Id: I9f5ad74ea89e9dfa5a3a40c1e07f7e9178bf1d52
(cherry picked from commit 6bffaa885542443ca559888d921853ecd194cbcb)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3414
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
If RM and per-query memory limits were enabled at the same time, the
per-query limit would be ignored if RM wanted to expand the memory
allocation. This change adds an optional reservation limit to a
memtracker. The original limit goes back to being a hard limit -
i.e. any attempt to consume more than that amount results in
failure. The RM reservation limit is the RM-allocated memory limit. If
that is exceeded it triggers the ExpandRmReservation() method, which tries
to retrieve more memory as long as the hard limit is observed.
The net effect is that per-query memory limits have the intended,
hard-limit effect, while the RM limits coexist nicely and can expand
with more memory as required.
At the same time, we change the precedence of various ways of suggesting
an initial reservation size so that the user can change the reservation
size via a query option (MEM_RESERVATION_SIZE).
Change-Id: I41bfa4eb1336810a8a5946f6be3472111a052144
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3134
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
Once the expr refactoring goes in, the BE will not be able to evaluate
any TYPE_NULL exprs. This patch ensures that the FE casts all null
literals and slot refs before they reach the BE.
There are a bunch of places where we know the appropriate type and
just weren't using it before. This patch also introduces a few notable
hacks:
* Serializing null SlotRefs and NullLiterals as boolean NullLiterals
in case they weren't cast earlier.
* Converting null SlotRefs to NullLiterals in uncheckedCastTo() since
we don't need to read from the slot at all.
This works, but we should consider adding a final pass that cleans up
the plan tree and takes care of this.
Change-Id: Ic2ee181139059553d7f2d0e17e9dacaee241df17
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3294
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit a8a67ebcad12956a8260b4ea4189afb7ffab4b68)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3361
Without this patch, the returned StringValue's ptr would be before the
input pointer if the 'pos' argument was < -input.len
Change-Id: I7bd506f5d1119741a94817c34a017215b67cc26e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3351
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
(cherry picked from commit bad40d2beceffaacc409e34041a00d3ffbabf201)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3360
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
This is to help ensure naming is consistent across the platform and
also avoid confusion with HS2 "impersonation" which is something very
different.
Change-Id: I48c1b76dff75b92b11ddc7aab0eb9a3a5d20e489
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3315
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 931f6a66c0d8dff25b746d127dc1f36e96b12f98)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3326
This was one idea to just cast to __int128_t as a poor man's int96.
Unfortunately, it seems too slow: ~15x for add, ~10x for multi and
3x for divide compared to __int128_t.
Change-Id: I06eb3fa3ac1edc2c174873a73a252a0165911b1c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2433
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
VLOG(3) includes each row which is much less often useful than the serialized
plan.
Change-Id: I933188f046dafb51da9d06583697792113a9165a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3289
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Before: if both operands to an arithmetic expression were null
literals, we would set the operand types and return type to INT. This
isn't correct for operators that don't support ints, e.g. divide
(there's a separate integer division function), since the function
signature wouldn't match the arithmetic expr's types. I think we
didn't run into problems because the BE uses void*s everywhere, but I
hit this when I switched the arithmetic functions to the UDF
interface.
In addition, some of the builtins were registered with the wrong
return type.
After: set the operand types to a type appropriate for the operator
before we set the return type, meaning the return type gets assigned
correctly using the existing logic.
Change-Id: I39fa147c178d895bdffaf1be676ddaa3af1d42c8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3255
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 2634932790d1f4a42ce64f73ec3722a8a7be04af)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3298
We run Wait() asynchronously for API compatibility, but many
QueryExecState functions cannot actually be run concurrently with
Wait() (e.g., Wait() opens output_exprs_, which are then evaluated in
FetchRows()).
Change-Id: I708aa23fdb238ee7aede1113790f48da2859cab9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2993
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 47f20b643e80f0f8640be9264d7ee3fc5d14dad0)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3226