Also add support for "SET", which returns a table of query options and
their respective values.
The front-end parses the option into a (key, value) pair and then the
existing backend logic is used to set the option, or return the result
sets.
Change-Id: I40dbd98537e2a73bdd5b27d8b2575a2fe6f8295b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3582
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
Tested-by: jenkins
(cherry picked from commit aa0f6a2fc1d3fe21f22cc7bc56887e1fdb02250b)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3614
Adds an aggregate function to compute equi-depth histograms. The UDA
creates a sample of the column values using weighted reservoir sampling
and computes the histogram from the sorted sample.
TODO:
* Extract highly frequent values into separate buckets (i.e. 'compressed
histogram').
* Expose separate finalize fn to produce samples and histogram data for stats
Change-Id: I314ce5fb8c73b935c4d61ea5bbd6816c59b3b41e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3552
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit c5c475712f88244e15160befaf4e99d6e165a148)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3608
With this change we introduce a proper type for subqueries using the
recently added complex type hierarchy. The type of a subquery can be a
ScalarType, a StructType or an ArrayType depending on how many columns
and rows are returned by the subquery's statement. The subquery type is
used to simplify the analysis of subquery predicates.
Change-Id: I82e76fcb511397ca58c611f26e77fb764cfa21ae
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3547
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3581
Currently, the scanner is throwing an IOException when encountering an
empty literal, and so no parse error is formulated. Fix this by
adding a token type for empty literals. This new token doesn't appear
in the parser's grammar, and so a nice parse error will be generated
when the parser encounters an empty literal token.
Also add a regression fe test case.
Change-Id: Ib1ad0470ebc30b6fc827c9420745ecd83fc5e1ae
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3539
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 2160b527703caee853ccca239797b67090bda149)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3568
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
FromUtc and ToUtc use thirdparty libraries which use inline asm which
isn't currently supported with JIT. The UDFs are included in this
commit, but the function symbols were not changed in
impala_functions.py
Change-Id: I0824a434d4a26a39abf29bc6e47d51b5ad7991d6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3390
Reviewed-by: Paden Tomasello <paden.tomasello@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 8e149ccd78010b7a22d6fff1b0de5614848b02ac)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3548
Adds support for dropping all table and column stats from a table. Once incremental
stats are supported, this will provide the user a way to force a recompute of all
stats.
Change-Id: I27e03d5986b64eb91852bfc3417ffa971d432d6b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3533
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit f1f074f24bfdc77c4cef147fe9d26f27df80ab81)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3551
The following changes are included in this commit:
1. Modified the parser to parse nested queries.
2. Added functional parser tests for nested queries.
3. Modified the analyzer to perform semantic analysis of nested queries.
4. Added functional analysis tests for nested queries.
Change-Id: I0988cb22c9b52c79d57a7c59daa85ec4821643f9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3419
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3530
I'm not sure when we added this but it does not have any benefit. The join nodes
combine the tuple*'s from the LHS and RHS anyway and the extra Tuple* reserved in
the LHS row batch is never written to or read.
Change-Id: I40f88f417161ef72185e995b6c5b8f56f31fbfc4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3438
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
The root cause of the problem was that columns of a Table were not
added to the colsByName_ map with lower case keys on the Table.load() path
that is only exercised by the catalog server (the Impalads "load" tables
via Table.loadFromThrift() which did the right thing).
The above led to an empty column stats object being sent to the HMS
after an otherwise successful compute stats.
The problem was sporadic for the following reasons:
1. Only certain file formats like avro/snap/block have uppercase
column names in the HMS because the table was created by Hive
2. Some of our tests executed via run-tests.py, notably the
cancellation tests, aren't deterministic in which test vectors
are executed in a particular run. As a result, we only see the
cancellation test run compute stats on an avro/snap/block
once in a while (this behavior is unaffected by this patch).
This patch includes other minor bugfixes and simplifications
related to compute stats.
Change-Id: I7cb5fe69404e35133eda314d9f7d072c78416ff1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3468
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3479
With IMPALA-1033 we disabled the counting of the number of NULLs in each column,
and that gave a 2x speed-up in the computation. But erroneously the value 0 was
being placed in the number of NULLs, instead of the correct -1 that indicates
'unknown'.
Change-Id: Ib882eb2a87e7e2469f606081cb2881461b441a45
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3377
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3378
For arithmetic ops, this is an optimization. The Add(decimal,decimal) already handles
the cast as part of the operation.
For binary predicates, the cast is bad and can lead to overflows. The decimal Compare()
function has custom logic to not overflow.
Change-Id: I9f5ad74ea89e9dfa5a3a40c1e07f7e9178bf1d52
(cherry picked from commit 6bffaa885542443ca559888d921853ecd194cbcb)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3414
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
IllegalStateExcpetion
This commit resolves IMPALA-1065 where the explain statement of TPC-DS
Q48 resulted in an IllegalStateException due to an overflow in the
cardinality estimation of a cross join operator. The fix is to check if
an overflow has occurred and reset the cardinality estimation to a valid
value.
Change-Id: I0e88fde07e7a5d86819af317e98bab7ac08d5a8a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3346
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3366
This patch cleans up registration and resolution of implicit table aliases as follows:
During analysis, we register all legal aliases of table/view references and remember
implicit table aliases that are ambiguous. When resolving table or column references
we consider all legal table aliases.
A table/view may have either one explicit alias or two implicit aliases.
The implicit aliases are the fully-qualified and the unqualified table name.
Within a single query, explicit and implicit aliases can be mixed as long as there
are no clashes between explicit and fully-qualified implicit aliases, and there are
no ambiguous references to implicit unqualified aliases.
Change-Id: I5734539aa821d130882491ec628dae8128d22e2f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3258
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3359
Once the expr refactoring goes in, the BE will not be able to evaluate
any TYPE_NULL exprs. This patch ensures that the FE casts all null
literals and slot refs before they reach the BE.
There are a bunch of places where we know the appropriate type and
just weren't using it before. This patch also introduces a few notable
hacks:
* Serializing null SlotRefs and NullLiterals as boolean NullLiterals
in case they weren't cast earlier.
* Converting null SlotRefs to NullLiterals in uncheckedCastTo() since
we don't need to read from the slot at all.
This works, but we should consider adding a final pass that cleans up
the plan tree and takes care of this.
Change-Id: Ic2ee181139059553d7f2d0e17e9dacaee241df17
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3294
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit a8a67ebcad12956a8260b4ea4189afb7ffab4b68)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3361
Before: The pre- and postconditions of expr substitution and cloning,
in particular, their effect on the isAnalyzed_ flag were unclear and
sometimes inconsistent e.g., some literal exprs set isAnalyzed_ to
true in their c'tor. As a result, several places required ad-hoc
solutions like Expr.unsetIsAnalyzed() and Expr.reanalyze().
This patch cleans up expr substitution and cloning, summarized as follows:
Expr analysis:
All exprs start our with isAnalyzed_ = false. The flag it set to true
iff analyze() has been called on the expr.
Expr.clone():
Creates a deep copy of an expr including all its analysis state.
Expr.equals():
Comparison of expr trees ignores implicit casts. This simplifies expr
substitution because un/analyzed exprs can be easily compared/substituted.
ExprSubstitutionMap:
When adding a mapping, the rhs expr must be analyzed to allow
substitution across query blocks. There is no requirement on the lhs expr.
Expr substitution:
Substitution returns an analyzed clone of the original expr with exprs
substituted. While performing the substitution, implicit casts and analysis
state are removed such that the returned result has minimal implicit casts
and types.
There are two versions of substitute functions: One that throws exceptions
one that does not, because the caller may have different expectations on
whether a substitution must succeed or not.
Numeric literals:
This patch combines IntLiteral and DecimalLiteral into a NumericLiteral.
Its main benefit is that analyze() always produces the same type, even if
the literal was implicitly cast and/or isAnalyzed was unset because of
expr substitution. This was not the case before because an implicit cast
could permanently turn an IntLiteral into a DecimalLiteral.
There is no more need for unsetIsAnalyzed() or reanalyze().
Change-Id: I646110e3714cff8ae8d5a378c25a107dd43334b6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3228
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3318
This is to help ensure naming is consistent across the platform and
also avoid confusion with HS2 "impersonation" which is something very
different.
Change-Id: I48c1b76dff75b92b11ddc7aab0eb9a3a5d20e489
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3315
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 931f6a66c0d8dff25b746d127dc1f36e96b12f98)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3326
Before: if both operands to an arithmetic expression were null
literals, we would set the operand types and return type to INT. This
isn't correct for operators that don't support ints, e.g. divide
(there's a separate integer division function), since the function
signature wouldn't match the arithmetic expr's types. I think we
didn't run into problems because the BE uses void*s everywhere, but I
hit this when I switched the arithmetic functions to the UDF
interface.
In addition, some of the builtins were registered with the wrong
return type.
After: set the operand types to a type appropriate for the operator
before we set the return type, meaning the return type gets assigned
correctly using the existing logic.
Change-Id: I39fa147c178d895bdffaf1be676ddaa3af1d42c8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3255
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 2634932790d1f4a42ce64f73ec3722a8a7be04af)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3298
This makes two changes to the privilege model:
* All CREATE statements now require ALL privileges on the parent object
* The user should always be able to perform "use default".
Additionally, it enables all of the authorization tests, and fixes a bug with
new privilege format from Sentry, and corrects an issue where a role wasn't
always being updated during an 'invalidate metadata' operation.
Change-Id: I92bab4ee0455574a2785bb5483b6d05611c3dfdc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3225
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Re-order union operands descending by their estimated per-host memory,
s.t. parent nodes can gauge the peak memory consumption of a MergeNode after
opening it during execution (a MergeNode opens its first operand in Open()).
Scan nodes are always ordered last because they can dynamically scale down their
memory usage, whereas many other nodes cannot (e.g., joins, aggregations).
One goal is to decrease the likelihood of a SortNode parent claiming too much
memory in its Open(), possibly causing the mem limit to be hit when subsequent
union operands are executed.
Change-Id: Ia51caaffd55305ea3dbd2146cd55acc7da67f382
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3146
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3213
Tested-by: jenkins
Syntax is "CREATE TABLE name LIKE fileformat '/path/to/file'".
Supports all options that CREATE TABLE does. Currently only PARQUET is supported.
Run testdata/bin/create-load-data.sh after pulling this patch.
Change-Id: Ibb9fbb89dbde6acceb850b914c48d12f22b33f55
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2720
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3158
This commit fixes IMPALA-1040 in which when an invalid value is inserted
to a decimal partitioned column through hive it results in a non
informative error message and in some cases in the associated table to
disappear from Impala's catalog. The fix results in a more informative
error message to always be thrown by Impala to indicate the insertion of
an invalid partition key value.
Change-Id: I2855ea69944e269fb7e02b3825f44e64352151e7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3062
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3200
The compute stats statement was not quoting the DB and table names. If those names
were aliasing with keywords, then the compute stats would not execute due to a syntax
error.
Change-Id: Ie08421246bb54a63a44eaf19d0d835da780b7033
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3170
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3198
Currently, the per-host mem cost for external sort is the same
as that of top-N, i.e. the size of the entire input.
This patch changes the estimated per-host mem cost to be the amount
of memory required for a 2-phase external sort.
Change-Id: I1bf976fa56a3dddf1a6697c75cf82d3a020cdad2
Previously, we tried to maintain as much of the scale as possible but
this leads to very easy overflow cases since it requires dropping all
digits before the decimal point. This patch picks a midway point.
I did a little bit of research this is close to what SQL server does
(the reference is linked in the function I changed).
Change-Id: I2100beead82559ef7b017c5f335acd532076c0d4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3150
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Previously, we would load a partition but would not perform any validation on whether
the partition metadata was actually well formed (could be converted into our
internal thrift representation). This updates our table loading logic to do
add an additional validation check after creating a partition to ensure the metadata is
well formed.
The hang reported in IMPALA-1041 was because we were unable to convert the target table
toThrift() so it never got sent to the impalad from the catalog server due to invalid
partition metadata.
Change-Id: I4848427cbc923d6a0e515ba5154e900981dbf9ae
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2981
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3147
This patch changes the planning of a UnionStmt s.t. it always produces a single fragment
with a MergeNode connecting all child fragments as its root.
The data partition of the returned fragment and how the child fragments are merged
depends on the data partitions of the child fragments:
- All child fragments are unpartitioned or partitioned: The returned fragment is
has a UNPARTITIONED or RANDOM data partition, respectively. The MergeNode absorbs
the plan trees of all child fragments.
- Mixed partitioned/unpartitioned child fragments: The returned fragment is
RANDOM partitioned. The plan trees of all partitioned child fragments are absorbed
into the MergeNode. All unpartitioned child fragments are connected to the
MergeNode via a RANDOM exchange, and remain unchanged otherwise.
Also adds support for random partitioned data exchanges.
Change-Id: I82b2d12c104d98c4e7133234653ee1b67658ef7a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2876
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3143
Updates 'invalidate metadata' to force an update of the Sentry Policy Service metadata,
if Sentry is enabled.
Also fixes an issue with broken connection to the Sentry Policy Service by creating
a new connection for each Sentry RPC. This is not desireable (we should be able to
cache the connections), but currently there is no way to detect broken connections
and "re-open" them (tracked by SENTRY-296).
I am working on adding a test for this, but since Impala does not yet support grant
revoke it is a bit tricky.
Change-Id: I681037beeec9cbf378126d799f6e07c89a4dc29c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3054
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
filter out all partitions
This commit fixes IMPALA-1028 in which the cardinality estimate is not
correct when all the partitions of a partitioned table are filtered out.
To fix this issue we make sure that the estimated result cardinality of
the scan node is zero when all the partitions are filtered out.
Change-Id: I225949eb2e8f905a5d0f678d7f199fb95ba4aab0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3063
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3083
Order-by without limit in the query statement corresponding an INSERT
or CTAS must be ignored because
i) There is no guarantee on row ordering when the target table is scanned again
i.e. 'select * from table' may return rows in any order, regardless of how the
rows were inserted, and
ii) Ignoring (and not flagging an error) is consistent with the treatment of
order-by w/o limit in nested queries, union operands etc.
Currently, an order-by w/o limit in a QueryStmt is only evaluated if the analyzer is
the root analyzer (has no ancestors).
However, a new child analyzer is not created for the QueryStmt in an InsertStmt, so this
technique fails for inserts. The correct thing to do is to use a child analyzer for that
QueryStmt, but this has spill-over scoping effects for analysis of with clauses.
This patch adds a flag, similar to the isExplain flag to the analyzer to identify
insert statements.
Change-Id: I9ded587cfea75eca0b7a43ee9b0df0a6c8ecb602
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3044
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3060