Commit Graph

839 Commits

Author SHA1 Message Date
Dan Hecht
1fee56cb26 IMPALA-1080: Implement "SET <query_option>" as SQL statement.
Also add support for "SET", which returns a table of query options and
their respective values.

The front-end parses the option into a (key, value) pair and then the
existing backend logic is used to set the option, or return the result
sets.

Change-Id: I40dbd98537e2a73bdd5b27d8b2575a2fe6f8295b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3582
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
Tested-by: jenkins
(cherry picked from commit aa0f6a2fc1d3fe21f22cc7bc56887e1fdb02250b)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3614
2014-07-25 10:25:09 -07:00
Matthew Jacobs
b83aa4984b Add compute histograms aggregate function
Adds an aggregate function to compute equi-depth histograms. The UDA
creates a sample of the column values using weighted reservoir sampling
and computes the histogram from the sorted sample.

TODO:
* Extract highly frequent values into separate buckets (i.e. 'compressed
  histogram').
* Expose separate finalize fn to produce samples and histogram data for stats

Change-Id: I314ce5fb8c73b935c4d61ea5bbd6816c59b3b41e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3552
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit c5c475712f88244e15160befaf4e99d6e165a148)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3608
2014-07-25 00:21:10 -07:00
Alex Behm
19bab59854 Create/alter/describe tables with complex types.
This patch adds parsing of complex types and tests for using complex
types in various exprs and create/alter/describe stmts.

Change-Id: Ibc211a560c889f5ccfb616813700b923c89d8245
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3577
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3594
2014-07-23 17:26:14 -07:00
Dimitris Tsirogiannis
6850d7ff90 Incorporate complex types in the analysis of subqueries
With this change we introduce a proper type for subqueries using the
recently added complex type hierarchy. The type of a subquery can be a
ScalarType, a StructType or an ArrayType depending on how many columns
and rows are returned by the subquery's statement. The subquery type is
used to simplify the analysis of subquery predicates.

Change-Id: I82e76fcb511397ca58c611f26e77fb764cfa21ae
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3547
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3581
2014-07-22 13:24:42 -07:00
Alex Behm
e9864d5f78 Introduce type hierarchy and add complex types.
This patch replaces ColumnType with a hierarchy of types that models
the existing scalar types as well as the new complex types ARRAY, MAP,
and STRUCT.

Change-Id: Ia895f41153e99febb0c35412acac12689c3c2064
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3491
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3538
2014-07-21 20:00:46 -07:00
Dan Hecht
80cc3d88cb IMPALA-1078: report better error with empty literals.
Currently, the scanner is throwing an IOException when encountering an
empty literal, and so no parse error is formulated.  Fix this by
adding a token type for empty literals.  This new token doesn't appear
in the parser's grammar, and so a nice parse error will be generated
when the parser encounters an empty literal token.

Also add a regression fe test case.

Change-Id: Ib1ad0470ebc30b6fc827c9420745ecd83fc5e1ae
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3539
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 2160b527703caee853ccca239797b67090bda149)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3568
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
2014-07-21 19:10:07 -07:00
Paden Tomasello
879a40913c Implemented UDFs for timestamp functions.
FromUtc and ToUtc use thirdparty libraries which use inline asm which
isn't currently supported with JIT. The UDFs are included in this
commit, but the function symbols were not changed in
impala_functions.py

Change-Id: I0824a434d4a26a39abf29bc6e47d51b5ad7991d6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3390
Reviewed-by: Paden Tomasello <paden.tomasello@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 8e149ccd78010b7a22d6fff1b0de5614848b02ac)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3548
2014-07-21 15:27:46 -07:00
Lenni Kuff
7157f54bbe Support DROP STATS <table name>
Adds support for dropping all table and column stats from a table. Once incremental
stats are supported, this will provide the user a way to force a recompute of all
stats.

Change-Id: I27e03d5986b64eb91852bfc3417ffa971d432d6b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3533
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit f1f074f24bfdc77c4cef147fe9d26f27df80ab81)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3551
2014-07-21 10:28:16 -07:00
Dimitris Tsirogiannis
b2bc920c6c Parsing and analyzing nested queries
The following changes are included in this commit:
1. Modified the parser to parse nested queries.
2. Added functional parser tests for nested queries.
3. Modified the analyzer to perform semantic analysis of nested queries.
4. Added functional analysis tests for nested queries.

Change-Id: I0988cb22c9b52c79d57a7c59daa85ec4821643f9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3419
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3530
2014-07-17 12:25:25 -07:00
Alex Behm
075c9bc0d9 Fix WITH-clause scoping of inserts by analyzing query stmts in a child analyzer.
Change-Id: I5dac28b6f1ddda6d2b369aef273f977fc2d9aca2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3497
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3503
2014-07-15 18:48:28 -07:00
Nong Li
1ce1c47184 Don't propagate parent tuple ids to child nodes.
I'm not sure when we added this but it does not have any benefit. The join nodes
combine the tuple*'s from the LHS and RHS anyway and the extra Tuple* reserved in
the LHS row batch is never written to or read.

Change-Id: I40f88f417161ef72185e995b6c5b8f56f31fbfc4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3438
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-07-15 16:57:09 -07:00
Alex Behm
ebc70d921d Fixes for sporadic build failure in compute stats cancellation.
The root cause of the problem was that columns of a Table were not
added to the colsByName_ map with lower case keys on the Table.load() path
that is only exercised by the catalog server (the Impalads "load" tables
via Table.loadFromThrift() which did the right thing).

The above led to an empty column stats object being sent to the HMS
after an otherwise successful compute stats.

The problem was sporadic for the following reasons:
1. Only certain file formats like avro/snap/block have uppercase
   column names in the HMS because the table was created by Hive
2. Some of our tests executed via run-tests.py, notably the
   cancellation tests, aren't deterministic in which test vectors
   are executed in a particular run. As a result, we only see the
   cancellation test run compute stats on an avro/snap/block
   once in a while (this behavior is unaffected by this patch).

This patch includes other minor bugfixes and simplifications
related to compute stats.

Change-Id: I7cb5fe69404e35133eda314d9f7d072c78416ff1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3468
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3479
2014-07-10 19:09:08 -07:00
Alex Behm
04ef96e873 Clean up resolution of table and view references.
Change-Id: I2bcc21d0dab1718b0c11a4e27b59e02d934aa79c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2511
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3457
2014-07-09 17:59:13 -07:00
Alex Behm
21c9eb68b1 Restore casts stripped from grouping exprs by substitution.
Change-Id: I2a317025f9a8549beed7cf79b463239e11a6a2d0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3352
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3432
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-07-08 10:45:43 -07:00
Ippokratis Pandis
e1ae5fe95a IMPALA-1068: COMPUTE STATS should place -1 in #NULLs
With IMPALA-1033 we disabled the counting of the number of NULLs in each column,
and that gave a 2x speed-up in the computation. But erroneously the value 0 was
being placed in the number of NULLs, instead of the correct -1 that indicates
'unknown'.

Change-Id: Ib882eb2a87e7e2469f606081cb2881461b441a45
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3377
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3378
2014-07-07 15:13:25 -07:00
Skye Wanderman-Milne
b572fe0af5 Remove unnecessary decimal casts for some builtins.
For arithmetic ops, this is an optimization. The Add(decimal,decimal) already handles
the cast as part of the operation.

For binary predicates, the cast is bad and can lead to overflows. The decimal Compare()
function has custom logic to not overflow.

Change-Id: I9f5ad74ea89e9dfa5a3a40c1e07f7e9178bf1d52
(cherry picked from commit 6bffaa885542443ca559888d921853ecd194cbcb)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3414
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-07-03 21:32:51 -07:00
Dimitris Tsirogiannis
cf782fe500 IMPALA-1065: Running explain on attached (TPC-DS) query throws
IllegalStateExcpetion

This commit resolves IMPALA-1065 where the explain statement of TPC-DS
Q48 resulted in an IllegalStateException due to an overflow in the
cardinality estimation of a cross join operator. The fix is to check if
an overflow has occurred and reset the cardinality estimation to a valid
value.

Change-Id: I0e88fde07e7a5d86819af317e98bab7ac08d5a8a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3346
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3366
2014-07-01 19:23:29 -07:00
Alex Behm
003da0ec59 IMPALA-1061: Fix resolution of implicit table aliases in views and star expressions.
This patch cleans up registration and resolution of implicit table aliases as follows:
During analysis, we register all legal aliases of table/view references and remember
implicit table aliases that are ambiguous. When resolving table or column references
we consider all legal table aliases.

A table/view may have either one explicit alias or two implicit aliases.
The implicit aliases are the fully-qualified and the unqualified table name.
Within a single query, explicit and implicit aliases can be mixed as long as there
are no clashes between explicit and fully-qualified implicit aliases, and there are
no ambiguous references to implicit unqualified aliases.

Change-Id: I5734539aa821d130882491ec628dae8128d22e2f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3258
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3359
2014-07-01 17:50:21 -07:00
Skye Wanderman-Milne
f0fb28158b FE changes to avoid shipping null-type expressions to the BE.
Once the expr refactoring goes in, the BE will not be able to evaluate
any TYPE_NULL exprs. This patch ensures that the FE casts all null
literals and slot refs before they reach the BE.

There are a bunch of places where we know the appropriate type and
just weren't using it before. This patch also introduces a few notable
hacks:

* Serializing null SlotRefs and NullLiterals as boolean NullLiterals
  in case they weren't cast earlier.
* Converting null SlotRefs to NullLiterals in uncheckedCastTo() since
  we don't need to read from the slot at all.

This works, but we should consider adding a final pass that cleans up
the plan tree and takes care of this.

Change-Id: Ic2ee181139059553d7f2d0e17e9dacaee241df17
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3294
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit a8a67ebcad12956a8260b4ea4189afb7ffab4b68)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3361
2014-07-01 15:48:08 -07:00
Nong Li
9abca8321b Fix result precision in decimal round/truncate/etc and overflow.
Change-Id: I23840734fd5b7ab7404d94f6df05410b153354de
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3338
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-07-01 08:05:39 -07:00
Matthew Jacobs
65c1a6f21e Remove SOURCE keyword by parsing as an identifier and checking the value
Reverts "IMPALA-1033: Remove SOURCE keyword; very common identifier"

Change-Id: I3fcf6d02786e00287b564cff0a823d0c19504e7a
2014-06-30 16:47:47 -07:00
Dimitris Tsirogiannis
630d90392e CDH-20089: Query planning failed in HdfsScanNode.evalBinaryPredicate
This commit fixes issue CDH-20089 where an error is thrown when we have
a binary predicate on a partition key that has no values.

Change-Id: I3b5cefb4d7193045fc6fc5e94766589c2299b5b1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3327
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3335
2014-06-30 15:05:31 -07:00
Alex Behm
7777fbff53 Clean up expr substitution and cloning.
Before: The pre- and postconditions of expr substitution and cloning,
in particular, their effect on the isAnalyzed_ flag were unclear and
sometimes inconsistent e.g., some literal exprs set isAnalyzed_ to
true in their c'tor. As a result, several places required ad-hoc
solutions like Expr.unsetIsAnalyzed() and Expr.reanalyze().

This patch cleans up expr substitution and cloning, summarized as follows:

Expr analysis:
All exprs start our with isAnalyzed_ = false. The flag it set to true
iff analyze() has been called on the expr.

Expr.clone():
Creates a deep copy of an expr including all its analysis state.

Expr.equals():
Comparison of expr trees ignores implicit casts. This simplifies expr
substitution because un/analyzed exprs can be easily compared/substituted.

ExprSubstitutionMap:
When adding a mapping, the rhs expr must be analyzed to allow
substitution across query blocks. There is no requirement on the lhs expr.

Expr substitution:
Substitution returns an analyzed clone of the original expr with exprs
substituted. While performing the substitution, implicit casts and analysis
state are removed such that the returned result has minimal implicit casts
and types.
There are two versions of substitute functions: One that throws exceptions
one that does not, because the caller may have different expectations on
whether a substitution must succeed or not.

Numeric literals:
This patch combines IntLiteral and DecimalLiteral into a NumericLiteral.
Its main benefit is that analyze() always produces the same type, even if
the literal was implicitly cast and/or isAnalyzed was unset because of
expr substitution. This was not the case before because an implicit cast
could permanently turn an IntLiteral into a DecimalLiteral.

There is no more need for unsetIsAnalyzed() or reanalyze().

Change-Id: I646110e3714cff8ae8d5a378c25a107dd43334b6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3228
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3318
2014-06-30 10:18:26 -07:00
Lenni Kuff
ad933ec765 Switch terminology of 'impersonated user' to 'delegated user'
This is to help ensure naming is consistent across the platform and
also avoid confusion with HS2 "impersonation" which is something very
different.

Change-Id: I48c1b76dff75b92b11ddc7aab0eb9a3a5d20e489
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3315
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 931f6a66c0d8dff25b746d127dc1f36e96b12f98)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3326
2014-06-28 20:46:06 -07:00
Nong Li
163750f170 Fix decimal multiply result precision off by 1.
Change-Id: I860e0d13ee9bae7d3e180103a22fe7606a320b13
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3249
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-27 11:22:05 -07:00
Skye Wanderman-Milne
5305b17121 IMPALA-1053: only log unsupported type warning once
Change-Id: Ibb34e4632f87ac192bb58d4d6616b41e7dac53d2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3140
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 51a1db1ceaa0a928f364f333c4351abefd90b2f8)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3297
2014-06-26 23:58:40 -07:00
Skye Wanderman-Milne
3a6d6b71cb Fix NULL handling in ArithmeticExpr
Before: if both operands to an arithmetic expression were null
literals, we would set the operand types and return type to INT. This
isn't correct for operators that don't support ints, e.g. divide
(there's a separate integer division function), since the function
signature wouldn't match the arithmetic expr's types. I think we
didn't run into problems because the BE uses void*s everywhere, but I
hit this when I switched the arithmetic functions to the UDF
interface.

In addition, some of the builtins were registered with the wrong
return type.

After: set the operand types to a type appropriate for the operator
before we set the return type, meaning the return type gets assigned
correctly using the existing logic.

Change-Id: I39fa147c178d895bdffaf1be676ddaa3af1d42c8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3255
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 2634932790d1f4a42ce64f73ec3722a8a7be04af)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3298
2014-06-26 23:52:02 -07:00
Lenni Kuff
13f487ae31 CDH-19900: Change to make Hive/Impala privilege models consistent
This makes two changes to the privilege model:
* All CREATE statements now require ALL privileges on the parent object
* The user should always be able to perform "use default".

Additionally, it enables all of the authorization tests, and fixes a bug with
new privilege format from Sentry, and corrects an issue where a role wasn't
always being updated during an 'invalidate metadata' operation.

Change-Id: I92bab4ee0455574a2785bb5483b6d05611c3dfdc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3225
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-23 19:08:37 -07:00
Nong Li
a7beb12540 [CDH5] Fix column stats for decimal.
Change-Id: I72b31f6431bf6259e759fd290200fd1a755f82c6
2014-06-20 23:03:06 -07:00
Alex Behm
881f3a8c33 Re-order union operands descending by their estimated per-host memory.
Re-order union operands descending by their estimated per-host memory,
s.t. parent nodes can gauge the peak memory consumption of a MergeNode after
opening it during execution (a MergeNode opens its first operand in Open()).
Scan nodes are always ordered last because they can dynamically scale down their
memory usage, whereas many other nodes cannot (e.g., joins, aggregations).
One goal is to decrease the likelihood of a SortNode parent claiming too much
memory in its Open(), possibly causing the mem limit to be hit when subsequent
union operands are executed.

Change-Id: Ia51caaffd55305ea3dbd2146cd55acc7da67f382
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3146
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3213
Tested-by: jenkins
2014-06-20 18:46:10 -07:00
Victor Bittorf
2d7f2e19b2 IMPALA 938: Infer schema from Parquet file
Syntax is "CREATE TABLE name LIKE fileformat '/path/to/file'".
Supports all options that CREATE TABLE does. Currently only PARQUET is supported.
Run testdata/bin/create-load-data.sh after pulling this patch.

Change-Id: Ibb9fbb89dbde6acceb850b914c48d12f22b33f55
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2720
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3158
2014-06-20 17:38:01 -07:00
Dimitris Tsirogiannis
7dbd3a5860 IMPALA-1040: Reading a decimal partitioned column with invalid values
This commit fixes IMPALA-1040 in which when an invalid value is inserted
to a decimal partitioned column through hive it results in a non
informative error message and in some cases in the associated table to
disappear from Impala's catalog. The fix results in a more informative
error message to always be thrown by Impala to indicate the insertion of
an invalid partition key value.

Change-Id: I2855ea69944e269fb7e02b3825f44e64352151e7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3062
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3200
2014-06-20 12:46:52 -07:00
Ippokratis Pandis
6026f1ebe1 IMPALA-1055: Compute stats query statements don't quote DB and table names
The compute stats statement was not quoting the DB and table names. If those names
were aliasing with keywords, then the compute stats would not execute due to a syntax
error.

Change-Id: Ie08421246bb54a63a44eaf19d0d835da780b7033
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3170
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3198
2014-06-20 09:32:52 -07:00
Srinath Shankar
9a57334e34 IMPALA-1052: Display a warning when an order-by without limit is ignored
Displays a warning and the ignored order by clause

Change-Id: I3738d60e914de02bec552478279d8c4314426676
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3102
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3175
2014-06-19 18:42:32 -07:00
Srinath Shankar
6afbc412ff Lower estimated per-host mem cost for sort nodes
Currently, the per-host mem cost for external sort is the same
as that of top-N, i.e. the size of the entire input.
This patch changes the estimated per-host mem cost to be the amount
of memory required for a 2-phase external sort.

Change-Id: I1bf976fa56a3dddf1a6697c75cf82d3a020cdad2
2014-06-19 18:00:39 -07:00
Nong Li
11b4d85bf1 Change precision/scale truncate in decimal divide analysis.
Previously, we tried to maintain as much of the scale as possible but
this leads to very easy overflow cases since it requires dropping all
digits before the decimal point. This patch picks a midway point.

I did a little bit of research this is close to what SQL server does
(the reference is linked in the function I changed).

Change-Id: I2100beead82559ef7b017c5f335acd532076c0d4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3150
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-19 17:16:29 -07:00
Alex Behm
70d7ff07af CDH-19856: Disable Hive's stats autogathering.
Change-Id: I04e91f91d29b7863848a750e362c9d94469df7f2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3156
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3169
2014-06-19 16:48:34 -07:00
Alex Behm
ef6705d7e0 Rename MergeNode to UnionNode.
Change-Id: I9e3675a103757db1345b04bd1d102d2719efddd0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3128
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3154
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-06-19 12:44:21 -07:00
Nong Li
e7f7eab1b5 Missing reanalyze() in select stmt after substitution.
Change-Id: I71203ebb02cf64e5bf259d2f6c5faf951f87f0d2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3144
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-19 02:52:10 -07:00
Lenni Kuff
7dacd32983 IMPALA-1041: Detect partitions that contain invalid MD
Previously, we would load a partition but would not perform any validation on whether
the partition metadata was actually well formed (could be converted into our
internal thrift representation). This updates our table loading logic to do
add an additional validation check after creating a partition to ensure the metadata is
well formed.
The hang reported in IMPALA-1041 was because we were unable to convert the target table
toThrift() so it never got sent to the impalad from the catalog server due to invalid
partition metadata.

Change-Id: I4848427cbc923d6a0e515ba5154e900981dbf9ae
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2981
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3147
2014-06-19 00:57:32 -07:00
Alex Behm
677062be3d Rework planning of unions s.t. a UnionStmt produces a single MergeNode.
This patch changes the planning of a UnionStmt s.t. it always produces a single fragment
with a MergeNode connecting all child fragments as its root.
The data partition of the returned fragment and how the child fragments are merged
depends on the data partitions of the child fragments:
- All child fragments are unpartitioned or partitioned: The returned fragment is
  has a UNPARTITIONED or RANDOM data partition, respectively. The MergeNode absorbs
  the plan trees of all child fragments.
- Mixed partitioned/unpartitioned child fragments: The returned fragment is
  RANDOM partitioned. The plan trees of all partitioned child fragments are absorbed
  into the MergeNode. All unpartitioned child fragments are connected to the
  MergeNode via a RANDOM exchange, and remain unchanged otherwise.

Also adds support for random partitioned data exchanges.

Change-Id: I82b2d12c104d98c4e7133234653ee1b67658ef7a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2876
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3143
2014-06-19 00:56:58 -07:00
Lenni Kuff
d8175f7109 [CDH5] Fix build break due to Sentry grantDatabasePrivilege() API change
Change-Id: I4fb46a972c9c514551b2215e5abd7ad32f69046e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3145
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-06-18 22:22:12 -07:00
Alex Behm
9dc883b140 IMPALA-1005: Print consistent plan fragment ids in explain plan and runtime profile.
Change-Id: I63b59a896dc9dc0c9ed1d5e889f7b5626ba61202
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3037
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3124
2014-06-18 15:44:43 -07:00
Lenni Kuff
220b3ac5a8 [CDH5] CDH-19662: Add sentry policy refresh in 'invalidate metadata'
Updates 'invalidate metadata' to force an update of the Sentry Policy Service metadata,
if Sentry is enabled.
Also fixes an issue with broken connection to the Sentry Policy Service by creating
a new connection for each Sentry RPC. This is not desireable (we should be able to
cache the connections), but currently there is no way to detect broken connections
and "re-open" them (tracked by SENTRY-296).

I am working on adding a test for this, but since Impala does not yet support grant
revoke it is a bit tricky.

Change-Id: I681037beeec9cbf378126d799f6e07c89a4dc29c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3054
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-06-18 12:27:50 -07:00
anusha
6b3689e8c7 IMPALA-973: Fix for invalidate metadata behaviour
Change-Id: Ie0c4c458b0919978b03ebaba28bf37950dd34643
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3009
Tested-by: jenkins
Reviewed-by: Anusha Dasarakothapalli <anusha.dasarakothapalli@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3091
2014-06-17 12:18:50 -07:00
Nong Li
a136277a67 Fix missing clean up in AnalyzeExprsTest.
Change-Id: Ic22981d0f2309f7cedf30d081d92542866255b97
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3086
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-16 23:56:23 -07:00
Dimitris Tsirogiannis
67eb5eb3a8 IMPALA-1028: Cardinality estimate is wrong for partitioned tables if we
filter out all partitions

This commit fixes IMPALA-1028 in which the cardinality estimate is not
correct when all the partitions of a partitioned table are filtered out.
To fix this issue we make sure that the estimated result cardinality of
the scan node is zero when all the partitions are filtered out.

Change-Id: I225949eb2e8f905a5d0f678d7f199fb95ba4aab0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3063
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3083
2014-06-16 20:36:13 -07:00
Matthew Jacobs
dbe1b534ed IMPALA-1050: NPE error when pool placement policy cannot map user to pool
Change-Id: I53ed823ee55bee96269f4119af7da2dab25d4a7c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3028
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 569bd5d4a8e30a907a33551c58a3ab80849b8dc9)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3061
2014-06-15 13:38:20 -07:00
Srinath Shankar
895bdeddd8 Ignore order-by without limit in INSERT and CTAS
Order-by without limit in the query statement corresponding an INSERT
or CTAS must be ignored because
i) There is no guarantee on row ordering when the target table is scanned again
   i.e. 'select * from table' may return rows in any order, regardless of how the
   rows were inserted, and
ii) Ignoring (and not flagging an error) is consistent with the treatment of
   order-by w/o limit in nested queries, union operands etc.
Currently, an order-by w/o limit in a QueryStmt is only evaluated if the analyzer is
the root analyzer (has no ancestors).
However, a new child analyzer is not created for the QueryStmt in an InsertStmt, so this
technique fails for inserts. The correct thing to do is to use a child analyzer for that
QueryStmt, but this has spill-over scoping effects for analysis of with clauses.

This patch adds a flag, similar to the isExplain flag to the analyzer to identify
insert statements.

Change-Id: I9ded587cfea75eca0b7a43ee9b0df0a6c8ecb602
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3044
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3060
2014-06-14 18:36:43 -07:00
Nong Li
27a57dda28 Improve partition spec analysis error message.
Change-Id: I92068642292b3d0fe6c37e31a64293d02c379900
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2978
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-13 00:34:09 -07:00