Defer resizing the columns_ vector until we are sure we will initialize it.
Downstream code doesn't expect any NULLs.
Change-Id: I250cceee5181428fcd3cd1a8b021edb7187ae888
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4465
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
Tested-by: jenkins
The 'less than' predicate created by AnalyticPlanner used to check if the
previous row was less than the current row is not exactly what we want
to determine when rows in RANGE windows (the default window in this case)
share the same result values. Rows get the same results when the order by
exprs evaluate equally or both null, so it's easiest (and more efficient)
to use a predicate that simply checks equality or both null. We already
create such predicates for checking for partition boundaries, so this is
a trivial change.
When we support arbitrary RANGE window offsets we will likely want to
add similar predicates that compare two tuples plus/minus the offset,
but those will be simpler because there can be only one order by expr
when specifying RANGE offsets with PRECEDING/FOLLOWING.
Change-Id: I52ff6203686832852430e498eca6ad2cc2daee98
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4474
Tested-by: jenkins
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
1) Adds BE support for RANGE windows between UNBOUNDED PRECEDING to
UNBOUNDED FOLLOWING.
2) RANGE windows with offset boundaries fail analysis because they're
not supported by the BE yet.
Change-Id: I734575eb87c909d09d24c4df028023f3b50d3cb5
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4442
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
Fixed a bug when setting the length in reading/write text files for CHAR(N).
Also added chars_tiny table for testing CHAR(N) and VARCHAR(N).
Change-Id: If5d5db30afa4b00cf03c68c6a845f182970329f4
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4415
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
Like left/right outer joins, anti joins have a uni-directional value transfer.
Predicates could be pushed into anti joined plan subtrees if the condition
was inverted, but this patch does not implement this optimization.
No special consideration must be made to prevent predicate assignment
into anti-joined branches because anti-joined tuples are invisible outside
of the On-clause, and therefore, all unassigned conjuncts referencing the
invisible tuple must come from the original join's On-clause. The assignment
of such predicates is already handled correctly.
Change-Id: Ic2b94f6eb57e000ea51e253035e713288b205298
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4425
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
This commit fixes the issue (IMPALA-1215) where NOT IN subqueries return
wrong results in the presence of NULL values. The null-matching equality
operator is introduced in the front-end and the NOT IN subqueries are
rewritten using the null-aware anti-join operator.
Change-Id: I5a323357025d77c2143db86e1057999ec8a371c0
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4391
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
1) Fix ROWS following start bound where window is never fully in partition
2) Fix sum() NULL handling over sliding windows and add/consolidate tests.
sum() should return NULL when all non-NULL values are removed. Because
sum only stores the current sum as the intermediate value, we can't know
if the sum is actually 0 or if there are no non-NULL values in the window.
(avg() doesn't have this problem because it explicitly keeps the count
of the number of elements in the average as part of the intermediate state.)
Instead of changing sum() to have more intermediate state (which would
affect aggregations), we can just keep track of the number of calls to
Update() and Remove() in the FunctionContextImpl and check in SumRemove()
whether or not there are any non-null elements being summed. Added
tests (verified with Oracle).
3) Fixed a bug where the state tracking the last result tuple could be
wrong and resulting in a crash.
4) IMPALA-1269: Windows between a start offset to CURRENT ROW might could
produce wrong results between partitions.
5) IMPALA-1273: Incorrect results with very large window and small table
Tests are included for all issues.
Change-Id: I0f396c24078a1494fb977e8775f1ca8c530932eb
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4397
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
The previous code did not handle well the case where the spilling happens when
building the hash table (i.e. partitioning the build rows fit). This caused the
probe partition to be starved causing queries that should be able to run to fail
with a not enough buffers error.
Change-Id: I3a9a84e8800a72ed3ce6f5ab7ff03bc2d6eb7ad8
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4403
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
This patch addresses:
1. Char doesn't use codegen
2. Not in-lining large CHAR(N) for N > 128
3. Parquet reader/writer for CHAR(N) and VARCHAR(N)
Change-Id: I83a29a8bd312841a3e29bfe2243884074570f247
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4280
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
This patch also fixes IMPALA-1164: NDV() now returns a BIGINT (and not STRING).
Change-Id: Ia2a3272204938579d61091ee4f7f2d1cbf38ed55
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4338
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Adds support in the BE AnalyticEvalNode for ROWS windows with arbitrary
start bounds. If there is a start bound specified a sliding window must
be maintained. As input rows are processed they are added to the window.
As they expire from the window, they are 'removed' from the current
intermediate state of the evaluators (stored in curr_tuple_) by calling
AggFnEvaluator::Remove(). This is an initial implementation that keeps
the tuples in the window in memory. We can improve this later by using
the BufferedTupleStream with an Iterator interface supporting multiple
readers.
This also fixes IMPALA-1253: LAST_VALUE returns incorrect results
Change-Id: Id5daf6c060ab4079bb8dacf2db8992985894a820
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4335
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
Queries like;
INSERT INTO table VALUES (CAST("..." AS CHAR(N)))
Used codegen path and failed; changed to use interpreted path.
Change-Id: Id80274580df268b3f828dec19a2e0b0578061ca8
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4362
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
This patch fixes two issues:
- Add API to buffered block mgr to allow an atomic Unpin and GetNewBlock. This has
the semantics of unpinning a block and giving the buffer to the new block. This
is necessary for the tuple stream to make sure another thread does not grab the
unpinned block in between.
- Buffer management reading an unpinned stream. Before moving onto a new block (and
unpinning the current), we need to make sure all the tuples returned from the
current block are returned up the operator tree.
Change-Id: I95ee58d1019dd971f6a7dc19ecafdfa54cdbf942
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4333
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
We weren't returning the right merge function for decimal in
GetAvgFunction(). Someday the functions will be registered in the FE
like for scalar functions.
Change-Id: I1153ef8570b0e78f0925b7d3d58ec3b0fbb2c589
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4336
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Adds support for displaying all or a subset of the privileges granted to a role. Users
have privileges to execute this statement if they are already granted the role or if
they are an admin user on the Sentry Policy Service. The output includes:
* The target scope of the privilege
* The privilege level
* The target names in the object hierarchy
* Whether the privilege was granted using WITH GRANT OPTION
* The create time of the privilege
Examples:
-- Show all grants in role1
SHOW GRANT ROLE role1
-- Shows all grants in role1 on the database foo
SHOW GRANT ROLE role1 on DATABASE foo
Output looks like:
+----------+------------+-------+-----+-----------+--------------+-------------------------------+
| scope | database | table | uri | privilege | grant_option | create_time |
+----------+------------+-------+-----+-----------+--------------+-------------------------------+
| DATABASE | functional | | | ALL | false | Fri, Sep 19 2014 16:13:40.999 |
+----------+------------+-------+-----+-----------+--------------+-------------------------------+
Change-Id: I8ef1b87a4c22c8fba4228012668033d7f9d06fcb
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4389
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
This change adds support for GRANT <privilege> TO <role> WITH GRANT OPTION which allows
delegating GRANT/REVOKE authority to non-admin users. Specifically, it allows users who
have been granted the specified role to execute GRANT/REVOKE statements on all child
objects. For example, you can now do something like:
GRANT ALL ON DATABASE foo TO role1 WITH GRANT OPTION
and everyone granted role1 will be able to execute GRANT/REVOKE statements on database
foo OR any of the tables in the database.
It also adds support for REVOKE GRANT OPTION FOR <privilege> FROM <role> which allows
removing a previous WITH GRANT OPTION without actually deleting the privilege.
Similar to GRANT/REVOKE statements, the actual authorization checks on whether a user
should/should not have privileges to execute these options is done at the Sentry Service
level.
Change-Id: I8757569a3bdb68414e315ef37d6845b1859eb758
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4377
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
This patch adds the necessary changes required to authorize SHOW ROLES statements.
This is not as easy as it could be because the Sentry Service doesn't currently
expose the metadata for who is/isn't authorized to execute these statements. To authorize
the statements, we need to first make an RPC to the Sentry Service (via the
Catalog Server) and then only proceed with the SHOW statement if the check succeeds.
We should consider revisiting this approach in the future when more metadata is available
from Sentry.
Additionally, this patch adds support for SHOW CURRENT ROLES which shows all roles
that are currently granted to the current user.
Change-Id: Ia01c20d58ab081f49a85566075836d8c6e25dbd4
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4367
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
This patch adds two new hint styles:
1. Traditional commented hint: /* +hint1,hint2,hint3 */
2. End-of-line commented hint: -- +hint1,hint2,hint3\n
We now preserve hints when creating views. We always use the
end-of-line commented hint style to allow Hive to read
hinted views created by Impala. Hive does not support
traditional /* */ comments, and attempts to parse /*+ */ as
hints, failing with a parse error on unrecognized hints.
This patch also changes Impala to only issue a warning
for unrecognized hints instead of throwing an error. This
allows Impala to run against hinted views created by Hive.
Change-Id: I6e8352442e763c0029f72c17363caa087572dca0
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4235
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4361
Changes needed for PHJ to support RIGHT {SEMI, ANTI} JOINs. Codegen works as well.
Basic parser tests and minimal (end-to-end) query tests.
Need to add analyzer tests and add more query tests.
Note that in the case of right-{semi,anti} and perhaps also on {right,full}-outer we
should not be broadcasting the build side.
Change-Id: I6854ee9e4640f809f0350229bcc00811fa474f07
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4288
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4369
This change adds support for GRANT/REVOKE to Impala via the Sentry Service. This includes
support for creating and dropping roles, granting and revoking roles to/from groups,
granting/revoking privileges to/from roles, and commands to view role metadata.
The specific statements that are added in this patch are:
CREATE/DROP ROLE <roleName>
SHOW ROLES
SHOW ROLE GRANT GROUP <groupName>
GRANT/REVOKE ROLE <roleName> TO/FROM GROUP <groupName>
GRANT/REVOKE <privilegeSpec> TO/FROM <roleName
It does not include some of the fancier bulk-op syntax like support for granting multiple
roles to multiple groups in one statement.
This patch does not add support for the WITH GRANT OPTION to delegate GRANT/REVOKE
privileges to other users.
TODO:
* Authorize these statements on the client side. The current Sentry Service design makes
it difficult to authorize any GRANT/REVOKE statement on the client (Impala) side.
Privilege checks are done within the Sentry Service itself. There are a few different
options available to let Impala "fail fast" and those changes will come in a follow
on patch.
Change-Id: Ic6bd19f5939d3290255222dcc1a42ce95bd345e2
Adds agg fns for FIRST_VALUE, LAST_VALUE, LAG, LEAD. Also adds
support for ROWS windows with the end bound as unbounded following
as long as the start bound is unbounded preceding.
Change-Id: I4856ae580164d17a1bbf7d45010b61f5afa5db50
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4249
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
IMPALA-1233: Crash running query with analytic in WITH clause
IMPALA-1232: Analytic eval node crashes if cancelling query before Open()
Change-Id: I9a263775b8ef670d0f819ed53d0af1eb96edf5c7
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4313
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
Support for CHAR is implemented as a StringVal in the backend.
TODO:
1. Parquet Reader/writer
2. Codegen slot ref
3. Codegen text reader
4. Don't inline large chars
5. update impala-hs2-server.cc with CHAR support
Change-Id: Ibba2c89cea971cb740001ea7975bf3e929150471
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4075
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
This commit fixes the issue where an error was thrown if a subquery was
used in either side of a between predicate. Between predicates with
subqueries are replaced by their corresponding compound predicates
during query rewrite.
Change-Id: I4315a6e91c9306c6817bf6aa6bc1d0b586a1a067
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4246
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
This also reverts back to using CRC hash since FNV is not codegen'd
yet. The perf is not as good as the original HJ in a microbenchmark; I
haven't run a cluster run yet.
Change-Id: Ie4dc983f31631fbc78720425a0e354dd1d3342a6
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4219
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
This commit fixes IMPALA-1195 in which an exception is thrown when a
scalar subquery is in an IS NULL predicate. With this commit we also add
support for scalar subqueries in functions and other exprs.
Change-Id: Id995e77e6561a6450c4347706e4901fb3e236cfe
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4185
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
The AntiJoin code path was not resetting the hash table iterator when it was finding
a match for the current row. As a result the hash table iterator was pointing to the
wrong row when EvalConjuncts was being called.
Change-Id: I37bc457ccf999755f7f76ee30b24c5a12cb10a19
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4215
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
This patch addresses the following issues:
1. Allow creating Avro tables without col defs in Impala. Compute stats works on them.
2. Handle table creation with inconsistent col defs and Avro schema as follows:
The table creation will succeed and ignore the col defs in favor of the Avro schema.
A warning is issued that the col defs and the Avro schema are inconsistent.
Compute stats works on such tables.
This patch does not address the issue of compute stats after Avro schema evolution.
Change-Id: Iea6b737d238d81491dc2097012ebc149a89d03ba
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4182
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4250
Tested-by: jenkins
The AnalyticEvalNode needs to re-use intermediate state tuples so it cannot
call Finalize() for agg fns that clean up intermediate state. Those fns
need to have a GetValue() method which just returns the result.
This adds a GetValue() method for avg() (all types) and min()/max() (only
needed for strings).
Change-Id: Iedd6b026a1a256d9577dbb4c37824ac9282319ca
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4199
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
(cherry picked from commit d3fe94e8dba1d7b3698db9849058dacf14657292)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4237
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
This fails for the same reason as the sequence writer. It passes locally but fails in
zlib on the jenkins boxes. I suspect something is wrong with our gzip codec or the
version of zlib installed on those machines (we've disabled this for parquet as well).
Change-Id: I706186fbb6207fa694b4e61c7114e17c1ffe3482
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4221
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4260
Reviewed-by: Nong Li <nong@cloudera.com>
Fixes IMPALA-1200: resources for tuples may be returned in output batches too
early (i.e. the tuple may still be needed for rows that will be returned
later). We cannot just return resources after some number of tuples have been
allocated as they may still be needed, so this adds a second mem pool for
previously allocated tuples that can be transferred to output row batches. We
keep track of the last row containing resources in that pool so we can be sure
to only transfer the resources once that last row has been returned to the
parent.
Also addresses IMPALA-1206 (DCHECK failure)
Change-Id: I34b823ffb8d54263ea76e071d10ccae1cef0db99
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4187
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
(cherry picked from commit bc51ebaafea0ba5e1b97f4b3237ecfe241a9e674)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4224
Tested-by: jenkins
This commit adds support for uncorrelated EXISTS subqueries in Impala.
Uncorrelated EXISTS subqueries are rewritten using a CROSS JOIN.
Uncorrelated NOT EXISTS subqueries are not supported.
Change-Id: I0003dcdc0fa5cc99931b9a9f4deddbcd42572490
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4140
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4186
Adds the rank() and dense_rank() analytic functions and makes internal
changes to the AggFnEvaluator that are necessary to support calling
Finalize() repeatedly (as the AnalyticEvalNode does) on UDAs that destroy
state in Finalize().
Rank requires both the current rank and the count at that rank in order to
determine the next rank, so the intermediate state is a StringVal containing
a struct with these two fields.
Aggregate functions (internally only, for now) can expose a GetValue() method
which takes an intermediate value and returns a final value without destroying
the intermediate state. Finalize() is then used to clean up intermediate
state, if necessary.
This also adds a second optional, internal-only function for UDAs to allow
removing values from intermediate state: Remove(). This will be required for
implementing sliding windows later but is added here because the change is
nearly identical to that for adding GetValue().
Some cleanup in the AnalyticEvalNode, most notably we avoid allocating tuples
to DeepCopy prev_input_row_ between input batches. Instead, we keep the last
two child row batches because the prev child row batch owns the resources for
prev_input_row_.
Change-Id: I5a30eb517a38d369fe63f7af91904a4b9786fadc
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3962
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 137bb45d81ea57655aefbf5cde0cbeab0121b8f0)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4183
This patch adds support for:
- Planning of multiple analytic exprs from a select block
- Simple grouping of analytic exprs by partition/order/window
to reduce data exchanges and sorts
Change-Id: Ie2162558b2bc2e6218c30e694393e85cbf3251ff
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4120
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4168
Adds support in the AnalyticEvalNode for ROWS windows with the start
boundary UNBOUNDED PRECEDING, i.e. the end boundary can specify an
offset or CURRENT ROW.
To reduce complexity where we maintain windows and determine when output
results can be produced (ProcessInputBatch), the logic that depends on
the window is factored into several functions. The core functionality
remains the same: for every input row, produce output results if possible,
update the analytic functions, and add the row to the input_stream_ to be
returned later when enough results are available. The functions
TryFinalizePrevRow, TryFinalizeCurrentRow, and InitializeNewPartition
are now called and handle the various window types appropriately.
Change-Id: I36cf76bf11d9e8b48d2556169683abcb43c1db7a
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4073
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 421a032035fcb13e03f8e7d34b4908f1221fd9f5)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4163
Reviewed-by: Matthew Jacobs <mj@cloudera.com>