FIRST_VALUE with row offsets preceding did not produce the correct
results. This fix changes the rewrite for FIRST_VALUE and adds
additional handling for NULLs in the backend.
Change-Id: I03d54c05f63f46e9adb467008fa876ab33812c7b
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4648
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
Fixes test failures in exhaustive mode when aggregation tests
are run on table formats that do not support decimal.
Change-Id: Ic5dfb398575770cf318ffcc0ce3a20737bb2f5cd
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4636
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
If an error occurred loading a library in LibCache (e.g. by using
CREATE FUNCTION) an error is returned but a cache entry may still
exist which may result in strange errors later when the cache
entry is accessed by subsequent queries.
This changes LibCache::GetCacheEntry to ensure cache entries do
not exist if errors occur. Because GetCacheEntry needs to take
the global lock and then the cache entry lock, but needs to
unlock the global lock before performing slow HDFS operations,
we set the error status on the cache entry so that all locks
can be released when an error occurs. Other threads that attempt
to access the cache entry check the status and return if it is
not OK. The first thread (the thread that got the error) can
then remove the cache entry whenever it is able to again acquire
the global lock_.
Change-Id: I00fd0e2a4611b06fa72ffe0aaaa7d077b7a0c36e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4642
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
Adding a bitstring at the head of each block in the TupleStream that indicates which
tuples of the appended rows in the block are NULLs. When reading the stream, through
GetNext() or GetTupleRow() calls, the NULL tuples are stitched back to their correct
position.
This fixes crashes in PHJ of bushy plans with NULLs on the build side(s) as well as
similar crashes in PAGG and the analytic node.
For example, it fixes IMPALA-1204, IMPALA-1223, and IMPALA-1249.
Also, adds regression tests for IMPALA-1175, IMPALA-1204, IMPALA-1223, IMPALA-1249
and IMPALA-1306.
Change-Id: I30ad0dbd4dfeabcda8fae444d1c6ec9291f38398
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4596
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
With this commit we enable correlated and uncorrelated EXISTS
subqueries with grouping and/or aggregation including analytic
functions. Furthermore, we enable correlated EXISTS subqueries
with a LIMIT clause.
Change-Id: I36c33f80b152b7f175bf803cbe920ce1983d7162
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4583
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Avro and Sequence writers are only available if query option
ALLOW_UNSUPPORTED_FORMATS is set to true, prints an error otherwise.
Change-Id: I597039f7c68f708fda10f848531eb557d6910f92
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4539
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
This adds DECODE functionality into the existing CaseExpr class. There
will be no separate backend impementation for DECODE, it will be sent to
the backend as a CASE expr so the existing codegen function can be used.
Because Oracle does cast checking during execution and Impala cast
checking during analysis, some uses of DECODE that are valid in Oracle
are invalid in Impala.
Ex:
SELECT DECODE(foo, bar, int_col, baz, string_col_containing_only_ints)
FROM ...
would be run on Oracle. If string_col_containing_only_ints actually
contained non-INTs, an error would be thrown during execution and no
results would be returned. In Impala an error is thrown during analysis.
If a CAST was added to the STRING column, a cast failure would result in
NULL.
Change-Id: Ia08cc2389abb6f843bba117e7091c659ad25ff41
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4334
Tested-by: jenkins
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Casey Ching <casey@cloudera.com>
This was always a TODO. We want memory to come from the block mgr and trigger spilling.
Change-Id: I07f1f79fbbb33068fb2df64510a80a9b008ef73d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4466
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
Adds fixes and tests for Hive CHAR & VARCHAR compatibility.
Also fixes a bug in tuple materialization for VARCHAR and non in-lined CHAR.
Change-Id: I400b089cb8ddba2e264ef9f2e37956b2ceaaf9fb
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4054
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
The AnalyticEvalNode had a DCHECK that expected the buffered tuple to
only be set when it was needed (i.e. when there are partition exprs or
order by exprs). However, the FE creates a buffered tuple for an entire
sort group when any AnalyticEvalNodes in the sort group need it and that
tuple is set for all nodes. This reverses the logic so that we DCHECK
the buffered tuple is set when it is needed.
Change-Id: If54b303bc439f235da06a542b46a35c61da9e1bd
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4489
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
NULL
This commit fixes a bug in the implementation of the null-aware anti
join that resulted in wrong results being returned from NOT IN correlated
subqueries in the presence of nulls.
Change-Id: I6f2eb326ec7e40d80ec8da94ba33946b9ac9b115
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4506
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Added several tests for analytical functions
Tests for the following have not been added because it's not implemented yet:
- Lag, Lead functions
- Window clauses
Change-Id: I34546c967a6d29c97327f4cba405006a50867dcb
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4307
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
Fixes:
IMPALA-1256: Nested analytic: AnalysisException: select list expression not produced by aggregation output
IMPALA-1280: Crash running analytic with LEFT SEMI JOIN
Change-Id: I98b8f90de0079afad5b2d547abc27bcee57651f3
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4500
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Defer resizing the columns_ vector until we are sure we will initialize it.
Downstream code doesn't expect any NULLs.
Change-Id: I250cceee5181428fcd3cd1a8b021edb7187ae888
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4465
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
Tested-by: jenkins
The 'less than' predicate created by AnalyticPlanner used to check if the
previous row was less than the current row is not exactly what we want
to determine when rows in RANGE windows (the default window in this case)
share the same result values. Rows get the same results when the order by
exprs evaluate equally or both null, so it's easiest (and more efficient)
to use a predicate that simply checks equality or both null. We already
create such predicates for checking for partition boundaries, so this is
a trivial change.
When we support arbitrary RANGE window offsets we will likely want to
add similar predicates that compare two tuples plus/minus the offset,
but those will be simpler because there can be only one order by expr
when specifying RANGE offsets with PRECEDING/FOLLOWING.
Change-Id: I52ff6203686832852430e498eca6ad2cc2daee98
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4474
Tested-by: jenkins
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
1) Adds BE support for RANGE windows between UNBOUNDED PRECEDING to
UNBOUNDED FOLLOWING.
2) RANGE windows with offset boundaries fail analysis because they're
not supported by the BE yet.
Change-Id: I734575eb87c909d09d24c4df028023f3b50d3cb5
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4442
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
Fixed a bug when setting the length in reading/write text files for CHAR(N).
Also added chars_tiny table for testing CHAR(N) and VARCHAR(N).
Change-Id: If5d5db30afa4b00cf03c68c6a845f182970329f4
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4415
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
Like left/right outer joins, anti joins have a uni-directional value transfer.
Predicates could be pushed into anti joined plan subtrees if the condition
was inverted, but this patch does not implement this optimization.
No special consideration must be made to prevent predicate assignment
into anti-joined branches because anti-joined tuples are invisible outside
of the On-clause, and therefore, all unassigned conjuncts referencing the
invisible tuple must come from the original join's On-clause. The assignment
of such predicates is already handled correctly.
Change-Id: Ic2b94f6eb57e000ea51e253035e713288b205298
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4425
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
This commit fixes the issue (IMPALA-1215) where NOT IN subqueries return
wrong results in the presence of NULL values. The null-matching equality
operator is introduced in the front-end and the NOT IN subqueries are
rewritten using the null-aware anti-join operator.
Change-Id: I5a323357025d77c2143db86e1057999ec8a371c0
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4391
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
1) Fix ROWS following start bound where window is never fully in partition
2) Fix sum() NULL handling over sliding windows and add/consolidate tests.
sum() should return NULL when all non-NULL values are removed. Because
sum only stores the current sum as the intermediate value, we can't know
if the sum is actually 0 or if there are no non-NULL values in the window.
(avg() doesn't have this problem because it explicitly keeps the count
of the number of elements in the average as part of the intermediate state.)
Instead of changing sum() to have more intermediate state (which would
affect aggregations), we can just keep track of the number of calls to
Update() and Remove() in the FunctionContextImpl and check in SumRemove()
whether or not there are any non-null elements being summed. Added
tests (verified with Oracle).
3) Fixed a bug where the state tracking the last result tuple could be
wrong and resulting in a crash.
4) IMPALA-1269: Windows between a start offset to CURRENT ROW might could
produce wrong results between partitions.
5) IMPALA-1273: Incorrect results with very large window and small table
Tests are included for all issues.
Change-Id: I0f396c24078a1494fb977e8775f1ca8c530932eb
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4397
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
The previous code did not handle well the case where the spilling happens when
building the hash table (i.e. partitioning the build rows fit). This caused the
probe partition to be starved causing queries that should be able to run to fail
with a not enough buffers error.
Change-Id: I3a9a84e8800a72ed3ce6f5ab7ff03bc2d6eb7ad8
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4403
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
This patch addresses:
1. Char doesn't use codegen
2. Not in-lining large CHAR(N) for N > 128
3. Parquet reader/writer for CHAR(N) and VARCHAR(N)
Change-Id: I83a29a8bd312841a3e29bfe2243884074570f247
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4280
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
This patch also fixes IMPALA-1164: NDV() now returns a BIGINT (and not STRING).
Change-Id: Ia2a3272204938579d61091ee4f7f2d1cbf38ed55
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4338
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Adds support in the BE AnalyticEvalNode for ROWS windows with arbitrary
start bounds. If there is a start bound specified a sliding window must
be maintained. As input rows are processed they are added to the window.
As they expire from the window, they are 'removed' from the current
intermediate state of the evaluators (stored in curr_tuple_) by calling
AggFnEvaluator::Remove(). This is an initial implementation that keeps
the tuples in the window in memory. We can improve this later by using
the BufferedTupleStream with an Iterator interface supporting multiple
readers.
This also fixes IMPALA-1253: LAST_VALUE returns incorrect results
Change-Id: Id5daf6c060ab4079bb8dacf2db8992985894a820
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4335
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
Queries like;
INSERT INTO table VALUES (CAST("..." AS CHAR(N)))
Used codegen path and failed; changed to use interpreted path.
Change-Id: Id80274580df268b3f828dec19a2e0b0578061ca8
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4362
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
This patch fixes two issues:
- Add API to buffered block mgr to allow an atomic Unpin and GetNewBlock. This has
the semantics of unpinning a block and giving the buffer to the new block. This
is necessary for the tuple stream to make sure another thread does not grab the
unpinned block in between.
- Buffer management reading an unpinned stream. Before moving onto a new block (and
unpinning the current), we need to make sure all the tuples returned from the
current block are returned up the operator tree.
Change-Id: I95ee58d1019dd971f6a7dc19ecafdfa54cdbf942
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4333
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
We weren't returning the right merge function for decimal in
GetAvgFunction(). Someday the functions will be registered in the FE
like for scalar functions.
Change-Id: I1153ef8570b0e78f0925b7d3d58ec3b0fbb2c589
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4336
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Adds support for displaying all or a subset of the privileges granted to a role. Users
have privileges to execute this statement if they are already granted the role or if
they are an admin user on the Sentry Policy Service. The output includes:
* The target scope of the privilege
* The privilege level
* The target names in the object hierarchy
* Whether the privilege was granted using WITH GRANT OPTION
* The create time of the privilege
Examples:
-- Show all grants in role1
SHOW GRANT ROLE role1
-- Shows all grants in role1 on the database foo
SHOW GRANT ROLE role1 on DATABASE foo
Output looks like:
+----------+------------+-------+-----+-----------+--------------+-------------------------------+
| scope | database | table | uri | privilege | grant_option | create_time |
+----------+------------+-------+-----+-----------+--------------+-------------------------------+
| DATABASE | functional | | | ALL | false | Fri, Sep 19 2014 16:13:40.999 |
+----------+------------+-------+-----+-----------+--------------+-------------------------------+
Change-Id: I8ef1b87a4c22c8fba4228012668033d7f9d06fcb
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4389
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins