Commit Graph

713 Commits

Author SHA1 Message Date
Skye Wanderman-Milne
f2b01997df Allow UDA intermediates to use CHAR. Update stddev/var to use it.
Change-Id: I791c6389978f4994cba33f01273e94343a163916
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4368
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-09-25 19:37:02 -07:00
Skye Wanderman-Milne
7f87e7e5b5 IMPALA-1111: Fix alignment in ReservoirSample aggregate functions
Change-Id: Iac7aa96eb19079715a7e8152a5edfeafa0d50bc7
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4478
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-09-25 19:37:02 -07:00
Alex Behm
88ae4c9080 Fix HBase region splitting for tests.
It appears that HBase sometimes ignores an admin.splitRegion() RPC,
which made our region splitting fail. As a workaround, this patch adds
another retry loop such that the split/wait sequence is attempted
multiple times.

Change-Id: I9aa8ab87bba79ea11b79c50f15328b8be844924d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4557
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-09-25 18:44:28 -07:00
Dimitris Tsirogiannis
f21aed16fd Bug fixes in null-aware anti-join
This commit fixes issue IMPALA-1215 where NOT IN subqueries return wrong
results in the presence of null values.

Change-Id: I97e41c8df8ba864d0189595d670b3f0349fcad36
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4467
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-09-23 07:33:23 -07:00
Dan Hecht
47a11578d4 IMPALA-1272: fix crash when compression codec is invalid for parquet
Defer resizing the columns_ vector until we are sure we will initialize it.
Downstream code doesn't expect any NULLs.

Change-Id: I250cceee5181428fcd3cd1a8b021edb7187ae888
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4465
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
Tested-by: jenkins
2014-09-23 07:33:13 -07:00
Matthew Jacobs
28fc8ddf60 IMPALA-1292: Incorrect result in analytic SUM when ORDER BY column is null
The 'less than' predicate created by AnalyticPlanner used to check if the
previous row was less than the current row is not exactly what we want
to determine when rows in RANGE windows (the default window in this case)
share the same result values. Rows get the same results when the order by
exprs evaluate equally or both null, so it's easiest (and more efficient)
to use a predicate that simply checks equality or both null. We already
create such predicates for checking for partition boundaries, so this is
a trivial change.

When we support arbitrary RANGE window offsets we will likely want to
add similar predicates that compare two tuples plus/minus the offset,
but those will be simpler because there can be only one order by expr
when specifying RANGE offsets with PRECEDING/FOLLOWING.

Change-Id: I52ff6203686832852430e498eca6ad2cc2daee98
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4474
Tested-by: jenkins
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
2014-09-23 07:32:43 -07:00
Matthew Jacobs
08a5204594 Analytic Fns: BE support for range unbounded on both sides and range offsets fail analysis
1) Adds BE support for RANGE windows between UNBOUNDED PRECEDING to
   UNBOUNDED FOLLOWING.
2) RANGE windows with offset boundaries fail analysis because they're
   not supported by the BE yet.

Change-Id: I734575eb87c909d09d24c4df028023f3b50d3cb5
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4442
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
2014-09-23 07:32:21 -07:00
Marcel Kornacker
0b3124ab35 Analytic plan optimization: taking advantage of the hash partitioning of the preceding aggregation.
- determine the partition group that has maximal intersection of its partition exprs with the
  preceding grouping exprs
- if that intersection's expected ndv > #nodes, make that partition group the first one in the sequence
  to be computed and reduce the hash partition of the preceding aggregation to that intersection

Change-Id: I612b4a260a8975deb495e5d34c32f03db4a7cca7
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4451
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
2014-09-23 07:32:04 -07:00
Victor Bittorf
9939c9d009 Bugfix and tests for CHAR(N) and VARCHAR(N)
Fixed a bug when setting the length in reading/write text files for CHAR(N).
Also added chars_tiny table for testing CHAR(N) and VARCHAR(N).

Change-Id: If5d5db30afa4b00cf03c68c6a845f182970329f4
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4415
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
2014-09-23 07:30:07 -07:00
Matthew Jacobs
8a75e759cb Move analytic fns test case for decimal to decimal.test
Change-Id: Ic6e02484f47f9a9c47924850c8cf12daf8574c8c
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4449
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
2014-09-23 07:26:32 -07:00
Matthew Jacobs
57addd34ac Analysis error for min()/max() w/ analytic windows without UNBOUNDED PRECEDING
min()/max() do not currently support windows without UNBOUNDED PRECEDING,
so this changes AnalyticExpr to detect this during analysis and throw an
AnalysisException.

Also removed some stale TODOs in the BE

Change-Id: I734b0a5d5399f9bb9d4db6ab1ddc079237b0ac03
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4431
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-09-23 07:26:21 -07:00
Matthew Jacobs
da5198e615 Add spilling test for an analytic fn
Change-Id: Ia93c71c9c2a01f7f04a81593d51f5ca565286b7d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4447
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-09-23 07:26:09 -07:00
Alex Behm
8345494fb1 IMPALA-1249: Anti joins have a uni-directional value transfer.
Like left/right outer joins, anti joins have a uni-directional value transfer.
Predicates could be pushed into anti joined plan subtrees if the condition
was inverted, but this patch does not implement this optimization.

No special consideration must be made to prevent predicate assignment
into anti-joined branches because anti-joined tuples are invisible outside
of the On-clause, and therefore, all unassigned conjuncts referencing the
invisible tuple must come from the original join's On-clause. The assignment
of such predicates is already handled correctly.

Change-Id: Ic2b94f6eb57e000ea51e253035e713288b205298
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4425
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-09-23 07:25:51 -07:00
Alex Behm
0791eb2ee5 IMPALA-1281: Restrict re-ordering of cross joins.
This patch restricts the leftmost table ref candidates of cross joins to the
very first join (like we already do for outer/semi joins). Join inversion
is still considered for cross joins.

While conceptually possible, it is tricky to reason about allowing the rhs of
arbitrary cross-join table refs as the leftmost candidate during join
re-ordering. We would have to carefully change the joinOps of all table refs
in between, and ensure to not make those changes in place to avoid "polluting"
the table refs for the next round of join re-ordering (considering a new
leftmost table ref). The safer fix is to restrict the considered orders.

Change-Id: I5fdc323e4a9c2dada06d9aec81769057f7076299
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4438
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-09-23 07:25:37 -07:00
Nong Li
8a661d0787 [CDH5] cherry pick conflicts.
Change-Id: Ic11237b7ead4a810b523d6b6095781efbc5bb66b
2014-09-20 19:41:42 -07:00
Dimitris Tsirogiannis
3b5f1d3ab5 Rewrite NOT IN subqueries with a null-aware anti-join.
This commit fixes the issue (IMPALA-1215) where NOT IN subqueries return
wrong results in the presence of NULL values. The null-matching equality
operator is introduced in the front-end and the NOT IN subqueries are
rewritten using the null-aware anti-join operator.

Change-Id: I5a323357025d77c2143db86e1057999ec8a371c0
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4391
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
2014-09-20 16:13:49 -07:00
Matthew Jacobs
8de30cbdb6 Simplify FIRST_VALUE analytic function implementation
Change-Id: I290adcaf50e9f5d5831eab4d67513d251e5fbe3e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4418
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
2014-09-20 16:13:14 -07:00
Marcel Kornacker
ec3ac883bf Analytic plan optimizations.
- order partition and sort groups by increasing output tuple size
- the unpartitioned partition group goes last
- if the unpartitioned partition group contains an order-by, that is executed like
  a regular distributed sort (with a merging exchange)
- coalesce sort groups that have a compatible partition-by clause and one is a prefix
  of the other

Still missing:
- ordering comparison should take equivalence classes into account

Change-Id: Ie604c74f7804a9028f3ab59ce0c291deb0edb272
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4399
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-09-20 16:12:50 -07:00
Matthew Jacobs
9ffb9069b6 Fix multiple bugs in analytic fns BE and improved/consolidated tests
1) Fix ROWS following start bound where window is never fully in partition
2) Fix sum() NULL handling over sliding windows and add/consolidate tests.
   sum() should return NULL when all non-NULL values are removed. Because
   sum only stores the current sum as the intermediate value, we can't know
   if the sum is actually 0 or if there are no non-NULL values in the window.
   (avg() doesn't have this problem because it explicitly keeps the count
   of the number of elements in the average as part of the intermediate state.)
   Instead of changing sum() to have more intermediate state (which would
   affect aggregations), we can just keep track of the number of calls to
   Update() and Remove() in the FunctionContextImpl and check in SumRemove()
   whether or not there are any non-null elements being summed. Added
   tests (verified with Oracle).
3) Fixed a bug where the state tracking the last result tuple could be
   wrong and resulting in a crash.
4) IMPALA-1269: Windows between a start offset to CURRENT ROW might could
   produce wrong results between partitions.
5) IMPALA-1273: Incorrect results with very large window and small table

Tests are included for all issues.

Change-Id: I0f396c24078a1494fb977e8775f1ca8c530932eb
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4397
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
2014-09-20 16:12:44 -07:00
Nong Li
6b73eec02d PHJ: Fix block management when spilling.
The previous code did not handle well the case where the spilling happens when
building the hash table (i.e. partitioning the build rows fit). This caused the
probe partition to be starved causing queries that should be able to run to fail
with a not enough buffers error.

Change-Id: I3a9a84e8800a72ed3ce6f5ab7ff03bc2d6eb7ad8
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4403
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-09-20 16:12:21 -07:00
Victor Bittorf
6289121261 CHAR(N) Followup Patch
This patch addresses:
  1. Char doesn't use codegen
  2. Not in-lining large CHAR(N) for N > 128
  3. Parquet reader/writer for CHAR(N) and VARCHAR(N)

Change-Id: I83a29a8bd312841a3e29bfe2243884074570f247
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4280
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
2014-09-20 16:12:03 -07:00
Skye Wanderman-Milne
2a449651da Use CRC hash for 0th partition level.
Change-Id: Ie845e0edb684f13421eea41327b1571b368db21a
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4370
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-09-20 16:11:40 -07:00
Alex Behm
0fb380961c IMPALA-1187: Add appx_count_distinct query option to rewrite COUNT(DISTINCT) to NDV().
This patch also fixes IMPALA-1164: NDV() now returns a BIGINT (and not STRING).

Change-Id: Ia2a3272204938579d61091ee4f7f2d1cbf38ed55
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4338
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-09-20 16:11:34 -07:00
Alex Behm
ae7f59a65a Cost-based inversion of outer, semi and cross joins.
Change-Id: I7ce8847aadb5028ea5655ef2437ad31ab277e6de
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4323
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-09-20 16:11:25 -07:00
Marcel Kornacker
abeac801d9 IMPALA-1243: Incorrect plan in analytic using inline view
Prevent predicate propagation into inline views containing analytic exprs by
a) not explicitly pushing predicates into such inline views
b) blocking the propagation path by not registering auxiliary eq predicates for the inline view

Change-Id: Ie2961d36a532f6b4603a4edbe7dd9cf0a6882d75
2014-09-20 16:10:23 -07:00
Matthew Jacobs
0facf61296 Analytic Functions: BE support for ROWS windows with arbitrary start bounds
Adds support in the BE AnalyticEvalNode for ROWS windows with arbitrary
start bounds. If there is a start bound specified a sliding window must
be maintained. As input rows are processed they are added to the window.
As they expire from the window, they are 'removed' from the current
intermediate state of the evaluators (stored in curr_tuple_) by calling
AggFnEvaluator::Remove(). This is an initial implementation that keeps
the tuples in the window in memory. We can improve this later by using
the BufferedTupleStream with an Iterator interface supporting multiple
readers.

This also fixes IMPALA-1253: LAST_VALUE returns incorrect results

Change-Id: Id5daf6c060ab4079bb8dacf2db8992985894a820
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4335
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
2014-09-20 16:08:12 -07:00
Victor Bittorf
a1892a17d5 IMPALA-1248: Fixed CHAR(N) in VALUES clause.
Queries like;
INSERT INTO table VALUES (CAST("..." AS CHAR(N)))
Used codegen path and failed; changed to use interpreted path.

Change-Id: Id80274580df268b3f828dec19a2e0b0578061ca8
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4362
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
2014-09-20 16:07:16 -07:00
Alex Behm
7355d9c221 IMPALA-1247: In a 2-phase agg the 1st phase should output its intermediate tuple.
Change-Id: I8f7ba0551099b6cf524baf6bd6f848d02896418d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4378
Tested-by: jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-09-20 16:07:06 -07:00
Dimitris Tsirogiannis
335c46a206 IMPALA-1228 Incorrect rewrite of scalar agg subquery
This commit fixes an issue where, for the case of a scalar agg subquery
with count, the zeroifnull fn is applied to the wrong column during
query rewrite.

This commit also fixes IMPALA-1237.

Change-Id: Ic00ae5799d6970171c007e3ed25348d7fc09d825
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4286
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
2014-09-20 16:06:47 -07:00
ishaan
c4b4e010ff Buffered Tuple Stream fixes.
This patch fixes two issues:
  - Add API to buffered block mgr to allow an atomic Unpin and GetNewBlock. This has
    the semantics of unpinning a block and giving the buffer to the new block. This
    is necessary for the tuple stream to make sure another thread does not grab the
    unpinned block in between.
  - Buffer management reading an unpinned stream. Before moving onto a new block (and
    unpinning the current), we need to make sure all the tuples returned from the
    current block are returned up the operator tree.

Change-Id: I95ee58d1019dd971f6a7dc19ecafdfa54cdbf942
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4333
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-09-20 16:05:11 -07:00
Skye Wanderman-Milne
f8905ea485 Fix AVG codegen
We weren't returning the right merge function for decimal in
GetAvgFunction(). Someday the functions will be registered in the FE
like for scalar functions.

Change-Id: I1153ef8570b0e78f0925b7d3d58ec3b0fbb2c589
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4336
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-09-20 16:02:47 -07:00
Nong Li
209927fbf2 Fix spilling with aggregated exprs with string slots.
This patch also does some clean up on agg-fn-evaluator to support this case.

Change-Id: If7e5c8663c7d371b2666acaab2966a2cd7bdccf9
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4256
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-09-20 16:02:02 -07:00
Lenni Kuff
9e43e4b5e8 [CDH5] Add support for SHOW GRANT ROLE <roleName> [ON <privilege spec>]
Adds support for displaying all or a subset of the privileges granted to a role. Users
have privileges to execute this statement if they are already granted the role or if
they are an admin user on the Sentry Policy Service. The output includes:
* The target scope of the privilege
* The privilege level
* The target names in the object hierarchy
* Whether the privilege was granted using WITH GRANT OPTION
* The create time of the privilege

Examples:
-- Show all grants in role1
SHOW GRANT ROLE role1

-- Shows all grants in role1 on the database foo
SHOW GRANT ROLE role1 on DATABASE foo

Output looks like:
+----------+------------+-------+-----+-----------+--------------+-------------------------------+
| scope    | database   | table | uri | privilege | grant_option | create_time                   |
+----------+------------+-------+-----+-----------+--------------+-------------------------------+
| DATABASE | functional |       |     | ALL       | false        | Fri, Sep 19 2014 16:13:40.999 |
+----------+------------+-------+-----+-----------+--------------+-------------------------------+

Change-Id: I8ef1b87a4c22c8fba4228012668033d7f9d06fcb
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4389
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-09-19 21:08:05 -07:00
Lenni Kuff
e4deef07bf [CDH5] Add support for WITH GRANT OPTION/REVOKE GRANT OPTION
This change adds support for GRANT <privilege> TO <role> WITH GRANT OPTION which allows
delegating GRANT/REVOKE authority to non-admin users. Specifically, it allows users who
have been granted the specified role to execute GRANT/REVOKE statements on all child
objects. For example, you can now do something like:
GRANT ALL ON DATABASE foo TO role1 WITH GRANT OPTION
and everyone granted role1 will be able to execute GRANT/REVOKE statements on database
foo OR any of the tables in the database.
It also adds support for REVOKE GRANT OPTION FOR <privilege> FROM <role> which allows
removing a previous WITH GRANT OPTION without actually deleting the privilege.
Similar to GRANT/REVOKE statements, the actual authorization checks on whether a user
should/should not have privileges to execute these options is done at the Sentry Service
level.

Change-Id: I8757569a3bdb68414e315ef37d6845b1859eb758
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4377
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-09-19 17:45:15 -07:00
Lenni Kuff
293ead3b2a [CDH5] Authorize SHOW ROLES statements and support SHOW CURRENT ROLES
This patch adds the necessary changes required to authorize SHOW ROLES statements.
This is not as easy as it could be because the Sentry Service doesn't currently
expose the metadata for who is/isn't authorized to execute these statements. To authorize
the statements, we need to first make an RPC to the Sentry Service (via the
Catalog Server) and then only proceed with the SHOW statement if the check succeeds.
We should consider revisiting this approach in the future when more metadata is available
from Sentry.

Additionally, this patch adds support for SHOW CURRENT ROLES which shows all roles
that are currently granted to the current user.

Change-Id: Ia01c20d58ab081f49a85566075836d8c6e25dbd4
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4367
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-09-19 05:41:33 -07:00
Henry Robinson
6bc411c890 Add support for HS2 protocol V6
This patch adds support for V6 of the HS2 protocol, which notably
includes columnar organisation of result sets. Clients that set their
protocol version to < V6 will receive result sets in the traditional row
orientation.

The performance of fetches over HS2 goes up significantly as a result,
since the V1 protocol had some pathologies in its deserialisation
performance.

 Beeswax
  Row materialisation: 455ms, client processing time: 523ms
 HS2 V6:
  Row materialisation: 444ms, client processing time: 1.8s
 HS2 V1:
  Row materialisation: 585ms, client processing time: 15.9s (!)

TODO: Add support for the CHAR datatype

The following patch is also included:

Fix wait-for-hiveserver2.py when Impala moves to HS2 V6

Due to HIVE-6050, older versions of Hive are not compatible with newer
clients (even those that try to use old protocol
versions). wait-for-hiveserver2.py uses HS2 to talk to the HiveServer2
service, but picks up the newer version from V6, and fails.

This patch temporarily re-adds cli_service.thrift (renaming the Thrift
service as LegacyTCLIService) only for wait-for-hiveserver2.py to
use. As soon as Impala's thirdparty Hive moves to HS2 V6, we can get rid
of this change.

Change-Id: I2cbe884345ae7e772620b80a29b6574bd6532940
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4402
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-09-18 20:17:18 -07:00
Mike Yoder
d1e83f8280 Support for simultaneous LDAP and Kerberos authentication.
Prior to this work, the impalad could either authenticate with
Kerberos, or authenticate with LDAP.  This fixes that so that both can
co-exist in the same daemon.  Prior code had both a
KerberosAuthProvider and an LdapAuthProvider; this is refactored into
a single SaslAuthProvider that potentially contains both LDAP and
Kerberos.

The terminology of "client facing" and "server facing" has been
replaced with "external" and "internal".  External is for clients like
the impala shell, odbc, jdbc, etc.  Internal is for daemon <-> daemon
communication.

The notion of the "auxprop" plugin is removed, as that was dead code.

The Thrift code is enhanced to pass the Realm information from the
SaslAuthProvider down to the underlying SASL library.

Change-Id: I0a0b968a107c0b25610ca37295c3fee345ecdd6d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4051
Reviewed-by: Michael Yoder <myoder@cloudera.com>
Tested-by: jenkins
2014-09-18 12:54:45 -07:00
Alex Behm
5877f12be6 IMPALA-995: Add plan hints embedded in comments and preserve them in views.
This patch adds two new hint styles:
1. Traditional commented hint: /* +hint1,hint2,hint3 */
2. End-of-line commented hint: -- +hint1,hint2,hint3\n

We now preserve hints when creating views. We always use the
end-of-line commented hint style to allow Hive to read
hinted views created by Impala. Hive does not support
traditional /* */ comments, and attempts to parse /*+ */ as
hints, failing with a parse error on unrecognized hints.

This patch also changes Impala to only issue a warning
for unrecognized hints instead of throwing an error. This
allows Impala to run against hinted views created by Hive.

Change-Id: I6e8352442e763c0029f72c17363caa087572dca0
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4235
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4361
2014-09-18 00:36:03 -07:00
Ippokratis Pandis
946aa3089b Adding support in PHJ for right-{semi,anti} joins.
Changes needed for PHJ to support RIGHT {SEMI, ANTI} JOINs. Codegen works as well.
Basic parser tests and minimal (end-to-end) query tests.
Need to add analyzer tests and add more query tests.
Note that in the case of right-{semi,anti} and perhaps also on {right,full}-outer we
should not be broadcasting the build side.

Change-Id: I6854ee9e4640f809f0350229bcc00811fa474f07
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4288
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4369
2014-09-16 19:42:24 -07:00
Marcel Kornacker
af629cbbb0 IMPALA-1231 Analytic query with HAVING: IllegalStateException
By including the actual FunctionCallExpr as a child of an AnalyticExpr, we would end
up substituting it if it also happened to show up as a regular/non-analytic aggregate
expr in the query. The solution is only to include the function call parameters as
children of the AnalyticExpr, not the function call itself.

Plus, fixed up partition-by/order-by less-than predicates in AnalyticPlanner.

Change-Id: Ib5d33baa2f257f7a8a21dd536332d11c55fbdbca
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4327
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4342
2014-09-15 19:24:23 -07:00
Lenni Kuff
ffe9e4b74e [CDH5] Add support for GRANT/REVOKE to Impala
This change adds support for GRANT/REVOKE to Impala via the Sentry Service. This includes
support for creating and dropping roles, granting and revoking roles to/from groups,
granting/revoking privileges to/from roles, and commands to view role metadata.

The specific statements that are added in this patch are:
CREATE/DROP ROLE <roleName>
SHOW ROLES
SHOW ROLE GRANT GROUP <groupName>
GRANT/REVOKE ROLE <roleName> TO/FROM GROUP <groupName>
GRANT/REVOKE <privilegeSpec> TO/FROM <roleName

It does not include some of the fancier bulk-op syntax like support for granting multiple
roles to multiple groups in one statement.

This patch does not add support for the WITH GRANT OPTION to delegate GRANT/REVOKE
privileges to other users.

TODO:
* Authorize these statements on the client side. The current Sentry Service design makes
  it difficult to authorize any GRANT/REVOKE statement on the client (Impala) side.
  Privilege checks are done within the Sentry Service itself. There are a few different
  options available to let Impala "fail fast" and those changes will come in a follow
  on patch.

Change-Id: Ic6bd19f5939d3290255222dcc1a42ce95bd345e2
2014-09-13 21:21:10 -07:00
Matthew Jacobs
ea3b70d861 Add agg fns for remaining analytic ranking fns
Adds agg fns for FIRST_VALUE, LAST_VALUE, LAG, LEAD. Also adds
support for ROWS windows with the end bound as unbounded following
as long as the start bound is unbounded preceding.

Change-Id: I4856ae580164d17a1bbf7d45010b61f5afa5db50
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4249
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
2014-09-13 00:19:21 -07:00
Nong Li
a4e2f97845 Fix and add spilling test.
More tests coming.

Change-Id: I09e98adb6b011575572051eff1cd52e7be689fe8
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4311
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-09-13 00:19:21 -07:00
Alex Behm
de75278125 Add SHOW ANALYTIC FUNCTIONS and additional analysis checks.
Change-Id: Ic1aac60fb9b094349b9cfbec68608ac50fc5660c
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4298
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-09-13 00:19:21 -07:00
Matthew Jacobs
b143c0574d Fix a few bugs in the AnalyticEvalNode
IMPALA-1233: Crash running query with analytic in WITH clause
IMPALA-1232: Analytic eval node crashes if cancelling query before Open()

Change-Id: I9a263775b8ef670d0f819ed53d0af1eb96edf5c7
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4313
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
2014-09-13 00:19:20 -07:00
Victor Bittorf
8bebf2b196 CHAR: adding support for CHAR(N)
Support for CHAR is implemented as a StringVal in the backend.

TODO:
  1. Parquet Reader/writer
  2. Codegen slot ref
  3. Codegen text reader
  4. Don't inline large chars
  5. update impala-hs2-server.cc with CHAR support

Change-Id: Ibba2c89cea971cb740001ea7975bf3e929150471
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4075
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
2014-09-13 00:19:20 -07:00
Alex Behm
78efbf8903 Set canonical window/ordering/function when analyzing an AnalyticExpr.
Change-Id: I8f3cef32cbf67bd96abca02cd79468c2c30e6f48
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4218
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-09-13 00:19:20 -07:00
Alex Behm
503201794c Wrapping up planning of analytic functions.
This patch adds support for:
- analytic functions in inline views
- analytic functions referencing inline views
- analytic functions in unions
- analytic functions in subqueries/joins
- avoid generating plan for non-materialized analytic exprs
- predicate assignment and propagation onto and through
  analytic eval nodes

Change-Id: I195d32606af670f216b88e1145177fd1d66456eb
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4173
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-09-13 00:17:40 -07:00
Dimitris Tsirogiannis
e1e874a77f IMPALA-1212 Accept subquery as LHS or RHS of between operator
This commit fixes the issue where an error was thrown if a subquery was
used in either side of a between predicate. Between predicates with
subqueries are replaced by their corresponding compound predicates
during query rewrite.

Change-Id: I4315a6e91c9306c6817bf6aa6bc1d0b586a1a067
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4246
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
2014-09-13 00:17:36 -07:00
Skye Wanderman-Milne
3b7449a59b Codegen PartitionedHashJoinNode
This also reverts back to using CRC hash since FNV is not codegen'd
yet. The perf is not as good as the original HJ in a microbenchmark; I
haven't run a cluster run yet.

Change-Id: Ie4dc983f31631fbc78720425a0e354dd1d3342a6
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4219
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-09-13 00:17:33 -07:00