impala

mirror of https://github.com/apache/impala.git synced 2026-01-08 03:02:48 -05:00

Author	SHA1	Message	Date
Skye Wanderman-Milne	f2b01997df	Allow UDA intermediates to use CHAR. Update stddev/var to use it. Change-Id: I791c6389978f4994cba33f01273e94343a163916 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4368 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-09-25 19:37:02 -07:00
Skye Wanderman-Milne	7f87e7e5b5	IMPALA-1111: Fix alignment in ReservoirSample aggregate functions Change-Id: Iac7aa96eb19079715a7e8152a5edfeafa0d50bc7 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4478 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-09-25 19:37:02 -07:00
Dimitris Tsirogiannis	f21aed16fd	Bug fixes in null-aware anti-join This commit fixes issue IMPALA-1215 where NOT IN subqueries return wrong results in the presence of null values. Change-Id: I97e41c8df8ba864d0189595d670b3f0349fcad36 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4467 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-09-23 07:33:23 -07:00
Dan Hecht	47a11578d4	IMPALA-1272: fix crash when compression codec is invalid for parquet Defer resizing the columns_ vector until we are sure we will initialize it. Downstream code doesn't expect any NULLs. Change-Id: I250cceee5181428fcd3cd1a8b021edb7187ae888 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4465 Reviewed-by: Daniel Hecht <dhecht@cloudera.com> Tested-by: jenkins	2014-09-23 07:33:13 -07:00
Matthew Jacobs	28fc8ddf60	IMPALA-1292: Incorrect result in analytic SUM when ORDER BY column is null The 'less than' predicate created by AnalyticPlanner used to check if the previous row was less than the current row is not exactly what we want to determine when rows in RANGE windows (the default window in this case) share the same result values. Rows get the same results when the order by exprs evaluate equally or both null, so it's easiest (and more efficient) to use a predicate that simply checks equality or both null. We already create such predicates for checking for partition boundaries, so this is a trivial change. When we support arbitrary RANGE window offsets we will likely want to add similar predicates that compare two tuples plus/minus the offset, but those will be simpler because there can be only one order by expr when specifying RANGE offsets with PRECEDING/FOLLOWING. Change-Id: I52ff6203686832852430e498eca6ad2cc2daee98 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4474 Tested-by: jenkins Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-09-23 07:32:43 -07:00
Matthew Jacobs	08a5204594	Analytic Fns: BE support for range unbounded on both sides and range offsets fail analysis 1) Adds BE support for RANGE windows between UNBOUNDED PRECEDING to UNBOUNDED FOLLOWING. 2) RANGE windows with offset boundaries fail analysis because they're not supported by the BE yet. Change-Id: I734575eb87c909d09d24c4df028023f3b50d3cb5 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4442 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-09-23 07:32:21 -07:00
Victor Bittorf	9939c9d009	Bugfix and tests for CHAR(N) and VARCHAR(N) Fixed a bug when setting the length in reading/write text files for CHAR(N). Also added chars_tiny table for testing CHAR(N) and VARCHAR(N). Change-Id: If5d5db30afa4b00cf03c68c6a845f182970329f4 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4415 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins	2014-09-23 07:30:07 -07:00
Matthew Jacobs	8a75e759cb	Move analytic fns test case for decimal to decimal.test Change-Id: Ic6e02484f47f9a9c47924850c8cf12daf8574c8c Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4449 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: jenkins	2014-09-23 07:26:32 -07:00
Matthew Jacobs	da5198e615	Add spilling test for an analytic fn Change-Id: Ia93c71c9c2a01f7f04a81593d51f5ca565286b7d Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4447 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-09-23 07:26:09 -07:00
Alex Behm	8345494fb1	IMPALA-1249: Anti joins have a uni-directional value transfer. Like left/right outer joins, anti joins have a uni-directional value transfer. Predicates could be pushed into anti joined plan subtrees if the condition was inverted, but this patch does not implement this optimization. No special consideration must be made to prevent predicate assignment into anti-joined branches because anti-joined tuples are invisible outside of the On-clause, and therefore, all unassigned conjuncts referencing the invisible tuple must come from the original join's On-clause. The assignment of such predicates is already handled correctly. Change-Id: Ic2b94f6eb57e000ea51e253035e713288b205298 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4425 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-09-23 07:25:51 -07:00
Nong Li	8a661d0787	[CDH5] cherry pick conflicts. Change-Id: Ic11237b7ead4a810b523d6b6095781efbc5bb66b	2014-09-20 19:41:42 -07:00
Dimitris Tsirogiannis	3b5f1d3ab5	Rewrite NOT IN subqueries with a null-aware anti-join. This commit fixes the issue (IMPALA-1215) where NOT IN subqueries return wrong results in the presence of NULL values. The null-matching equality operator is introduced in the front-end and the NOT IN subqueries are rewritten using the null-aware anti-join operator. Change-Id: I5a323357025d77c2143db86e1057999ec8a371c0 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4391 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins	2014-09-20 16:13:49 -07:00
Matthew Jacobs	8de30cbdb6	Simplify FIRST_VALUE analytic function implementation Change-Id: I290adcaf50e9f5d5831eab4d67513d251e5fbe3e Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4418 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins	2014-09-20 16:13:14 -07:00
Matthew Jacobs	9ffb9069b6	Fix multiple bugs in analytic fns BE and improved/consolidated tests 1) Fix ROWS following start bound where window is never fully in partition 2) Fix sum() NULL handling over sliding windows and add/consolidate tests. sum() should return NULL when all non-NULL values are removed. Because sum only stores the current sum as the intermediate value, we can't know if the sum is actually 0 or if there are no non-NULL values in the window. (avg() doesn't have this problem because it explicitly keeps the count of the number of elements in the average as part of the intermediate state.) Instead of changing sum() to have more intermediate state (which would affect aggregations), we can just keep track of the number of calls to Update() and Remove() in the FunctionContextImpl and check in SumRemove() whether or not there are any non-null elements being summed. Added tests (verified with Oracle). 3) Fixed a bug where the state tracking the last result tuple could be wrong and resulting in a crash. 4) IMPALA-1269: Windows between a start offset to CURRENT ROW might could produce wrong results between partitions. 5) IMPALA-1273: Incorrect results with very large window and small table Tests are included for all issues. Change-Id: I0f396c24078a1494fb977e8775f1ca8c530932eb Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4397 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins	2014-09-20 16:12:44 -07:00
Nong Li	6b73eec02d	PHJ: Fix block management when spilling. The previous code did not handle well the case where the spilling happens when building the hash table (i.e. partitioning the build rows fit). This caused the probe partition to be starved causing queries that should be able to run to fail with a not enough buffers error. Change-Id: I3a9a84e8800a72ed3ce6f5ab7ff03bc2d6eb7ad8 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4403 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-09-20 16:12:21 -07:00
Victor Bittorf	6289121261	CHAR(N) Followup Patch This patch addresses: 1. Char doesn't use codegen 2. Not in-lining large CHAR(N) for N > 128 3. Parquet reader/writer for CHAR(N) and VARCHAR(N) Change-Id: I83a29a8bd312841a3e29bfe2243884074570f247 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4280 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins	2014-09-20 16:12:03 -07:00
Skye Wanderman-Milne	2a449651da	Use CRC hash for 0th partition level. Change-Id: Ie845e0edb684f13421eea41327b1571b368db21a Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4370 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-09-20 16:11:40 -07:00
Alex Behm	0fb380961c	IMPALA-1187: Add appx_count_distinct query option to rewrite COUNT(DISTINCT) to NDV(). This patch also fixes IMPALA-1164: NDV() now returns a BIGINT (and not STRING). Change-Id: Ia2a3272204938579d61091ee4f7f2d1cbf38ed55 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4338 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-09-20 16:11:34 -07:00
Alex Behm	ae7f59a65a	Cost-based inversion of outer, semi and cross joins. Change-Id: I7ce8847aadb5028ea5655ef2437ad31ab277e6de Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4323 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-09-20 16:11:25 -07:00
Matthew Jacobs	0facf61296	Analytic Functions: BE support for ROWS windows with arbitrary start bounds Adds support in the BE AnalyticEvalNode for ROWS windows with arbitrary start bounds. If there is a start bound specified a sliding window must be maintained. As input rows are processed they are added to the window. As they expire from the window, they are 'removed' from the current intermediate state of the evaluators (stored in curr_tuple_) by calling AggFnEvaluator::Remove(). This is an initial implementation that keeps the tuples in the window in memory. We can improve this later by using the BufferedTupleStream with an Iterator interface supporting multiple readers. This also fixes IMPALA-1253: LAST_VALUE returns incorrect results Change-Id: Id5daf6c060ab4079bb8dacf2db8992985894a820 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4335 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-09-20 16:08:12 -07:00
Victor Bittorf	a1892a17d5	IMPALA-1248: Fixed CHAR(N) in VALUES clause. Queries like; INSERT INTO table VALUES (CAST("..." AS CHAR(N))) Used codegen path and failed; changed to use interpreted path. Change-Id: Id80274580df268b3f828dec19a2e0b0578061ca8 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4362 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins	2014-09-20 16:07:16 -07:00
Alex Behm	7355d9c221	IMPALA-1247: In a 2-phase agg the 1st phase should output its intermediate tuple. Change-Id: I8f7ba0551099b6cf524baf6bd6f848d02896418d Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4378 Tested-by: jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-09-20 16:07:06 -07:00
ishaan	c4b4e010ff	Buffered Tuple Stream fixes. This patch fixes two issues: - Add API to buffered block mgr to allow an atomic Unpin and GetNewBlock. This has the semantics of unpinning a block and giving the buffer to the new block. This is necessary for the tuple stream to make sure another thread does not grab the unpinned block in between. - Buffer management reading an unpinned stream. Before moving onto a new block (and unpinning the current), we need to make sure all the tuples returned from the current block are returned up the operator tree. Change-Id: I95ee58d1019dd971f6a7dc19ecafdfa54cdbf942 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4333 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-09-20 16:05:11 -07:00
Skye Wanderman-Milne	f8905ea485	Fix AVG codegen We weren't returning the right merge function for decimal in GetAvgFunction(). Someday the functions will be registered in the FE like for scalar functions. Change-Id: I1153ef8570b0e78f0925b7d3d58ec3b0fbb2c589 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4336 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-09-20 16:02:47 -07:00
Lenni Kuff	9e43e4b5e8	[CDH5] Add support for SHOW GRANT ROLE <roleName> [ON <privilege spec>] Adds support for displaying all or a subset of the privileges granted to a role. Users have privileges to execute this statement if they are already granted the role or if they are an admin user on the Sentry Policy Service. The output includes: * The target scope of the privilege * The privilege level * The target names in the object hierarchy * Whether the privilege was granted using WITH GRANT OPTION * The create time of the privilege Examples: -- Show all grants in role1 SHOW GRANT ROLE role1 -- Shows all grants in role1 on the database foo SHOW GRANT ROLE role1 on DATABASE foo Output looks like: +----------+------------+-------+-----+-----------+--------------+-------------------------------+ \| scope \| database \| table \| uri \| privilege \| grant_option \| create_time \| +----------+------------+-------+-----+-----------+--------------+-------------------------------+ \| DATABASE \| functional \| \| \| ALL \| false \| Fri, Sep 19 2014 16:13:40.999 \| +----------+------------+-------+-----+-----------+--------------+-------------------------------+ Change-Id: I8ef1b87a4c22c8fba4228012668033d7f9d06fcb Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4389 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-09-19 21:08:05 -07:00
Lenni Kuff	e4deef07bf	[CDH5] Add support for WITH GRANT OPTION/REVOKE GRANT OPTION This change adds support for GRANT <privilege> TO <role> WITH GRANT OPTION which allows delegating GRANT/REVOKE authority to non-admin users. Specifically, it allows users who have been granted the specified role to execute GRANT/REVOKE statements on all child objects. For example, you can now do something like: GRANT ALL ON DATABASE foo TO role1 WITH GRANT OPTION and everyone granted role1 will be able to execute GRANT/REVOKE statements on database foo OR any of the tables in the database. It also adds support for REVOKE GRANT OPTION FOR <privilege> FROM <role> which allows removing a previous WITH GRANT OPTION without actually deleting the privilege. Similar to GRANT/REVOKE statements, the actual authorization checks on whether a user should/should not have privileges to execute these options is done at the Sentry Service level. Change-Id: I8757569a3bdb68414e315ef37d6845b1859eb758 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4377 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-09-19 17:45:15 -07:00
Lenni Kuff	293ead3b2a	[CDH5] Authorize SHOW ROLES statements and support SHOW CURRENT ROLES This patch adds the necessary changes required to authorize SHOW ROLES statements. This is not as easy as it could be because the Sentry Service doesn't currently expose the metadata for who is/isn't authorized to execute these statements. To authorize the statements, we need to first make an RPC to the Sentry Service (via the Catalog Server) and then only proceed with the SHOW statement if the check succeeds. We should consider revisiting this approach in the future when more metadata is available from Sentry. Additionally, this patch adds support for SHOW CURRENT ROLES which shows all roles that are currently granted to the current user. Change-Id: Ia01c20d58ab081f49a85566075836d8c6e25dbd4 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4367 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-09-19 05:41:33 -07:00
Alex Behm	5877f12be6	IMPALA-995: Add plan hints embedded in comments and preserve them in views. This patch adds two new hint styles: 1. Traditional commented hint: /* +hint1,hint2,hint3 / 2. End-of-line commented hint: -- +hint1,hint2,hint3\n We now preserve hints when creating views. We always use the end-of-line commented hint style to allow Hive to read hinted views created by Impala. Hive does not support traditional / / comments, and attempts to parse /+ */ as hints, failing with a parse error on unrecognized hints. This patch also changes Impala to only issue a warning for unrecognized hints instead of throwing an error. This allows Impala to run against hinted views created by Hive. Change-Id: I6e8352442e763c0029f72c17363caa087572dca0 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4235 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4361	2014-09-18 00:36:03 -07:00
Ippokratis Pandis	946aa3089b	Adding support in PHJ for right-{semi,anti} joins. Changes needed for PHJ to support RIGHT {SEMI, ANTI} JOINs. Codegen works as well. Basic parser tests and minimal (end-to-end) query tests. Need to add analyzer tests and add more query tests. Note that in the case of right-{semi,anti} and perhaps also on {right,full}-outer we should not be broadcasting the build side. Change-Id: I6854ee9e4640f809f0350229bcc00811fa474f07 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4288 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4369	2014-09-16 19:42:24 -07:00
Lenni Kuff	ffe9e4b74e	[CDH5] Add support for GRANT/REVOKE to Impala This change adds support for GRANT/REVOKE to Impala via the Sentry Service. This includes support for creating and dropping roles, granting and revoking roles to/from groups, granting/revoking privileges to/from roles, and commands to view role metadata. The specific statements that are added in this patch are: CREATE/DROP ROLE <roleName> SHOW ROLES SHOW ROLE GRANT GROUP <groupName> GRANT/REVOKE ROLE <roleName> TO/FROM GROUP <groupName> GRANT/REVOKE <privilegeSpec> TO/FROM <roleName It does not include some of the fancier bulk-op syntax like support for granting multiple roles to multiple groups in one statement. This patch does not add support for the WITH GRANT OPTION to delegate GRANT/REVOKE privileges to other users. TODO: * Authorize these statements on the client side. The current Sentry Service design makes it difficult to authorize any GRANT/REVOKE statement on the client (Impala) side. Privilege checks are done within the Sentry Service itself. There are a few different options available to let Impala "fail fast" and those changes will come in a follow on patch. Change-Id: Ic6bd19f5939d3290255222dcc1a42ce95bd345e2	2014-09-13 21:21:10 -07:00
Matthew Jacobs	ea3b70d861	Add agg fns for remaining analytic ranking fns Adds agg fns for FIRST_VALUE, LAST_VALUE, LAG, LEAD. Also adds support for ROWS windows with the end bound as unbounded following as long as the start bound is unbounded preceding. Change-Id: I4856ae580164d17a1bbf7d45010b61f5afa5db50 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4249 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins	2014-09-13 00:19:21 -07:00
Alex Behm	de75278125	Add SHOW ANALYTIC FUNCTIONS and additional analysis checks. Change-Id: Ic1aac60fb9b094349b9cfbec68608ac50fc5660c Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4298 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-09-13 00:19:21 -07:00
Matthew Jacobs	b143c0574d	Fix a few bugs in the AnalyticEvalNode IMPALA-1233: Crash running query with analytic in WITH clause IMPALA-1232: Analytic eval node crashes if cancelling query before Open() Change-Id: I9a263775b8ef670d0f819ed53d0af1eb96edf5c7 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4313 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins	2014-09-13 00:19:20 -07:00
Victor Bittorf	8bebf2b196	CHAR: adding support for CHAR(N) Support for CHAR is implemented as a StringVal in the backend. TODO: 1. Parquet Reader/writer 2. Codegen slot ref 3. Codegen text reader 4. Don't inline large chars 5. update impala-hs2-server.cc with CHAR support Change-Id: Ibba2c89cea971cb740001ea7975bf3e929150471 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4075 Reviewed-by: Nong Li <nong@cloudera.com> Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins	2014-09-13 00:19:20 -07:00
Alex Behm	503201794c	Wrapping up planning of analytic functions. This patch adds support for: - analytic functions in inline views - analytic functions referencing inline views - analytic functions in unions - analytic functions in subqueries/joins - avoid generating plan for non-materialized analytic exprs - predicate assignment and propagation onto and through analytic eval nodes Change-Id: I195d32606af670f216b88e1145177fd1d66456eb Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4173 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-09-13 00:17:40 -07:00
Dimitris Tsirogiannis	e1e874a77f	IMPALA-1212 Accept subquery as LHS or RHS of between operator This commit fixes the issue where an error was thrown if a subquery was used in either side of a between predicate. Between predicates with subqueries are replaced by their corresponding compound predicates during query rewrite. Change-Id: I4315a6e91c9306c6817bf6aa6bc1d0b586a1a067 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4246 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins	2014-09-13 00:17:36 -07:00
Skye Wanderman-Milne	3b7449a59b	Codegen PartitionedHashJoinNode This also reverts back to using CRC hash since FNV is not codegen'd yet. The perf is not as good as the original HJ in a microbenchmark; I haven't run a cluster run yet. Change-Id: Ie4dc983f31631fbc78720425a0e354dd1d3342a6 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4219 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-09-13 00:17:33 -07:00
Matthew Jacobs	571223d3f5	Analytic tests without explicit final ordering should verify after sorting Change-Id: Ic4ebd15af0d027eaded62f181e74bf41de93310a Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4264 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins	2014-09-12 18:17:29 -07:00
Dimitris Tsirogiannis	b670d98e40	IMPALA-1195: IllegalStateException in query with agg scalar subquery This commit fixes IMPALA-1195 in which an exception is thrown when a scalar subquery is in an IS NULL predicate. With this commit we also add support for scalar subqueries in functions and other exprs. Change-Id: Id995e77e6561a6450c4347706e4901fb3e236cfe Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4185 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>	2014-09-12 18:17:28 -07:00
Ippokratis Pandis	ae69ff208b	IMPALA-1177: The AntiJoin code was not resetting correctly the hash table iterator The AntiJoin code path was not resetting the hash table iterator when it was finding a match for the current row. As a result the hash table iterator was pointing to the wrong row when EvalConjuncts was being called. Change-Id: I37bc457ccf999755f7f76ee30b24c5a12cb10a19 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4215 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: jenkins	2014-09-12 18:17:28 -07:00
Alex Behm	3c650c3ba5	IMPALA-1104: Fixes for compute stats on Avro tables. Create Avro tables without col defs. This patch addresses the following issues: 1. Allow creating Avro tables without col defs in Impala. Compute stats works on them. 2. Handle table creation with inconsistent col defs and Avro schema as follows: The table creation will succeed and ignore the col defs in favor of the Avro schema. A warning is issued that the col defs and the Avro schema are inconsistent. Compute stats works on such tables. This patch does not address the issue of compute stats after Avro schema evolution. Change-Id: Iea6b737d238d81491dc2097012ebc149a89d03ba Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4182 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com> Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4250 Tested-by: jenkins	2014-09-11 20:50:05 -07:00
Matthew Jacobs	4c7688332f	Add GetValue() fn for agg fns that can be run as analytic fns and alloc memory The AnalyticEvalNode needs to re-use intermediate state tuples so it cannot call Finalize() for agg fns that clean up intermediate state. Those fns need to have a GetValue() method which just returns the result. This adds a GetValue() method for avg() (all types) and min()/max() (only needed for strings). Change-Id: Iedd6b026a1a256d9577dbb4c37824ac9282319ca Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4199 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins (cherry picked from commit d3fe94e8dba1d7b3698db9849058dacf14657292) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4237 Reviewed-by: Matthew Jacobs <mj@cloudera.com>	2014-09-11 09:55:21 -07:00
Nong Li	9ce5a7fd13	Disable text writer test. This fails for the same reason as the sequence writer. It passes locally but fails in zlib on the jenkins boxes. I suspect something is wrong with our gzip codec or the version of zlib installed on those machines (we've disabled this for parquet as well). Change-Id: I706186fbb6207fa694b4e61c7114e17c1ffe3482 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4221 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Nong Li <nong@cloudera.com> Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4260 Reviewed-by: Nong Li <nong@cloudera.com>	2014-09-11 08:47:23 -07:00
Matthew Jacobs	ad5a594597	Fix a few bugs in AnalyticEvalNode Fixes IMPALA-1200: resources for tuples may be returned in output batches too early (i.e. the tuple may still be needed for rows that will be returned later). We cannot just return resources after some number of tuples have been allocated as they may still be needed, so this adds a second mem pool for previously allocated tuples that can be transferred to output row batches. We keep track of the last row containing resources in that pool so we can be sure to only transfer the resources once that last row has been returned to the parent. Also addresses IMPALA-1206 (DCHECK failure) Change-Id: I34b823ffb8d54263ea76e071d10ccae1cef0db99 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4187 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com> (cherry picked from commit bc51ebaafea0ba5e1b97f4b3237ecfe241a9e674) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4224 Tested-by: jenkins	2014-09-09 14:53:52 -07:00
Nong Li	d52a620737	Add support for writing compressed text. Change-Id: I314b925594801ae4b5c47248d998801aa0b37270 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4205 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-09-07 22:08:30 -07:00
Dimitris Tsirogiannis	2ab66c4ca2	Add support for uncorrelated EXISTS subqueries This commit adds support for uncorrelated EXISTS subqueries in Impala. Uncorrelated EXISTS subqueries are rewritten using a CROSS JOIN. Uncorrelated NOT EXISTS subqueries are not supported. Change-Id: I0003dcdc0fa5cc99931b9a9f4deddbcd42572490 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4140 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4186	2014-09-05 12:36:18 -07:00
Alex Behm	321b9b0804	IMPALA-1148: Do not generate a sort node if the sort tuple has no materialized slots. Change-Id: If9d55b54a8305798ab68470a4a698d95ef92ce7a Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4176 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4184	2014-09-05 10:12:55 -07:00
Matthew Jacobs	5bf1c1f223	Analytic Functions: Add rank() and dense_rank() Adds the rank() and dense_rank() analytic functions and makes internal changes to the AggFnEvaluator that are necessary to support calling Finalize() repeatedly (as the AnalyticEvalNode does) on UDAs that destroy state in Finalize(). Rank requires both the current rank and the count at that rank in order to determine the next rank, so the intermediate state is a StringVal containing a struct with these two fields. Aggregate functions (internally only, for now) can expose a GetValue() method which takes an intermediate value and returns a final value without destroying the intermediate state. Finalize() is then used to clean up intermediate state, if necessary. This also adds a second optional, internal-only function for UDAs to allow removing values from intermediate state: Remove(). This will be required for implementing sliding windows later but is added here because the change is nearly identical to that for adding GetValue(). Some cleanup in the AnalyticEvalNode, most notably we avoid allocating tuples to DeepCopy prev_input_row_ between input batches. Instead, we keep the last two child row batches because the prev child row batch owns the resources for prev_input_row_. Change-Id: I5a30eb517a38d369fe63f7af91904a4b9786fadc Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3962 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 137bb45d81ea57655aefbf5cde0cbeab0121b8f0) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4183	2014-09-05 02:15:42 -07:00
Marcel Kornacker	b68b6dedc1	Planning and grouping of multiple analytic exprs from a select block. This patch adds support for: - Planning of multiple analytic exprs from a select block - Simple grouping of analytic exprs by partition/order/window to reduce data exchanges and sorts Change-Id: Ie2162558b2bc2e6218c30e694393e85cbf3251ff Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4120 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4168	2014-09-04 13:44:33 -07:00
Matthew Jacobs	c322e9590d	Analytic functions: Initial BE support for ROWS windows Adds support in the AnalyticEvalNode for ROWS windows with the start boundary UNBOUNDED PRECEDING, i.e. the end boundary can specify an offset or CURRENT ROW. To reduce complexity where we maintain windows and determine when output results can be produced (ProcessInputBatch), the logic that depends on the window is factored into several functions. The core functionality remains the same: for every input row, produce output results if possible, update the analytic functions, and add the row to the input_stream_ to be returned later when enough results are available. The functions TryFinalizePrevRow, TryFinalizeCurrentRow, and InitializeNewPartition are now called and handle the various window types appropriately. Change-Id: I36cf76bf11d9e8b48d2556169683abcb43c1db7a Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4073 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: jenkins (cherry picked from commit 421a032035fcb13e03f8e7d34b4908f1221fd9f5) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4163 Reviewed-by: Matthew Jacobs <mj@cloudera.com>	2014-09-04 00:46:23 -07:00

1 2 3 4 5 ...

397 Commits