impala

mirror of https://github.com/apache/impala.git synced 2026-01-07 09:02:19 -05:00

Author	SHA1	Message	Date
Thomas Tauber-Marshall	4b486b0f90	IMPALA-1861: Simplify conditionals with constant conditions When there are conditionals with constant values of TRUE or FALSE we can simplify them during analysis using the ExprRewriter. This patch introduces the SimplifyConditionalsRule with covers IF, OR, AND, CASE, and DECODE. It also introduces NormalizeExprsRule which normalizes AND and OR such that if either child is a BoolLiteral, then the left child is a BoolLiteral. Testing: - Added unit tests to ExprRewriteRulesTest. - Added functional tests to expr.test - Ran FE planner tests and BE expr-test. Change-Id: Id70aaf9fd99f64bd98175b7e2dbba28f350e7d3b Reviewed-on: http://gerrit.cloudera.org:8080/5585 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-01-24 03:22:08 +00:00
Dimitris Tsirogiannis	3426a04952	IMPALA-4449: Revisit table locking pattern in the catalog This commit fixes an issue where multiple long-running operations on the same catalog object (e.g. table) can block other catalog operations from making progress. Problem: IMPALA-1480 introduced table level locking that in conjunction with the global catalog lock ensures serialized access to catalog table objects. In some cases (e.g. multiple long running operations on same table), the locking pattern used resulted in the catalog lock being held for a long period of time, thus blocking other catalog operations from making any progress. That resulted in high response times and the system appearing to be hung. Solution: Change the locking pattern in the catalog for protecting table objects so that no operation will hold the catalog lock for a long time if it fails to acquire a table lock. The operation that attempts to acquire a table lock and fails to do so must release the catalog lock and retry. The use of fair locks prevent starvation from happening. The only operation that doesn't follow this retry logic is the getCatalogObjects() call that retrieves a snapshot of the catalog metadata for transmitting to the statestore. Testing: I manually tested this change by running concurrency tests using JMeter and verified that the throughput of catalog operations on a specific table is not affected by other concurrent long running operations (e.g. refresh) on a different table. Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2 Reviewed-on: http://gerrit.cloudera.org:8080/5710 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-20 21:13:37 +00:00
Alex Behm	7438730052	IMPALA-4767: Workaround for HIVE-15653 to preserve table stats. HIVE-15653 is a Hive Metastore bug that results in ALTER TABLE commands wiping the table stats of unpartitioned tables. Until the Hive bug is fixed, this patch adds a workaround to Impala that forces the Metastore to preserve the table stats. Testing: Private core/hdfs run passed. Change-Id: Ic191c765f73624bc716badadd7215c8dca9d6b1f Reviewed-on: http://gerrit.cloudera.org:8080/5731 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-20 01:18:10 +00:00
Jim Apple	6cf3efdfec	IMPALA-4788: Use HashSet in RECOVER PARTITIONS duplicate checks RECOVER PARTITIONS needs to avoid recovering partitions that are already in HMS. Before this patch, that check is done by makeing a list of the existing partitions and searching in that list for each path found in the search for partitions eligible for recovery. This patch changes the container to a HashSet for performance reasons. Change-Id: I4b9b6f8eb85f854e8c0896c18a231cebe32b4678 Reviewed-on: http://gerrit.cloudera.org:8080/5745 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Jim Apple <jbapple-impala@apache.org>	2017-01-19 22:57:00 +00:00
Alex Behm	3b7a179197	IMPALA-4768: Improve logging of table loading. - Improves the logging for several important events, in particular, during table loading. - Uses LOG.info() for such messages to clarify their intent. The goal is to improve supportability without having to turn on trace debugging which can generate a significant log volume. Change-Id: I8de96d0cb6d09b2272b1925d42cb059367fe7196 Reviewed-on: http://gerrit.cloudera.org:8080/5709 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-17 02:25:11 +00:00
Alex Behm	fa4a054cde	IMPALA-4765: Avoid using several loading threads on one table. When there are multiple concurrent requests to the catalogd to prioritize loading the same table, then several catalog loading threads may end up waiting for that single table to be loaded, effectively reducing the number of catalog loading threads. In extreme examples, this might degrade to serial loading of tables. This patch augments the existing data structures and code to prevent using several loading threads for the same table. Some of the existing data structures and code could be consolidated/simplified but this patch does not try to address that issue to minimize the risk of this change. Testing: I could easily reproduce the bug locally with the steps described in the JIRA. After this patch, I could not observe threads being wasted anymore. Change-Id: Idba5f1808e0b9cbbcf46245834d8ad38d01231cb Reviewed-on: http://gerrit.cloudera.org:8080/5707 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-15 08:38:13 +00:00
Thomas Tauber-Marshall	89a3d3c1eb	IMPALA-4716: Expr rewrite causes IllegalStateException The DECODE constructor in CaseExpr uses the same decodeExpr object when building the BinaryPredicates that compare the decodeExpr to each 'when' of the DECODE. This causes problems when different BinaryPredicates try to cast the same decodeExpr object to different types during analysis, in this case leading to a Precondition check failure. The solution is to clone the decodeExpr in the DECODE constructor in CaseExpr for each generated BinaryPredicate. Testing: - Added a regression test to exprs.test Change-Id: I4de9ed7118c8d18ec3f02ff74c9cca211c716e51 Reviewed-on: http://gerrit.cloudera.org:8080/5631 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-13 04:44:07 +00:00
Joe McDonnell	5755261954	IMPALA-4036: invalid SQL generated for partitioned table with comment For a table that has both a table comment and a partition specified, "show create table" incorrectly outputs the comment before the partition. This is not the correct order, and it results in an invalid SQL. This transaction fixes the ordering (partition comes before comment) and adds tests for this case. Change-Id: I29a33cfd142b473997fdc3acfe3f0966bc7ed784 Reviewed-on: http://gerrit.cloudera.org:8080/5648 Tested-by: Impala Public Jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2017-01-12 20:41:35 +00:00
Joe McDonnell	5d028d93b9	IMPALA-4341: Add metadata load to planner timeline This moves the timeline from the Analyzer GlobalState to the AnalysisContext and AnalysisContext.AnalysisResult. When analysis needs to load metadata about missing tables, it marks an event noting the start of metadata load. Then, when metadata load completes (or times out), it marks an event noting that metadata load completed (or timed out). Keeping the timeline on the AnalysisContext means that it persists across attempts at analysis. AnalysisContext.AnalysisResult has a reference to the timeline, so that it persists past analyzeStmt and can be used for the rest of the planning. Here is an example output of the planner timeline after this change: Planner Timeline: 4s371ms - Metadata load started: 41.388ms (41.388ms) - Metadata load finished: 4s260ms (4s219ms) - Analysis finished: 4s296ms (35.693ms) - Equivalence classes computed: 4s315ms (19.062ms) - Single node plan created: 4s323ms (7.812ms) - Runtime filters computed: 4s323ms (777.010us) - Distributed plan created: 4s325ms (1.464ms) - Planning finished: 4s371ms (46.697ms) When there is no need to load metadata, the timeline looks like: Planner Timeline: 13.695ms - Analysis finished: 2.411ms (2.411ms) - Equivalence classes computed: 2.653ms (241.733us) - Single node plan created: 5.641ms (2.987ms) - Runtime filters computed: 5.726ms (85.204us) - Distributed plan created: 6.548ms (821.722us) - Planning finished: 13.695ms (7.147ms) Change-Id: I6f01a35e5f9f5007a0298acfc8e16da00ef99c6c Reviewed-on: http://gerrit.cloudera.org:8080/5685 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-12 03:54:15 +00:00
Marcel Kornacker	70ae2e38eb	IMPALA-4739: ExprRewriter fails on HAVING clauses The bug was that expr rewrite rules such as ExtractCommonConjunctRule analyzed their own output, which doesn't work for syntactic elements that allow column aliases, such as the HAVING clause. The fix was to remove the analysis step (the re-analysis happens anyway in AnalysisCtx). Change-Id: Ife74c61f549f620c42f74928f6474e8a5a7b7f00 Reviewed-on: http://gerrit.cloudera.org:8080/5662 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-12 02:31:44 +00:00
Tim Armstrong	11dc6bbb3a	IMPALA-4676: remove the last reference to BlockStorageLocation Change-Id: I3853b806da1cce309c5d7d124adb96aa131eaec2 Reviewed-on: http://gerrit.cloudera.org:8080/5633 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-07 03:21:36 +00:00
Henry Robinson	66260bda9e	Fix Lists import in ExprRewriter.java Change-Id: Ia8f27180b334c509d627bc2f93fc5030a3cf37eb Reviewed-on: http://gerrit.cloudera.org:8080/5606 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-05 21:51:59 +00:00
Tim Armstrong	4ce5213d16	IMPALA-4676: remove vestigial references to getBlockStorageLocations() API The BlockStorageLocation import is unused. Remove validation of config keys that only affect the BlockStorageLocation API. See HDFS-10868 and HDFS-8895. We do not need to validate these keys any more since we don't use that API. These config keys are removed in Hadoop 3 so this patch is required to build against it. Change-Id: Ic12337a9f5b7d910282aaf7d8508a4176cf89cbc Reviewed-on: http://gerrit.cloudera.org:8080/5526 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-05 19:40:08 +00:00
Alex Behm	6d15f03777	IMPALA-3641: Fix catalogd RPC responses to DROP IF EXISTS. The main problem was that the catalogd's response to a DROP IF EXISTS operations included a removed object that was applied to the requesting impalad's catalog cache. In particular, a DROP DATABASE IF EXISTS that did not actually drop anything in the catalogd still returned the object name in the RPC response as a removed object with the current catalog version (i.e., without incrementing the catalog version). The above behavior lead to a situation where a drop of a non-existent object overwrote a legitimate entry in an impalad's CatalogDeltaLog. Recall that the version of the dropped object was based on the current catalog version at some point in time, e.g., the same version of a legitimate entry in the CatalogDeltaLog. As a reminder, the CatalogDeltaLog protects deletions from being undone via updates from the statestore. So overwriting an object in the CatalogDeltaLog can lead to a dropped object appearing again with certain timing of a statestore update. Please see the JIRA for an analysis of logging output that shows the bug and its effect. The fix is simple: The RPC response of a DROP IF EXISTS should only contain a removed object if an object was actually removed from the catalogd. This fix, however, introduces a new consistency issue (IMPALA-4727). The new behavior is not ideal, but better than the old behavior, explained as follows: The behavior before this patch is problematic because the drop of a completely unrelated object can affect the consistency of a drop+add on another object. The behavior after this patch is that a drop+add may fail in the add if there is an ill-timed concurrent drop of the same object. Testing: - Unfortunately, I have not been able to reproduce the issue locally despite vigorous attempts and despite knowing what the problem is. Our existing tests seem to reproduce the issue pretty reliably, so it's not clear whether a targeted test is feasible or needed. - An exhaustive test run passed. Change-Id: Icb1f31eb2ecf05b9b51ef4e12e6bb78f44d0cf84 Reviewed-on: http://gerrit.cloudera.org:8080/5556 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-05 04:15:30 +00:00
Amos Bird	b3636c97d4	IMPALA-4033: Treat string-partition key values as case sensitive. This commit makes ADD PARTITION operations treat string partition-key values as case sensitive in consistent with other related partition DDL operations. Change-Id: I6fbe67d99df8a50a16a18456fde85d03d622c7a1 Reviewed-on: http://gerrit.cloudera.org:8080/5535 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-22 10:45:39 +00:00
Lars Volker	ce9b332ee9	IMPALA-4163: Add sortby() query hint This change introduces the sortby() query plan hint for insert statements. When specified, sortby(a, b) will add an additional sort step to the plan to order data by columns a, b before inserting it into the target table. Change-Id: I37a3ffab99aaa5d5a4fd1ac674b3e8b394a3c4c0 Reviewed-on: http://gerrit.cloudera.org:8080/5051 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2016-12-17 05:37:43 +00:00
Matthew Jacobs	c2faf4a8a1	IMPALA-4662: Fix NULL literal handling in Kudu IN list predicates The KuduScanNode attempts to push IN list predicates to the Kudu scan, but NULL literals cannot be pushed. The code in KuduScanNode needed to check if the Literals in the InPredicate is a NullLiteral, in which case the entire IN list should not be pushed to Kudu. The same handling is already in place for binary predicate pushdown. Change-Id: Iaf2c10a326373ad80aef51a85cec64071daefa7b Reviewed-on: http://gerrit.cloudera.org:8080/5505 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-12-15 23:00:24 +00:00
Sailesh Mukil	ffbdeda946	IMPALA-4611: Checking perms on S3 files is a very expensive no-op We call getPermissions() on partition directories to find out if Impala has access to those files. On S3, this currently is a no-op as the S3A connector does not try to set/get the permissions for S3 objects. So, it always returns the default set of permissions -> 777. However, it still makes a roundtrip to S3 causing a slow down in the Catalog. We can return the READ_WRITE permission immediately if we know we are accessing an S3 file, thereby avoiding the round trip to S3 for every partition. This will greatly speedup metadata operations for S3 tables and partitions, which is already known to be a big bottleneck. If and when the S3A connector is able to manage permissions in the future, we need to revisit this code. However, as permissions on S3 are unsupported by Impala right now, we might as well gain on perf. Change-Id: If9d1072c185a6162727019cdf1cb34d7f3f1c75c Reviewed-on: http://gerrit.cloudera.org:8080/5449 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Impala Public Jenkins	2016-12-13 01:56:28 +00:00
Marcel Kornacker	4b2d76dbb5	IMPALA-4014: Introduce query-wide execution state. This introduces a global structure to coordinate execution of fragment instances on a backend for a single query. New classes: - QueryExecMgr: subsumes FragmentMgr - QueryState - FragmentInstanceState: replaces FragmentExecState Change-Id: I962ae6b7cb7dc0d07fbb8f70317aeb01d88d400b Reviewed-on: http://gerrit.cloudera.org:8080/4418 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2016-12-11 02:29:28 +00:00
Lars Volker	1e683d4ee6	IMPALA-4403: Implement SHOW RANGE PARTITIONS for Kudu tables Change-Id: Idf5b2fdd02938a42fa59ec98884e4ac915dd1f65 Reviewed-on: http://gerrit.cloudera.org:8080/5390 Reviewed-by: Lars Volker <lv@cloudera.com> Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-12-10 00:05:50 +00:00
aphadke	02fc53d5f0	IMPALA-2057: Better error message for incorrect avro decimal column declaration Adding a better error message when logical type is specified at a wrong level or is not not specified in an avro decimal column declaration. Change-Id: Iad23706128223b6537d565471ef5d8faa91b0b5a Reviewed-on: http://gerrit.cloudera.org:8080/5255 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Internal Jenkins	2016-12-09 23:14:24 +00:00
Alex Behm	80f85179f9	IMPALA-3126: Conservative assignment of inner-join On-clause predicates. Implements the following conservative but correct policy for assigning predicates from the On-clause of an inner join: If the predicate references an outer-joined tuple, then evaluate it at the inner join that the On-clause belongs to. Cleans up Analyzer.canEvalPredicate(). Change-Id: Idf45323ed9102ffb45c9d94a130ea3692286f215 Reviewed-on: http://gerrit.cloudera.org:8080/4982 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-09 02:12:46 +00:00
Tim Armstrong	88448d1d4a	IMPALA-4586: don't constant fold in backend This patch ensures that setting the query option enable_expr_rewrites=false will disable both constant folding in the frontend (which it did already) and constant caching in the backend (which is enabled in this patch). This gives a way for users to revert to the old behaviour of non-deterministic UDFs before these optimisations were added in Impala 2.8. Before this patch, the backend would cache values based on IsConstant(). This meant that there was no way to override caching of values of non-deterministic UDFs, e.g. with enable_expr_rewrites. After this patch, we only cache literal values in the backend. This offers the same performance as before in the common case where the frontend will constant fold the expressions anyway. Also rename some functions to more cleanly separate the backend concepts of "constant" expressions and expressions that can be evaluated without a TupleRow. In a future change (IMPALA-4617) we should remove the IsConstant() analysis logic from the backend entirely and pass the information from the frontend. We should also fix isConstant() in the frontend so that it only returns true when it is safe to constant-fold the expression (IMPALA-4606). Once that is done, we could revert back to using IsConstant() instead of IsLiteral(). Testing: Added targeted test to test constant folding of UDFs: we expect different results depending on whether constant folding is enabled. Also run TestUdfs with expr rewrites enabled and disabled, since this can exercise different code paths. Refactored test_udfs somewhat to avoid running uninteresting combinations of query options for targeted tests and removed some 'drop * if not exists' statements that aren't necessary when using unique_database. This change revealed flakiness in test_mem_limit, which seems to have only worked by coincidence. Updated TrackAllocation() to actually set the query status when a memory limit is exceeded. Looped this test for a while to make sure it isn't flaky any more. Also fix other test bugs where the vector argument is modified in-place, which can leak out to other tests. Change-Id: I0c76e3c8a8d92749256c312080ecd7aac5d99ce7 Reviewed-on: http://gerrit.cloudera.org:8080/5391 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2016-12-08 04:53:53 +00:00
Dimitris Tsirogiannis	5ea1798661	IMPALA-4619: Allow NULL as default value in Kudu tables This commit fixes an issue where an error is thrown if the default value for a Kudu column is set to NULL. Change-Id: Ida27ce56f1dd7603485a69c680db3bcea6702aff Reviewed-on: http://gerrit.cloudera.org:8080/5405 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-12-08 04:53:38 +00:00
Alex Behm	eaa14f2750	IMPALA-4614: Set eval cost of timestamp literals. The main issue was that the eval cost was not set for timestamp literals, so a preconditions check was hit when trying to order a list of conjuncts by cost. Another subtle issue made the bug only reproducible by a specific query against a Kudu table in our tests, although the bug is not Kudu specific: The eval cost of Exprs was not recomputed in analyze(), even after resetting an Expr, e.g., during a substitution. As a result, the bug was only reproducible for a list of conjuncts that contained an inferred predicate with a timestamp literal. This patch does not contain a fix for that issue due to its complexity/risk. It is tracked in IMPALA-4620. Testing: Ran planner tests locally. Ran query_test.py locally. A private core/hdfs run passed. Change-Id: Ife30420bafbd1c64a5e3385e5755909110b4b354 Reviewed-on: http://gerrit.cloudera.org:8080/5404 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2016-12-08 04:31:12 +00:00
Bharath Vissapragada	bb63339377	IMPALA-3314: Fix Avro schema loading for partitioned tables. Bug: Commit 6f31c7 fixed a crash when setting Avro schemas for tables with storage altered to Avro file format. However the fix was incomplete for partitioned/multi file format tables since 'hasAvroData_' is not set for all code paths that load the partitioned tables (For example: HdfsTable#loadAllPartitions()). Fix: Moved the code for setting 'hasAvroData_' to addPartition() which is the common logic for all code paths adding new partitions. Also fixed the test coverage gap by adding a new test for partitioned tables altered to Avro format. Change-Id: I7854ff002b2277ec4a5388216218a1d5ad142de8 Reviewed-on: http://gerrit.cloudera.org:8080/5388 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-07 09:45:11 +00:00
Dan Burkert	f83652c1da	Replace INTO N BUCKETS with PARTITIONS N in CREATE TABLE This commit also removes the now unused `DISTRIBUTE`, `SPLIT`, and `BUCKETS` keywords that were going to be newly released in Impala 2.6, but are now unused. Additionally, a few remaining uses of the `DISTRIBUTE BY` syntax has been switched to `PARTITION BY`. Change-Id: I32fdd5ef26c532f7a30220db52bdfbf228165922 Reviewed-on: http://gerrit.cloudera.org:8080/5382 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-12-07 07:31:16 +00:00
Alex Behm	6098ac7162	IMPALA-4592: Improve error msg for non-deterministic predicates. Impala cannot correctly evaluate or assign some non-deterministic predicates. This patch improves the error message shown when trying to evaluate such unsupported predicates for the purpose of partition pruning. Change-Id: I94765f62bde94f4faa7fc5c26d928099ca1496d1 Reviewed-on: http://gerrit.cloudera.org:8080/5386 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-07 06:27:51 +00:00
Alex Behm	534999382d	IMPALA-4574: Do not treat UUID() like a constant expr. A recent change (IMPALA-1788) lead UUID() to be constant folded, and therefore, produce the same value for every invocation across rows. Similar issues might also occur due to the BE optimizing UUID() during codegen of scalar-fn-call.h/cc. The fix is to not treat UUID() like a constant expr in both the FE and BE. Discussion: The fix in this patch is rather blunt, but minimally invasive to reduce the risk of adding new bugs. Ideally, the constness of an Expr should be determined in one place and the FE and BE should agree on which Exprs are constant. I considered the following alternatives but concluded they were too risky: 1. Pass a flag from FE to BE for ever Expr indicating its constness. This simple solution would populate a thrift field with the result of Expr.isConstant() for every Expr in an Expr tree. There are several issues. Calling isConstant() for every Expr in an Expr tree is rather expensive due to repeated traversals of the tree. That could be mitigated by populating an isConstant flag during Expr.analyze() to avoid re-computing the constness repeatedly. This requires changes to analyze(), clone(), reset(), and possibly other places for many Exprs. There is potential for missing a place and adding a new bug. 2. The above solution could be limited to only FunctionCallExpr. However, the BE expr type FUNCTION_CALL which maps to scalar-fn-call.h/cc is created from various FE Exprs, not just FunctionCallExpr. So adding a flag only to scalar-fn-call.h/cc would be confusing because it would only sometimes be set in a meaningful way. This seems more confusing than the current straightforward solution. Testing: Added FE and EE tests. Change-Id: If2499f5f6ecdcb098623202c8e6dc2d02727194a Reviewed-on: http://gerrit.cloudera.org:8080/5324 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-06 22:03:01 +00:00
Dimitris Tsirogiannis	cba93f1ac3	IMPALA-4561: Replace DISTRIBUTE BY with PARTITION BY in CREATE TABLE Change-Id: I0e07c41eabb4c8cb95754cf04293cbd9e03d6ab2 Reviewed-on: http://gerrit.cloudera.org:8080/5317 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-12-06 10:41:53 +00:00
Alex Behm	f837754377	IMPALA-3167: Fix assignment of WHERE conjunct through grouping agg + OJ. Background: We generally allow the assignment of predicates below the nullable side of a left/right outer join, explained as follows using an example: SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.id = t2.id WHERE t2.int_col < 10 The scan of 't2' picks up 't2.int_col < 10' via Analyzer.getBoundPredicates() and recognizes that the predicate must also be evaluated by a join later, so the predicate is not marked as assigned. The join then picks up the unassigned predicate via Analyzer.getUnassignedConjuncts(). The bug was that our logic for detecting whether a bound predicate must also be evaluated at a join node was flawed because it only considered whether the tuples of the source or destination predicate were outer joined (plus other conditions). The underlying assumption is that either the source or destination tuple are bound by a tuple produced by a TableRef, but in the buggy query the source predicate is bound by an aggregation tuple, so we incorrectly marked the bound predicate as assigned in Analyzer.getBoundPredicates(). The fix is to conservatively not mark bound predicates as assigned if the slots referenced by the predicate have equivalent slots that belong to an outer-joined tuple. As a result, a plan node may pick up the same predicate multiple times, once via Analyzer.getBoundPredicates() and another time via Analyzer.getUnassignedConjuncts(). Those are deduped now. The following example explains the duplicate predicate assignment: SELECT * FROM (SELECT * FROM t t1) a LEFT OUTER JOIN t b ON a.id = b.id WHERE a.id < 10 1. The predicate 'a.id < 10' gets migrated into the inline view. 'a.id < 10' is marked as assigned but is still registered as a single-tid conjunct in the Analyzer for potential propagation 2. The scan node of 't1' calls Analyzer.getBoundPredicates() and generates 't1.id < 10' based on the source predicate 'a.id < 10'. 3. The scan node of 't1' picks up the migrated conjunct 't1.id < 10' via Analyzer.getUnassignedConjuncts(). Change-Id: I774d13a13ad1e8fe82512df98dc29983bdd232eb Reviewed-on: http://gerrit.cloudera.org:8080/4960 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-06 07:24:01 +00:00
Alex Behm	b656f570e9	IMPALA-4578: Pick up bound predicates for Kudu scan nodes. The bug was a simple oversight. In KuduScanNiode.init() we forgot to call Analyzer.getBoundPredicates(). Change-Id: I19a38d6ea8cc0d2b0ddc3808d1f9ffef5ce306a8 Reviewed-on: http://gerrit.cloudera.org:8080/5365 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-06 06:19:03 +00:00
Dimitris Tsirogiannis	5bb9959fa4	IMPALA-4584: Make alter table operations on Kudu tables synchronous This commit changes the behavior of alter table operations on Kudu tables from asynchronous to synchronous. With this change, alter table operations return when either the operations complete successfully or a timeout is reached. Change-Id: I385bce66691ae9040e72f97557e1bba31009e36b Reviewed-on: http://gerrit.cloudera.org:8080/5364 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-12-06 03:53:15 +00:00
Matthew Jacobs	9f387c8583	IMPALA-4571: Push IN predicates to Kudu Fixes the KuduScanNode to convert InPredicates to KuduPredicates and push them to the Kudu scan if possible. An InPredicate can be pushed to the scan if expression is of the exact form: <SlotRef> IN (<LiteralExpr>, <LiteralExpr>, ...) That means the InPredicate has the following properties: 1) It has a list of literal values (i.e. not a subquery); All values are LiteralExprs (not SlotRefs). 2) Not negative, i.e. only 'IN' supported, not 'NOT IN' 3) The SlotRef is not wrapped in any casts 4) The types of all values match the type of the SlotRef exactly. A planner test was added exercising all supported types as well as exprs where the values would not be supported. TODO: perf testing TODO: consider a limit on the number of list values before keeping the predicate on the Impala scan node (determine from testing) Change-Id: I8988d4819d20d467b48e286917e347ca00f60cf0 Reviewed-on: http://gerrit.cloudera.org:8080/5316 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-12-06 03:24:39 +00:00
Dimitris Tsirogiannis	867b2434ca	Additional functional testing for default values on Kudu tables This commit also fixes an issue where an error is thrown if a default value is set for a boolean column on a Kudu table. Change-Id: I25b66275d29d1cf21df14e78ab58f625a83b0725 Reviewed-on: http://gerrit.cloudera.org:8080/5337 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2016-12-06 00:04:43 +00:00
Alex Behm	12cc508178	IMPALA-3125: Fix assignment of equality predicates from an outer-join On-clause. Impala used to incorrectly assign On-clause equality predicates from an outer join if those predicates referenced multiple tables, but only one side of the outer join. The fix is to add an additional check in Analyzer.getEqJoinConjuncts() to prevent that incorrect assignment. Change-Id: I719e0eeacccad070b1f9509d80aaf761b572add0 Reviewed-on: http://gerrit.cloudera.org:8080/4986 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-05 09:31:25 +00:00
Alex Behm	852e272b32	IMPALA-4303: Do not reset() qualifier of union operands. The bug: We used to reset() the qualifier of union operands to their original value obtained during parsing. This leads to problems when union operands are unnested and we need to rewrite Subqueries. In particular, the first union operand of a nested union was reset() to a null qualifier, but that operand could be somewhere in the middle of the list of unnested operands in the parent. At that point, we've lost information about the qualifier of the unnested operand. The fix: The simplest solution is to not reset() the qualifier. The other alternative is be to reset() the qualifier, but also undo any unnesting. That seems unnecessary and wasteful. Change-Id: I157bb0f08c4a94fd779487d7c23edd64a537a1f6 Reviewed-on: http://gerrit.cloudera.org:8080/4963 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-05 00:58:30 +00:00
Dimitris Tsirogiannis	1da57019ad	IMPALA-4579: SHOW CREATE VIEW fails for view containing a subquery This commit fixes an issue where a SHOW CREATE VIEW statement throws an analysis error if the view contains a subquery. Change-Id: I4a89e46a022f0ccec198b6e3e2b30230103831ce Reviewed-on: http://gerrit.cloudera.org:8080/5333 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-12-04 08:35:15 +00:00
Alex Behm	7efa08316e	IMPALA-4572: Run COMPUTE STATS on Parquet tables with MT_DOP=4. COMPUTE STATS on Parquet tables is run with MT_DOP=4 by default. COMPUTE STATS on non-Parquet tables will run without MT_DOP. Users can always override the behavior by setting MT_DOP manually. Setting MT_DOP to 0 means a statement will be run in the conventional execution mode (without intra-node paralellism based on multiple fragment instances). Users can set a higher MT_DOP even for Parquet tables. Testing: Added a new test that checks the effective MT_DOP. Locally ran test_mt_dop.py, test_scanners.py, test_nested_types.py, test_compute_stats.py, and test_cancellation.py. Change-Id: I2be3c7c9f3004e9a759224a2e5756eb6e4efa359 Reviewed-on: http://gerrit.cloudera.org:8080/5315 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-03 22:28:53 +00:00
Bharath Vissapragada	6662c55364	IMPALA-4172/IMPALA-3653: Improvements to block metadata loading This patch improves the block metadata loading (locations and disk storage IDs) for partitioned and un-partitioned tables in the Catalog server. Without this patch: ------------------ We loop through each and every file in the table/partition directories and call getFileBlockLocations() on it to obtain the block metadata. This results in large number of RPC calls to the Namenode, especially for tables with large no. of files/partitions. With this patch: --------------- We move the block metadata querying to use listStatus() call which accepts a directory as input and fetches the 'BlockLocation' objects for every file recursively in that directory. This improves the metadata loading in the following ways. - For non-partitioned tables, we query all the BlockLocations in a single RPC call in the base table directory and load the corresponding disk IDs. - For partitioned tables, we query the BlockLocations for all the partitions residing under the base table directories in a single RPC and then load every partition with non-default partition directory separately. - REFRESH on a table reloads the block metadata from scratch for every data file every time. So it can be used as a replacement for invalidate in situations like HDFS block rebalancing which needs block metadata update. Also, this patch does away with VolumeIds returned by the HDFS NN and uses the new StorageIDs returned by the BlockLocation class. These StorageIDs are UUID strings and hence are mapped to a per-node 0-based index as expected by the backend. In the upcoming versions of Hadoop APIs, getFileBlockStorageLocations() is deprecated and instead the listStatus() returns BlockLocations with storage IDs embedded. This patch makes use of this improvement to reduce an additional RPC to the NN to fetch the storage locations. Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26 Reviewed-on: http://gerrit.cloudera.org:8080/5148 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Internal Jenkins	2016-12-03 21:17:46 +00:00
Marcel Kornacker	694d72ea9b	Cleanup of logging output, part 2: downgrading 'debug' to 'trace' All 'debug' output still gets written into the info log. Downgrading to 'trace' to avoid that. Change-Id: If54f9d563be75571c7dc6d99ed13a6e86d9061a9 Reviewed-on: http://gerrit.cloudera.org:8080/5342 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2016-12-03 09:58:52 +00:00
Thomas Tauber-Marshall	7bcb51b152	IMPALA-4357: Fix DROP TABLE to pass analysis if the table fails to load If a table fails to load, eg. because it was deleted externally from Kudu, we should still allow 'DROP TABLE' to pass analysis. Otherwise, you may be unable to drop tables that are in a bad state. Testing: - Updates existing Kudu tests to reflect the new behavior, and fixes a couple of problems with those tests that were causing them to pass spuriously (as well as fixing the same problem with another test in the file while I'm here). Change-Id: I6b41fc3c0e95508ab67f1d420b033b02ec75a5da Reviewed-on: http://gerrit.cloudera.org:8080/5144 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-02 21:58:03 +00:00
Dimitris Tsirogiannis	da34ce9780	IMPALA-4527: Columns in Kudu tables created from Impala default to "NULL" This commit reverts the behavior introduced by IMPALA-3719 which used the Kudu default behavior for column nullability if none was specified in the CREATE TABLE statement. With this commit, non-key columns of Kudu tables that are created from Impala are by default nullable unless specified otherwise. Change-Id: I950d9a9c64e3851e11a641573617790b340ece94 Reviewed-on: http://gerrit.cloudera.org:8080/5259 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-02 02:06:22 +00:00
Alex Behm	467642f2ca	Bracketing Java logging output with log level checks part 2. This reduces creation of intermediate objects and improves performance. Change-Id: Ib94b3a20d439d854f579d9086755eb19699fcb68 Reviewed-on: http://gerrit.cloudera.org:8080/5297 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2016-12-01 11:10:02 +00:00
Marcel Kornacker	352833b8cf	Bracketing Java logging output with log level checks. This reduces creation of intermediate objects and improves performance. Change-Id: Ie0f5123dbf2caf3b03183c76820599920baa9785 Reviewed-on: http://gerrit.cloudera.org:8080/5284 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2016-12-01 04:42:38 +00:00
Lars Volker	0f62bf35fd	IMPALA-4550: Fix CastExpr analysis for substituted slots During slot substitution, the type of the child of a CastExpr can change. If the previous child type matched the CastExpr, then the cast was flagged as noOp_. During substitution and subsequent re-analysis the noOp_ flag was not revisited so that no cast was performed, even after it had become necessary. The fix is to always set noOp_ to the correct value in CastExpr.analyze(). Change-Id: I7f29cdc359558fad6df455b8eec0e0eaed00e996 Reviewed-on: http://gerrit.cloudera.org:8080/5267 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-30 11:19:36 +00:00
Dimitris Tsirogiannis	9f497ba02f	IMPALA-2890: Support ALTER TABLE statements for Kudu tables With this commit, we add support for additional ALTER TABLE statements against Kudu tables. The new supported ALTER TABLE operations for Kudu are: - ADD/DROP range partitions. Syntax: ALTER TABLE <tbl_name> ADD [IF NOT EXISTS] RANGE <kudu_partition_spec> ALTER TABLE <tbl_name> DROP [IF EXISTS] RANGE <kudu_partition_spec> - ADD/DROP/RENAME column. Syntax: ALTER TABLE <tbl_name> ADD COLUMNS (col_spec, [col_spec, ...]) ALTER TABLE <tbl_name> DROP COLUMN <col_name> ALTER TABLE <tbl_name> CHANGE COLUMN <old> <new_name> <type> - Rename Kudu table using the 'kudu.table_name' table property. Example: ALTER TABLE <tbl_name> SET TBLPROPERTY ('kudu.tbl_name'='<new_name>'), will change the underlying Kudu table name to <new_name>. - Renaming the HMS/Catalog table entry of a Kudu table is supported using the existing ALTER TABLE <tbl_name> RENAME TO <new_tbl_name> syntax. Not supported: - ALTER TABLE <tbl_name> REPLACE COLUMNS Change-Id: I04bc87e04e05da5cc03edec79d13cedfd2012896 Reviewed-on: http://gerrit.cloudera.org:8080/5136 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-11-30 04:55:03 +00:00
Taras Bobrovytsky	9032952758	IMPALA-4000: Restricted Sentry authorization for Kudu Tables At this time, there is no comprehensive way of enforcing a Sentry authorization policy against tables stored in Kudu. The following behavior was implemented in this patch: - Only the ALL privilege level can be granted to Kudu tables. Finer-grained levels such as only SELECT or only INSERT are not supported. - Column level permissions on Kudu tables are not supported. - Only users with ALL privileges on SERVER may create external Kudu tables. Change-Id: I183f08ad8ce80deee011a6b90ad67b9cefc0452c Reviewed-on: http://gerrit.cloudera.org:8080/5047 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2016-11-29 17:06:16 +00:00
Alex Behm	fc61f2a3f7	IMPALA-4523: Correct max VARCHAR size to 65535 (2^16 - 1). Change-Id: If76eb45b01692ed360fad8fa1722d56fa06c6c00 Reviewed-on: http://gerrit.cloudera.org:8080/5204 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-24 22:28:01 +00:00
Tim Armstrong	90ebecdd40	IMPALA-4529: speed up parsing of identifiers Instead of using substring(), parseInt() and a try/catch, directly check the character. Change-Id: Iebef43a6a2f7923ca0e9c158d83f5c06f26da0cd Reviewed-on: http://gerrit.cloudera.org:8080/5210 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-24 08:03:28 +00:00

1 2 3 4 5 ...

1508 Commits