When there are conditionals with constant values of TRUE or
FALSE we can simplify them during analysis using the ExprRewriter.
This patch introduces the SimplifyConditionalsRule with covers IF,
OR, AND, CASE, and DECODE.
It also introduces NormalizeExprsRule which normalizes AND and OR
such that if either child is a BoolLiteral, then the left child is a
BoolLiteral.
Testing:
- Added unit tests to ExprRewriteRulesTest.
- Added functional tests to expr.test
- Ran FE planner tests and BE expr-test.
Change-Id: Id70aaf9fd99f64bd98175b7e2dbba28f350e7d3b
Reviewed-on: http://gerrit.cloudera.org:8080/5585
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
This commit fixes an issue where multiple long-running operations on the
same catalog object (e.g. table) can block other catalog operations from
making progress.
Problem:
IMPALA-1480 introduced table level locking that in conjunction with the
global catalog lock ensures serialized access to catalog table objects.
In some cases (e.g. multiple long running operations on same table), the
locking pattern used resulted in the catalog lock being held for
a long period of time, thus blocking other catalog operations from making
any progress. That resulted in high response times and the system
appearing to be hung.
Solution:
Change the locking pattern in the catalog for protecting table objects
so that no operation will hold the catalog lock for a long time if it
fails to acquire a table lock. The operation that attempts to acquire a
table lock and fails to do so must release the catalog lock and retry.
The use of fair locks prevent starvation from happening. The only
operation that doesn't follow this retry logic is the
getCatalogObjects() call that retrieves a snapshot of the catalog
metadata for transmitting to the statestore.
Testing:
I manually tested this change by running concurrency tests using JMeter
and verified that the throughput of catalog operations on a specific table
is not affected by other concurrent long running operations (e.g. refresh)
on a different table.
Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Reviewed-on: http://gerrit.cloudera.org:8080/5710
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Impala Public Jenkins
HIVE-15653 is a Hive Metastore bug that results in ALTER TABLE
commands wiping the table stats of unpartitioned tables.
Until the Hive bug is fixed, this patch adds a workaround
to Impala that forces the Metastore to preserve the table stats.
Testing: Private core/hdfs run passed.
Change-Id: Ic191c765f73624bc716badadd7215c8dca9d6b1f
Reviewed-on: http://gerrit.cloudera.org:8080/5731
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
RECOVER PARTITIONS needs to avoid recovering partitions that are
already in HMS. Before this patch, that check is done by makeing a
list of the existing partitions and searching in that list for each
path found in the search for partitions eligible for recovery. This
patch changes the container to a HashSet for performance reasons.
Change-Id: I4b9b6f8eb85f854e8c0896c18a231cebe32b4678
Reviewed-on: http://gerrit.cloudera.org:8080/5745
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Jim Apple <jbapple-impala@apache.org>
- Improves the logging for several important events,
in particular, during table loading.
- Uses LOG.info() for such messages to clarify their
intent.
The goal is to improve supportability without having
to turn on trace debugging which can generate a
significant log volume.
Change-Id: I8de96d0cb6d09b2272b1925d42cb059367fe7196
Reviewed-on: http://gerrit.cloudera.org:8080/5709
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
When there are multiple concurrent requests to the catalogd to
prioritize loading the same table, then several catalog loading
threads may end up waiting for that single table to be loaded,
effectively reducing the number of catalog loading threads. In
extreme examples, this might degrade to serial loading of tables.
This patch augments the existing data structures and code to
prevent using several loading threads for the same table.
Some of the existing data structures and code could be
consolidated/simplified but this patch does not try to address
that issue to minimize the risk of this change.
Testing: I could easily reproduce the bug locally with the steps
described in the JIRA. After this patch, I could not observe threads
being wasted anymore.
Change-Id: Idba5f1808e0b9cbbcf46245834d8ad38d01231cb
Reviewed-on: http://gerrit.cloudera.org:8080/5707
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
The DECODE constructor in CaseExpr uses the same decodeExpr object when
building the BinaryPredicates that compare the decodeExpr to each 'when'
of the DECODE. This causes problems when different BinaryPredicates try
to cast the same decodeExpr object to different types during analysis,
in this case leading to a Precondition check failure.
The solution is to clone the decodeExpr in the DECODE constructor in
CaseExpr for each generated BinaryPredicate.
Testing:
- Added a regression test to exprs.test
Change-Id: I4de9ed7118c8d18ec3f02ff74c9cca211c716e51
Reviewed-on: http://gerrit.cloudera.org:8080/5631
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Impala Public Jenkins
For a table that has both a table comment and a partition specified,
"show create table" incorrectly outputs the comment before the partition.
This is not the correct order, and it results in an invalid SQL.
This transaction fixes the ordering (partition comes before comment) and
adds tests for this case.
Change-Id: I29a33cfd142b473997fdc3acfe3f0966bc7ed784
Reviewed-on: http://gerrit.cloudera.org:8080/5648
Tested-by: Impala Public Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
This moves the timeline from the Analyzer GlobalState to the AnalysisContext
and AnalysisContext.AnalysisResult. When analysis needs to load metadata
about missing tables, it marks an event noting the start of metadata load.
Then, when metadata load completes (or times out), it marks an
event noting that metadata load completed (or timed out). Keeping the
timeline on the AnalysisContext means that it persists across attempts at
analysis. AnalysisContext.AnalysisResult has a reference to the timeline,
so that it persists past analyzeStmt and can be used for the rest of
the planning.
Here is an example output of the planner timeline after this change:
Planner Timeline: 4s371ms
- Metadata load started: 41.388ms (41.388ms)
- Metadata load finished: 4s260ms (4s219ms)
- Analysis finished: 4s296ms (35.693ms)
- Equivalence classes computed: 4s315ms (19.062ms)
- Single node plan created: 4s323ms (7.812ms)
- Runtime filters computed: 4s323ms (777.010us)
- Distributed plan created: 4s325ms (1.464ms)
- Planning finished: 4s371ms (46.697ms)
When there is no need to load metadata, the timeline looks like:
Planner Timeline: 13.695ms
- Analysis finished: 2.411ms (2.411ms)
- Equivalence classes computed: 2.653ms (241.733us)
- Single node plan created: 5.641ms (2.987ms)
- Runtime filters computed: 5.726ms (85.204us)
- Distributed plan created: 6.548ms (821.722us)
- Planning finished: 13.695ms (7.147ms)
Change-Id: I6f01a35e5f9f5007a0298acfc8e16da00ef99c6c
Reviewed-on: http://gerrit.cloudera.org:8080/5685
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Impala Public Jenkins
The bug was that expr rewrite rules such as ExtractCommonConjunctRule
analyzed their own output, which doesn't work for syntactic elements
that allow column aliases, such as the HAVING clause.
The fix was to remove the analysis step (the re-analysis happens anyway
in AnalysisCtx).
Change-Id: Ife74c61f549f620c42f74928f6474e8a5a7b7f00
Reviewed-on: http://gerrit.cloudera.org:8080/5662
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Impala Public Jenkins
The BlockStorageLocation import is unused.
Remove validation of config keys that only affect the
BlockStorageLocation API. See HDFS-10868 and HDFS-8895. We do not
need to validate these keys any more since we don't use that API.
These config keys are removed in Hadoop 3 so this patch is
required to build against it.
Change-Id: Ic12337a9f5b7d910282aaf7d8508a4176cf89cbc
Reviewed-on: http://gerrit.cloudera.org:8080/5526
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
The main problem was that the catalogd's response to
a DROP IF EXISTS operations included a removed object
that was applied to the requesting impalad's catalog cache.
In particular, a DROP DATABASE IF EXISTS that did not actually
drop anything in the catalogd still returned the object name in
the RPC response as a removed object with the *current* catalog
version (i.e., without incrementing the catalog version).
The above behavior lead to a situation where a drop of
a non-existent object overwrote a legitimate entry in
an impalad's CatalogDeltaLog. Recall that the version of the
dropped object was based on the current catalog version
at some point in time, e.g., the same version of a
legitimate entry in the CatalogDeltaLog.
As a reminder, the CatalogDeltaLog protects deletions from
being undone via updates from the statestore. So overwriting
an object in the CatalogDeltaLog can lead to a dropped object
appearing again with certain timing of a statestore update.
Please see the JIRA for an analysis of logging output that
shows the bug and its effect.
The fix is simple: The RPC response of a DROP IF EXISTS should
only contain a removed object if an object was actually
removed from the catalogd.
This fix, however, introduces a new consistency issue (IMPALA-4727).
The new behavior is not ideal, but better than the old behavior,
explained as follows:
The behavior before this patch is problematic because the drop of a
completely unrelated object can affect the consistency of a drop+add
on another object.
The behavior after this patch is that a drop+add may fail in the add
if there is an ill-timed concurrent drop of the same object.
Testing:
- Unfortunately, I have not been able to reproduce the issue
locally despite vigorous attempts and despite knowing what
the problem is. Our existing tests seem to reproduce the
issue pretty reliably, so it's not clear whether a targeted
test is feasible or needed.
- An exhaustive test run passed.
Change-Id: Icb1f31eb2ecf05b9b51ef4e12e6bb78f44d0cf84
Reviewed-on: http://gerrit.cloudera.org:8080/5556
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
This commit makes ADD PARTITION operations treat string partition-key
values as case sensitive in consistent with other related partition DDL
operations.
Change-Id: I6fbe67d99df8a50a16a18456fde85d03d622c7a1
Reviewed-on: http://gerrit.cloudera.org:8080/5535
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This change introduces the sortby() query plan hint for insert
statements. When specified, sortby(a, b) will add an additional sort
step to the plan to order data by columns a, b before inserting it into
the target table.
Change-Id: I37a3ffab99aaa5d5a4fd1ac674b3e8b394a3c4c0
Reviewed-on: http://gerrit.cloudera.org:8080/5051
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
The KuduScanNode attempts to push IN list predicates to the
Kudu scan, but NULL literals cannot be pushed. The code in
KuduScanNode needed to check if the Literals in the
InPredicate is a NullLiteral, in which case the entire IN
list should not be pushed to Kudu.
The same handling is already in place for binary predicate
pushdown.
Change-Id: Iaf2c10a326373ad80aef51a85cec64071daefa7b
Reviewed-on: http://gerrit.cloudera.org:8080/5505
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
We call getPermissions() on partition directories to find out if
Impala has access to those files. On S3, this currently is a no-op
as the S3A connector does not try to set/get the permissions for S3
objects. So, it always returns the default set of permissions -> 777.
However, it still makes a roundtrip to S3 causing a slow down in the
Catalog.
We can return the READ_WRITE permission immediately if we know we are
accessing an S3 file, thereby avoiding the round trip to S3 for every
partition. This will greatly speedup metadata operations for S3 tables
and partitions, which is already known to be a big bottleneck.
If and when the S3A connector is able to manage permissions in
the future, we need to revisit this code. However, as permissions on
S3 are unsupported by Impala right now, we might as well gain on perf.
Change-Id: If9d1072c185a6162727019cdf1cb34d7f3f1c75c
Reviewed-on: http://gerrit.cloudera.org:8080/5449
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Impala Public Jenkins
This introduces a global structure to coordinate execution
of fragment instances on a backend for a single query.
New classes:
- QueryExecMgr: subsumes FragmentMgr
- QueryState
- FragmentInstanceState: replaces FragmentExecState
Change-Id: I962ae6b7cb7dc0d07fbb8f70317aeb01d88d400b
Reviewed-on: http://gerrit.cloudera.org:8080/4418
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
declaration
Adding a better error message when logical type is specified at a wrong
level or is not not specified in an avro decimal column declaration.
Change-Id: Iad23706128223b6537d565471ef5d8faa91b0b5a
Reviewed-on: http://gerrit.cloudera.org:8080/5255
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Internal Jenkins
Implements the following conservative but correct policy for assigning
predicates from the On-clause of an inner join:
If the predicate references an outer-joined tuple, then evaluate it at
the inner join that the On-clause belongs to.
Cleans up Analyzer.canEvalPredicate().
Change-Id: Idf45323ed9102ffb45c9d94a130ea3692286f215
Reviewed-on: http://gerrit.cloudera.org:8080/4982
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This patch ensures that setting the query option
enable_expr_rewrites=false will disable both constant folding in the
frontend (which it did already) and constant caching in the backend
(which is enabled in this patch). This gives a way for users to revert
to the old behaviour of non-deterministic UDFs before these
optimisations were added in Impala 2.8.
Before this patch, the backend would cache values based on IsConstant().
This meant that there was no way to override caching of values of
non-deterministic UDFs, e.g. with enable_expr_rewrites.
After this patch, we only cache literal values in the backend. This
offers the same performance as before in the common case where the
frontend will constant fold the expressions anyway.
Also rename some functions to more cleanly separate the backend concepts
of "constant" expressions and expressions that can be evaluated without
a TupleRow. In a future change (IMPALA-4617) we should remove the
IsConstant() analysis logic from the backend entirely and pass the
information from the frontend. We should also fix isConstant() in the
frontend so that it only returns true when it is safe to constant-fold
the expression (IMPALA-4606). Once that is done, we could revert back
to using IsConstant() instead of IsLiteral().
Testing:
Added targeted test to test constant folding of UDFs: we expect
different results depending on whether constant folding is enabled.
Also run TestUdfs with expr rewrites enabled and disabled, since this
can exercise different code paths. Refactored test_udfs somewhat to
avoid running uninteresting combinations of query options for
targeted tests and removed some 'drop * if not exists' statements
that aren't necessary when using unique_database.
This change revealed flakiness in test_mem_limit, which seems
to have only worked by coincidence. Updated TrackAllocation() to
actually set the query status when a memory limit is exceeded.
Looped this test for a while to make sure it isn't flaky any
more.
Also fix other test bugs where the vector argument is modified
in-place, which can leak out to other tests.
Change-Id: I0c76e3c8a8d92749256c312080ecd7aac5d99ce7
Reviewed-on: http://gerrit.cloudera.org:8080/5391
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
This commit fixes an issue where an error is thrown if the default value
for a Kudu column is set to NULL.
Change-Id: Ida27ce56f1dd7603485a69c680db3bcea6702aff
Reviewed-on: http://gerrit.cloudera.org:8080/5405
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
The main issue was that the eval cost was not set for
timestamp literals, so a preconditions check was hit
when trying to order a list of conjuncts by cost.
Another subtle issue made the bug only reproducible by
a specific query against a Kudu table in our tests,
although the bug is not Kudu specific: The eval cost
of Exprs was not recomputed in analyze(), even after
resetting an Expr, e.g., during a substitution. As a
result, the bug was only reproducible for a list
of conjuncts that contained an inferred predicate
with a timestamp literal.
This patch does not contain a fix for that issue due
to its complexity/risk. It is tracked in IMPALA-4620.
Testing: Ran planner tests locally. Ran query_test.py
locally. A private core/hdfs run passed.
Change-Id: Ife30420bafbd1c64a5e3385e5755909110b4b354
Reviewed-on: http://gerrit.cloudera.org:8080/5404
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
Bug: Commit 6f31c7 fixed a crash when setting Avro schemas for
tables with storage altered to Avro file format. However the
fix was incomplete for partitioned/multi file format tables since
'hasAvroData_' is not set for all code paths that load the
partitioned tables (For example: HdfsTable#loadAllPartitions()).
Fix: Moved the code for setting 'hasAvroData_' to addPartition()
which is the common logic for all code paths adding new partitions.
Also fixed the test coverage gap by adding a new test for partitioned
tables altered to Avro format.
Change-Id: I7854ff002b2277ec4a5388216218a1d5ad142de8
Reviewed-on: http://gerrit.cloudera.org:8080/5388
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This commit also removes the now unused `DISTRIBUTE`, `SPLIT`, and
`BUCKETS` keywords that were going to be newly released in Impala 2.6,
but are now unused. Additionally, a few remaining uses of the
`DISTRIBUTE BY` syntax has been switched to `PARTITION BY`.
Change-Id: I32fdd5ef26c532f7a30220db52bdfbf228165922
Reviewed-on: http://gerrit.cloudera.org:8080/5382
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
Impala cannot correctly evaluate or assign some non-deterministic
predicates. This patch improves the error message shown when
trying to evaluate such unsupported predicates for the purpose
of partition pruning.
Change-Id: I94765f62bde94f4faa7fc5c26d928099ca1496d1
Reviewed-on: http://gerrit.cloudera.org:8080/5386
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
A recent change (IMPALA-1788) lead UUID() to be constant folded,
and therefore, produce the same value for every invocation across
rows. Similar issues might also occur due to the BE optimizing
UUID() during codegen of scalar-fn-call.h/cc.
The fix is to not treat UUID() like a constant expr in both
the FE and BE.
Discussion:
The fix in this patch is rather blunt, but minimally invasive to
reduce the risk of adding new bugs. Ideally, the constness of an
Expr should be determined in one place and the FE and BE should agree
on which Exprs are constant. I considered the following alternatives
but concluded they were too risky:
1. Pass a flag from FE to BE for ever Expr indicating its constness.
This simple solution would populate a thrift field with the
result of Expr.isConstant() for every Expr in an Expr tree.
There are several issues. Calling isConstant() for every Expr
in an Expr tree is rather expensive due to repeated traversals
of the tree. That could be mitigated by populating an isConstant
flag during Expr.analyze() to avoid re-computing the constness
repeatedly. This requires changes to analyze(), clone(), reset(),
and possibly other places for many Exprs. There is potential
for missing a place and adding a new bug.
2. The above solution could be limited to only FunctionCallExpr.
However, the BE expr type FUNCTION_CALL which maps to
scalar-fn-call.h/cc is created from various FE Exprs, not just
FunctionCallExpr. So adding a flag only to scalar-fn-call.h/cc
would be confusing because it would only sometimes be set
in a meaningful way. This seems more confusing than the current
straightforward solution.
Testing: Added FE and EE tests.
Change-Id: If2499f5f6ecdcb098623202c8e6dc2d02727194a
Reviewed-on: http://gerrit.cloudera.org:8080/5324
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Background: We generally allow the assignment of predicates below the
nullable side of a left/right outer join, explained as follows using an
example:
SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.id = t2.id
WHERE t2.int_col < 10
The scan of 't2' picks up 't2.int_col < 10' via
Analyzer.getBoundPredicates() and recognizes that the predicate must
also be evaluated by a join later, so the predicate is not marked as
assigned. The join then picks up the unassigned predicate via
Analyzer.getUnassignedConjuncts().
The bug was that our logic for detecting whether a bound predicate must
also be evaluated at a join node was flawed because it only considered
whether the tuples of the source or destination predicate were outer
joined (plus other conditions).
The underlying assumption is that either the source or destination tuple
are bound by a tuple produced by a TableRef, but in the buggy query the
source predicate is bound by an aggregation tuple, so we incorrectly
marked the bound predicate as assigned in Analyzer.getBoundPredicates().
The fix is to conservatively not mark bound predicates as assigned if
the slots referenced by the predicate have equivalent slots that
belong to an outer-joined tuple. As a result, a plan node may pick up
the same predicate multiple times, once via
Analyzer.getBoundPredicates() and another time via
Analyzer.getUnassignedConjuncts(). Those are deduped now.
The following example explains the duplicate predicate assignment:
SELECT * FROM (SELECT * FROM t t1) a LEFT OUTER JOIN t b ON a.id = b.id
WHERE a.id < 10
1. The predicate 'a.id < 10' gets migrated into the inline view.
'a.id < 10' is marked as assigned but is still registered as
a single-tid conjunct in the Analyzer for potential propagation
2. The scan node of 't1' calls Analyzer.getBoundPredicates() and
generates 't1.id < 10' based on the source predicate 'a.id < 10'.
3. The scan node of 't1' picks up the migrated conjunct 't1.id < 10'
via Analyzer.getUnassignedConjuncts().
Change-Id: I774d13a13ad1e8fe82512df98dc29983bdd232eb
Reviewed-on: http://gerrit.cloudera.org:8080/4960
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The bug was a simple oversight. In KuduScanNiode.init()
we forgot to call Analyzer.getBoundPredicates().
Change-Id: I19a38d6ea8cc0d2b0ddc3808d1f9ffef5ce306a8
Reviewed-on: http://gerrit.cloudera.org:8080/5365
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This commit changes the behavior of alter table operations on Kudu
tables from asynchronous to synchronous. With this change, alter table
operations return when either the operations complete successfully or
a timeout is reached.
Change-Id: I385bce66691ae9040e72f97557e1bba31009e36b
Reviewed-on: http://gerrit.cloudera.org:8080/5364
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
Fixes the KuduScanNode to convert InPredicates to
KuduPredicates and push them to the Kudu scan if possible.
An InPredicate can be pushed to the scan if expression is of
the exact form:
<SlotRef> IN (<LiteralExpr>, <LiteralExpr>, ...)
That means the InPredicate has the following properties:
1) It has a list of literal values (i.e. not a subquery);
All values are LiteralExprs (not SlotRefs).
2) Not negative, i.e. only 'IN' supported, not 'NOT IN'
3) The SlotRef is not wrapped in any casts
4) The types of all values match the type of the SlotRef
exactly.
A planner test was added exercising all supported types as
well as exprs where the values would not be supported.
TODO: perf testing
TODO: consider a limit on the number of list values before
keeping the predicate on the Impala scan node
(determine from testing)
Change-Id: I8988d4819d20d467b48e286917e347ca00f60cf0
Reviewed-on: http://gerrit.cloudera.org:8080/5316
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
This commit also fixes an issue where an error is thrown if a default
value is set for a boolean column on a Kudu table.
Change-Id: I25b66275d29d1cf21df14e78ab58f625a83b0725
Reviewed-on: http://gerrit.cloudera.org:8080/5337
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
Impala used to incorrectly assign On-clause equality predicates from an
outer join if those predicates referenced multiple tables, but only one
side of the outer join.
The fix is to add an additional check in Analyzer.getEqJoinConjuncts()
to prevent that incorrect assignment.
Change-Id: I719e0eeacccad070b1f9509d80aaf761b572add0
Reviewed-on: http://gerrit.cloudera.org:8080/4986
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The bug: We used to reset() the qualifier of union operands
to their original value obtained during parsing. This leads to
problems when union operands are unnested and we need to rewrite
Subqueries. In particular, the first union operand of a nested union
was reset() to a null qualifier, but that operand could be somewhere
in the middle of the list of unnested operands in the parent. At that
point, we've lost information about the qualifier of the unnested
operand.
The fix: The simplest solution is to not reset() the qualifier.
The other alternative is be to reset() the qualifier, but also
undo any unnesting. That seems unnecessary and wasteful.
Change-Id: I157bb0f08c4a94fd779487d7c23edd64a537a1f6
Reviewed-on: http://gerrit.cloudera.org:8080/4963
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This commit fixes an issue where a SHOW CREATE VIEW statement throws an
analysis error if the view contains a subquery.
Change-Id: I4a89e46a022f0ccec198b6e3e2b30230103831ce
Reviewed-on: http://gerrit.cloudera.org:8080/5333
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
COMPUTE STATS on Parquet tables is run with MT_DOP=4 by default.
COMPUTE STATS on non-Parquet tables will run without MT_DOP.
Users can always override the behavior by setting MT_DOP manually.
Setting MT_DOP to 0 means a statement will be run in the
conventional execution mode (without intra-node paralellism based
on multiple fragment instances). Users can set a higher MT_DOP
even for Parquet tables.
Testing: Added a new test that checks the effective MT_DOP.
Locally ran test_mt_dop.py, test_scanners.py, test_nested_types.py,
test_compute_stats.py, and test_cancellation.py.
Change-Id: I2be3c7c9f3004e9a759224a2e5756eb6e4efa359
Reviewed-on: http://gerrit.cloudera.org:8080/5315
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This patch improves the block metadata loading (locations and disk
storage IDs) for partitioned and un-partitioned tables in the Catalog
server.
Without this patch:
------------------
We loop through each and every file in the table/partition directories
and call getFileBlockLocations() on it to obtain the block metadata.
This results in large number of RPC calls to the Namenode, especially
for tables with large no. of files/partitions.
With this patch:
---------------
We move the block metadata querying to use listStatus() call which
accepts a directory as input and fetches the 'BlockLocation' objects
for every file recursively in that directory. This improves the
metadata loading in the following ways.
- For non-partitioned tables, we query all the BlockLocations in a
single RPC call in the base table directory and load the corresponding
disk IDs.
- For partitioned tables, we query the BlockLocations for all the
partitions residing under the base table directories in a single RPC
and then load every partition with non-default partition directory
separately.
- REFRESH on a table reloads the block metadata from scratch for
every data file every time. So it can be used as a replacement for
invalidate in situations like HDFS block rebalancing which needs
block metadata update.
Also, this patch does away with VolumeIds returned by the HDFS NN
and uses the new StorageIDs returned by the BlockLocation class.
These StorageIDs are UUID strings and hence are mapped to a
per-node 0-based index as expected by the backend. In the upcoming
versions of Hadoop APIs, getFileBlockStorageLocations() is deprecated
and instead the listStatus() returns BlockLocations with storage IDs
embedded. This patch makes use of this improvement to reduce an
additional RPC to the NN to fetch the storage locations.
Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26
Reviewed-on: http://gerrit.cloudera.org:8080/5148
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Internal Jenkins
All 'debug' output still gets written into the info log. Downgrading
to 'trace' to avoid that.
Change-Id: If54f9d563be75571c7dc6d99ed13a6e86d9061a9
Reviewed-on: http://gerrit.cloudera.org:8080/5342
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
If a table fails to load, eg. because it was deleted externally from
Kudu, we should still allow 'DROP TABLE' to pass analysis. Otherwise,
you may be unable to drop tables that are in a bad state.
Testing:
- Updates existing Kudu tests to reflect the new behavior, and fixes
a couple of problems with those tests that were causing them to pass
spuriously (as well as fixing the same problem with another test in
the file while I'm here).
Change-Id: I6b41fc3c0e95508ab67f1d420b033b02ec75a5da
Reviewed-on: http://gerrit.cloudera.org:8080/5144
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This commit reverts the behavior introduced by IMPALA-3719 which used
the Kudu default behavior for column nullability if none was specified
in the CREATE TABLE statement. With this commit, non-key columns of Kudu
tables that are created from Impala are by default nullable unless
specified otherwise.
Change-Id: I950d9a9c64e3851e11a641573617790b340ece94
Reviewed-on: http://gerrit.cloudera.org:8080/5259
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
During slot substitution, the type of the child of a CastExpr can
change. If the previous child type matched the CastExpr, then the cast
was flagged as noOp_. During substitution and subsequent re-analysis
the noOp_ flag was not revisited so that no cast was performed, even
after it had become necessary.
The fix is to always set noOp_ to the correct value in
CastExpr.analyze().
Change-Id: I7f29cdc359558fad6df455b8eec0e0eaed00e996
Reviewed-on: http://gerrit.cloudera.org:8080/5267
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
With this commit, we add support for additional ALTER TABLE statements
against Kudu tables. The new supported ALTER TABLE operations for Kudu are:
- ADD/DROP range partitions. Syntax:
ALTER TABLE <tbl_name> ADD [IF NOT EXISTS] RANGE <kudu_partition_spec>
ALTER TABLE <tbl_name> DROP [IF EXISTS] RANGE <kudu_partition_spec>
- ADD/DROP/RENAME column. Syntax:
ALTER TABLE <tbl_name> ADD COLUMNS (col_spec, [col_spec, ...])
ALTER TABLE <tbl_name> DROP COLUMN <col_name>
ALTER TABLE <tbl_name> CHANGE COLUMN <old> <new_name> <type>
- Rename Kudu table using the 'kudu.table_name' table property. Example:
ALTER TABLE <tbl_name> SET TBLPROPERTY ('kudu.tbl_name'='<new_name>'),
will change the underlying Kudu table name to <new_name>.
- Renaming the HMS/Catalog table entry of a Kudu table is supported using the
existing ALTER TABLE <tbl_name> RENAME TO <new_tbl_name> syntax.
Not supported:
- ALTER TABLE <tbl_name> REPLACE COLUMNS
Change-Id: I04bc87e04e05da5cc03edec79d13cedfd2012896
Reviewed-on: http://gerrit.cloudera.org:8080/5136
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
At this time, there is no comprehensive way of enforcing a Sentry
authorization policy against tables stored in Kudu. The following
behavior was implemented in this patch:
- Only the ALL privilege level can be granted to Kudu tables.
Finer-grained levels such as only SELECT or only INSERT are not
supported.
- Column level permissions on Kudu tables are not supported.
- Only users with ALL privileges on SERVER may create external Kudu
tables.
Change-Id: I183f08ad8ce80deee011a6b90ad67b9cefc0452c
Reviewed-on: http://gerrit.cloudera.org:8080/5047
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
Instead of using substring(), parseInt() and a try/catch, directly
check the character.
Change-Id: Iebef43a6a2f7923ca0e9c158d83f5c06f26da0cd
Reviewed-on: http://gerrit.cloudera.org:8080/5210
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins