When there are conditionals with constant values of TRUE or
FALSE we can simplify them during analysis using the ExprRewriter.
This patch introduces the SimplifyConditionalsRule with covers IF,
OR, AND, CASE, and DECODE.
It also introduces NormalizeExprsRule which normalizes AND and OR
such that if either child is a BoolLiteral, then the left child is a
BoolLiteral.
Testing:
- Added unit tests to ExprRewriteRulesTest.
- Added functional tests to expr.test
- Ran FE planner tests and BE expr-test.
Change-Id: Id70aaf9fd99f64bd98175b7e2dbba28f350e7d3b
Reviewed-on: http://gerrit.cloudera.org:8080/5585
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
HIVE-15653 is a Hive Metastore bug that results in ALTER TABLE
commands wiping the table stats of unpartitioned tables.
Until the Hive bug is fixed, this patch adds a workaround
to Impala that forces the Metastore to preserve the table stats.
Testing: Private core/hdfs run passed.
Change-Id: Ic191c765f73624bc716badadd7215c8dca9d6b1f
Reviewed-on: http://gerrit.cloudera.org:8080/5731
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
Previously Impala was inconsistent about whether the year 10000 was
supported, as a result of inconsistency in boost, which reported the
maximum year as 9999 but sometimes allowed 10000. This meant that
Impala sometimes accepted the year 10000 and sometimes not.
Use the patched boost version and update tests accordingly.
Testing:
Ran an exhaustive build.
Change-Id: Iaf23b40833017789d879e5da7bb10384129e2d10
Reviewed-on: http://gerrit.cloudera.org:8080/5665
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
The DECODE constructor in CaseExpr uses the same decodeExpr object when
building the BinaryPredicates that compare the decodeExpr to each 'when'
of the DECODE. This causes problems when different BinaryPredicates try
to cast the same decodeExpr object to different types during analysis,
in this case leading to a Precondition check failure.
The solution is to clone the decodeExpr in the DECODE constructor in
CaseExpr for each generated BinaryPredicate.
Testing:
- Added a regression test to exprs.test
Change-Id: I4de9ed7118c8d18ec3f02ff74c9cca211c716e51
Reviewed-on: http://gerrit.cloudera.org:8080/5631
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Impala Public Jenkins
For a table that has both a table comment and a partition specified,
"show create table" incorrectly outputs the comment before the partition.
This is not the correct order, and it results in an invalid SQL.
This transaction fixes the ordering (partition comes before comment) and
adds tests for this case.
Change-Id: I29a33cfd142b473997fdc3acfe3f0966bc7ed784
Reviewed-on: http://gerrit.cloudera.org:8080/5648
Tested-by: Impala Public Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
The bug was that expr rewrite rules such as ExtractCommonConjunctRule
analyzed their own output, which doesn't work for syntactic elements
that allow column aliases, such as the HAVING clause.
The fix was to remove the analysis step (the re-analysis happens anyway
in AnalysisCtx).
Change-Id: Ife74c61f549f620c42f74928f6474e8a5a7b7f00
Reviewed-on: http://gerrit.cloudera.org:8080/5662
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Impala Public Jenkins
Refactor BufferedBlockMgr/TmpFileMgr to push more I/O logic into
TmpFileMgr, in anticipation of it being shared with BufferPool.
TmpFileMgr now handles:
* Scratch space allocation and recycling
* Read and write I/O
The interface is also greatly changed so that it is built around Write()
and Read() calls, abstracting away the details of temporary file
allocation from clients. This means the TmpFileMgr::File class can
be hidden from clients.
Write error recovery:
Also implement write error recovery in TmpFileMgr.
If an error occurs while writing to scratch and we have multiple
scratch directories, we will try one of the other directories
before cancelling the query. File-level blacklisting is used to
prevent excessive repeated attempts to resize a scratch file during
a single query. Device-level blacklisting is not implemented because
it is problematic to permanently take a scratch directory out of use.
To reduce the number of error paths, all I/O errors are now handled
asynchronously. Previously errors creating or extending the file were
returned synchronously from WriteUnpinnedBlock(). This required
modifying DiskIoMgr to create the file if not present when opened.
Also set the default max_errors value in the thrift definition file,
so that it is in effect for backend tests.
Future Work:
* Support for recycling variable-length scratch file ranges. I omitted
this to avoid making the patch even large.
Testing:
Updated BufferedBlockMgr unit test to reflect changes in behaviour:
* Scratch space is no longer permanently associated with a block, and
is remapped every time a new block is written to disk .
* Files are now blacklisted - updated existing tests and enable the
disable blacklisting test.
Added some basic testing of recycling of scratch file ranges in
the TmpFileMgr unit test.
I also manually tested the code in two ways. First by removing permissions
for /tmp/impala-scratch and ensuring that a spilling query fails cleanly.
Second, by creating a tiny ramdisk (16M) and running with two scratch
directories: one on /tmp and one on the tiny ramdisk. When spilling, an
out of space error is encountered for the tiny ramdisk and impala spills
the remaining data (72M) to /tmp.
Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17
Reviewed-on: http://gerrit.cloudera.org:8080/5141
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
This commit makes ADD PARTITION operations treat string partition-key
values as case sensitive in consistent with other related partition DDL
operations.
Change-Id: I6fbe67d99df8a50a16a18456fde85d03d622c7a1
Reviewed-on: http://gerrit.cloudera.org:8080/5535
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This is the second patch to address IMPALA-4684. The first patch exposed
a transient Zookeeper connection error on RHEL7. This patch introduces a
retry (up to 3 times), and somewhat better logging.
Tested by running tests against an RHEL7 instance and confirming that
all HBase nodes start up.
Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c
Reviewed-on: http://gerrit.cloudera.org:8080/5554
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
If an exception (other than NoNodeError) was raised while checking for
HBase nodes, we weren't cleanly stopping the ZooKeeper client, which
in turn created a second exception when the the connection was closed.
The second exception masked the original error condition.
Tested by forcibly raising unexpected errors while checking for HBase
nodes.
Change-Id: I46a74d018f9169385a9f10a85718044c31a24dbc
Reviewed-on: http://gerrit.cloudera.org:8080/5547
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This change introduces the sortby() query plan hint for insert
statements. When specified, sortby(a, b) will add an additional sort
step to the plan to order data by columns a, b before inserting it into
the target table.
Change-Id: I37a3ffab99aaa5d5a4fd1ac674b3e8b394a3c4c0
Reviewed-on: http://gerrit.cloudera.org:8080/5051
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
The KuduScanNode attempts to push IN list predicates to the
Kudu scan, but NULL literals cannot be pushed. The code in
KuduScanNode needed to check if the Literals in the
InPredicate is a NullLiteral, in which case the entire IN
list should not be pushed to Kudu.
The same handling is already in place for binary predicate
pushdown.
Change-Id: Iaf2c10a326373ad80aef51a85cec64071daefa7b
Reviewed-on: http://gerrit.cloudera.org:8080/5505
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
Testing:
Tested that buildall.sh works as expected. Built locally with
IMPALA_MAKE_FLAGS unset to confirm I didn't break anything.
Built locally with
IMPALA_MAKE_FLAGS=--load-average=$IMPALA_BUILD_THREADS and looked
at "ps auxf" output to confirm it's passed through.
Change-Id: I17b13cbaf395f962762d5cff3d650ffb077934a4
Reviewed-on: http://gerrit.cloudera.org:8080/5480
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
IMPALA-4553 made ntp-wait succeed before kudu would start, assuming
ntp-wait was installed, in order to prevent a litany of errors on ec2
about unsynchronized clocks. This patch disables that waiting if no
internet connection is detected in order to make it possible to start
the minicluster when offline.
Change-Id: Ifbb5babebb0ca6d2553be1b001e20e2270e052b6
Reviewed-on: http://gerrit.cloudera.org:8080/5412
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
Implements the following conservative but correct policy for assigning
predicates from the On-clause of an inner join:
If the predicate references an outer-joined tuple, then evaluate it at
the inner join that the On-clause belongs to.
Cleans up Analyzer.canEvalPredicate().
Change-Id: Idf45323ed9102ffb45c9d94a130ea3692286f215
Reviewed-on: http://gerrit.cloudera.org:8080/4982
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
CodegenWriteSlot() receives negative length values for the lengths of
the slots passed to it if the slots contain escape characters. (This
is currently only for non-string types, as we do not codegen string
types with escaped characters). The DelimitedTextParser is responsible
for identifying escape characters and assigning the negative lengths
appropriately. CodegenWriteCompleteTuple() passes this length to
CodegenWriteSlot() as it is. This differs from the behavior of
WriteSlot() where the length passed to it is always positive, as all
the callers of WriteSlot() make sure of that (including
WriteCompleteTuple()).
The IrIsNullString() and IrGenericIsNullString() functions are
responsibe for checking if the given data contains a NULL pattern.
They are called by CodegenWriteSlot(). A NULL pattern usually contains
an escaped character which means that the length of that slot will be
a negative length. However, the IrIsNullString() and
IrGenericIsNullString() that take the same length argument from
CodegenWriteSlot() always expect a positive length argument. So, no
slots were ever marked as NULL by these NULL-checking functions when
codegen was enabled.
NULL slots were still detected accidentally because of some incorrect
code in CodegenWriteSlot() that marked invalid slots and NULL slots as
NULL. Therefore, due to this code, even invalid slots were not marked
as invalid and did not return an error. Instead they were just
sliently marked as NULL.
This patch makes sure that only positive lengths are passed to
CodegenWriteSlot() so that NULL checking is correct and it also
makes sure that invalid slots are not silently marked as NULL.
Testing: Re-enabled an older hdfs-scan-node-errors test. Formatted
it to fit new error message format after IMPALA-3859 and IMPALA-3895.
Change-Id: I858e427ad7c2b2da8c2bb657be06b7443655781f
Reviewed-on: http://gerrit.cloudera.org:8080/5377
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
This patch ensures that setting the query option
enable_expr_rewrites=false will disable both constant folding in the
frontend (which it did already) and constant caching in the backend
(which is enabled in this patch). This gives a way for users to revert
to the old behaviour of non-deterministic UDFs before these
optimisations were added in Impala 2.8.
Before this patch, the backend would cache values based on IsConstant().
This meant that there was no way to override caching of values of
non-deterministic UDFs, e.g. with enable_expr_rewrites.
After this patch, we only cache literal values in the backend. This
offers the same performance as before in the common case where the
frontend will constant fold the expressions anyway.
Also rename some functions to more cleanly separate the backend concepts
of "constant" expressions and expressions that can be evaluated without
a TupleRow. In a future change (IMPALA-4617) we should remove the
IsConstant() analysis logic from the backend entirely and pass the
information from the frontend. We should also fix isConstant() in the
frontend so that it only returns true when it is safe to constant-fold
the expression (IMPALA-4606). Once that is done, we could revert back
to using IsConstant() instead of IsLiteral().
Testing:
Added targeted test to test constant folding of UDFs: we expect
different results depending on whether constant folding is enabled.
Also run TestUdfs with expr rewrites enabled and disabled, since this
can exercise different code paths. Refactored test_udfs somewhat to
avoid running uninteresting combinations of query options for
targeted tests and removed some 'drop * if not exists' statements
that aren't necessary when using unique_database.
This change revealed flakiness in test_mem_limit, which seems
to have only worked by coincidence. Updated TrackAllocation() to
actually set the query status when a memory limit is exceeded.
Looped this test for a while to make sure it isn't flaky any
more.
Also fix other test bugs where the vector argument is modified
in-place, which can leak out to other tests.
Change-Id: I0c76e3c8a8d92749256c312080ecd7aac5d99ce7
Reviewed-on: http://gerrit.cloudera.org:8080/5391
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
This commit fixes an issue where an error is thrown if the default value
for a Kudu column is set to NULL.
Change-Id: Ida27ce56f1dd7603485a69c680db3bcea6702aff
Reviewed-on: http://gerrit.cloudera.org:8080/5405
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
The main issue was that the eval cost was not set for
timestamp literals, so a preconditions check was hit
when trying to order a list of conjuncts by cost.
Another subtle issue made the bug only reproducible by
a specific query against a Kudu table in our tests,
although the bug is not Kudu specific: The eval cost
of Exprs was not recomputed in analyze(), even after
resetting an Expr, e.g., during a substitution. As a
result, the bug was only reproducible for a list
of conjuncts that contained an inferred predicate
with a timestamp literal.
This patch does not contain a fix for that issue due
to its complexity/risk. It is tracked in IMPALA-4620.
Testing: Ran planner tests locally. Ran query_test.py
locally. A private core/hdfs run passed.
Change-Id: Ife30420bafbd1c64a5e3385e5755909110b4b354
Reviewed-on: http://gerrit.cloudera.org:8080/5404
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
In a recent change (IMPALA-4363) we introduced a change where all file
paths in .test files should be replaced with '__HDFS_FILENAME__'. This
caused problems for tests on non-HDFS file systems and we also lost some
test coverage. This patch fixes the problem by allowing the $DATABASE
template in the catch section of the .test file.
Change-Id: If0f6ae8dea7ac4cdaf0c61ebd8f0c589c353a96e
Reviewed-on: http://gerrit.cloudera.org:8080/5372
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
For LLVM IR UDF, Impalad will link an external LLVM module
in which the IR UDF is defined with the main module. If it
happens that a symbol is defined in both modules, LLVM may
choose to discard the one defined in the external module.
The discarded function and its callee will not be present
in the linked module.
In IMPALA-4595, udf-sample.cc was compiled without any
optimization. Duplicated definition such as StringVal::null()
may have different inlining level between the external module
and the main module. When the duplicated definition in
the external module is discarded, some of its callee
functions (which are not inlined) may not be defined in the
main module so they can no longer be located in the linked
module. This trips up some code in the LlvmCodegen::LinkModule().
In particular, when parsing for functions in external module
which are materialized during linking, certain functions may
not be present due to the reason above. Impalad will hit
a DCHECK in debug build or crash due to null pointer access
in release build.
This change fixes the problem above by taking into account
that certain functions may not be defined anymore after linking.
This change also fixes two incorrect status propagation in
fe-support.cc.
Change-Id: Iaa056a0c888bfcc95b412e1bc1063bb607b58ab7
Reviewed-on: http://gerrit.cloudera.org:8080/5384
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
Bug: Commit 6f31c7 fixed a crash when setting Avro schemas for
tables with storage altered to Avro file format. However the
fix was incomplete for partitioned/multi file format tables since
'hasAvroData_' is not set for all code paths that load the
partitioned tables (For example: HdfsTable#loadAllPartitions()).
Fix: Moved the code for setting 'hasAvroData_' to addPartition()
which is the common logic for all code paths adding new partitions.
Also fixed the test coverage gap by adding a new test for partitioned
tables altered to Avro format.
Change-Id: I7854ff002b2277ec4a5388216218a1d5ad142de8
Reviewed-on: http://gerrit.cloudera.org:8080/5388
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This commit also removes the now unused `DISTRIBUTE`, `SPLIT`, and
`BUCKETS` keywords that were going to be newly released in Impala 2.6,
but are now unused. Additionally, a few remaining uses of the
`DISTRIBUTE BY` syntax has been switched to `PARTITION BY`.
Change-Id: I32fdd5ef26c532f7a30220db52bdfbf228165922
Reviewed-on: http://gerrit.cloudera.org:8080/5382
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
Impala cannot correctly evaluate or assign some non-deterministic
predicates. This patch improves the error message shown when
trying to evaluate such unsupported predicates for the purpose
of partition pruning.
Change-Id: I94765f62bde94f4faa7fc5c26d928099ca1496d1
Reviewed-on: http://gerrit.cloudera.org:8080/5386
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Bumps the toolchain version to get a newer Kudu build.
Also fixes test failures resulting from changes in Kudu.
Notably error strings have changed (IMPALA-4590) and the
number of replicas must be odd (IMPALA-4589).
Note: The toolchain binaries starting with this build are
now using the toolchain binutils rather than the system
binutils.
Testing: private exhaustive build.
Change-Id: If1912f058c240fbe82b06f77e31add7755289be1
Reviewed-on: http://gerrit.cloudera.org:8080/5369
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
A recent change (IMPALA-1788) lead UUID() to be constant folded,
and therefore, produce the same value for every invocation across
rows. Similar issues might also occur due to the BE optimizing
UUID() during codegen of scalar-fn-call.h/cc.
The fix is to not treat UUID() like a constant expr in both
the FE and BE.
Discussion:
The fix in this patch is rather blunt, but minimally invasive to
reduce the risk of adding new bugs. Ideally, the constness of an
Expr should be determined in one place and the FE and BE should agree
on which Exprs are constant. I considered the following alternatives
but concluded they were too risky:
1. Pass a flag from FE to BE for ever Expr indicating its constness.
This simple solution would populate a thrift field with the
result of Expr.isConstant() for every Expr in an Expr tree.
There are several issues. Calling isConstant() for every Expr
in an Expr tree is rather expensive due to repeated traversals
of the tree. That could be mitigated by populating an isConstant
flag during Expr.analyze() to avoid re-computing the constness
repeatedly. This requires changes to analyze(), clone(), reset(),
and possibly other places for many Exprs. There is potential
for missing a place and adding a new bug.
2. The above solution could be limited to only FunctionCallExpr.
However, the BE expr type FUNCTION_CALL which maps to
scalar-fn-call.h/cc is created from various FE Exprs, not just
FunctionCallExpr. So adding a flag only to scalar-fn-call.h/cc
would be confusing because it would only sometimes be set
in a meaningful way. This seems more confusing than the current
straightforward solution.
Testing: Added FE and EE tests.
Change-Id: If2499f5f6ecdcb098623202c8e6dc2d02727194a
Reviewed-on: http://gerrit.cloudera.org:8080/5324
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Background: We generally allow the assignment of predicates below the
nullable side of a left/right outer join, explained as follows using an
example:
SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.id = t2.id
WHERE t2.int_col < 10
The scan of 't2' picks up 't2.int_col < 10' via
Analyzer.getBoundPredicates() and recognizes that the predicate must
also be evaluated by a join later, so the predicate is not marked as
assigned. The join then picks up the unassigned predicate via
Analyzer.getUnassignedConjuncts().
The bug was that our logic for detecting whether a bound predicate must
also be evaluated at a join node was flawed because it only considered
whether the tuples of the source or destination predicate were outer
joined (plus other conditions).
The underlying assumption is that either the source or destination tuple
are bound by a tuple produced by a TableRef, but in the buggy query the
source predicate is bound by an aggregation tuple, so we incorrectly
marked the bound predicate as assigned in Analyzer.getBoundPredicates().
The fix is to conservatively not mark bound predicates as assigned if
the slots referenced by the predicate have equivalent slots that
belong to an outer-joined tuple. As a result, a plan node may pick up
the same predicate multiple times, once via
Analyzer.getBoundPredicates() and another time via
Analyzer.getUnassignedConjuncts(). Those are deduped now.
The following example explains the duplicate predicate assignment:
SELECT * FROM (SELECT * FROM t t1) a LEFT OUTER JOIN t b ON a.id = b.id
WHERE a.id < 10
1. The predicate 'a.id < 10' gets migrated into the inline view.
'a.id < 10' is marked as assigned but is still registered as
a single-tid conjunct in the Analyzer for potential propagation
2. The scan node of 't1' calls Analyzer.getBoundPredicates() and
generates 't1.id < 10' based on the source predicate 'a.id < 10'.
3. The scan node of 't1' picks up the migrated conjunct 't1.id < 10'
via Analyzer.getUnassignedConjuncts().
Change-Id: I774d13a13ad1e8fe82512df98dc29983bdd232eb
Reviewed-on: http://gerrit.cloudera.org:8080/4960
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The bug was a simple oversight. In KuduScanNiode.init()
we forgot to call Analyzer.getBoundPredicates().
Change-Id: I19a38d6ea8cc0d2b0ddc3808d1f9ffef5ce306a8
Reviewed-on: http://gerrit.cloudera.org:8080/5365
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This commit changes the behavior of alter table operations on Kudu
tables from asynchronous to synchronous. With this change, alter table
operations return when either the operations complete successfully or
a timeout is reached.
Change-Id: I385bce66691ae9040e72f97557e1bba31009e36b
Reviewed-on: http://gerrit.cloudera.org:8080/5364
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
Fixes the KuduScanNode to convert InPredicates to
KuduPredicates and push them to the Kudu scan if possible.
An InPredicate can be pushed to the scan if expression is of
the exact form:
<SlotRef> IN (<LiteralExpr>, <LiteralExpr>, ...)
That means the InPredicate has the following properties:
1) It has a list of literal values (i.e. not a subquery);
All values are LiteralExprs (not SlotRefs).
2) Not negative, i.e. only 'IN' supported, not 'NOT IN'
3) The SlotRef is not wrapped in any casts
4) The types of all values match the type of the SlotRef
exactly.
A planner test was added exercising all supported types as
well as exprs where the values would not be supported.
TODO: perf testing
TODO: consider a limit on the number of list values before
keeping the predicate on the Impala scan node
(determine from testing)
Change-Id: I8988d4819d20d467b48e286917e347ca00f60cf0
Reviewed-on: http://gerrit.cloudera.org:8080/5316
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
This commit also fixes an issue where an error is thrown if a default
value is set for a boolean column on a Kudu table.
Change-Id: I25b66275d29d1cf21df14e78ab58f625a83b0725
Reviewed-on: http://gerrit.cloudera.org:8080/5337
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
The bugs was that the functions did not check whether the conversion
pushed the value out of range. The fix is to use boost's validation
immediately to check the validity of the timestamp and catch any
exceptions thrown.
It would be preferable to avoid the exceptions, but Boost does not
provide a straightforward way to disable the exceptions or extract
potentially-invalid values from a date object.
Testing:
Added expression tests that exercise out-of-range cases. Also
added additional tests to confirm that date addition and subtraction
weren't affected by similar bugs.
Change-Id: Idc427b06ac33ec874a05cb98d01c00e970d3dde6
Reviewed-on: http://gerrit.cloudera.org:8080/5251
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Impala used to incorrectly assign On-clause equality predicates from an
outer join if those predicates referenced multiple tables, but only one
side of the outer join.
The fix is to add an additional check in Analyzer.getEqJoinConjuncts()
to prevent that incorrect assignment.
Change-Id: I719e0eeacccad070b1f9509d80aaf761b572add0
Reviewed-on: http://gerrit.cloudera.org:8080/4986
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The bug: We used to reset() the qualifier of union operands
to their original value obtained during parsing. This leads to
problems when union operands are unnested and we need to rewrite
Subqueries. In particular, the first union operand of a nested union
was reset() to a null qualifier, but that operand could be somewhere
in the middle of the list of unnested operands in the parent. At that
point, we've lost information about the qualifier of the unnested
operand.
The fix: The simplest solution is to not reset() the qualifier.
The other alternative is be to reset() the qualifier, but also
undo any unnesting. That seems unnecessary and wasteful.
Change-Id: I157bb0f08c4a94fd779487d7c23edd64a537a1f6
Reviewed-on: http://gerrit.cloudera.org:8080/4963
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This commit fixes an issue where a SHOW CREATE VIEW statement throws an
analysis error if the view contains a subquery.
Change-Id: I4a89e46a022f0ccec198b6e3e2b30230103831ce
Reviewed-on: http://gerrit.cloudera.org:8080/5333
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
Before this patch, we would simply read the INT96 Parquet timestamp
representation and assume that it's valid. However, not all bit
permutations represent a valid timestamp. One of the boost functions
raised an exception (that we didn't catch) when passed an invalid
boost date object, which resulted in a crash. This patch fixes
problem by validating that the date falls into 1400..9999 year
range as we are scanning Parquet.
Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac
Reviewed-on: http://gerrit.cloudera.org:8080/5343
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
This commit reverts the behavior introduced by IMPALA-3719 which used
the Kudu default behavior for column nullability if none was specified
in the CREATE TABLE statement. With this commit, non-key columns of Kudu
tables that are created from Impala are by default nullable unless
specified otherwise.
Change-Id: I950d9a9c64e3851e11a641573617790b340ece94
Reviewed-on: http://gerrit.cloudera.org:8080/5259
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Fix a test bug where we need to skip nested types tests for the old aggs
and joins.
Fix a product bug where *eos is not initialised by the MT scan node.
This causes incorrect results when the calling ExecNode does not
initialise the eos variable, e.g. the sort node and the old agg and join
nodes.
Testing:
Added a test that reproduces the incorrect results with the sort node
when run under ASAN
Tested the mt_dop tests locally with old aggs and joins to ensure they
pass.
Change-Id: I48c50c8aa0c23710eb099fba252bc3c0cb74b313
Reviewed-on: http://gerrit.cloudera.org:8080/5302
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
The bug was that HdfsScanNodeMt::Close() did not properly
clean up all in-flight resources when called through the
query cancellation path.
The main change is to clean up all resources when passing
a NULL batch into HdfsparquetScanner::Close() which also
needed similar changes in the scanner context.
Testing: Ran test_cancellation.py, test_scanners.py and
test_nested_types.py with MT_DOP=3. Added a test query
with a limit that was failing before.
A regular private hdfs/core test run succeeded.
Change-Id: Ib32f87b3289ed9e8fc2db0885675845e11207438
Reviewed-on: http://gerrit.cloudera.org:8080/5274
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
During slot substitution, the type of the child of a CastExpr can
change. If the previous child type matched the CastExpr, then the cast
was flagged as noOp_. During substitution and subsequent re-analysis
the noOp_ flag was not revisited so that no cast was performed, even
after it had become necessary.
The fix is to always set noOp_ to the correct value in
CastExpr.analyze().
Change-Id: I7f29cdc359558fad6df455b8eec0e0eaed00e996
Reviewed-on: http://gerrit.cloudera.org:8080/5267
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
With this commit, we add support for additional ALTER TABLE statements
against Kudu tables. The new supported ALTER TABLE operations for Kudu are:
- ADD/DROP range partitions. Syntax:
ALTER TABLE <tbl_name> ADD [IF NOT EXISTS] RANGE <kudu_partition_spec>
ALTER TABLE <tbl_name> DROP [IF EXISTS] RANGE <kudu_partition_spec>
- ADD/DROP/RENAME column. Syntax:
ALTER TABLE <tbl_name> ADD COLUMNS (col_spec, [col_spec, ...])
ALTER TABLE <tbl_name> DROP COLUMN <col_name>
ALTER TABLE <tbl_name> CHANGE COLUMN <old> <new_name> <type>
- Rename Kudu table using the 'kudu.table_name' table property. Example:
ALTER TABLE <tbl_name> SET TBLPROPERTY ('kudu.tbl_name'='<new_name>'),
will change the underlying Kudu table name to <new_name>.
- Renaming the HMS/Catalog table entry of a Kudu table is supported using the
existing ALTER TABLE <tbl_name> RENAME TO <new_tbl_name> syntax.
Not supported:
- ALTER TABLE <tbl_name> REPLACE COLUMNS
Change-Id: I04bc87e04e05da5cc03edec79d13cedfd2012896
Reviewed-on: http://gerrit.cloudera.org:8080/5136
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
When ntpd is not synchronized, kudu initialization fails on the master
node:
F1129 16:37:28.969956 15230 master_main.cc:68] Check failed:
_s.ok() Bad status: Service unavailable: Cannot initialize clock:
Error reading clock. Clock considered unsynchronized
Change-Id: I371e01e21246a8c0ece98ca7d4bf6761615127b4
Reviewed-on: http://gerrit.cloudera.org:8080/5258
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
The error path in OptimizeLlvmModule() has not worked correctly for a
long time because various places in the code assume that codegen'd
function pointers will be filled in (e.g. ScalarFnCall) . Since the
recent change "IMPALA-4397,IMPALA-3259: reduce codegen time and memory"
it is more likely to go down this path.
The cases when errors occur on this path: memory limit exceeded, internal
codegen bugs, and corrupt IR UDFs, are all cases when it is not correct
or safe to continue executing the query, so we should just fail the
query.
Testing:
Add a test where codegen reliably fails with memory limit exceeded.
Change-Id: Ib38d0a44b54c47617cad1b971244f477d344d505
Reviewed-on: http://gerrit.cloudera.org:8080/5211
Reviewed-by: Michael Ho <kwho@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
With this commit we add support for auditing all Kudu-specific
operations and we enable column lineage for INSERT and UPSERT
statements on Kudu tables. No lineage output is generated for DELETE and
UPDATE statements.
Change-Id: Idc4ca1cd63bcfa4370c240a5c4a4126ed6704f4d
Reviewed-on: http://gerrit.cloudera.org:8080/5151
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
Previously, CopyStringVal() mistakenly copies a null
StringVal as an empty string (i.e. a non-null string
with zero length). This change fixes the problem by
distinguishing between these two cases in CopyStringVal()
and handles them properly. Also added a test case for it.
This problem only started showing up recently due to
commit 51268c053f which
calls CopyStringVal() in OffsetFnInit(). All other
pre-existing callers of CopyStringVal() before that
commit checks if 'src' is null before calling it so
the problem never showed up. In that sense, this is
a latent bug exposed by the aforementioned commit.
Change-Id: I3a5b9349dd08556eba5cfedc8c0063cc59f5be03
Reviewed-on: http://gerrit.cloudera.org:8080/5198
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
Adds a new ExprRewriteRule for replacing constant expressions
with their literal equivalent via BE evaluation. Applies the
new rule together with the existing ones on the parse tree,
after analysis.
Limitations
- Constant folding is applied on the unresolved expressions.
As a result, it only works for expressions that are constant
within a single query block, as opposed to expressions that
may become constant after fully substituting inline-view exprs.
- Exprs are not normalized, so some opportunities for constant
folding are missed for certain expr-tree shapes.
This patch includes the following interesting changes:
- Introduces a timestamp literal that can only be produced
by constant folding (not expressible directly via SQL).
- To make sure that rewrites have no user-visible effect,
the original result types and column labels of the top-level
statement are restored after the rewrites are performed.
- Does not fold exprs if their evaluation resulted in a
warning or error, or if the resulting value is not
representable by corresponding FE LiteralExpr.
- Fixes an existing issue with converting strings between
the FE/BE. String produced in the BE that have characters
with a value > 127 are not correctly deserialized into a
Java String via thrift. We detect this case during constant
folding and abandon folding of such exprs.
- Fixes several issues with detecting/reporting errors in
NativeEvalConstExprs().
- Cleans up ExprContext::GetValue() into
ExprContext::GetConstantValue() which clarifies its only use
of evaluating exprs from the FE.
Testing:
- Modifies expr-test.cc to run all tests through the constant
folding path.
- Adds basic planner and rewrite rule tests.
- Exhaustive test run passed
Change-Id: If672b703db1ba0bfc26e5b9130161798b40a69e9
Reviewed-on: http://gerrit.cloudera.org:8080/5109
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins