This patch adds a query rewriter to remove redundant explicit casts to
a string type (string, char, varchar) from binary predicates of the form
"cast(<non-const expr> to <string type>) <eq/ne op> <string constant>".
The cast is redundant if the predicate evaluation is the same even if
the cast is removed and the constant is converted to the original type
of the expression. For example:
cast(int_col as string) = '123456' -> int_col = 123456
Performance:
For the following query on a table having 6001215 records -
select * from tpch.lineitem where cast(l_linenumber as string) = '0'
+-----------------+-----------+--------+
| | Scan Time |
+-----------------+-----------+--------+
| | Avg | St dev |
| Without rewrite | 1s406ms | 44ms |
| With rewrite | 1s099ms | 28ms |
+-----------------+-----------+--------+
Testing:
- Added unit tests to ExprRewriteRulesTest
- Added functional test to expr.test
- Current FE planner tests and BE expr-test run successfully with this
change.
Change-Id: I91b7c6452d0693115f9b9ed9ba09f3ffe0f36b2b
Reviewed-on: http://gerrit.cloudera.org:8080/8660
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Impala Public Jenkins
Previously, CodegenAnyVal used an LLVM function for floating point
comparison that considered 'nan' = 'nan' to be true. This is
inconsistent with the way we handle 'nan' in the non-codegen path,
where we consider 'nan' = 'nan' to be false, leading to inconsisent
results.
This patch fixes CodegenAnyVal to use an LLVM function for floating
point comparison that considers 'nan' = 'nan' to be false.
Testing:
- Added e2e tests for the two scenarios affected by this: CASE and
joins.
Change-Id: I1bb8e5074b3c939927dedc46bc9db63ca24486a1
Reviewed-on: http://gerrit.cloudera.org:8080/8790
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
The FROM_UNIXTIME(epoch) and FROM_UNIXTIME(epoch, format) produce
different results when epoch is out of range of TimestampValue.
The former produces an empty string, while the latter gives NULL.
The fix is to harmonize the results to NULL.
Testing:
Add unit tests to ExprTest.TimestampFunctions.
Change-Id: Ie3a5e9a9cb39d32993fa2c7f725be44d8b9ce9f2
Reviewed-on: http://gerrit.cloudera.org:8080/8629
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
TimestampValue::FromSubsecondUnixTime() and UtcFromUnixTimeMicros()
are incorrect only in case of the last second of 1399, because these
sub-second values are rounded first towards 1400-01-01 00:00:00, which
is accepted as a valid date, and the sub-second part is subtracted
afterwards, leading to a date outside the valid interval. The maximum
case, 9999-12-31 59:59:59 is a bit different, because as I understand,
with nanosecond precision posix times, the maximum value is actually
10000-01-01. 00:00:00 minus 1 nanosec.
TimestampValue::FromUnixTimeNanos() can create problematic TimestampValues
both <1400 and 10000<=.
These timestamps can cause problems, because most code assumes that if
HasDate/HasTime is true, then it really is a valid timestamp.
To fix this, the posix times are checked in the constructor of
TimestampValue, and if it is outside the valid interval,both time_
and date_ are set to not_a_date_time.
Test:
select cast(-17987443200-0.1 as timestamp);
This query no longer crashes, but returns NULL, similarly to other
< 1400 timestamps.
Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff
Reviewed-on: http://gerrit.cloudera.org:8080/7954
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
No tests were added/dropped or modified. They are consolidated into
fewer .test files.
Change-Id: Idda4b34b5e6e9b5012b177a4c00077aa7fec394c
Reviewed-on: http://gerrit.cloudera.org:8080/8153
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
Adds a new expression to represent the following boolean predicate:
<expr> IS [NOT] (TRUE | FALSE | UNKNOWN)
The expression is expanded in the parser to istrue/false for the checks
against true and false respectively and to isnull for the check against
unknown. Compared to the other approaches (rewrites, extended backend expr),
this change is the simplest. Main downside is that error messages are
in terms of the lowered expression.
Testing:
- fe: parser, tosql, analyze exprs
- e2e: query exprs
Change-Id: I9d5fba65ef6c87dfc55a25d2c45246f74eb48c40
Reviewed-on: http://gerrit.cloudera.org:8080/8122
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
The internal representation of the most negative number
in two's complement requires 1 more bit to represent the
positive version. This means ABS() must promote integer
types to the next highest width.
Change-Id: I86cc880e78258d5f90471bd8af4caeb4305eed77
Reviewed-on: http://gerrit.cloudera.org:8080/8004
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
A recent change, IMPALA-5016, added an expr rewrite rule to simplfy
coalesce(). This rule eliminates the coalesce() when its first
parameter (that isn't constant null) is a SlotRef pointing to a
SlotDescriptor that is non-nullable (for example because it is from
a non-nullable Kudu column or because it is from an HDFS partition
column with no null partitions), under the assumption that the SlotRef
could never have a null value.
This assumption is violated when the SlotRef is the output of an
outer join, leading to incorrect results being returned. The problem
is that the nullability of a SlotDescriptor (which determines whether
there is a null indicator bit in the tuple for that slot) is a
slightly different property than the nullability of a SlotRef pointing
to that SlotDescriptor (since the SlotRef can still be NULL if the
entire tuple is NULL).
This patch removes the portion of the rewrite rule that considers
the nullability of the SlotDescriptor. This means that we're missing
out on some optimizations opportunities and we should revisit this in
a way that works with outer joins (IMPALA-5753)
Testing:
- Updated FE tests.
- Added regression tests to exprs.test
Change-Id: I1ca6df949f9d416ab207016236dbcb5886295337
Reviewed-on: http://gerrit.cloudera.org:8080/7567
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
This patch addresses 3 issues:
- SelectList.reset() didn't properly reset some of its members, though
they're documented as needing to be reset. This was causing a crash
when the Planner attempted to make an aggregation node for an agg
function that had been eliminated by expr rewriting. While I'm here,
I added resetting of all of SelectList's members that need to be
reset, and fixed the documentation of one member that shouldn't be
reset.
- SimplifyConditionalsRule was changing the meaning of queries that
contain agg functions, e.g. because "select if(true, 0, sum(id))"
is not equivalent to "select 0". The fix is to not return the
simplfied expr if it removes all aggregates.
- ExprRewriteRulesTest was performing rewrites on the result exprs of
the SelectStmt, which causes problems if the result exprs have been
substituted. In normal query execution, we don't rewrite the result
exprs anyway, so the fix is to match normal query execution and
rewrite the select list exprs.
Testing:
- Added e2e test to exprs.test.
- Added unit test to ExprRewriteRulesTest.
Change-Id: Ic20b1621753980b47a612e0885804363b733f6da
Reviewed-on: http://gerrit.cloudera.org:8080/6653
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
This turned out to be slightly non-trivial as REPLACE is already a
keyword, and thus the parser needs to be tweaked to allow this,
since function names act as bare identifiers.
It was difficult to get this to match performance of regexp_replace.
For expanding patterns, the fact that regexp_replace copies the
expansion inline means that it may in fact win on large strings
with sparse matches that are > dcache size apart. Let's leave
optimizing that for later.
Testing: Added a full test for maximum size strings and got most
of the boundary conditions I could identify. Manually ran queries
on TPC-H dataset in impala to verify both performance and correctness.
Added large string and exprs.test test clauses and ran the tests to
verify they work as expected.
Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Reviewed-on: http://gerrit.cloudera.org:8080/5776
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
In SelectList.reset(), we call reset() on each select list item's
expr. reset() is supposed to remove implicit casts, by returning
the reset expr with implicit cast exprs removed from the tree.
Previously SelectList.reset() ignored the return value of the calls
to Expr.reset(), meaning that if the top-most expr of the select list
item is an implicit cast, it won't actually get removed, which causes
problems with analysis since implicit casts are always treated as
pre-analyzed.
The solution is to set the select list item's exprs to the return
value of reset().
Testing:
- Added a regression test to exprs.test
Change-Id: I16ff88716b185e1d72d2bc603a42bd06c60ec18e
Reviewed-on: http://gerrit.cloudera.org:8080/5917
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
When there are conditionals with constant values of TRUE or
FALSE we can simplify them during analysis using the ExprRewriter.
This patch introduces the SimplifyConditionalsRule with covers IF,
OR, AND, CASE, and DECODE.
It also introduces NormalizeExprsRule which normalizes AND and OR
such that if either child is a BoolLiteral, then the left child is a
BoolLiteral.
Testing:
- Added unit tests to ExprRewriteRulesTest.
- Added functional tests to expr.test
- Ran FE planner tests and BE expr-test.
Change-Id: Id70aaf9fd99f64bd98175b7e2dbba28f350e7d3b
Reviewed-on: http://gerrit.cloudera.org:8080/5585
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
Previously Impala was inconsistent about whether the year 10000 was
supported, as a result of inconsistency in boost, which reported the
maximum year as 9999 but sometimes allowed 10000. This meant that
Impala sometimes accepted the year 10000 and sometimes not.
Use the patched boost version and update tests accordingly.
Testing:
Ran an exhaustive build.
Change-Id: Iaf23b40833017789d879e5da7bb10384129e2d10
Reviewed-on: http://gerrit.cloudera.org:8080/5665
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
The DECODE constructor in CaseExpr uses the same decodeExpr object when
building the BinaryPredicates that compare the decodeExpr to each 'when'
of the DECODE. This causes problems when different BinaryPredicates try
to cast the same decodeExpr object to different types during analysis,
in this case leading to a Precondition check failure.
The solution is to clone the decodeExpr in the DECODE constructor in
CaseExpr for each generated BinaryPredicate.
Testing:
- Added a regression test to exprs.test
Change-Id: I4de9ed7118c8d18ec3f02ff74c9cca211c716e51
Reviewed-on: http://gerrit.cloudera.org:8080/5631
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Impala Public Jenkins
The bug was that expr rewrite rules such as ExtractCommonConjunctRule
analyzed their own output, which doesn't work for syntactic elements
that allow column aliases, such as the HAVING clause.
The fix was to remove the analysis step (the re-analysis happens anyway
in AnalysisCtx).
Change-Id: Ife74c61f549f620c42f74928f6474e8a5a7b7f00
Reviewed-on: http://gerrit.cloudera.org:8080/5662
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Impala Public Jenkins
A recent change (IMPALA-1788) lead UUID() to be constant folded,
and therefore, produce the same value for every invocation across
rows. Similar issues might also occur due to the BE optimizing
UUID() during codegen of scalar-fn-call.h/cc.
The fix is to not treat UUID() like a constant expr in both
the FE and BE.
Discussion:
The fix in this patch is rather blunt, but minimally invasive to
reduce the risk of adding new bugs. Ideally, the constness of an
Expr should be determined in one place and the FE and BE should agree
on which Exprs are constant. I considered the following alternatives
but concluded they were too risky:
1. Pass a flag from FE to BE for ever Expr indicating its constness.
This simple solution would populate a thrift field with the
result of Expr.isConstant() for every Expr in an Expr tree.
There are several issues. Calling isConstant() for every Expr
in an Expr tree is rather expensive due to repeated traversals
of the tree. That could be mitigated by populating an isConstant
flag during Expr.analyze() to avoid re-computing the constness
repeatedly. This requires changes to analyze(), clone(), reset(),
and possibly other places for many Exprs. There is potential
for missing a place and adding a new bug.
2. The above solution could be limited to only FunctionCallExpr.
However, the BE expr type FUNCTION_CALL which maps to
scalar-fn-call.h/cc is created from various FE Exprs, not just
FunctionCallExpr. So adding a flag only to scalar-fn-call.h/cc
would be confusing because it would only sometimes be set
in a meaningful way. This seems more confusing than the current
straightforward solution.
Testing: Added FE and EE tests.
Change-Id: If2499f5f6ecdcb098623202c8e6dc2d02727194a
Reviewed-on: http://gerrit.cloudera.org:8080/5324
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The bugs was that the functions did not check whether the conversion
pushed the value out of range. The fix is to use boost's validation
immediately to check the validity of the timestamp and catch any
exceptions thrown.
It would be preferable to avoid the exceptions, but Boost does not
provide a straightforward way to disable the exceptions or extract
potentially-invalid values from a date object.
Testing:
Added expression tests that exercise out-of-range cases. Also
added additional tests to confirm that date addition and subtraction
weren't affected by similar bugs.
Change-Id: Idc427b06ac33ec874a05cb98d01c00e970d3dde6
Reviewed-on: http://gerrit.cloudera.org:8080/5251
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
During slot substitution, the type of the child of a CastExpr can
change. If the previous child type matched the CastExpr, then the cast
was flagged as noOp_. During substitution and subsequent re-analysis
the noOp_ flag was not revisited so that no cast was performed, even
after it had become necessary.
The fix is to always set noOp_ to the correct value in
CastExpr.analyze().
Change-Id: I7f29cdc359558fad6df455b8eec0e0eaed00e996
Reviewed-on: http://gerrit.cloudera.org:8080/5267
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Adds a new ExprRewriteRule for replacing constant expressions
with their literal equivalent via BE evaluation. Applies the
new rule together with the existing ones on the parse tree,
after analysis.
Limitations
- Constant folding is applied on the unresolved expressions.
As a result, it only works for expressions that are constant
within a single query block, as opposed to expressions that
may become constant after fully substituting inline-view exprs.
- Exprs are not normalized, so some opportunities for constant
folding are missed for certain expr-tree shapes.
This patch includes the following interesting changes:
- Introduces a timestamp literal that can only be produced
by constant folding (not expressible directly via SQL).
- To make sure that rewrites have no user-visible effect,
the original result types and column labels of the top-level
statement are restored after the rewrites are performed.
- Does not fold exprs if their evaluation resulted in a
warning or error, or if the resulting value is not
representable by corresponding FE LiteralExpr.
- Fixes an existing issue with converting strings between
the FE/BE. String produced in the BE that have characters
with a value > 127 are not correctly deserialized into a
Java String via thrift. We detect this case during constant
folding and abandon folding of such exprs.
- Fixes several issues with detecting/reporting errors in
NativeEvalConstExprs().
- Cleans up ExprContext::GetValue() into
ExprContext::GetConstantValue() which clarifies its only use
of evaluating exprs from the FE.
Testing:
- Modifies expr-test.cc to run all tests through the constant
folding path.
- Adds basic planner and rewrite rule tests.
- Exhaustive test run passed
Change-Id: If672b703db1ba0bfc26e5b9130161798b40a69e9
Reviewed-on: http://gerrit.cloudera.org:8080/5109
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This patch fixes two issues around handling of constant expr args.
The patches are combined because they touch some of the same code
and depend on some of the same memory management cleanup.
First, it fixes IMPALA-2379, where constant expr args were not visible
to UDAFs. The issue is that the input exprs need to be opened before
calling the UDAF Init() function.
Second, it avoids overhead from repeated evaluation of constant
arguments for ScalarFnCall expressions on both the codegen'd and
interpreted paths. A common example is an IN predicate with a
long list of constant values.
The interpreted path was inefficient because it always evaluated all
children expressions. Instead in this patch constant args are
evaluated once and cached. The memory management of the AnyVal*
objects was somewhat nebulous - adjusted it so that they're allocated
from ExprContext::mem_pool_, which has the correct lifetime.
The codegen'd path was inefficient only with varargs - with fixed
arguments the LLVM optimiser is able to infer after inlining that
the expressions are constant and remove all evaluation. However,
for varargs it stores the vararg values into a heap-allocated buffer.
The LLVM optimiser is unable to remove these stores because they
have a side-effect that is visible to code outside the function.
The codegen'd path is improved by evaluating varargs into an automatic
buffer that can be optimised out. We also make a small related change
to bake the string constants into the codegen'd code.
Testing:
Ran exhaustive build.
Added regression test for IMPALA-2379 and MemPool test for aligned
allocation. Added a test for in predicates with constant strings.
Perf:
Added a targeted query that demonstrates the improvement. Also manually
validated the non-codegend perf. Also ran TPC-H and targeted perf
queries locally - didn't see any significant changes.
+--------------------+-------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters |
+--------------------+-------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| TARGETED-PERF(_20) | primitive_filter_in_predicate | parquet / none / none | 1.19 | 9.82 | I -87.85% | 3.82% | 0.71% | 1 | 10 |
+--------------------+-------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
(I) Improvement: TARGETED-PERF(_20) primitive_filter_in_predicate [parquet / none / none] (9.82s -> 1.19s [-87.85%])
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+
| Operator | % of Query | Avg | Base Avg | Delta(Avg) | StdDev(%) | Max | Base Max | Delta(Max) | #Hosts | #Rows | Est #Rows |
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+
| 01:AGGREGATE | 14.39% | 155.88ms | 214.61ms | -27.37% | 2.68% | 163.38ms | 227.53ms | -28.19% | 1 | 1 | 1 |
| 00:SCAN HDFS | 85.60% | 927.46ms | 9.43s | -90.16% | 4.49% | 1.01s | 9.50s | -89.42% | 1 | 13.77K | 14.05K |
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+
Change-Id: I45c3ed8c9d7a61e94a9b9d6c316e8a53d9ff6c24
Reviewed-on: http://gerrit.cloudera.org:8080/4838
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This changes the code to use the lddqu and movdqu instructions (via
Intel intrinsics) to allow unaligned memory access.
Change-Id: I39b2b47bb717d5ac9727512a24fcf8a8a6a8dcc6
Reviewed-on: http://gerrit.cloudera.org:8080/4205
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Fixed StringToFloatInternal() not to parse strings like "1.23inf"
and "infinite" with leading/trailing garbage as Infinity. These
strings are now rejected with PARSE_FAILURE.
Only "inf" and "infinity" are accepted, parsing is case-insensitive.
"NaN" values are handled similarly: strings with leading/trailing
garbage like "nana" are rejected, parsing is case-insensitive.
Other changes:
- StringToFloatInternal() was cleaned up a bit. Parsing inf and NaN
strings was moved out of the main loop.
- Use std::numeric_limits<T>::infinity() instead of INFINITY macro
and std::numeric_limits<T>::quiet_NaN() instead of NAN macro.
- Fixed another minor bug: multiple dots are allowed when parsing
float values (e.g. "2.1..6" is interpreted as 2.16).
- New BE and E2E tests were added.
Change-Id: I9e17d0f051b300a22a520ce34e276c2d4460d35e
Reviewed-on: http://gerrit.cloudera.org:8080/3791
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
We do not have exceptions enabled for codegen'd code, so exceptions
thrown by functions called by codegen'd functions cannot be caught by
the codegen'd functions. TimestampValue::UnixTimeToPtime() has a
try/catch around boost::posix_time::ptime_from_tm(), but since it was
inlined into the TimestampFunctions::FromUnix() IR the try/catch
didn't work. This patch moves the UnixTimeToPtime() implementation to
the .cc file so it doesn't get included in the IR. It does the same
for TimestampParser::Parse() in case it gets inlined into IR code as
well.
Change-Id: Ic0af73629e1e3b6bf18cbf5d832973712b068527
Reviewed-on: http://gerrit.cloudera.org:8080/2210
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
Enforces that the planner treats IS NOT DISTINCT FROM as eligible for
hash joins, but does not find the minimum spanning tree of
equivalences for use in optimizing query plans; this is left as future
work.
Change-Id: I62c5300b1fbd764796116f95efe36573eed4c8d0
Reviewed-on: http://gerrit.cloudera.org:8080/710
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Internal Jenkins
This patch implements a new built-in function
regexp_match_count. This function returns the number of
matching occurrences in input.
The regexp_match_count() function has the following syntax:
int = regexp_match_count(string input, string pattern)
int = regexp_match_count(string input, string pattern,
int start_pos, string flags)
The input value specifies the string on which the regular
expression is processed.
The pattern value specifies the regular expression.
The start_pos value specifies the character position
at which to start the search for a match. It is set
to 1 by default if it's not specified.
The flags value (if specified) dictates the behavior of
the regular expression matcher:
m: Specifies that the input data might contain more than
one line so that the '^' and the '$' matches should take
that into account.
i: Specifies that the regex matcher is case insensitive.
c: Specifies that the regex matcher is case sensitive.
n: Specifies that the '.' character matches newlines.
By default, the flag value is set to 'c'. Note that the
flags are consistent with other existing built-in functions
(e.g. regexp_like) so certain flags in IBM netezza such as
's' are not supported to avoid confusion.
Change-Id: Ib33ece0448f78e6a60bf215640f11b5049e47bb5
Reviewed-on: http://gerrit.cloudera.org:8080/1248
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
multi-line text
This patch fixes the option passed to the RE2 regex matcher
so that it will count the newline character '\n' as a valid
candidate for '.'. Previously, that option was set to false
by default, causing multi-line text to fail to match against
patterns with wildcard characters in it. This patch also adds
some tests to address these cases and fixes some typos in
like-predicate.h.
Change-Id: I25367623f587bf151e4c87cc7cb6aec3cd57e41a
Reviewed-on: http://gerrit.cloudera.org:8080/1172
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
Failure due to an issue with NULL tuples (IMPALA-2375)
where NULL tuples come from the right side of a left
outer join where the right side comes from an inline view
which produces 0 slots (e.g. the view selects a constant).
The HJ doesn't handle them correctly because the planner
inserts an IsTupleNull expr. This isn't an issue for the
PHJ because the BufferedTupleStream returns non-NULL Tuple*
ptrs even for tuples with no slots.
Per IMPALA-2375, we're going to address this after 2.3, so
moving this test case into joins-partitioned.test which only
runs on the PHJ.
Change-Id: I64cb7e8ffd60f3379aa8860135db5af8e66d686f
Reviewed-on: http://gerrit.cloudera.org:8080/1231
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
Before, Expr::IsConstant() manually specified the constant Expr
classes, but TupleIsNullPredicate and AnalyticExpr overrode
IsConstant() to always return false (which Expr::IsConstant() didn't
specify). This meant that unless the TupleIsNullPredicate was the root
expr, TupleIsNullPredicate::IsConstant() would never be called and
Expr::IsConstant() would return true. This patch changes
Expr::IsConstant() to recurse on its children, rather than having it
contain the constant logic for all expr types.
Change-Id: I756eb945e04c791eff39c33305fe78d957ec29f4
Reviewed-on: http://gerrit.cloudera.org:8080/1214
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
QueryExecState::FetchRowsInternal() doesn't check the query state after evaluating the
select statement expressions with GetRowValue(). These means that, e.g., UDFs that call
SetError() in the select list will not fail the query.
Change-Id: I120d7abbee2a3ed5c5c66ec0a3a9b6e9a6ab10bf
Reviewed-on: http://gerrit.cloudera.org:8080/815
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
By not using THREAD_LOCAL for its state, btrim() invocations in
multi-threaded contexts (i.e. pushed to the scanner) would have threads
trampling over each other's bitset used to check for trimmed characters.
Testing:
See new test in expr.test:
select count(*) from functional.alltpyes where btrim(string_col, string_col) != ""
.. should give 0 results, but would give > 0 with this bug.
Change-Id: I595e25b1d4fb7c76b846fce837b4ec140f47d43c
Reviewed-on: http://gerrit.cloudera.org:8080/748
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
Use select query instead of insert query to verify constant expression
on partition column.
Change-Id: I442111225e8df29bcc5fe89500d023559bb1c1fb
Reviewed-on: http://gerrit.cloudera.org:8080/707
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
The test file testdata/workloads/functional-query/queries/QueryTest/exprs.test had INSERT
statements in it, which are not supported on S3. This commit gets rid of those statements
and rewrites them with SELECT [...] FROM VALUES(...) so that the tests are compatible on
S3.
Change-Id: I25faacf9fae3780f627afee86dc8c1ede7f6e2a2
Reviewed-on: http://gerrit.cloudera.org:8080/670
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
Added the SPLIT_PART and the REGEXP_LIKE builtin functions and tests for both.
The REGEXP_LIKE has an optional third parameter which if used, uses a different
'prepare' function (RegexpLikePrepare in like-predicate.cc) so that the appropriate
options can be set in the RE2 library.
Added a patch for the RE2 library so that the 'dot matches all' option is exposed
via the RE2 class.
Fixed a bug in the case when the function to be evaluated for the WHERE clause
operates on constants, proper cleanup isn't guaranteed on certain edge cases.
Change-Id: Ia2a8de9eeb2854100a2d949f612cfaba317c5a7b
Reviewed-on: http://gerrit.cloudera.org:8080/501
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
It turns out there is a variety of cases where boost incorrectly adds
intervals if the interval is at (or beyond) an edge case value. This
change defines a max interval and returns NULL if the user supplies
an interval beyond the max.
Change-Id: I4fb6869be22ab06089b66eeffaea04b0c0880080
Reviewed-on: http://gerrit.cloudera.org:8080/492
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Bit and byte functions for compatibility with Teradata: bitand, bitor, bitxor, bitnot,
countset, getbit, setbit, shiftleft, shiftright, rotateleft, rotateright.
Interfaces and behavior follow Teradata documentation.
All bit* functions are compatible with DB2. bitand only is compatible with Oracle.
Change-Id: Idba3fb7beb029de493b602e6279aa68e32688df3
When a UDF with constant parameters in the select list calls SetError(), it does not fail
the query. This is because UnionNode::GetNext() does not check for errors after
UnionNode::EvalAndMaterializeExprs() evaluates the expression, which itself does not
report the error.
Change-Id: I8850cf1a603e320bb23f4a9a4d47600d14590f3a
Implements suffix n! operator for factorial and factorial function.
Slightly refactor operators in fe to share code between unary operators.
Based partially on work by Arthur Peng <arthur.peng@intel.com>.
Change-Id: I71b6c824c59fc5305f16b8c4457805126a1da93b
Reviewed-on: http://gerrit.cloudera.org:8080/531
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Changed the way the function context error message is returned. Also, changed the
exception thrown in SingleNodePlanner from IllegalStateException to AnalysisException
in case of an exception in registerConjuncts().
This commit follows from:
d497ba6cef
This is a new commit since the previous one was closed before making these changes.
Change-Id: Ifa9b7c0884d76b6d7911d8cd80355a8ba13c4c18
Reviewed-on: http://gerrit.cloudera.org:8080/560
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
not done before throwing exception.
When a builtin has an error (in the constant case), it is checked for but the state
cleanup isn't taken care of which results in a DCHECK. When a UDF has an error (in the
constant case), the error does not propagate back up the stack due to a lack of error
checking in ScalarFnCall::Open() after it calls GetConstVal().
Change-Id: Ib500c84a41df574690369f124044991ed8c82cc1
Reviewed-on: http://gerrit.cloudera.org:8080/537
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
Boost handles a couple of edge cases differently than other databases
such as Postgres and MySQL when adding year/month intervals to
timestamps. This change makes Impala consistent for the other databases.
The performance difference was not noticeable (<5% if any).
Change-Id: Icb02a06281b53753938cab88e0d28f20709fee06
Reviewed-on: http://gerrit.cloudera.org:8080/489
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This patch introduces the function GetConstant(), which is used by
expr compute function and UDFs to access query constants. There is a
corresponding GetIrConstant() function that returns the IR versions of
the same constants. Currently the only implemented constants are the
expr's return type and argument types, but other constants can be
easily be added to these functions. Interpreted expr functions run
normally, but cross-compiled functions can be passed to
InlineConstants(), which looks for calls to GetConstant() and replaces
them with the result of calling GetIrConstant().
I used this technique in the decimal functions that previously were
not switching on the type at all. The performance of LeastGreatest()
after this patch is the same as it was before it switched on the type.
Change-Id: I8b55744551830d894318a7bab6b6f045fb8bed41
Reviewed-on: http://gerrit.cloudera.org:8080/352
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
operator with values
This commit fixes an issue where a query is not successfully analyzed if an
IN operator with values appears in a binary predicate.
Change-Id: Ia3b83803a553b9a3b3489382fc53978a720c4b4f
Reviewed-on: http://gerrit.cloudera.org:8080/334
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
The IN predicate wasn't using the decimal type when comparing decimal
values. I benchmarked this on a modified version of TPCDS-Q8 (i.e. a
query with a huge decimal IN predicate) and there is a ~5% performance
degradation with codegen enabled (surprisingly, there appears to be a
slight performance gain with codegen disabled). We should be able to
remove this penalty when we add constant injection via codegen.
Change-Id: Ie1296fd50c68d06a343701442da49fe8d3cd16dd
Reviewed-on: http://gerrit.cloudera.org:8080/230
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
This should fix the last y2k38 problem. Previously calling
unix_timestamp() with a input of '2038-01-19 03:14:08' or later would
return a negative value due to a 32 bit int overflow. This patch
switches from 32 to 64 bit ints.
Change-Id: Ic9180887d6c828f6ecd25435be86fd0bd52d3f0d
Reviewed-on: http://gerrit.cloudera.org:8080/61
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Our .test file parser used to not abort tests when there
is a malformed test/section. This patch changes that behavior
to report an error and treat the test as failed.
Quite a few tests were not well-formed, and were not executed
as a result. This patch fixes those tests.
Arguably, the test file parser should be more flexible in which places
to accept comments, but this patch does not address that problem.
Change-Id: If53358eb0cb958b68e51940b071e64c1d6c3ec6f
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5468
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins