impala

mirror of https://github.com/apache/impala.git synced 2026-01-01 18:00:30 -05:00

Author	SHA1	Message	Date
Bikramjeet Vig	545163bb0a	IMPALA-5929: Remove redundant explicit casts to string This patch adds a query rewriter to remove redundant explicit casts to a string type (string, char, varchar) from binary predicates of the form "cast(<non-const expr> to <string type>) <eq/ne op> <string constant>". The cast is redundant if the predicate evaluation is the same even if the cast is removed and the constant is converted to the original type of the expression. For example: cast(int_col as string) = '123456' -> int_col = 123456 Performance: For the following query on a table having 6001215 records - select * from tpch.lineitem where cast(l_linenumber as string) = '0' +-----------------+-----------+--------+ \| \| Scan Time \| +-----------------+-----------+--------+ \| \| Avg \| St dev \| \| Without rewrite \| 1s406ms \| 44ms \| \| With rewrite \| 1s099ms \| 28ms \| +-----------------+-----------+--------+ Testing: - Added unit tests to ExprRewriteRulesTest - Added functional test to expr.test - Current FE planner tests and BE expr-test run successfully with this change. Change-Id: I91b7c6452d0693115f9b9ed9ba09f3ffe0f36b2b Reviewed-on: http://gerrit.cloudera.org:8080/8660 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-03 01:15:42 +00:00
Thomas Tauber-Marshall	e94c60833a	IMPALA-6069: Fix CodegenAnyVal's handling of 'nan' Previously, CodegenAnyVal used an LLVM function for floating point comparison that considered 'nan' = 'nan' to be true. This is inconsistent with the way we handle 'nan' in the non-codegen path, where we consider 'nan' = 'nan' to be false, leading to inconsisent results. This patch fixes CodegenAnyVal to use an LLVM function for floating point comparison that considers 'nan' = 'nan' to be false. Testing: - Added e2e tests for the two scenarios affected by this: CASE and joins. Change-Id: I1bb8e5074b3c939927dedc46bc9db63ca24486a1 Reviewed-on: http://gerrit.cloudera.org:8080/8790 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-08 22:42:03 +00:00
Jinchul	2fba80ee5e	IMPALA-5146: Fix inconsitent results at FROM_UNIXTIME() The FROM_UNIXTIME(epoch) and FROM_UNIXTIME(epoch, format) produce different results when epoch is out of range of TimestampValue. The former produces an empty string, while the latter gives NULL. The fix is to harmonize the results to NULL. Testing: Add unit tests to ExprTest.TimestampFunctions. Change-Id: Ie3a5e9a9cb39d32993fa2c7f725be44d8b9ce9f2 Reviewed-on: http://gerrit.cloudera.org:8080/8629 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-29 05:22:32 +00:00
Csaba Ringhofer	41f0c6a5a6	IMPALA-5664: Unix time to timestamp conversions may crash Impala TimestampValue::FromSubsecondUnixTime() and UtcFromUnixTimeMicros() are incorrect only in case of the last second of 1399, because these sub-second values are rounded first towards 1400-01-01 00:00:00, which is accepted as a valid date, and the sub-second part is subtracted afterwards, leading to a date outside the valid interval. The maximum case, 9999-12-31 59:59:59 is a bit different, because as I understand, with nanosecond precision posix times, the maximum value is actually 10000-01-01. 00:00:00 minus 1 nanosec. TimestampValue::FromUnixTimeNanos() can create problematic TimestampValues both <1400 and 10000<=. These timestamps can cause problems, because most code assumes that if HasDate/HasTime is true, then it really is a valid timestamp. To fix this, the posix times are checked in the constructor of TimestampValue, and if it is outside the valid interval,both time_ and date_ are set to not_a_date_time. Test: select cast(-17987443200-0.1 as timestamp); This query no longer crashes, but returns NULL, similarly to other < 1400 timestamps. Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Reviewed-on: http://gerrit.cloudera.org:8080/7954 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-21 03:13:14 +00:00
Alex Behm	c1781b73b3	Move tests related to the old join node. No tests were added/dropped or modified. They are consolidated into fewer .test files. Change-Id: Idda4b34b5e6e9b5012b177a4c00077aa7fec394c Reviewed-on: http://gerrit.cloudera.org:8080/8153 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-28 18:36:17 +00:00
Vuk Ercegovac	646920810f	IMPALA-1767 Adds predicate to test boolean values true, false, unknown. Adds a new expression to represent the following boolean predicate: <expr> IS [NOT] (TRUE \| FALSE \| UNKNOWN) The expression is expanded in the parser to istrue/false for the checks against true and false respectively and to isnull for the check against unknown. Compared to the other approaches (rewrites, extended backend expr), this change is the simplest. Main downside is that error messages are in terms of the lowered expression. Testing: - fe: parser, tosql, analyze exprs - e2e: query exprs Change-Id: I9d5fba65ef6c87dfc55a25d2c45246f74eb48c40 Reviewed-on: http://gerrit.cloudera.org:8080/8122 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-23 04:33:20 +00:00
Zachary Amsden	f53ce3b16d	IMPALA-4513: Promote integer types for ABS() The internal representation of the most negative number in two's complement requires 1 more bit to represent the positive version. This means ABS() must promote integer types to the next highest width. Change-Id: I86cc880e78258d5f90471bd8af4caeb4305eed77 Reviewed-on: http://gerrit.cloudera.org:8080/8004 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-23 02:41:32 +00:00
Thomas Tauber-Marshall	2ae94e7ead	IMPALA-5725: coalesce() with outer join incorrectly rewritten A recent change, IMPALA-5016, added an expr rewrite rule to simplfy coalesce(). This rule eliminates the coalesce() when its first parameter (that isn't constant null) is a SlotRef pointing to a SlotDescriptor that is non-nullable (for example because it is from a non-nullable Kudu column or because it is from an HDFS partition column with no null partitions), under the assumption that the SlotRef could never have a null value. This assumption is violated when the SlotRef is the output of an outer join, leading to incorrect results being returned. The problem is that the nullability of a SlotDescriptor (which determines whether there is a null indicator bit in the tuple for that slot) is a slightly different property than the nullability of a SlotRef pointing to that SlotDescriptor (since the SlotRef can still be NULL if the entire tuple is NULL). This patch removes the portion of the rewrite rule that considers the nullability of the SlotDescriptor. This means that we're missing out on some optimizations opportunities and we should revisit this in a way that works with outer joins (IMPALA-5753) Testing: - Updated FE tests. - Added regression tests to exprs.test Change-Id: I1ca6df949f9d416ab207016236dbcb5886295337 Reviewed-on: http://gerrit.cloudera.org:8080/7567 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-04 21:51:19 +00:00
Thomas Tauber-Marshall	915a16345c	IMPALA-5125: SimplifyConditionalsRule incorrectly handles aggregates This patch addresses 3 issues: - SelectList.reset() didn't properly reset some of its members, though they're documented as needing to be reset. This was causing a crash when the Planner attempted to make an aggregation node for an agg function that had been eliminated by expr rewriting. While I'm here, I added resetting of all of SelectList's members that need to be reset, and fixed the documentation of one member that shouldn't be reset. - SimplifyConditionalsRule was changing the meaning of queries that contain agg functions, e.g. because "select if(true, 0, sum(id))" is not equivalent to "select 0". The fix is to not return the simplfied expr if it removes all aggregates. - ExprRewriteRulesTest was performing rewrites on the result exprs of the SelectStmt, which causes problems if the result exprs have been substituted. In normal query execution, we don't rewrite the result exprs anyway, so the fix is to match normal query execution and rewrite the select list exprs. Testing: - Added e2e test to exprs.test. - Added unit test to ExprRewriteRulesTest. Change-Id: Ic20b1621753980b47a612e0885804363b733f6da Reviewed-on: http://gerrit.cloudera.org:8080/6653 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins	2017-04-24 21:41:11 +00:00
Zach Amsden	0715a303ea	IMPALA-4729: Implement REPLACE() This turned out to be slightly non-trivial as REPLACE is already a keyword, and thus the parser needs to be tweaked to allow this, since function names act as bare identifiers. It was difficult to get this to match performance of regexp_replace. For expanding patterns, the fact that regexp_replace copies the expansion inline means that it may in fact win on large strings with sparse matches that are > dcache size apart. Let's leave optimizing that for later. Testing: Added a full test for maximum size strings and got most of the boundary conditions I could identify. Manually ran queries on TPC-H dataset in impala to verify both performance and correctness. Added large string and exprs.test test clauses and ran the tests to verify they work as expected. Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Reviewed-on: http://gerrit.cloudera.org:8080/5776 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-15 01:33:23 +00:00
Thomas Tauber-Marshall	3edc9099bc	IMPALA-4849: IllegalStateException from rewritten CASE expr In SelectList.reset(), we call reset() on each select list item's expr. reset() is supposed to remove implicit casts, by returning the reset expr with implicit cast exprs removed from the tree. Previously SelectList.reset() ignored the return value of the calls to Expr.reset(), meaning that if the top-most expr of the select list item is an implicit cast, it won't actually get removed, which causes problems with analysis since implicit casts are always treated as pre-analyzed. The solution is to set the select list item's exprs to the return value of reset(). Testing: - Added a regression test to exprs.test Change-Id: I16ff88716b185e1d72d2bc603a42bd06c60ec18e Reviewed-on: http://gerrit.cloudera.org:8080/5917 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-09 20:36:22 +00:00
Thomas Tauber-Marshall	4b486b0f90	IMPALA-1861: Simplify conditionals with constant conditions When there are conditionals with constant values of TRUE or FALSE we can simplify them during analysis using the ExprRewriter. This patch introduces the SimplifyConditionalsRule with covers IF, OR, AND, CASE, and DECODE. It also introduces NormalizeExprsRule which normalizes AND and OR such that if either child is a BoolLiteral, then the left child is a BoolLiteral. Testing: - Added unit tests to ExprRewriteRulesTest. - Added functional tests to expr.test - Ran FE planner tests and BE expr-test. Change-Id: Id70aaf9fd99f64bd98175b7e2dbba28f350e7d3b Reviewed-on: http://gerrit.cloudera.org:8080/5585 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-01-24 03:22:08 +00:00
Tim Armstrong	69859bddfb	IMPALA-4549: consistently treat 9999 as upper bound for timestamp year Previously Impala was inconsistent about whether the year 10000 was supported, as a result of inconsistency in boost, which reported the maximum year as 9999 but sometimes allowed 10000. This meant that Impala sometimes accepted the year 10000 and sometimes not. Use the patched boost version and update tests accordingly. Testing: Ran an exhaustive build. Change-Id: Iaf23b40833017789d879e5da7bb10384129e2d10 Reviewed-on: http://gerrit.cloudera.org:8080/5665 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-19 00:04:27 +00:00
Thomas Tauber-Marshall	89a3d3c1eb	IMPALA-4716: Expr rewrite causes IllegalStateException The DECODE constructor in CaseExpr uses the same decodeExpr object when building the BinaryPredicates that compare the decodeExpr to each 'when' of the DECODE. This causes problems when different BinaryPredicates try to cast the same decodeExpr object to different types during analysis, in this case leading to a Precondition check failure. The solution is to clone the decodeExpr in the DECODE constructor in CaseExpr for each generated BinaryPredicate. Testing: - Added a regression test to exprs.test Change-Id: I4de9ed7118c8d18ec3f02ff74c9cca211c716e51 Reviewed-on: http://gerrit.cloudera.org:8080/5631 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-13 04:44:07 +00:00
Marcel Kornacker	70ae2e38eb	IMPALA-4739: ExprRewriter fails on HAVING clauses The bug was that expr rewrite rules such as ExtractCommonConjunctRule analyzed their own output, which doesn't work for syntactic elements that allow column aliases, such as the HAVING clause. The fix was to remove the analysis step (the re-analysis happens anyway in AnalysisCtx). Change-Id: Ife74c61f549f620c42f74928f6474e8a5a7b7f00 Reviewed-on: http://gerrit.cloudera.org:8080/5662 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-12 02:31:44 +00:00
Alex Behm	534999382d	IMPALA-4574: Do not treat UUID() like a constant expr. A recent change (IMPALA-1788) lead UUID() to be constant folded, and therefore, produce the same value for every invocation across rows. Similar issues might also occur due to the BE optimizing UUID() during codegen of scalar-fn-call.h/cc. The fix is to not treat UUID() like a constant expr in both the FE and BE. Discussion: The fix in this patch is rather blunt, but minimally invasive to reduce the risk of adding new bugs. Ideally, the constness of an Expr should be determined in one place and the FE and BE should agree on which Exprs are constant. I considered the following alternatives but concluded they were too risky: 1. Pass a flag from FE to BE for ever Expr indicating its constness. This simple solution would populate a thrift field with the result of Expr.isConstant() for every Expr in an Expr tree. There are several issues. Calling isConstant() for every Expr in an Expr tree is rather expensive due to repeated traversals of the tree. That could be mitigated by populating an isConstant flag during Expr.analyze() to avoid re-computing the constness repeatedly. This requires changes to analyze(), clone(), reset(), and possibly other places for many Exprs. There is potential for missing a place and adding a new bug. 2. The above solution could be limited to only FunctionCallExpr. However, the BE expr type FUNCTION_CALL which maps to scalar-fn-call.h/cc is created from various FE Exprs, not just FunctionCallExpr. So adding a flag only to scalar-fn-call.h/cc would be confusing because it would only sometimes be set in a meaningful way. This seems more confusing than the current straightforward solution. Testing: Added FE and EE tests. Change-Id: If2499f5f6ecdcb098623202c8e6dc2d02727194a Reviewed-on: http://gerrit.cloudera.org:8080/5324 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-06 22:03:01 +00:00
Tim Armstrong	1495b2007d	IMPALA-4498: crash in to_utc_timestamp/from_utc_timestamp The bugs was that the functions did not check whether the conversion pushed the value out of range. The fix is to use boost's validation immediately to check the validity of the timestamp and catch any exceptions thrown. It would be preferable to avoid the exceptions, but Boost does not provide a straightforward way to disable the exceptions or extract potentially-invalid values from a date object. Testing: Added expression tests that exercise out-of-range cases. Also added additional tests to confirm that date addition and subtraction weren't affected by similar bugs. Change-Id: Idc427b06ac33ec874a05cb98d01c00e970d3dde6 Reviewed-on: http://gerrit.cloudera.org:8080/5251 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2016-12-05 23:33:44 +00:00
Lars Volker	0f62bf35fd	IMPALA-4550: Fix CastExpr analysis for substituted slots During slot substitution, the type of the child of a CastExpr can change. If the previous child type matched the CastExpr, then the cast was flagged as noOp_. During substitution and subsequent re-analysis the noOp_ flag was not revisited so that no cast was performed, even after it had become necessary. The fix is to always set noOp_ to the correct value in CastExpr.analyze(). Change-Id: I7f29cdc359558fad6df455b8eec0e0eaed00e996 Reviewed-on: http://gerrit.cloudera.org:8080/5267 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-30 11:19:36 +00:00
Alex Behm	bbf5255d0e	IMPALA-1788: Fold constant expressions. Adds a new ExprRewriteRule for replacing constant expressions with their literal equivalent via BE evaluation. Applies the new rule together with the existing ones on the parse tree, after analysis. Limitations - Constant folding is applied on the unresolved expressions. As a result, it only works for expressions that are constant within a single query block, as opposed to expressions that may become constant after fully substituting inline-view exprs. - Exprs are not normalized, so some opportunities for constant folding are missed for certain expr-tree shapes. This patch includes the following interesting changes: - Introduces a timestamp literal that can only be produced by constant folding (not expressible directly via SQL). - To make sure that rewrites have no user-visible effect, the original result types and column labels of the top-level statement are restored after the rewrites are performed. - Does not fold exprs if their evaluation resulted in a warning or error, or if the resulting value is not representable by corresponding FE LiteralExpr. - Fixes an existing issue with converting strings between the FE/BE. String produced in the BE that have characters with a value > 127 are not correctly deserialized into a Java String via thrift. We detect this case during constant folding and abandon folding of such exprs. - Fixes several issues with detecting/reporting errors in NativeEvalConstExprs(). - Cleans up ExprContext::GetValue() into ExprContext::GetConstantValue() which clarifies its only use of evaluating exprs from the FE. Testing: - Modifies expr-test.cc to run all tests through the constant folding path. - Adds basic planner and rewrite rule tests. - Exhaustive test run passed Change-Id: If672b703db1ba0bfc26e5b9130161798b40a69e9 Reviewed-on: http://gerrit.cloudera.org:8080/5109 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-23 21:11:30 +00:00
Tim Armstrong	10fa472fa6	IMPALA-4302,IMPALA-2379: constant expr arg fixes This patch fixes two issues around handling of constant expr args. The patches are combined because they touch some of the same code and depend on some of the same memory management cleanup. First, it fixes IMPALA-2379, where constant expr args were not visible to UDAFs. The issue is that the input exprs need to be opened before calling the UDAF Init() function. Second, it avoids overhead from repeated evaluation of constant arguments for ScalarFnCall expressions on both the codegen'd and interpreted paths. A common example is an IN predicate with a long list of constant values. The interpreted path was inefficient because it always evaluated all children expressions. Instead in this patch constant args are evaluated once and cached. The memory management of the AnyVal* objects was somewhat nebulous - adjusted it so that they're allocated from ExprContext::mem_pool_, which has the correct lifetime. The codegen'd path was inefficient only with varargs - with fixed arguments the LLVM optimiser is able to infer after inlining that the expressions are constant and remove all evaluation. However, for varargs it stores the vararg values into a heap-allocated buffer. The LLVM optimiser is unable to remove these stores because they have a side-effect that is visible to code outside the function. The codegen'd path is improved by evaluating varargs into an automatic buffer that can be optimised out. We also make a small related change to bake the string constants into the codegen'd code. Testing: Ran exhaustive build. Added regression test for IMPALA-2379 and MemPool test for aligned allocation. Added a test for in predicates with constant strings. Perf: Added a targeted query that demonstrates the improvement. Also manually validated the non-codegend perf. Also ran TPC-H and targeted perf queries locally - didn't see any significant changes. +--------------------+-------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ \| Workload \| Query \| File Format \| Avg(s) \| Base Avg(s) \| Delta(Avg) \| StdDev(%) \| Base StdDev(%) \| Num Clients \| Iters \| +--------------------+-------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ \| TARGETED-PERF(_20) \| primitive_filter_in_predicate \| parquet / none / none \| 1.19 \| 9.82 \| I -87.85% \| 3.82% \| 0.71% \| 1 \| 10 \| +--------------------+-------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ (I) Improvement: TARGETED-PERF(_20) primitive_filter_in_predicate [parquet / none / none] (9.82s -> 1.19s [-87.85%]) +--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+ \| Operator \| % of Query \| Avg \| Base Avg \| Delta(Avg) \| StdDev(%) \| Max \| Base Max \| Delta(Max) \| #Hosts \| #Rows \| Est #Rows \| +--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+ \| 01:AGGREGATE \| 14.39% \| 155.88ms \| 214.61ms \| -27.37% \| 2.68% \| 163.38ms \| 227.53ms \| -28.19% \| 1 \| 1 \| 1 \| \| 00:SCAN HDFS \| 85.60% \| 927.46ms \| 9.43s \| -90.16% \| 4.49% \| 1.01s \| 9.50s \| -89.42% \| 1 \| 13.77K \| 14.05K \| +--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+ Change-Id: I45c3ed8c9d7a61e94a9b9d6c316e8a53d9ff6c24 Reviewed-on: http://gerrit.cloudera.org:8080/4838 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-11-08 02:44:51 +00:00
Bharath Vissapragada	64c394827a	IMPALA-4196: Cross compile bit-byte-functions Change-Id: I5a1291bfd202b500405a884e4a62f0ca2447244a Reviewed-on: http://gerrit.cloudera.org:8080/4557 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Internal Jenkins	2016-10-01 01:42:21 +00:00
Jim Apple	20ef3b016e	IMPALA-4058: ByteSwap256 assumed memory was 16-byte aligned. This changes the code to use the lddqu and movdqu instructions (via Intel intrinsics) to allow unaligned memory access. Change-Id: I39b2b47bb717d5ac9727512a24fcf8a8a6a8dcc6 Reviewed-on: http://gerrit.cloudera.org:8080/4205 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-09-02 01:47:08 +00:00
Attila Jeges	211f60d831	IMPALA-1731,IMPALA-3868: Float values are not parsed correctly Fixed StringToFloatInternal() not to parse strings like "1.23inf" and "infinite" with leading/trailing garbage as Infinity. These strings are now rejected with PARSE_FAILURE. Only "inf" and "infinity" are accepted, parsing is case-insensitive. "NaN" values are handled similarly: strings with leading/trailing garbage like "nana" are rejected, parsing is case-insensitive. Other changes: - StringToFloatInternal() was cleaned up a bit. Parsing inf and NaN strings was moved out of the main loop. - Use std::numeric_limits<T>::infinity() instead of INFINITY macro and std::numeric_limits<T>::quiet_NaN() instead of NAN macro. - Fixed another minor bug: multiple dots are allowed when parsing float values (e.g. "2.1..6" is interpreted as 2.16). - New BE and E2E tests were added. Change-Id: I9e17d0f051b300a22a520ce34e276c2d4460d35e Reviewed-on: http://gerrit.cloudera.org:8080/3791 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2016-08-24 03:34:01 +00:00
Jim Apple	1c16dd0cf8	IMPALA-2107: Add Base64 encoder/decoder Change-Id: I911451c5d68e8ae9d352abfcf4d5ff36484f0bf3 Reviewed-on: http://gerrit.cloudera.org:8080/2633 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:32 -07:00
Skye Wanderman-Milne	5a81d2db88	IMPALA-2184: don't inline timestamp methods with try/catch blocks in IR We do not have exceptions enabled for codegen'd code, so exceptions thrown by functions called by codegen'd functions cannot be caught by the codegen'd functions. TimestampValue::UnixTimeToPtime() has a try/catch around boost::posix_time::ptime_from_tm(), but since it was inlined into the TimestampFunctions::FromUnix() IR the try/catch didn't work. This patch moves the UnixTimeToPtime() implementation to the .cc file so it doesn't get included in the IR. It does the same for TimestampParser::Parse() in case it gets inlined into IR code as well. Change-Id: Ic0af73629e1e3b6bf18cbf5d832973712b068527 Reviewed-on: http://gerrit.cloudera.org:8080/2210 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2016-02-19 00:03:23 -08:00
Jim Apple	1a3d7ffd4f	IMPALA-2147: Support IS [NOT] DISTINCT FROM and "<=>" predicates Enforces that the planner treats IS NOT DISTINCT FROM as eligible for hash joins, but does not find the minimum spanning tree of equivalences for use in optimizing query plans; this is left as future work. Change-Id: I62c5300b1fbd764796116f95efe36573eed4c8d0 Reviewed-on: http://gerrit.cloudera.org:8080/710 Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Internal Jenkins	2016-01-14 05:45:22 +00:00
Michael Ho	34a94c2503	IMPALA-2404: Implements built-in function regexp_match_count This patch implements a new built-in function regexp_match_count. This function returns the number of matching occurrences in input. The regexp_match_count() function has the following syntax: int = regexp_match_count(string input, string pattern) int = regexp_match_count(string input, string pattern, int start_pos, string flags) The input value specifies the string on which the regular expression is processed. The pattern value specifies the regular expression. The start_pos value specifies the character position at which to start the search for a match. It is set to 1 by default if it's not specified. The flags value (if specified) dictates the behavior of the regular expression matcher: m: Specifies that the input data might contain more than one line so that the '^' and the '$' matches should take that into account. i: Specifies that the regex matcher is case insensitive. c: Specifies that the regex matcher is case sensitive. n: Specifies that the '.' character matches newlines. By default, the flag value is set to 'c'. Note that the flags are consistent with other existing built-in functions (e.g. regexp_like) so certain flags in IBM netezza such as 's' are not supported to avoid confusion. Change-Id: Ib33ece0448f78e6a60bf215640f11b5049e47bb5 Reviewed-on: http://gerrit.cloudera.org:8080/1248 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-10-27 10:11:13 +00:00
Michael Ho	26690ff1cd	IMPALA-2204: Underscore in like predicate does not work for multi-line text This patch fixes the option passed to the RE2 regex matcher so that it will count the newline character '\n' as a valid candidate for '.'. Previously, that option was set to false by default, causing multi-line text to fail to match against patterns with wildcard characters in it. This patch also adds some tests to address these cases and fixes some typos in like-predicate.h. Change-Id: I25367623f587bf151e4c87cc7cb6aec3cd57e41a Reviewed-on: http://gerrit.cloudera.org:8080/1172 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2015-10-14 19:16:48 +00:00
Matthew Jacobs	16759d7989	IMPALA-2529: expr test case fails on non-partitioned HJ Failure due to an issue with NULL tuples (IMPALA-2375) where NULL tuples come from the right side of a left outer join where the right side comes from an inline view which produces 0 slots (e.g. the view selects a constant). The HJ doesn't handle them correctly because the planner inserts an IsTupleNull expr. This isn't an issue for the PHJ because the BufferedTupleStream returns non-NULL Tuple* ptrs even for tuples with no slots. Per IMPALA-2375, we're going to address this after 2.3, so moving this test case into joins-partitioned.test which only runs on the PHJ. Change-Id: I64cb7e8ffd60f3379aa8860135db5af8e66d686f Reviewed-on: http://gerrit.cloudera.org:8080/1231 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2015-10-12 14:41:05 -07:00
Skye Wanderman-Milne	cfe1e38d6e	IMPALA-2495: make Expr::IsConstant() recurse on children Before, Expr::IsConstant() manually specified the constant Expr classes, but TupleIsNullPredicate and AnalyticExpr overrode IsConstant() to always return false (which Expr::IsConstant() didn't specify). This meant that unless the TupleIsNullPredicate was the root expr, TupleIsNullPredicate::IsConstant() would never be called and Expr::IsConstant() would return true. This patch changes Expr::IsConstant() to recurse on its children, rather than having it contain the constant logic for all expr types. Change-Id: I756eb945e04c791eff39c33305fe78d957ec29f4 Reviewed-on: http://gerrit.cloudera.org:8080/1214 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-10-09 16:47:46 -07:00
Sailesh Mukil	0d46129458	IMPALA-1746: QueryExecState doesn't check for query cancellation or errors QueryExecState::FetchRowsInternal() doesn't check the query state after evaluating the select statement expressions with GetRowValue(). These means that, e.g., UDFs that call SetError() in the select list will not fail the query. Change-Id: I120d7abbee2a3ed5c5c66ec0a3a9b6e9a6ab10bf Reviewed-on: http://gerrit.cloudera.org:8080/815 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2015-09-22 10:58:33 -07:00
Henry Robinson	8809567e82	IMPALA-2290: Fix btrim() thread-safety. By not using THREAD_LOCAL for its state, btrim() invocations in multi-threaded contexts (i.e. pushed to the scanner) would have threads trampling over each other's bitset used to check for trimmed characters. Testing: See new test in expr.test: select count(*) from functional.alltpyes where btrim(string_col, string_col) != "" .. should give 0 results, but would give > 0 with this bug. Change-Id: I595e25b1d4fb7c76b846fce837b4ec140f47d43c Reviewed-on: http://gerrit.cloudera.org:8080/748 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Henry Robinson <henry@cloudera.com>	2015-09-09 04:15:30 +00:00
aacalfa	5e733e8d62	IMPALA-2190: Complete conversion functions between timestamp, unixtime, and string dates Change-Id: I48a446f19c7634477f175d0defa8779dd70a392f Reviewed-on: http://gerrit.cloudera.org:8080/654 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-09-07 07:07:20 +00:00
Juan Yu	c66785be4a	IMPALA-2227: S3:query_test.test_queries.TestQueries.test_exprs failure Use select query instead of insert query to verify constant expression on partition column. Change-Id: I442111225e8df29bcc5fe89500d023559bb1c1fb Reviewed-on: http://gerrit.cloudera.org:8080/707 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-08-29 00:40:41 +00:00
Juan Yu	d42ecb310a	IMPALA-1756: Add test case for partition insert query Change-Id: I4879d8fe7221b551898fa9fa94076bb9b0804f06 Reviewed-on: http://gerrit.cloudera.org:8080/696 Reviewed-by: Juan Yu <jyu@cloudera.com> Tested-by: Internal Jenkins	2015-08-27 18:50:58 +00:00
Sailesh Mukil	1a9fc47295	IMPALA-2227: S3: query_test.test_queries.TestQueries.test_exprs failure The test file testdata/workloads/functional-query/queries/QueryTest/exprs.test had INSERT statements in it, which are not supported on S3. This commit gets rid of those statements and rewrites them with SELECT [...] FROM VALUES(...) so that the tests are compatible on S3. Change-Id: I25faacf9fae3780f627afee86dc8c1ede7f6e2a2 Reviewed-on: http://gerrit.cloudera.org:8080/670 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2015-08-26 00:36:51 +00:00
Sailesh Mukil	1c46cab5c6	IMPALA-2084: SPLIT_PART and REGEXP_LIKE functions for Tableau pushdown Added the SPLIT_PART and the REGEXP_LIKE builtin functions and tests for both. The REGEXP_LIKE has an optional third parameter which if used, uses a different 'prepare' function (RegexpLikePrepare in like-predicate.cc) so that the appropriate options can be set in the RE2 library. Added a patch for the RE2 library so that the 'dot matches all' option is exposed via the RE2 class. Fixed a bug in the case when the function to be evaluated for the WHERE clause operates on constants, proper cleanup isn't guaranteed on certain edge cases. Change-Id: Ia2a8de9eeb2854100a2d949f612cfaba317c5a7b Reviewed-on: http://gerrit.cloudera.org:8080/501 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2015-08-18 09:07:34 +00:00
Casey Ching	cf60967b7e	IMPALA-1675: Avoid overflow when adding large intervals to TIMESTAMPs It turns out there is a variety of cases where boost incorrectly adds intervals if the interval is at (or beyond) an edge case value. This change defines a max interval and returns NULL if the user supplies an interval beyond the max. Change-Id: I4fb6869be22ab06089b66eeffaea04b0c0880080 Reviewed-on: http://gerrit.cloudera.org:8080/492 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-08-16 12:09:24 +00:00
Tim Armstrong	e151ebaa71	IMPALA-1001: Bit and byte manipulation functions Bit and byte functions for compatibility with Teradata: bitand, bitor, bitxor, bitnot, countset, getbit, setbit, shiftleft, shiftright, rotateleft, rotateright. Interfaces and behavior follow Teradata documentation. All bit* functions are compatible with DB2. bitand only is compatible with Oracle. Change-Id: Idba3fb7beb029de493b602e6279aa68e32688df3	2015-07-28 08:11:01 -07:00
Sailesh Mukil	8a01527bad	IMPALA-2141: UnionNode::GetNext() doesn't check for query errors When a UDF with constant parameters in the select list calls SetError(), it does not fail the query. This is because UnionNode::GetNext() does not check for errors after UnionNode::EvalAndMaterializeExprs() evaluates the expression, which itself does not report the error. Change-Id: I8850cf1a603e320bb23f4a9a4d47600d14590f3a	2015-07-27 22:09:19 -07:00
Tim Armstrong	822cb8f5e2	IMPALA-1660: Netezza compatibility - factorial Implements suffix n! operator for factorial and factorial function. Slightly refactor operators in fe to share code between unary operators. Based partially on work by Arthur Peng <arthur.peng@intel.com>. Change-Id: I71b6c824c59fc5305f16b8c4457805126a1da93b Reviewed-on: http://gerrit.cloudera.org:8080/531 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-07-27 19:03:48 +00:00
Sailesh Mukil	c21c080a46	IMPALA-1756: Constant expressions not checked for errors, no state cleanup on exception. Changed the way the function context error message is returned. Also, changed the exception thrown in SingleNodePlanner from IllegalStateException to AnalysisException in case of an exception in registerConjuncts(). This commit follows from: `d497ba6cef` This is a new commit since the previous one was closed before making these changes. Change-Id: Ifa9b7c0884d76b6d7911d8cd80355a8ba13c4c18 Reviewed-on: http://gerrit.cloudera.org:8080/560 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-07-24 19:04:38 +00:00
Sailesh Mukil	6d7bb76e87	IMPALA-1756: Constant filter expressions are not checked for errors and state cleanup is not done before throwing exception. When a builtin has an error (in the constant case), it is checked for but the state cleanup isn't taken care of which results in a DCHECK. When a UDF has an error (in the constant case), the error does not propagate back up the stack due to a lack of error checking in ScalarFnCall::Open() after it calls GetConstVal(). Change-Id: Ib500c84a41df574690369f124044991ed8c82cc1 Reviewed-on: http://gerrit.cloudera.org:8080/537 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2015-07-21 04:01:39 +00:00
Casey Ching	a6d534682b	IMPALA-2086, IMPALA-2090: Avoid boost year/month interval logic Boost handles a couple of edge cases differently than other databases such as Postgres and MySQL when adding year/month intervals to timestamps. This change makes Impala consistent for the other databases. The performance difference was not noticeable (<5% if any). Change-Id: Icb02a06281b53753938cab88e0d28f20709fee06 Reviewed-on: http://gerrit.cloudera.org:8080/489 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-07-20 10:16:54 +00:00
Skye Wanderman-Milne	7801aa499f	Use codegen to inject runtime constants in exprs This patch introduces the function GetConstant(), which is used by expr compute function and UDFs to access query constants. There is a corresponding GetIrConstant() function that returns the IR versions of the same constants. Currently the only implemented constants are the expr's return type and argument types, but other constants can be easily be added to these functions. Interpreted expr functions run normally, but cross-compiled functions can be passed to InlineConstants(), which looks for calls to GetConstant() and replaces them with the result of calling GetIrConstant(). I used this technique in the decimal functions that previously were not switching on the type at all. The performance of LeastGreatest() after this patch is the same as it was before it switched on the type. Change-Id: I8b55744551830d894318a7bab6b6f045fb8bed41 Reviewed-on: http://gerrit.cloudera.org:8080/352 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-05-15 02:24:04 +00:00
Dimitris Tsirogiannis	d8e5bbe2da	IMPALA-1949: Analysis exception when a binary operator contain an IN operator with values This commit fixes an issue where a query is not successfully analyzed if an IN operator with values appears in a binary predicate. Change-Id: Ia3b83803a553b9a3b3489382fc53978a720c4b4f Reviewed-on: http://gerrit.cloudera.org:8080/334 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-04-14 03:54:33 +00:00
Skye Wanderman-Milne	9d6586cdb8	Addendum to IMPALA-1755 patch This patch introduces SetLookup functionality for timestamp and decimal types, as well addressing remaining code review comments. Change-Id: Ied40d2d55adbdea891ff2ab97b30f0d3986645f9 Reviewed-on: http://gerrit.cloudera.org:8080/245 Tested-by: Internal Jenkins Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>	2015-03-20 14:37:23 -07:00
Skye Wanderman-Milne	5118c55a0a	IMPALA-1810: IN predicate was not comparing DecimalVals correctly The IN predicate wasn't using the decimal type when comparing decimal values. I benchmarked this on a modified version of TPCDS-Q8 (i.e. a query with a huge decimal IN predicate) and there is a ~5% performance degradation with codegen enabled (surprisingly, there appears to be a slight performance gain with codegen disabled). We should be able to remove this penalty when we add constant injection via codegen. Change-Id: Ie1296fd50c68d06a343701442da49fe8d3cd16dd Reviewed-on: http://gerrit.cloudera.org:8080/230 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-03-20 14:37:18 -07:00
casey	dbc504fad1	IMPALA-1579: UNIX_TIMESTAMP() should return BIGINTs instead of INTs This should fix the last y2k38 problem. Previously calling unix_timestamp() with a input of '2038-01-19 03:14:08' or later would return a negative value due to a 32 bit int overflow. This patch switches from 32 to 64 bit ints. Change-Id: Ic9180887d6c828f6ecd25435be86fd0bd52d3f0d Reviewed-on: http://gerrit.cloudera.org:8080/61 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-02-16 00:59:34 +00:00
Alex Behm	f696861c5c	Throw error on unrecognized test sections. Our .test file parser used to not abort tests when there is a malformed test/section. This patch changes that behavior to report an error and treat the test as failed. Quite a few tests were not well-formed, and were not executed as a result. This patch fixes those tests. Arguably, the test file parser should be more flexible in which places to accept comments, but this patch does not address that problem. Change-Id: If53358eb0cb958b68e51940b071e64c1d6c3ec6f Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5468 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-12-02 18:08:09 -08:00

1 2

80 Commits