impala

mirror of https://github.com/apache/impala.git synced 2026-01-04 00:00:56 -05:00

Author	SHA1	Message	Date
Sailesh Mukil	1c46cab5c6	IMPALA-2084: SPLIT_PART and REGEXP_LIKE functions for Tableau pushdown Added the SPLIT_PART and the REGEXP_LIKE builtin functions and tests for both. The REGEXP_LIKE has an optional third parameter which if used, uses a different 'prepare' function (RegexpLikePrepare in like-predicate.cc) so that the appropriate options can be set in the RE2 library. Added a patch for the RE2 library so that the 'dot matches all' option is exposed via the RE2 class. Fixed a bug in the case when the function to be evaluated for the WHERE clause operates on constants, proper cleanup isn't guaranteed on certain edge cases. Change-Id: Ia2a8de9eeb2854100a2d949f612cfaba317c5a7b Reviewed-on: http://gerrit.cloudera.org:8080/501 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2015-08-18 09:07:34 +00:00
Casey Ching	cf60967b7e	IMPALA-1675: Avoid overflow when adding large intervals to TIMESTAMPs It turns out there is a variety of cases where boost incorrectly adds intervals if the interval is at (or beyond) an edge case value. This change defines a max interval and returns NULL if the user supplies an interval beyond the max. Change-Id: I4fb6869be22ab06089b66eeffaea04b0c0880080 Reviewed-on: http://gerrit.cloudera.org:8080/492 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-08-16 12:09:24 +00:00
Tim Armstrong	e151ebaa71	IMPALA-1001: Bit and byte manipulation functions Bit and byte functions for compatibility with Teradata: bitand, bitor, bitxor, bitnot, countset, getbit, setbit, shiftleft, shiftright, rotateleft, rotateright. Interfaces and behavior follow Teradata documentation. All bit* functions are compatible with DB2. bitand only is compatible with Oracle. Change-Id: Idba3fb7beb029de493b602e6279aa68e32688df3	2015-07-28 08:11:01 -07:00
Sailesh Mukil	8a01527bad	IMPALA-2141: UnionNode::GetNext() doesn't check for query errors When a UDF with constant parameters in the select list calls SetError(), it does not fail the query. This is because UnionNode::GetNext() does not check for errors after UnionNode::EvalAndMaterializeExprs() evaluates the expression, which itself does not report the error. Change-Id: I8850cf1a603e320bb23f4a9a4d47600d14590f3a	2015-07-27 22:09:19 -07:00
Tim Armstrong	822cb8f5e2	IMPALA-1660: Netezza compatibility - factorial Implements suffix n! operator for factorial and factorial function. Slightly refactor operators in fe to share code between unary operators. Based partially on work by Arthur Peng <arthur.peng@intel.com>. Change-Id: I71b6c824c59fc5305f16b8c4457805126a1da93b Reviewed-on: http://gerrit.cloudera.org:8080/531 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-07-27 19:03:48 +00:00
Sailesh Mukil	c21c080a46	IMPALA-1756: Constant expressions not checked for errors, no state cleanup on exception. Changed the way the function context error message is returned. Also, changed the exception thrown in SingleNodePlanner from IllegalStateException to AnalysisException in case of an exception in registerConjuncts(). This commit follows from: `d497ba6cef` This is a new commit since the previous one was closed before making these changes. Change-Id: Ifa9b7c0884d76b6d7911d8cd80355a8ba13c4c18 Reviewed-on: http://gerrit.cloudera.org:8080/560 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-07-24 19:04:38 +00:00
Sailesh Mukil	6d7bb76e87	IMPALA-1756: Constant filter expressions are not checked for errors and state cleanup is not done before throwing exception. When a builtin has an error (in the constant case), it is checked for but the state cleanup isn't taken care of which results in a DCHECK. When a UDF has an error (in the constant case), the error does not propagate back up the stack due to a lack of error checking in ScalarFnCall::Open() after it calls GetConstVal(). Change-Id: Ib500c84a41df574690369f124044991ed8c82cc1 Reviewed-on: http://gerrit.cloudera.org:8080/537 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2015-07-21 04:01:39 +00:00
Casey Ching	a6d534682b	IMPALA-2086, IMPALA-2090: Avoid boost year/month interval logic Boost handles a couple of edge cases differently than other databases such as Postgres and MySQL when adding year/month intervals to timestamps. This change makes Impala consistent for the other databases. The performance difference was not noticeable (<5% if any). Change-Id: Icb02a06281b53753938cab88e0d28f20709fee06 Reviewed-on: http://gerrit.cloudera.org:8080/489 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-07-20 10:16:54 +00:00
Skye Wanderman-Milne	7801aa499f	Use codegen to inject runtime constants in exprs This patch introduces the function GetConstant(), which is used by expr compute function and UDFs to access query constants. There is a corresponding GetIrConstant() function that returns the IR versions of the same constants. Currently the only implemented constants are the expr's return type and argument types, but other constants can be easily be added to these functions. Interpreted expr functions run normally, but cross-compiled functions can be passed to InlineConstants(), which looks for calls to GetConstant() and replaces them with the result of calling GetIrConstant(). I used this technique in the decimal functions that previously were not switching on the type at all. The performance of LeastGreatest() after this patch is the same as it was before it switched on the type. Change-Id: I8b55744551830d894318a7bab6b6f045fb8bed41 Reviewed-on: http://gerrit.cloudera.org:8080/352 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-05-15 02:24:04 +00:00
Dimitris Tsirogiannis	d8e5bbe2da	IMPALA-1949: Analysis exception when a binary operator contain an IN operator with values This commit fixes an issue where a query is not successfully analyzed if an IN operator with values appears in a binary predicate. Change-Id: Ia3b83803a553b9a3b3489382fc53978a720c4b4f Reviewed-on: http://gerrit.cloudera.org:8080/334 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-04-14 03:54:33 +00:00
Skye Wanderman-Milne	9d6586cdb8	Addendum to IMPALA-1755 patch This patch introduces SetLookup functionality for timestamp and decimal types, as well addressing remaining code review comments. Change-Id: Ied40d2d55adbdea891ff2ab97b30f0d3986645f9 Reviewed-on: http://gerrit.cloudera.org:8080/245 Tested-by: Internal Jenkins Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>	2015-03-20 14:37:23 -07:00
Skye Wanderman-Milne	5118c55a0a	IMPALA-1810: IN predicate was not comparing DecimalVals correctly The IN predicate wasn't using the decimal type when comparing decimal values. I benchmarked this on a modified version of TPCDS-Q8 (i.e. a query with a huge decimal IN predicate) and there is a ~5% performance degradation with codegen enabled (surprisingly, there appears to be a slight performance gain with codegen disabled). We should be able to remove this penalty when we add constant injection via codegen. Change-Id: Ie1296fd50c68d06a343701442da49fe8d3cd16dd Reviewed-on: http://gerrit.cloudera.org:8080/230 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-03-20 14:37:18 -07:00
casey	dbc504fad1	IMPALA-1579: UNIX_TIMESTAMP() should return BIGINTs instead of INTs This should fix the last y2k38 problem. Previously calling unix_timestamp() with a input of '2038-01-19 03:14:08' or later would return a negative value due to a 32 bit int overflow. This patch switches from 32 to 64 bit ints. Change-Id: Ic9180887d6c828f6ecd25435be86fd0bd52d3f0d Reviewed-on: http://gerrit.cloudera.org:8080/61 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-02-16 00:59:34 +00:00
Alex Behm	f696861c5c	Throw error on unrecognized test sections. Our .test file parser used to not abort tests when there is a malformed test/section. This patch changes that behavior to report an error and treat the test as failed. Quite a few tests were not well-formed, and were not executed as a result. This patch fixes those tests. Arguably, the test file parser should be more flexible in which places to accept comments, but this patch does not address that problem. Change-Id: If53358eb0cb958b68e51940b071e64c1d6c3ec6f Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5468 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-12-02 18:08:09 -08:00
Skye Wanderman-Milne	2bfb69523f	IMPALA-1508: don't JIT TimestampFunctions::DateAddSub For some reason, the try/catch added to fix IMPALA-1493 doesn't work when we JIT the function. Fixing this in the JIT'd code will take some time, so for now just don't JIT the function. Change-Id: I7b2801027db0a9deb19b477c1a4ca0bdad77a825 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5383 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-11-23 21:36:03 -08:00
Nong Li	e2d7fb6402	Some test case cleanup. Change-Id: Ic29b7c1f5fd714a1e2cc41bf0e55c0d11c782862 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4791 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5090 Reviewed-by: Nong Li <nong@cloudera.com>	2014-11-03 22:33:08 -08:00
Martin Grund	6e0c1c26c9	IMPALA-1424: abs() function retains input type This patch modifies the abs() built-in function so that it retains the type of the input argument for the return type in the same way as Postgres does. Change-Id: I1750237b85bedbc3ce9d52330ac4d458b0aada3a Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4980 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: jenkins (cherry picked from commit 424b359ab0a4f621f2865844c3293f2c80e0867f) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4996	2014-10-28 08:07:21 -07:00
casey	9a72c28832	Add DECODE builtin This adds DECODE functionality into the existing CaseExpr class. There will be no separate backend impementation for DECODE, it will be sent to the backend as a CASE expr so the existing codegen function can be used. Because Oracle does cast checking during execution and Impala cast checking during analysis, some uses of DECODE that are valid in Oracle are invalid in Impala. Ex: SELECT DECODE(foo, bar, int_col, baz, string_col_containing_only_ints) FROM ... would be run on Oracle. If string_col_containing_only_ints actually contained non-INTs, an error would be thrown during execution and no results would be returned. In Impala an error is thrown during analysis. If a CAST was added to the STRING column, a cast failure would result in NULL. Change-Id: Ia08cc2389abb6f843bba117e7091c659ad25ff41 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4334 Tested-by: jenkins Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Casey Ching <casey@cloudera.com>	2014-09-26 12:26:46 -07:00
Skye Wanderman-Milne	559b83d3d0	Expr refactoring This patch changes the interface for evaluating expressions, in order to allow for thread-safe expression evaluations and easier codegen. Thread safety is achieved via the ExprContext class, a light-weight container for expression tree evaluation state. Codegen is easier because more expressions can be cross-compiled to IR. See expr.h and expr-context.h for an overview of the API changes. See sort-exec-exprs.cc for a simple example of the new interface and hdfs-scanner.cc for a more complicated example. This patch has not been completely code reviewed and may need further cleanup/stylistic work, as well as additional perf work. Change-Id: I3e3baf14ebffd2687533d0cc01a6fb8ac4def849 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3459 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-08-17 12:44:44 -07:00
Skye Wanderman-Milne	7a0cc27fd1	Convert math functions to the UDF interface. Also adds FunctionContext::GetNumArgs() method to the public UDF API. Change-Id: I76e21814e423f075a0a22b4e924c1d3ec26daba7 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3410 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-08-17 12:44:32 -07:00
Paden Tomasello	67d23c2d4b	Modified Case expression tests in exprs.test Change-Id: I65cee2e14291db8bf14a428715b08dac475b863a Reviewed-on: http://gerrit.ent.cloudera.com:8080/3485 Reviewed-by: Paden Tomasello <paden.tomasello@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3601	2014-07-24 12:34:02 -07:00
Paden Tomasello	3d173e65d2	Adding Codegen function and tests for CASE expressions. Change-Id: Ib52b3e3f12b35e2c0a60ef94501c20ef83abdfe5 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3187 Reviewed-by: Paden Tomasello <paden.tomasello@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3498	2014-07-18 12:03:58 -07:00
Dimitris Tsirogiannis	5a6f53db16	Add partition pruning tests The following changes are included in this commit: 1. Modified the alltypesagg table to include an additional partition key that has nulls. 2. Added a number of tests in hdfs.test that exercise the partition pruning logic (see IMPALA-887). 3. Modified all the tests that are affected by the change in alltypesagg. Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236	2014-06-24 02:14:27 -07:00
Nong Li	8f4dc0f2f0	IMPALA-974: Switch from FloatLiteral to DecimalLiteral. Float/Doubles are lossy so using those as the default literal type is problematic. Change-Id: I5a619dd931d576e2e6cd7774139e9bafb9452db9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2758 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-05-31 22:19:06 -07:00
Victor Bittorf	c13a1d080e	IMPALA-938: Fix implicit casting in timestamp arithmetic exprs. Change-Id: I7e875ec2251e9782c98b60195ecbc92258b63b5c Reviewed-on: http://gerrit.ent.cloudera.com:8080/2657 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins (cherry picked from commit 8822401dbb65d9b4d996d5bb78ac3aca1aa2dbac) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2671	2014-05-23 14:11:35 -07:00
Victor Bittorf	0bb66ef327	Adding aliases ADD_MONTHS and SUB_MONTHS This is a request for consistency with oracle. Change-Id: I463a66694a068cd773532d8f6f853a4b089b918a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2400 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins (cherry picked from commit 1f0b643789596f96c54580b8c5262fada4dfc958) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2502	2014-05-09 17:35:29 -07:00
Victor Bittorf	46151dc7dd	Adding EXTRACT builtin. Change-Id: I6de20f336ecdfa3acd8d3a9166cff4a062baaacc Reviewed-on: http://gerrit.ent.cloudera.com:8080/2247 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins (cherry picked from commit f233955020ffbd1023f2d6adbbfb22e267986305) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2370	2014-04-25 15:38:51 -07:00
Matthew Jacobs	967346b0c4	IMPALA-630: Add fn to get the PID of the impalad to which the user is connected Change-Id: I2d8b304bfb22883489bbbbe33e07478d164583b9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1127 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-01-08 10:54:37 -08:00
Chris Channing	7e98708d7d	IMPALA-114: Add support for custom date/time formats This change set adds support for dealing with custom date/time formats in Impala. The following date/time tokens are supported: y – Year M – Month d – Day H – Hour m – Minute s – second S – Fractional second The token names and usage have been modeled on the SimpleDateFormat class used in Java. This allows the use of repeating tokens to indicate zero padding for an output scenario (TS -> String) and a guide for reading data to a given length in a parsing scenario. Representing literals months is achieved by specifying three repeating tokens e.g. yyyy-MMM-dd -> 2013-Nov-21. Formatting character groups can appear in any order along with any separators e.g. yyyy/MM/dd dd-MMM-yy (dd)(MM)(yyyy) HH:mm:sss ..etc.. The following features are not supported with this patch: - Long literal months e.g. MMMM - Nested strings e.g. “Year: “ yyyy “Month: “ mm “Day: “ dd - Lazy formatting Change-Id: Ibba2eaed366fd736b921b31b8d0d517ac1248bca Reviewed-on: http://gerrit.ent.cloudera.com:8080/1001 Reviewed-by: Christopher Channing <cchanning@cloudera.com> Tested-by: Christopher Channing <cchanning@cloudera.com>	2014-01-08 10:54:34 -08:00
Alex Behm	dd0409e9d6	IMPALA-509: Minimal type promotion for arithmetic exprs. Change-Id: I576fe9baf3bae7d46ee08e29ececc4adda97e9df Reviewed-on: http://gerrit.ent.cloudera.com:8080/1078 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:54:30 -08:00
Alex Behm	1497002013	Added SHOW TABLE/COLUMN STATS command. Fixed the following stats-related bugs: - Per-partition row count was not distributed properly via CatalogService - HBase column stats were not loaded and distributed properly Enhancements to test framework: - Allow regex specification of expected row or column values - Fixed expected results of some tests because the test framework did not catch that they were incorrect Change-Id: I1fa8e710bbcf0ddb62b961fdd26ecd9ce7b75d51 Reviewed-on: http://gerrit.ent.cloudera.com:8080/813 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:51 -08:00
Alex Behm	33000b8c15	Fixed codegen of floating-point modulo. Change-Id: Idd28c6a71a659471aa632a6e26d970557daeb3bf Reviewed-on: http://gerrit.ent.cloudera.com:8080/385 Tested-by: jenkins Reviewed-by: Marcel Kornacker <marcel@cloudera.com>	2014-01-08 10:52:46 -08:00
Nong Li	ce092065be	Fix bug with how exec sets if the conjuncts are thread safe.	2014-01-08 10:50:53 -08:00
Skye Wanderman-Milne	0c343913fa	IMPALA-266: Round() does not output the right precision	2014-01-08 10:50:02 -08:00
Skye Wanderman-Milne	04bee45af5	Update query test to use dayofyear()	2014-01-08 10:49:47 -08:00
Alex Behm	1b2e8280d4	Fix NULL issues.	2014-01-08 10:49:32 -08:00
Lenni Kuff	5f81becd84	Create tables used by insert tests in a supported insert format	2014-01-08 10:49:00 -08:00
Henry Robinson	8d87972695	Improve parser coverage This patch adds support for the following SQL constructs - Unary + operator - The ALL keyword, in SELECT ALL and SELECT aggregate_func(ALL ) - REAL and INTEGER as type synonyms for DOUBLE and INT respectively - The AS keyword after a table spec. e.g. SELECT FROM tbl AS t0	2014-01-08 10:48:54 -08:00
ishaan	09d6d931f4	Change the way data is loaded	2014-01-08 10:48:09 -08:00
Lenni Kuff	ef48f65e76	Add test framework for running Impala query tests via Python This is the first set of changes required to start getting our functional test infrastructure moved from JUnit to Python. After investigating a number of option, I decided to go with a python test executor named py.test (http://pytest.org/). It is very flexible, open source (MIT licensed), and will enable us to do some cool things like parallel test execution. As part of this change, we now use our "test vectors" for query test execution. This will be very nice because it means if load the "core" dataset you know you will be able to run the "core" query tests (specified by --exploration_strategy when running the tests). You will see that now each combination of table format + query exec options is treated like an individual test case. this will make it much easier to debug exactly where something failed. These new tests can be run using the script at tests/run-tests.sh	2014-01-08 10:46:50 -08:00
Marcel Kornacker	ea050a43ad	Switching over backend runtime structures to new planner. Added container-util.h	2014-01-08 10:46:20 -08:00
Nong Li	4d0319d32b	Fix null string parsing.	2014-01-08 10:44:40 -08:00
Alexander Behm	ee705e3083	Added timestamp arithmetic expressions.	2014-01-08 10:44:31 -08:00
Lenni Kuff	04edc8f534	Update benchmark tests to run against generic workload, data loading with scale factor, +more This change updates the run-benchmark script to enable it to target one or more workloads. Now benchmarks can be run like: ./run-benchmark --workloads=hive-benchmark,tpch We lookup the workload in the workloads directory, then read the associated query .test files and start executing them. To ensure the queries are not duplicated between benchmark and query tests, I moved all existing queries (under fe/src/test/resources/* to the workloads directory. You do NOT need to look through all the .test files, I've just moved them. The one new file is the 'hive-benchmark.test' which contains the hive benchmark queries. Also added support for generating schema for different scale factors as well as executing against these scale factors. For example, let's say we have a dataset with a scale factor called "SF1". We would first generate the schema using: ./generate_schema_statements --workload=<workload> --scale_factor="SF3" This will create tables with a unique names from the other scale factors. Run the generated .sql file to load the data. Alternatively, the data can loaded by running a new python script: ./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor] For example: load-data.sh -w tpch -e core -s SF3 Then run against this: ./run-benchmark --workloads=<workload> --scale_factor=SF3 This changeset also includes a few other minor tweaks to some of the test scripts. Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6	2014-01-08 10:44:22 -08:00

44 Commits