156 Commits

Author SHA1 Message Date
Taras Bobrovytsky
8fec1911e5 IMPALA-6230, IMPALA-6468: Fix the output type of round() and related fns
Before this patch, the output type of round() ceil() floor() trunc() was
not always the same as the input type. It was also inconsistent in
general. For example, round(double) returned an integer, but
round(double, int) returned a double.

After looking at other database systems, we decided that the guideline
should be that the output type should be the same as the input type. In
this patch, we change the behavior of the previously mentioned functions
so that if a double is given then a double is returned.

We also modify the rounding behavior to always round away from zero.
Before, we were rounding towards positive infinity in some cases.

Testinging:
- Updated tests
- Ran an exhaustive build which passed.

Cherry-picks: not for 2.x

Change-Id: I77541678012edab70b182378b11ca8753be53f97
Reviewed-on: http://gerrit.cloudera.org:8080/9346
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2018-03-24 04:43:01 +00:00
Greg Rahn
d91df9b63f IMPALA-6537: Add missing ODBC scalar functions
This patch contains the following builtin function changes:

New aliases for existing functions:
- LEFT() same as STRLEFT()
- RIGHT() same as STRRIGHT()
- WEEK() same as WEEKOFYEAR()

New functions:
- QUARTER()
- MONTHNAME()

Refactors:
- Remove TimestampFunctions::DayName and add LongDayName to match pattern of
  TimestampFunctions::ShortDayName

Additionally, it adds the unit of QUARTER to EXTRACT() and DATE_PART()

Testing:
- manual testing comparing the translated ODBC functions to the
  non-translated ones
- added at least one new expr-test for aliases
- new expr-tests added for new functions

Change-Id: Ia60af2b4de8c098be7ecb3e60840e459ae10d499
Reviewed-on: http://gerrit.cloudera.org:8080/9376
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2018-02-23 07:19:07 +00:00
Jinchul
1b1087eb05 IMPALA-3282: Adds regexp_escape built-in function
Escapes the following special characters in RE2 library:
.\+*?[^]$(){}=!<>|:-

Testing:
Add some unit tests into ExprTest.StringRegexpFunctions
Add some E2E tests into exprs.test

Change-Id: I84c3e0ded26f6eb20794c38b75be9b25cd111e4b
Reviewed-on: http://gerrit.cloudera.org:8080/8900
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-02-01 05:14:14 +00:00
Zoram Thanga
c2d27ca823 IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of
characters.

This patch generalizes ltrim()/rtrim() functions to accept a second
argument that specifies the set of characters to be removed from the
leading/trailing end of the target string:

ltrim(string text[, characters text])
rtrim(string text[, characters text])

A common string trimming method has been added to StringFunctions,
which is called from the general ltrim/rtrim/btrim functions. The
functions also share prepare and close operations.

New StringFunctions tests have been added to ExprTest for the new
forms of ltrim() and rtrim(). New tests to cover handling of special
characters have also been added.

Note that our string handling functions only work with the ASCII
character set. Handling of other character sets lies outside the
scope of this patch.

The existing ltrim()/rtrim()/trim() functions that take only one
argument have been updated to use the more general methods.

Testing: Queries like the following were run on a 1.5-billion row
tpch_parquet.lineitem table, with the old and new implementations
to ensure there is no performance regression:

  1. select count(trim(l_shipinstruct)), count(trim(l_returnflag)), ...
  2. select count(*) from t where trim(l_shipinstruct) = '' and ...

Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Reviewed-on: http://gerrit.cloudera.org:8080/8349
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-23 23:44:46 +00:00
Jinchul
6041865031 IMPALA-3651: Adds murmur_hash() built-in function
murmur_hash relys on HashUtil::MurmurHash2_64 which MurmurHash2 64-bit
version.

Testing:
Add unit tests for primitive types: ExprTest.MurmurHashFunction
Add E2E tests into exprs.test

Change-Id: I14d56ffb8fab256f3f66a2669271fd4b3c50cc29
Reviewed-on: http://gerrit.cloudera.org:8080/8893
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-10 20:17:26 +00:00
Zachary Amsden
f53ce3b16d IMPALA-4513: Promote integer types for ABS()
The internal representation of the most negative number
in two's complement requires 1 more bit to represent the
positive version.  This means ABS() must promote integer
types to the next highest width.

Change-Id: I86cc880e78258d5f90471bd8af4caeb4305eed77
Reviewed-on: http://gerrit.cloudera.org:8080/8004
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-09-23 02:41:32 +00:00
Philip Zeyliger
02302b7cfe IMPALA-5211: Simplifying nullif conditional.
This commit:
* Converts nullif(x, y) into if(x IS DISTINCT FROM y, x, NULL).
* Re-writes x IS DINSTINCT FROM y -> FALSE if x.equals(y).
* Removes backend implementation of nullif.

As is the case with all conversions, the original nullif(...) is
replaced with if(...) in error messages, explain plans, and so on.

It's important and subtle that the conversion uses "x IS DISTINCT FROM y"
rather than "x != y" so that the simplification can be made while
handling null values correctly. ("x != x" may be either false or null,
but x is distinct from x is always false.)

Testing:
* Added new tests to ExprRewriteRulesTests for nullif and the if(x
  distinct from y, ...) simplification.
* New test for the rewrite in ParserTest.
* Adds an nvl2() test, incidentally.
* Confirmed (using EclEmma, which uses jococo engine) that coverage is good.
* Ran the tests.

Change-Id: Id91ca968a0c0be44e1ec54ad8602f91a5cb2e0e5
Reviewed-on: http://gerrit.cloudera.org:8080/7829
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-09-15 22:48:52 +00:00
Sandeep Akinapelli
f538b43911 IMPALA-5317: add DATE_TRUNC() function
Added a UDF builtin function date_trunc.
Reuse many of the Trunc functions implemented already for trunc() including
truncate unit and except strToTruncUnit
Added checks to ensure that truncation results that fall outside of
posix timestamp range are returned as NULL.
Added ctest for the date_trunc function.

Change-Id: I953ba006cbb166dcc78e8c0c12dfbf70f093b584
Reviewed-on: http://gerrit.cloudera.org:8080/7313
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-09-07 01:29:01 +00:00
Jinchul
15e6cf8fd0 IMPALA-5529: Add additional function signatures for TRUNC()
The following signatures to be added:
+--------------+----------------------------------+-------------+---------------+
| return type  | signature                        | binary type | is persistent |
+--------------+----------------------------------+-------------+---------------+
| DECIMAL(*,*) | trunc(DECIMAL(*,*))              | BUILTIN     | true          |
| DECIMAL(*,*) | trunc(DECIMAL(*,*), BIGINT)      | BUILTIN     | true          |
| DECIMAL(*,*) | trunc(DECIMAL(*,*), INT)         | BUILTIN     | true          |
| DECIMAL(*,*) | trunc(DECIMAL(*,*), SMALLINT)    | BUILTIN     | true          |
| DECIMAL(*,*) | trunc(DECIMAL(*,*), TINYINT)     | BUILTIN     | true          |
| BIGINT       | trunc(DOUBLE)                    | BUILTIN     | true          |
+--------------+----------------------------------+-------------+---------------+

Tests:
* Adds tests for the new builtin trunc()/dtrunc()

Change-Id: I856da9f817b948de3c72af60a0742b128398b4cf
Reviewed-on: http://gerrit.cloudera.org:8080/7450
Tested-by: Impala Public Jenkins
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
2017-07-29 20:53:45 +00:00
Matthew Jacobs
7a1ff1e5e9 IMPALA-5539: Fix Kudu timestamp with -use_local_tz_for_unix_ts
The -use_local_tz_for_unix_timestamp_conversion flag exists
to specify if TIMESTAMPs should be interpreted as localtime
or UTC when converting to/from Unix time via builtins:
  from_unixtime(bigint unixtime)
  unix_timestamp(string datetime[, ...])
  unix_timestamp(timestamp datetime)

However, the KuduScanner was calling into code that, when
the gflag above was set, interpreted Unix times as local
time.  Unfortunately the write path (KuduTableSink) and some
FE TIMESTAMP code (see KuduUtil.java) did not have this
behavior, i.e. we were handling the gflag inconsistently.

Tests:
* Adds a custom cluster test to run Kudu test cases with
  -use_local_tz_for_unix_timestamp_conversion.
* Adds tests for the new builtin
  unix_micros_to_utc_timestamp() which run in a custom
  cluster test (added test_local_tz_conversion.py) as well
  as in the regular tests (added to test_exprs.py).

Change-Id: I423a810427353be76aa64442044133a9a22cdc9b
Reviewed-on: http://gerrit.cloudera.org:8080/7311
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-07-19 22:17:13 +00:00
Bikramjeet Vig
9037b8e385 IMPALA-3504: UDF for current timestamp in UTC
This change adds a UDF "utc_timestamp" which returns the current
date and time in UTC. Example query:

select utc_timestamp();

+-------------------------------+
| utc_timestamp()               |
+-------------------------------+
| 2017-06-15 17:36:39.290773000 |
+-------------------------------+

Change-Id: I969fc805922f2bb9c8101e84f85ff2cc3b1b6729
Reviewed-on: http://gerrit.cloudera.org:8080/7203
Tested-by: Impala Public Jenkins
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
2017-07-06 23:04:28 +00:00
Vincent Tran
d5b6cb903d IMPALA-5316: Adds last_day() function
This change adds last_day() function.
The function takes exactly one TIMESTAMP argument
and returns a TIMESTAMP that is the last date of the
input date's calendar month.
The function will return NULL when:
  1) The input argument cannot be implicitly casted to
     a TIMESTAMP.
  2) The TIMESTAMP argument is missing a date component.
  3) The TIMESTAMP argument is outside of the supported range:
         between 1400-01-31 00:00:00 and 9999-12-31 23:59:59

Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370
Reviewed-on: http://gerrit.cloudera.org:8080/6991
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
2017-06-15 04:51:49 +00:00
Matthew Jacobs
2dcbefc652 IMPALA-5338: Fix Kudu timestamp column default values
While support for TIMESTAMP columns in Kudu tables has been
committed (IMPALA-5137), it does not support TIMESTAMP
column default values.

This supports CREATE TABLE syntax to specify the default
values, but more importantly this fixes the loading of Kudu
tables that may have had default values set on
UNIXTIME_MICROS columns, e.g. if the table was created via
the python client. This involves fixing KuduColumn to hide
the LiteralExpr representing the default value because it
will be a BIGINT if the column type is TIMESTAMP. It is only
needed to call toSql() and toStringValue(), so helper
functions are added to KuduColumn to encapsulate special
logic for TIMESTAMP.

TODO: Add support and tests for ALTER setting the default
value (when IMPALA-4622 is committed).

Change-Id: I655910fb4805bb204a999627fa9f68e43ea8aaf2
Reviewed-on: http://gerrit.cloudera.org:8080/6936
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
2017-06-02 01:47:48 +00:00
Matthew Jacobs
24c77f194b IMPALA-5137: Support pushing TIMESTAMP predicates to Kudu
This change builds on the support for reading and writing
TIMESTAMP columns to Kudu tables (see [1]), adding support
for pushing TIMESTAMP predicates to Kudu for scans.

Binary predicates and IN list predicates are supported.

Testing: Added some planner and EE tests to validate the
behavior.

1: https://gerrit.cloudera.org/#/c/6526/

Change-Id: I08b6c8354a408e7beb94c1a135c23722977246ea
Reviewed-on: http://gerrit.cloudera.org:8080/6789
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-18 21:09:51 +00:00
Zach Amsden
0715a303ea IMPALA-4729: Implement REPLACE()
This turned out to be slightly non-trivial as REPLACE is already a
keyword, and thus the parser needs to be tweaked to allow this,
since function names act as bare identifiers.

It was difficult to get this to match performance of regexp_replace.
For expanding patterns, the fact that regexp_replace copies the
expansion inline means that it may in fact win on large strings
with sparse matches that are > dcache size apart.  Let's leave
optimizing that for later.

Testing: Added a full test for maximum size strings and got most
of the boundary conditions I could identify.  Manually ran queries
on TPC-H dataset in impala to verify both performance and correctness.
Added large string and exprs.test test clauses and ran the tests to
verify they work as expected.

Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Reviewed-on: http://gerrit.cloudera.org:8080/5776
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
2017-02-15 01:33:23 +00:00
Zoltan Ivanfi
64ffbc4b51 IMPALA-3973: add position and occurrence to instr()
Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Reviewed-on: http://gerrit.cloudera.org:8080/4094
Tested-by: Internal Jenkins
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
2016-09-13 20:28:27 +00:00
Zoltan Ivanfi
c23dc3a53a IMPALA-1659: Netezza compatibility functions: metadata
Added the SQL functions current_catalog(), current_user() and session_user() as
aliases to existing ones and a new SQL function current_sid().

Change-Id: I9b5d1009bbf42acc175a942d2df484e1c64822ca
Reviewed-on: http://gerrit.cloudera.org:8080/4063
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2016-08-26 16:29:24 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Hayabusa-intel
4e7172f6f5 IMPALA-2459: Implement next_day date/time UDF
Returns the date of the weekday that follows a particular date.
The weekday argument is a string literal indicating the day of the week.
Also this argument is case-insensitive. Available values are:
"Sunday"/"SUN", "Monday"/"MON", "Tuesday"/"TUE",
"Wednesday"/"WED", "Thursday"/"THU", "Friday"/"FRI", "Saturday"/"SAT".
For example, the first Saturday after Wednesday, 25 December 2013
is on 28 December 2013.
select next_day('2013-12-25','Saturday') returns '2013-12-28 00:00:00'
select next_day(to_timestamp('08-1987-21', 'MM-yyyy-dd'), 'FRIDAY')
returns '1987-08-28 00:00:00'

Change-Id: I2721d236c096639a9e7d2df8a45ca888c6b3e83e
Reviewed-on: http://gerrit.cloudera.org:8080/1943
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
2016-06-09 04:30:48 -07:00
Jim Apple
1c16dd0cf8 IMPALA-2107: Add Base64 encoder/decoder
Change-Id: I911451c5d68e8ae9d352abfcf4d5ff36484f0bf3
Reviewed-on: http://gerrit.cloudera.org:8080/2633
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:32 -07:00
Thomas Tauber-Marshall
1c98ec7f81 IMPALA-1772: Add additional date/time functions.
Implemented the 'millisecond' built-in function, which takes
a timestamp and returns an integer representing its
millisecond portion.

Other functions pending.

Change-Id: I3bafc6aaf80d1d8d2a634d120d9dbdb954d3f0c4
Reviewed-on: http://gerrit.cloudera.org:8080/2148
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2016-03-08 03:12:51 +00:00
Hayabusa-intel
df599b79d9 IMPALA-1477: implement UUID function
Utilize Boost UUID libraries to generate UUID values.
Usage: select uuid();

Change-Id: I932f78952d65f4073d8177c6e80693586e6285cb
Reviewed-on: http://gerrit.cloudera.org:8080/647
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-02-03 10:10:28 +00:00
Michael Ho
f3e7274342 IMPALA-2711: Fix memory leak in Rand().
MathFunctions::RandPrepare() allocates a 4-bytes seed and
stores it in the FunctionContext's thread local state.
However, it was never freed. This change fixes the problem
by adding a close function for Rand() so it has a chance to
free the seed. A new test is also added to verify the fix.

Change-Id: Ibcc2e1ca0d052b86defe80aad471f9fdaac5a453
Reviewed-on: http://gerrit.cloudera.org:8080/1855
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
2016-01-26 11:53:38 +00:00
Jim Apple
1a3d7ffd4f IMPALA-2147: Support IS [NOT] DISTINCT FROM and "<=>" predicates
Enforces that the planner treats IS NOT DISTINCT FROM as eligible for
hash joins, but does not find the minimum spanning tree of
equivalences for use in optimizing query plans; this is left as future
work.

Change-Id: I62c5300b1fbd764796116f95efe36573eed4c8d0
Reviewed-on: http://gerrit.cloudera.org:8080/710
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Internal Jenkins
2016-01-14 05:45:22 +00:00
Michael Ho
34a94c2503 IMPALA-2404: Implements built-in function regexp_match_count
This patch implements a new built-in function
regexp_match_count. This function returns the number of
matching occurrences in input.

The regexp_match_count() function has the following syntax:

int = regexp_match_count(string input, string pattern)
int = regexp_match_count(string input, string pattern,
    int start_pos, string flags)

The input value specifies the string on which the regular
expression is processed.

The pattern value specifies the regular expression.

The start_pos value specifies the character position
at which to start the search for a match. It is set
to 1 by default if it's not specified.

The flags value (if specified) dictates the behavior of
the regular expression matcher:

m: Specifies that the input data might contain more than
one line so that the '^' and the '$' matches should take
that into account.

i: Specifies that the regex matcher is case insensitive.

c: Specifies that the regex matcher is case sensitive.

n: Specifies that the '.' character matches newlines.

By default, the flag value is set to 'c'. Note that the
flags are consistent with other existing built-in functions
(e.g. regexp_like) so certain flags in IBM netezza such as
's' are not supported to avoid confusion.

Change-Id: Ib33ece0448f78e6a60bf215640f11b5049e47bb5
Reviewed-on: http://gerrit.cloudera.org:8080/1248
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2015-10-27 10:11:13 +00:00
aacalfa
5e733e8d62 IMPALA-2190: Complete conversion functions between timestamp, unixtime, and string dates
Change-Id: I48a446f19c7634477f175d0defa8779dd70a392f
Reviewed-on: http://gerrit.cloudera.org:8080/654
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2015-09-07 07:07:20 +00:00
Tim Armstrong
d7e52e336a IMPALA-1660: addendum: add more aliases
Add in missing dfloor alias. This should have been added as part of
IMPALA-1660 as an alias for floor(double) but was overlooked.

Also add in aliases for decimal versions of functions where they exist.

Change-Id: Icb790745714882248d365274e95d45eaaf0ba133
Reviewed-on: http://gerrit.cloudera.org:8080/697
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-01 10:46:16 +00:00
Feni Chawla
2db0371a26 IMPALA-2033: Netezza compatibility date/time related functions.
Added INT_MONTHS_BETWEEN, TIMEOFDAY, TIMESTAMP_CMP, MONTHS_BETWEEN functions

Change-Id: I44834c84e21856568613938418947c532e7fbd2e
Reviewed-on: http://gerrit.cloudera.org:8080/642
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
2015-08-27 04:17:05 +00:00
Sailesh Mukil
1c46cab5c6 IMPALA-2084: SPLIT_PART and REGEXP_LIKE functions for Tableau pushdown
Added the SPLIT_PART and the REGEXP_LIKE builtin functions and tests for both.
The REGEXP_LIKE has an optional third parameter which if used, uses a different
'prepare' function (RegexpLikePrepare in like-predicate.cc) so that the appropriate
options can be set in the RE2 library.

Added a patch for the RE2 library so that the 'dot matches all' option is exposed
via the RE2 class.

Fixed a bug in the case when the function to be evaluated for the WHERE clause
operates on constants, proper cleanup isn't guaranteed on certain edge cases.

Change-Id: Ia2a8de9eeb2854100a2d949f612cfaba317c5a7b
Reviewed-on: http://gerrit.cloudera.org:8080/501
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2015-08-18 09:07:34 +00:00
Casey Ching
cf60967b7e IMPALA-1675: Avoid overflow when adding large intervals to TIMESTAMPs
It turns out there is a variety of cases where boost incorrectly adds
intervals if the interval is at (or beyond) an edge case value. This
change defines a max interval and returns NULL if the user supplies
an interval beyond the max.

Change-Id: I4fb6869be22ab06089b66eeffaea04b0c0880080
Reviewed-on: http://gerrit.cloudera.org:8080/492
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2015-08-16 12:09:24 +00:00
Feni Chawla
9428448146 IMPALA 2034: Netezza compatibility char functions for ASCII and UTF-8 strings: CHR and BTRIM
Change-Id: I76bf9ba76172b9f1a192ee0936d73718808c0fbd
Reviewed-on: http://gerrit.cloudera.org:8080/529
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
2015-08-06 02:24:24 +00:00
Casey Ching
074e5b4349 Remove hashbang from non-script python files
Many python files had a hashbang and the executable bit set though
they were not intended to be run a standalone script. That makes
determining which python files are actually scripts very difficult.
A future patch will update the hashbang in real python scripts so they
use $IMPALA_HOME/bin/impala-python.

Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba
Reviewed-on: http://gerrit.cloudera.org:8080/599
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2015-08-04 05:26:07 +00:00
Tim Armstrong
e151ebaa71 IMPALA-1001: Bit and byte manipulation functions
Bit and byte functions for compatibility with Teradata: bitand, bitor, bitxor, bitnot,
countset, getbit, setbit, shiftleft, shiftright, rotateleft, rotateright.
Interfaces and behavior follow Teradata documentation.

All bit* functions are compatible with DB2.  bitand only is compatible with Oracle.

Change-Id: Idba3fb7beb029de493b602e6279aa68e32688df3
2015-07-28 08:11:01 -07:00
Tim Armstrong
822cb8f5e2 IMPALA-1660: Netezza compatibility - factorial
Implements suffix n! operator for factorial and factorial function.

Slightly refactor operators in fe to share code between unary operators.

Based partially on work by Arthur Peng <arthur.peng@intel.com>.

Change-Id: I71b6c824c59fc5305f16b8c4457805126a1da93b
Reviewed-on: http://gerrit.cloudera.org:8080/531
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-07-27 19:03:48 +00:00
Tim Armstrong
e5cc539d3f IMPALA-1660: Netezza math function aliases
Add aliases for existing functions for Netezza compatibility:
dceil->ceil, dtrunc->truncate, dexp->exp, dlog1->ln, log10->dlog10, dpow->pow,
fpow->pow, dsqrt->sqrt, random->rand.

Change-Id: I97da27b676d4e07e55735540f494bdb873f7ed61
Reviewed-on: http://gerrit.cloudera.org:8080/559
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-07-23 21:56:33 +00:00
Casey Ching
a6d534682b IMPALA-2086, IMPALA-2090: Avoid boost year/month interval logic
Boost handles a couple of edge cases differently than other databases
such as Postgres and MySQL when adding year/month intervals to
timestamps. This change makes Impala consistent for the other databases.
The performance difference was not noticeable (<5% if any).

Change-Id: Icb02a06281b53753938cab88e0d28f20709fee06
Reviewed-on: http://gerrit.cloudera.org:8080/489
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2015-07-20 10:16:54 +00:00
Arthur Peng
a275f2a751 IMPALA-1660: Netezza math functions support
1.Add cot function
2.Add double type support in truncate

Change-Id: Id48c58b7778a31edfbda8982f7a8c3d05a1ad14e
2015-07-16 22:29:26 -07:00
Casey Ching
f351119730 Add section in builtin function registry for invisible functions
An upcoming patch will add a function that will not be user visible.
This patch allows a non-visible function to be added in the same way
that visible functions are added (using impala_functions.py).

Change-Id: I70971ced0d595a7aaa975985e589d2676423e221
Reviewed-on: http://gerrit.cloudera.org:8080/528
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2015-07-15 23:43:55 +00:00
Henry Robinson
79913b01e6 IMPALA-2064: Add effective_user() builtin
The user() builtin always returns the connecteduser. However, if the
client wants to see which user its queries are actually delegated to,
there was no easy way to do that.

This patch adds effective_user(), which returns the proxy delegated user
for authorization purposes. If no delegated user is set, the effective
user is the same as that returned from user().

The only way to test this is via a new custom cluster test, which sets
impala.doas.user so that the effective user might be different from the
connected one.

Change-Id: I7048c27c6808a6986dbe1246929816176dca9f76
Reviewed-on: http://gerrit.cloudera.org:8080/458
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
2015-06-16 23:42:40 +00:00
Feni Chawla
d4817b697f IMPALA-1771: Add support for hyperbolic trigonometric functions sinh(), cosh(), tanh() and atan2()
Change-Id: Iedd89629b36ec4f5ef270e5eff48371e075ad3ff
Reviewed-on: http://gerrit.cloudera.org:8080/409
Tested-by: Internal Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2015-05-27 15:54:55 +00:00
zuowang
bcd4c1de21 IMPALA-1662: Netezza compatibility functions: Logic
Logic functions for improved Netezza compatibility:
ISFALSE
ISNOTFALSE
ISNOTTRUE
ISTRUE
NONNULLVALUE
NULLVALUE

Change-Id: I1d5eef793ce99e286c340dc5c63839dc5eb2655d
Reviewed-on: http://gerrit.cloudera.org:8080/21
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
2015-03-04 21:09:06 +00:00
casey
dbc504fad1 IMPALA-1579: UNIX_TIMESTAMP() should return BIGINTs instead of INTs
This should fix the last y2k38 problem. Previously calling
unix_timestamp() with a input of '2038-01-19 03:14:08' or later would
return a negative value due to a 32 bit int overflow. This patch
switches from 32 to 64 bit ints.

Change-Id: Ic9180887d6c828f6ecd25435be86fd0bd52d3f0d
Reviewed-on: http://gerrit.cloudera.org:8080/61
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2015-02-16 00:59:34 +00:00
zuowang
60f6373a56 IMPALA-899: Add Mod() UDF
It's an overload function supports parameters of tinyint, int,
smallint, bigint, float, double, and decimal type.

Change-Id: I040753ea06d5402b2f67751f8c4d5d49fa539952
Reviewed-on: http://gerrit.cloudera.org:8080/22
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
2015-02-14 03:07:25 +00:00
Arthur Peng
26386eb41d IMPALA-1597: Add typeOf() builtin
typeOf() returns the type of the given expression.
e.g. typeOf(bigint_col) -> "BIGINT"

Change-Id: I4c12d6fb2759af38a941c92d0f20a6faa000f996
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5915
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
2015-02-03 00:41:12 -08:00
Martin Grund
6e0c1c26c9 IMPALA-1424: abs() function retains input type
This patch modifies the abs() built-in function so that it
retains the type of the input argument for the return type
in the same way as Postgres does.

Change-Id: I1750237b85bedbc3ce9d52330ac4d458b0aada3a
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4980
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 424b359ab0a4f621f2865844c3293f2c80e0867f)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4996
2014-10-28 08:07:21 -07:00
casey
543fa73f3a Standardize EXTRACT and add DATE_PART
The standard EXTRACT syntax is EXTRACT(<TIME UNIT> FROM <TIMESTAMP>) but
it was implemented as a regular function EXTRACT(<STRING>, <TIMESTAMP>).
The existing function will continue to be supported. We could deprecate
it but it doesn't seem like much of a burden to keep.

Adding DATE_PART is easy since it is functionally the same as the EXTRACT
function. The only difference is in the call signature. Besides the
difference in name, the arguments are reversed. Otherwise the two
functions are equivalent.

Change-Id: Ia6f9156624ed901723672469f94205c704839248
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4579
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-10-08 16:44:37 -07:00
Victor Bittorf
177337dc33 length(CHAR(*)) function ignores spaces
Fixed length to behave like postgres for CHAR(*).

Change-Id: I817c74a4af73a3c0a9f5ea1d8ee36b520936395f
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4572
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: Victor Bittorf <victor.bittorf@cloudera.com>
2014-10-06 15:10:53 -07:00
Skye Wanderman-Milne
559b83d3d0 Expr refactoring
This patch changes the interface for evaluating expressions, in order
to allow for thread-safe expression evaluations and easier
codegen. Thread safety is achieved via the ExprContext class, a
light-weight container for expression tree evaluation state. Codegen
is easier because more expressions can be cross-compiled to IR.

See expr.h and expr-context.h for an overview of the API
changes. See sort-exec-exprs.cc for a simple example of the new
interface and hdfs-scanner.cc for a more complicated example.

This patch has not been completely code reviewed and may need further
cleanup/stylistic work, as well as additional perf work.

Change-Id: I3e3baf14ebffd2687533d0cc01a6fb8ac4def849
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3459
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-08-17 12:44:44 -07:00
Skye Wanderman-Milne
f062a22997 Convert string functions to UDF interface
This also switches to using the re2 library for regular expression
functions instead of boost.

Change-Id: I4c3ae72ff2f7cbd5b1a2be719275f1b2e25f8ab2
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3412
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-08-17 12:44:38 -07:00
Skye Wanderman-Milne
7a0cc27fd1 Convert math functions to the UDF interface.
Also adds FunctionContext::GetNumArgs() method to the public UDF API.

Change-Id: I76e21814e423f075a0a22b4e924c1d3ec26daba7
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3410
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-08-17 12:44:32 -07:00