Added the SQL functions current_catalog(), current_user() and session_user() as
aliases to existing ones and a new SQL function current_sid().
Change-Id: I9b5d1009bbf42acc175a942d2df484e1c64822ca
Reviewed-on: http://gerrit.cloudera.org:8080/4063
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Returns the date of the weekday that follows a particular date.
The weekday argument is a string literal indicating the day of the week.
Also this argument is case-insensitive. Available values are:
"Sunday"/"SUN", "Monday"/"MON", "Tuesday"/"TUE",
"Wednesday"/"WED", "Thursday"/"THU", "Friday"/"FRI", "Saturday"/"SAT".
For example, the first Saturday after Wednesday, 25 December 2013
is on 28 December 2013.
select next_day('2013-12-25','Saturday') returns '2013-12-28 00:00:00'
select next_day(to_timestamp('08-1987-21', 'MM-yyyy-dd'), 'FRIDAY')
returns '1987-08-28 00:00:00'
Change-Id: I2721d236c096639a9e7d2df8a45ca888c6b3e83e
Reviewed-on: http://gerrit.cloudera.org:8080/1943
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
Implemented the 'millisecond' built-in function, which takes
a timestamp and returns an integer representing its
millisecond portion.
Other functions pending.
Change-Id: I3bafc6aaf80d1d8d2a634d120d9dbdb954d3f0c4
Reviewed-on: http://gerrit.cloudera.org:8080/2148
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
MathFunctions::RandPrepare() allocates a 4-bytes seed and
stores it in the FunctionContext's thread local state.
However, it was never freed. This change fixes the problem
by adding a close function for Rand() so it has a chance to
free the seed. A new test is also added to verify the fix.
Change-Id: Ibcc2e1ca0d052b86defe80aad471f9fdaac5a453
Reviewed-on: http://gerrit.cloudera.org:8080/1855
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
Enforces that the planner treats IS NOT DISTINCT FROM as eligible for
hash joins, but does not find the minimum spanning tree of
equivalences for use in optimizing query plans; this is left as future
work.
Change-Id: I62c5300b1fbd764796116f95efe36573eed4c8d0
Reviewed-on: http://gerrit.cloudera.org:8080/710
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Internal Jenkins
This patch implements a new built-in function
regexp_match_count. This function returns the number of
matching occurrences in input.
The regexp_match_count() function has the following syntax:
int = regexp_match_count(string input, string pattern)
int = regexp_match_count(string input, string pattern,
int start_pos, string flags)
The input value specifies the string on which the regular
expression is processed.
The pattern value specifies the regular expression.
The start_pos value specifies the character position
at which to start the search for a match. It is set
to 1 by default if it's not specified.
The flags value (if specified) dictates the behavior of
the regular expression matcher:
m: Specifies that the input data might contain more than
one line so that the '^' and the '$' matches should take
that into account.
i: Specifies that the regex matcher is case insensitive.
c: Specifies that the regex matcher is case sensitive.
n: Specifies that the '.' character matches newlines.
By default, the flag value is set to 'c'. Note that the
flags are consistent with other existing built-in functions
(e.g. regexp_like) so certain flags in IBM netezza such as
's' are not supported to avoid confusion.
Change-Id: Ib33ece0448f78e6a60bf215640f11b5049e47bb5
Reviewed-on: http://gerrit.cloudera.org:8080/1248
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Add in missing dfloor alias. This should have been added as part of
IMPALA-1660 as an alias for floor(double) but was overlooked.
Also add in aliases for decimal versions of functions where they exist.
Change-Id: Icb790745714882248d365274e95d45eaaf0ba133
Reviewed-on: http://gerrit.cloudera.org:8080/697
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Added the SPLIT_PART and the REGEXP_LIKE builtin functions and tests for both.
The REGEXP_LIKE has an optional third parameter which if used, uses a different
'prepare' function (RegexpLikePrepare in like-predicate.cc) so that the appropriate
options can be set in the RE2 library.
Added a patch for the RE2 library so that the 'dot matches all' option is exposed
via the RE2 class.
Fixed a bug in the case when the function to be evaluated for the WHERE clause
operates on constants, proper cleanup isn't guaranteed on certain edge cases.
Change-Id: Ia2a8de9eeb2854100a2d949f612cfaba317c5a7b
Reviewed-on: http://gerrit.cloudera.org:8080/501
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
It turns out there is a variety of cases where boost incorrectly adds
intervals if the interval is at (or beyond) an edge case value. This
change defines a max interval and returns NULL if the user supplies
an interval beyond the max.
Change-Id: I4fb6869be22ab06089b66eeffaea04b0c0880080
Reviewed-on: http://gerrit.cloudera.org:8080/492
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Many python files had a hashbang and the executable bit set though
they were not intended to be run a standalone script. That makes
determining which python files are actually scripts very difficult.
A future patch will update the hashbang in real python scripts so they
use $IMPALA_HOME/bin/impala-python.
Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba
Reviewed-on: http://gerrit.cloudera.org:8080/599
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
Bit and byte functions for compatibility with Teradata: bitand, bitor, bitxor, bitnot,
countset, getbit, setbit, shiftleft, shiftright, rotateleft, rotateright.
Interfaces and behavior follow Teradata documentation.
All bit* functions are compatible with DB2. bitand only is compatible with Oracle.
Change-Id: Idba3fb7beb029de493b602e6279aa68e32688df3
Implements suffix n! operator for factorial and factorial function.
Slightly refactor operators in fe to share code between unary operators.
Based partially on work by Arthur Peng <arthur.peng@intel.com>.
Change-Id: I71b6c824c59fc5305f16b8c4457805126a1da93b
Reviewed-on: http://gerrit.cloudera.org:8080/531
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Boost handles a couple of edge cases differently than other databases
such as Postgres and MySQL when adding year/month intervals to
timestamps. This change makes Impala consistent for the other databases.
The performance difference was not noticeable (<5% if any).
Change-Id: Icb02a06281b53753938cab88e0d28f20709fee06
Reviewed-on: http://gerrit.cloudera.org:8080/489
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
An upcoming patch will add a function that will not be user visible.
This patch allows a non-visible function to be added in the same way
that visible functions are added (using impala_functions.py).
Change-Id: I70971ced0d595a7aaa975985e589d2676423e221
Reviewed-on: http://gerrit.cloudera.org:8080/528
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
The user() builtin always returns the connecteduser. However, if the
client wants to see which user its queries are actually delegated to,
there was no easy way to do that.
This patch adds effective_user(), which returns the proxy delegated user
for authorization purposes. If no delegated user is set, the effective
user is the same as that returned from user().
The only way to test this is via a new custom cluster test, which sets
impala.doas.user so that the effective user might be different from the
connected one.
Change-Id: I7048c27c6808a6986dbe1246929816176dca9f76
Reviewed-on: http://gerrit.cloudera.org:8080/458
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
This should fix the last y2k38 problem. Previously calling
unix_timestamp() with a input of '2038-01-19 03:14:08' or later would
return a negative value due to a 32 bit int overflow. This patch
switches from 32 to 64 bit ints.
Change-Id: Ic9180887d6c828f6ecd25435be86fd0bd52d3f0d
Reviewed-on: http://gerrit.cloudera.org:8080/61
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
typeOf() returns the type of the given expression.
e.g. typeOf(bigint_col) -> "BIGINT"
Change-Id: I4c12d6fb2759af38a941c92d0f20a6faa000f996
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5915
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
This patch modifies the abs() built-in function so that it
retains the type of the input argument for the return type
in the same way as Postgres does.
Change-Id: I1750237b85bedbc3ce9d52330ac4d458b0aada3a
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4980
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 424b359ab0a4f621f2865844c3293f2c80e0867f)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4996
The standard EXTRACT syntax is EXTRACT(<TIME UNIT> FROM <TIMESTAMP>) but
it was implemented as a regular function EXTRACT(<STRING>, <TIMESTAMP>).
The existing function will continue to be supported. We could deprecate
it but it doesn't seem like much of a burden to keep.
Adding DATE_PART is easy since it is functionally the same as the EXTRACT
function. The only difference is in the call signature. Besides the
difference in name, the arguments are reversed. Otherwise the two
functions are equivalent.
Change-Id: Ia6f9156624ed901723672469f94205c704839248
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4579
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
This patch changes the interface for evaluating expressions, in order
to allow for thread-safe expression evaluations and easier
codegen. Thread safety is achieved via the ExprContext class, a
light-weight container for expression tree evaluation state. Codegen
is easier because more expressions can be cross-compiled to IR.
See expr.h and expr-context.h for an overview of the API
changes. See sort-exec-exprs.cc for a simple example of the new
interface and hdfs-scanner.cc for a more complicated example.
This patch has not been completely code reviewed and may need further
cleanup/stylistic work, as well as additional perf work.
Change-Id: I3e3baf14ebffd2687533d0cc01a6fb8ac4def849
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3459
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
This also switches to using the re2 library for regular expression
functions instead of boost.
Change-Id: I4c3ae72ff2f7cbd5b1a2be719275f1b2e25f8ab2
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3412
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
This patch also adds a GetReturnType() method to FunctionContext. This
is staging for the expr refactoring.
Change-Id: I854e79ded409e151663c4ec99c4e08631ad9e03e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3234
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
FromUtc and ToUtc use thirdparty libraries which use inline asm which
isn't currently supported with JIT. The UDFs are included in this
commit, but the function symbols were not changed in
impala_functions.py
Change-Id: I0824a434d4a26a39abf29bc6e47d51b5ad7991d6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3390
Reviewed-by: Paden Tomasello <paden.tomasello@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 8e149ccd78010b7a22d6fff1b0de5614848b02ac)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3548
Float/Doubles are lossy so using those as the default literal type
is problematic.
Change-Id: I5a619dd931d576e2e6cd7774139e9bafb9452db9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2758
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
This patch allows the text scanner to read 'inf' or 'Infinity' from a
row and correctly translate it into floating-point infinity. It also
adds is_inf() and is_nan() builtins.
Finally, we change the text table writer to write Infinity and NaN for
compatibility with Hive.
In the future, we might consider adding nan / inf literals to our
grammar (postgres has this, see:
http://www.postgresql.org/docs/9.3/static/datatype-numeric.html).
Change-Id: I796f2852b3c6c3b72e9aae9dd5ad228d188a6ea3
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2393
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 58091355142cadd2b74874d9aa7c8ab6bf3efe2f)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2483
It would have been convenient today to know the largest values that
Impala accepts for its integer types. This patch adds max and min
builtins for our numeric types.
[localhost:21000] > select max_bigint(), max_int(), max_smallint(),
max_tinyint();
Query: select max_bigint(), max_int(), max_smallint(), max_tinyint()
+---------------------+------------+----------------+---------------+
| max_bigint() | max_int() | max_smallint() | max_tinyint() |
+---------------------+------------+----------------+---------------+
| 9223372036854775807 | 2147483647 | 32767 | 127 |
+---------------------+------------+----------------+---------------+
Change-Id: I6df6df2728197529c6375dbb1b7d3c9ddb9833d2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2381
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2398