impala

mirror of https://github.com/apache/impala.git synced 2026-01-28 00:04:45 -05:00

Author	SHA1	Message	Date
Adam Tamas	1bafb7bd29	IMPALA-9531: Dropped support for dateless timestamps Removed the support for dateless timestamps. During dateless timestamp casts if the format doesn't contain date part we get an error during tokenization of the format. If the input str doesn't contain a date part then we get null result. Examples: select cast('01:02:59' as timestamp); This will come back as NULL value. select to_timestamp('01:01:01', 'HH:mm:ss'); select cast('01:02:59' as timestamp format 'HH12:MI:SS'); select cast('12 AM' as timestamp FORMAT 'AM.HH12'); These will come back with a parsing errors. Casting from a table will generate similar results. Testing: Modified the previous tests related to dateless timestamps. Added test to read fromtables which are still containing dateless timestamps and covered timestamp to string path when no date tokens are requested in the output string. Change-Id: I48c49bf027cc4b917849b3d58518facba372b322 Reviewed-on: http://gerrit.cloudera.org:8080/15866 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>	2020-07-08 19:32:15 +00:00
Gabor Kaszab	7e456dfa9d	IMPALA-9632: Implement ds_hll_sketch() and ds_hll_estimate() These functions can be used to get cardinality estimates of data using HLL algorithm from Apache DataSketches. ds_hll_sketch() receives a dataset, e.g. a column from a table, and returns a serialized HLL sketch in string format. This can be written to a table or be fed directly to ds_hll_estimate() that returns the cardinality estimate for that sketch. Comparing to ndv() these functions bring more flexibility as once we fed data to the sketch it can be written to a table and next time we can save scanning through the dataset and simply return the estimate using the sketch. This doesn't come for free, however, as perfomance measurements show that ndv() is 2x-3.5x faster than sketching. On the other hand if we query the estimate from an existing sketch then the runtime is negligible. Another flexibility with these sketches is that they can be merged together so e.g. if we had saved a sketch for each of the partitions of a table then they can be combined with each other based on the query without touching the actual data. DataSketches HLL is sensitive for the order of the data fed to the sketch and as a result running these algorithms in Impala gets non-deterministic results within the error bounds of the algorithm. In terms of correctness DataSketches HLL is most of the time in 2% range from the correct result but there are occasional spikes where the difference is bigger but never goes out of the range of 5%. Even though the DataSketches HLL algorithm could be parameterized currently this implementation hard-codes these parameters and use HLL_4 and lg_k=12. For more details about Apache DataSketches' HLL implementation see: https://datasketches.apache.org/docs/HLL/HLL.html Testing: - Added some tests running estimates for small datasets where the amount of data is small enough to get the correct results. - Ran manual tests on TPCH25.lineitem to compare perfomance with ndv(). Depending on data characteristics ndv() appears 2x-3.5x faster. The lower the cardinality of the dataset the bigger the difference between the 2 algorithms is. - Ran manual tests on TPCH25.lineitem and functional_parquet.alltypes to compare correctness with ndv(). See results above. Change-Id: Ic602cb6eb2bfbeab37e5e4cba11fbf0ca40b03fe Reviewed-on: http://gerrit.cloudera.org:8080/16000 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2020-07-07 14:11:21 +00:00
stiga-huang	0936384271	IMPALA-9010: Add builtin mask functions There're 6 builtin GenericUDFs for column masking in Hive: mask_show_first_n(value, charCount, upperChar, lowerChar, digitChar, otherChar, numberChar) mask_show_last_n(value, charCount, upperChar, lowerChar, digitChar, otherChar, numberChar) mask_first_n(value, charCount, upperChar, lowerChar, digitChar, otherChar, numberChar) mask_last_n(value, charCount, upperChar, lowerChar, digitChar, otherChar, numberChar) mask_hash(value) mask(value, upperChar, lowerChar, digitChar, otherChar, numberChar, dayValue, monthValue, yearValue) Description of the parameters: value - value to mask. Supported types: TINYINT, SMALLINT, INT, BIGINT, STRING, VARCHAR, CHAR, DATE(only for mask()). charCount - number of characters. Default value: 4 upperChar - character to replace upper-case characters with. Specify -1 to retain original character. Default value: 'X' lowerChar - character to replace lower-case characters with. Specify -1 to retain original character. Default value: 'x' digitChar - character to replace digit characters with. Specify -1 to retain original character. Default value: 'n' otherChar - character to replace all other characters with. Specify -1 to retain original character. Default value: -1 numberChar - character to replace digits in a number with. Valid values: 0-9. Default value: '1' dayValue - value to replace day field in a date with. Specify -1 to retain original value. Valid values: 1-31. Default value: 1 monthValue - value to replace month field in a date with. Specify -1 to retain original value. Valid values: 0-11. Default value: 0 yearValue - value to replace year field in a date with. Specify -1 to retain original value. Default value: 1 In Hive, these functions accept variable length of arguments in non-restricted types: mask_show_first_n(val) mask_show_first_n(val, 8) mask_show_first_n(val, 8, 'X', 'x', 'n') mask_show_first_n(val, 8, 'x', 'x', 'x', 'x', 2) mask_show_first_n(val, 8, 'x', -1, 'x', 'x', '9') The arguments of upperChar, lowerChar, digitChar, otherChar and numberChar can be in string or numeric types. Impala doesn't support Hive GenericUDFs, so we are lack of these mask functions to support Ranger column masking policies. On the other hand, we want the masking functions to be evaluated in the C++ builtin logic rather than calling out to java UDFs for performance. This patch introduces our builtin implementation of them. We currently don't have a corresponding framework for GenericUDF (IMPALA-9271), so we implement these by overloads. However, it may requires hundreds of overloads to cover all possible combinations. We just implement some important overloads, including - those used by Ranger default masking policies, - those with simple arguments and may be useful for users, - an overload with all arguments in int type for full functionality. Char argument need to be converted to their ASCII value. Tests: - Add BE tests in expr-test Change-Id: Ica779a1bf63a085d51f3b533f654cbaac102a664 Reviewed-on: http://gerrit.cloudera.org:8080/14963 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-01-17 15:34:34 +00:00
norbert.luksa	a862282811	IMPALA-8709: Add Damerau-Levenshtein edit distance built-in function This patch adds new built-in functions to calculate restricted Damerau-Levenshtein edit distance (optimal string alignment). Implmented as dle_dst() and damerau_levenshtein(). If either value is NULL or both values are NULL returns NULL which differs from Netezza's dle_dst() which returns the length of the not NULL value or 0 if both values are NULL. The NULL behavior matches the existing levenshtein() function. Also cleans up levenshtein tests. Testing: - Added unit tests to expr-test.cc - Manual testing on over 1400 string pairs from http://marvin.cs.uidaho.edu/misspell.html and results match Netezza Change-Id: Ib759817ec15e7075bf49d51e494e45c8af4db94d Reviewed-on: http://gerrit.cloudera.org:8080/13794 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-11-22 21:39:21 +00:00
luksan47	8db7f27ddd	IMPALA-8752: Added Jaro-Winkler edit distance and similarity built-in function The added functions return the Jaro/Jaro-Winkler similarity/distance of two strings. The algorithm calcuates the Jaro-Similarity of the strings, then adds more weight to the result if there are common prefixes. (Jaro-Winkler) For more detail, see: https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance Extended the algorithm with another optional parameter: boost threshold The prefix weight will only be applied if the Jaro-similarity exceeds the given threshold. By default, its value is 0.7. The new built-in functions are: * jaro_distance, jaro_dst * jaro_similarity, jaro_sim * jaro_winkler_distance, jw_dst * jaro_winkler_similarity, jw_sim Testing: * Added unit tests to expr-test.cc * Manual testing over 1400 word pairs from http://marvin.cs.uidaho.edu/misspell.html Results match Apache commons Change-Id: I64d7f461516c5e66cc27d62612bc8cc0e8f0178c Reviewed-on: http://gerrit.cloudera.org:8080/13870 Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-08-13 18:25:32 +00:00
Attila Jeges	f40935a30e	IMPALA-7369: part 2: Add INTERVAL expr support and built-in functions for DATE This change implements INTERVAL expression support for DATE type and adds several DATE related built-in functions. The efficiency of the DateValue::ToYearMonthDay() function used in many of the built-in functions below was also improved. The following functions are supported in Hive: INT YEAR(DATE d) Extracts year of the 'd' date, returns it as an int in 0-9999 range. INT MONTH(DATE d) Extracts month of the 'd' date and returns it as an int in 1-12 range. INT DAY(DATE d), INT DAYOFMONTH(DATE d) Extracts day-of-month of the 'd' date and returns it as an int in 1-31 range. INT QUARTER(DATE d) Extracts quarter of the 'd' date and returns it as an int in 1-4 range. INT DAYOFWEEK(DATE d) Extracts day-of-week of the 'd' date and returns it as an int in 1-7 range. 1 is Sunday and 7 is Saturday. INT DAYOFYEAR(DATE d) Extracts day-of-year of the 'd' date and returns it as an int in 1-366 range. INT WEEKOFYEAR(DATE d) Extracts week-of-year of the 'd' date and returns it as an int in 1-53 range. STRING DAYNAME(DATE d) Returns the day field from a 'd' date, converted to the string corresponding to that day name. The range of return values is "Sunday" to "Saturday". STRING MONTHNAME(DATE d) Returns the month field from a 'd' date, converted to the string corresponding to that month name. The range of return values is "January" to "December". DATE NEXT_DAY(DATE d, STRING weekday) Returns the first date which is later than 'd' and named as 'weekday'. 'weekday' is 3 letters or full name of the day of the week. DATE LAST_DAY(DATE d) Returns the last day of the month which the 'd' date belongs to. INT DATEDIFF(DATE d1, DATE d2) Returns the number of days from 'd1' date to 'd2' date. DATE CURRENT_DATE() Returns the current date (in the local time zone). INT INT_MONTHS_BETWEEN(DATE d1, DATE d2) Returns the number of months between 'd1' and 'd2' dates, as an int representing only the full months that passed. If 'd1' represents an earlier date than 'd2', the result is negative. DOUBLE MONTHS_BETWEEN(DATE d1, DATE d2) Returns the number of months between 'd1' and 'd2' dates. Can include a fractional part representing extra days in addition to the full months between the dates. The fractional component is computed by dividing the difference in days by 31 (regardless of the month). If 'd1' represents an earlier date than 'd2', the result is negative. DATE ADD_YEARS(DATE d, INT/BIGINT num_years), DATE SUB_YEARS(DATE d, INT/BIGINT num_years) Adds/subtracts a specified number of years to a 'd' date value. DATE ADD_MONTHS(DATE d, INT/BIGINT num_months), DATE SUB_MONTHS(DATE d, INT/BIGINT num_months) Adds/subtracts a specified number of months to a date value. If 'd' is the last day of a month, the returned date will fall on the last day of the target month too. DATE ADD_DAYS(DATE d, INT/BIGINT num_days), DATE SUB_DAYS(DATE d, INT/BIGINT num_days) Adds/subtracts a specified number of days to a date value. DATE ADD_WEEKS(DATE d, INT/BIGINT num_weeks), DATE SUB_WEEKS(DATE d, INT/BIGINT num_weeks) Adds/subtracts a specified number of weeks to a date value. The following function doesn't exist in Hive but supported by Amazon Redshift INT DATE_CMP(DATE d1, DATE d2) Compares 'd1' and 'd2' dates. Returns: 1. NULL, if either 'd1' or 'd2' is NULL 2. -1 if d1 < d2 3. 1 if d1 > d2 4. 0 if d1 == d2 (https://docs.aws.amazon.com/redshift/latest/dg/r_DATE_CMP.html) Change-Id: If404bffdaf055c769e79ffa8f193bac415cfdd1a Reviewed-on: http://gerrit.cloudera.org:8080/13648 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-25 23:06:25 +00:00
Attila Jeges	f0678b06e6	IMPALA-7369: part 1: Implement TRUNC, DATE_TRUNC, EXTRACT, DATE_PART functions for DATE These functions are somewhat similar in that each of them takes a DATE argument and a time unit to work with. They work identically to the corresponding TIMESTAMP functions. The only difference is that the DATE functions don't accept time-of-day units. TRUNC(DATE d, STRING unit) Truncates a DATE value to the specified time unit. The 'unit' argument is case insensitive. This argument string can be one of: SYYYY, YYYY, YEAR, SYEAR, YYY, YY, Y: Year. Q: Quarter. MONTH, MON, MM, RM: Month. DDD, DD, J: Day. DAY, DY, D: Starting day (Monday) of the week. WW: Truncates to the most recent date, no later than 'd', which is on the same day of the week as the first day of year. W: Truncates to the most recent date, no later than 'd', which is on the same day of the week as the first day of month. The impelementation mirrors Impala's TRUNC(TIMESTAMP ts, STRING unit) function. Hive and Oracle SQL have a similar function too. Reference: http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions201.htm . DATE_TRUNC(STRING unit, DATE d) Truncates a DATE value to the specified precision. The 'unit' argument is case insensitive. This argument string can be one of: DAY, WEEK, MONTH, YEAR, DECADE, CENTURY, MILLENNIUM. The implementation mirrors Impala's DATE_TRUNC(STRING unit, TIMESTAMP ts) function. Vertica has a similar function too. Reference: https://my.vertica.com/docs/8.1.x/HTML/index.htm#Authoring/ SQLReferenceManual/Functions/Date-Time/DATE_TRUNC.htm . EXTRACT(DATE d, STRING unit), EXTRACT(unit FROM DATE d) Returns one of the numeric date fields from a DATE value. The 'unit' string can be one of YEAR, QUARTER, MONTH, DAY. This argument value is case-insensitive. The implementation mirrors that Impala's EXTRACT(TIMESTAMP ts, STRING unit). Hive and Oracle SQL have a similar function too. Reference: http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions050.htm . DATE_PART(STRING unit, DATE date) Similar to EXTRACT(), with the argument order reversed. Supports the same date units as EXTRACT(). The implementation mirrors Impala's DATE_PART(STRING unit, TIMESTAMP ts) function. Change-Id: I843358a45eb5faa2c134994600546fc1d0a797c8 Reviewed-on: http://gerrit.cloudera.org:8080/13363 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-05 14:23:23 +00:00
Attila Jeges	b5805de3e6	IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form yyyy-MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is 0000-01-01 to 9999-12-31. This initial DATE type support covers TEXT and HBASE fileformats only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types (note that invalid casts will fail with an error just like invalid DECIMAL_V2 casts, while failed casts to other types do no lead to warning or error): - from STRING to DATE. The string value must be formatted as yyyy-MM-dd HH:mm:ss.SSSSSSSSS. The date component is mandatory, the time component is optional. If the time component is present, it will be truncated silently. - from DATE to STRING. The resulting string value is formatted as yyyy-MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since STRING -> DATE, STRING -> TIMESTAMP and DATE -> TIMESTAMP implicit conversions are now all possible, the existing function overload resolution logic is not adequate anymore. For example, it resolves the if(false, '2011-01-01', DATE '1499-02-02') function call to the if(BOOLEAN, TIMESTAMP, TIMESTAMP) version of the overloaded function, instead of the if(BOOLEAN, DATE, DATE) version. This is clearly wrong, so the function overload resolution logic had to be changed to resolve function calls to the best-fit overloaded function definition if there are multiple applicable candidates. An overloaded function definition is an applicable candidate for a function call if each actual parameter in the function call either matches the corresponding formal parameter's type (without casting) or is implicitly castable to that type. When looking for the best-fit applicable candidate, a parameter match score (i.e. the number of actual parameters in the function call that match their corresponding formal parameter's type without casting) is calculated and the applicable candidate with the highest parameter match score is chosen. There's one more issue that the new resolution logic has to address: if two applicable candidates have the same parameter match score and the only difference between the two is that the first one requires a STRING -> TIMESTAMP implicit cast for some of its parameters while the second one requires a STRING -> DATE implicit cast for the same parameters then the first candidate has to be chosen not to break backward compatibility. E.g: year('2019-02-15') function call must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. When the resolution algorithm considers overloaded function definitions, first it orders them lexicographically by the types in their parameter lists. To ensure the backward compatible behavior Primitivetype.DATE enum value has to come after PrimitiveType.TIMESTAMP. - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' (and the corresponding HBASE table 'functional_hbase.date_tbl') was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT and HBASE fileformats only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add Kudu support. - Optionally support Avro and ORC. For further details, see IMPALA-6169. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Reviewed-on: http://gerrit.cloudera.org:8080/12481 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-23 13:33:57 +00:00
Greg Rahn	ba9b78c103	IMPALA-7759: Add Levenshtein edit distance built-in function This patch adds new built-in functions to calculate Levenshtein edit distance. Implemented as levenshtein() to match PostgreSQL in both functionality and name and also added le_dst() alias for Netezza, compatibility, but note that levenshtein() differs in functionality in that if either value is NULL or both values are NULL, levenshtein() returns NULL, where Netezza's le_dst() returns the length of the not NULL value or 0 if both values are NULL. Testing: - Added unit tests to expr-test.cc - Manual test on 966289 string pairs and results match PostgreSQL - Added changes to qgen tests for PostgreSQL comparison Change-Id: I549d33ab7cebfa10db2934461c8ec91e2cc1cdcb Reviewed-on: http://gerrit.cloudera.org:8080/11793 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-12-02 10:39:44 +00:00
stiga-huang	ddef2cb9b1	IMPALA-376: add built-in functions for parsing JSON This patch implements the same function as Hive UDF get_json_object. We reuse RapidJson to parse the json string. In order to track the memory used in RapidJson, we wrap FunctionContext into an allocator. get_json_object accepts two parameters: a json string and a selector (json path). We parse the json string into a Document tree and then perform BFS according to the selector. For example, to process get_json_object('[{\"a\":1}, {\"a\":2}, {\"a\":3}]', '$[].a'), we first perform '$[]' to extract all the items in the root array. Then we get a queue consists of {a:1},{a:2},{a:3} and perform '.a' selector on all values in the queue. The final results is 1,2,3 in the queue. As there're multiple results, they should be encapsulated into an array. The output results is a string of '[1,2,3]'. More examples can be found in expr-test.cc. Test: * Add unit tests in expr-test * Add e2e tests in exprs.test * Add tests in test_alloc_fail.py to check handling of out of memory Change-Id: I6a9d3598cb3beca0865a7edb094f3a5b602dbd2f Reviewed-on: http://gerrit.cloudera.org:8080/10950 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-29 11:59:03 +00:00
Andrew Sherman	15e40a3c94	IMPALA-589: Add sql function returning the impalad coordinator hostname. In every execution of an Impala query, one of the impalad daemons acts as the coordinator node. In some cases, such as when using a proxy, a user cannot predict which host will act as the coordinator. To aid in diagnosis, we provide a sql function which returns the name of the host on which the coordinator is running. EXTERNAL DESCRIPTION: Add a builtin function called coordinator(), which returns the name of the host which is running the impalad that is acting as the coordinator for the current query. TESTING: - Added a basic unit test for the new function. - Added a unit test which simulates the case when coord_address is unset. - Added a query that uses coordinator() to exprs.test - Hand tested in a development deployment. - Ran regression tests and got a clean run. Change-Id: I94d6e2664ba659b48df53c5c06f67b502c533e47 Reviewed-on: http://gerrit.cloudera.org:8080/11459 Reviewed-by: Thomas Marshall <thomasmarshall@cmu.edu> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-24 20:26:44 +00:00
Adam Holley	7002ad3b35	IMPALA-7398: Add logged_in_user alias for effective_user This patch adds an alias to the effective_user function so that views created in Hive using the logged_in_user function will work in Impala. Example: CREATE VIEW foo.view1 AS SELECT * FROM foo.table1 WHERE name=logged_in_user(); Tests: - Added function and ran delegation tests - Ran backend tests - Ran custom-cluster tests including delegation Change-Id: Id63f243e0fffbe2798f1f9dbc4cc3ebe9d9529a6 Reviewed-on: http://gerrit.cloudera.org:8080/11184 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-11 06:02:52 +00:00
aphadke	dde930830b	IMPALA-4848: Add WIDTH_BUCKET() function Syntax : width_bucket(expr decimal, min_val decimal, max_val decimal, num_buckets int) This function creates equiwidth histograms , where the histogram range is divided into num_buckets buckets having identical sizes. This function returns the bucket in which the expr value would fall. min_val and max_val are the minimum and maximum value of the histogram range respectively. -> This function returns NULL if expr is a NULL. -> This function returns 0 if expr < min_val -> This function returns num_buckets + 1 if expr > max_val E.g. [localhost:21000] > select width_bucket(8, 1, 20, 3); +---------------------------+ \| width_bucket(8, 1, 20, 3) \| +---------------------------+ \| 2 \| +---------------------------+ Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f Reviewed-on: http://gerrit.cloudera.org:8080/6023 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-06-30 04:47:23 +00:00
Tim Armstrong	2995be8238	IMPALA-5607: part 1: breaking extract/date_part changes This is the compatibility-breaking part of Jinchul Kim's change to add additional units. To support nanoseconds we need to widen the output type of these functions. We also change the meaning of "milliseconds" to include the seconds component. Cherry-picks: not for 2.x Change-Id: I42d83712d9bb3a4900bec38a9c009dcf2a1fe019 Reviewed-on: http://gerrit.cloudera.org:8080/9957 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-04-10 04:00:37 +00:00
Taras Bobrovytsky	8fec1911e5	IMPALA-6230, IMPALA-6468: Fix the output type of round() and related fns Before this patch, the output type of round() ceil() floor() trunc() was not always the same as the input type. It was also inconsistent in general. For example, round(double) returned an integer, but round(double, int) returned a double. After looking at other database systems, we decided that the guideline should be that the output type should be the same as the input type. In this patch, we change the behavior of the previously mentioned functions so that if a double is given then a double is returned. We also modify the rounding behavior to always round away from zero. Before, we were rounding towards positive infinity in some cases. Testinging: - Updated tests - Ran an exhaustive build which passed. Cherry-picks: not for 2.x Change-Id: I77541678012edab70b182378b11ca8753be53f97 Reviewed-on: http://gerrit.cloudera.org:8080/9346 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-24 04:43:01 +00:00
Greg Rahn	d91df9b63f	IMPALA-6537: Add missing ODBC scalar functions This patch contains the following builtin function changes: New aliases for existing functions: - LEFT() same as STRLEFT() - RIGHT() same as STRRIGHT() - WEEK() same as WEEKOFYEAR() New functions: - QUARTER() - MONTHNAME() Refactors: - Remove TimestampFunctions::DayName and add LongDayName to match pattern of TimestampFunctions::ShortDayName Additionally, it adds the unit of QUARTER to EXTRACT() and DATE_PART() Testing: - manual testing comparing the translated ODBC functions to the non-translated ones - added at least one new expr-test for aliases - new expr-tests added for new functions Change-Id: Ia60af2b4de8c098be7ecb3e60840e459ae10d499 Reviewed-on: http://gerrit.cloudera.org:8080/9376 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-02-23 07:19:07 +00:00
Jinchul	1b1087eb05	IMPALA-3282: Adds regexp_escape built-in function Escapes the following special characters in RE2 library: .\+*?[^]$(){}=!<>\|:- Testing: Add some unit tests into ExprTest.StringRegexpFunctions Add some E2E tests into exprs.test Change-Id: I84c3e0ded26f6eb20794c38b75be9b25cd111e4b Reviewed-on: http://gerrit.cloudera.org:8080/8900 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-02-01 05:14:14 +00:00
Zoram Thanga	c2d27ca823	IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters. This patch generalizes ltrim()/rtrim() functions to accept a second argument that specifies the set of characters to be removed from the leading/trailing end of the target string: ltrim(string text[, characters text]) rtrim(string text[, characters text]) A common string trimming method has been added to StringFunctions, which is called from the general ltrim/rtrim/btrim functions. The functions also share prepare and close operations. New StringFunctions tests have been added to ExprTest for the new forms of ltrim() and rtrim(). New tests to cover handling of special characters have also been added. Note that our string handling functions only work with the ASCII character set. Handling of other character sets lies outside the scope of this patch. The existing ltrim()/rtrim()/trim() functions that take only one argument have been updated to use the more general methods. Testing: Queries like the following were run on a 1.5-billion row tpch_parquet.lineitem table, with the old and new implementations to ensure there is no performance regression: 1. select count(trim(l_shipinstruct)), count(trim(l_returnflag)), ... 2. select count(*) from t where trim(l_shipinstruct) = '' and ... Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79 Reviewed-on: http://gerrit.cloudera.org:8080/8349 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-23 23:44:46 +00:00
Jinchul	6041865031	IMPALA-3651: Adds murmur_hash() built-in function murmur_hash relys on HashUtil::MurmurHash2_64 which MurmurHash2 64-bit version. Testing: Add unit tests for primitive types: ExprTest.MurmurHashFunction Add E2E tests into exprs.test Change-Id: I14d56ffb8fab256f3f66a2669271fd4b3c50cc29 Reviewed-on: http://gerrit.cloudera.org:8080/8893 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-10 20:17:26 +00:00
Zachary Amsden	f53ce3b16d	IMPALA-4513: Promote integer types for ABS() The internal representation of the most negative number in two's complement requires 1 more bit to represent the positive version. This means ABS() must promote integer types to the next highest width. Change-Id: I86cc880e78258d5f90471bd8af4caeb4305eed77 Reviewed-on: http://gerrit.cloudera.org:8080/8004 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-23 02:41:32 +00:00
Philip Zeyliger	02302b7cfe	IMPALA-5211: Simplifying nullif conditional. This commit: * Converts nullif(x, y) into if(x IS DISTINCT FROM y, x, NULL). * Re-writes x IS DINSTINCT FROM y -> FALSE if x.equals(y). * Removes backend implementation of nullif. As is the case with all conversions, the original nullif(...) is replaced with if(...) in error messages, explain plans, and so on. It's important and subtle that the conversion uses "x IS DISTINCT FROM y" rather than "x != y" so that the simplification can be made while handling null values correctly. ("x != x" may be either false or null, but x is distinct from x is always false.) Testing: * Added new tests to ExprRewriteRulesTests for nullif and the if(x distinct from y, ...) simplification. * New test for the rewrite in ParserTest. * Adds an nvl2() test, incidentally. * Confirmed (using EclEmma, which uses jococo engine) that coverage is good. * Ran the tests. Change-Id: Id91ca968a0c0be44e1ec54ad8602f91a5cb2e0e5 Reviewed-on: http://gerrit.cloudera.org:8080/7829 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-15 22:48:52 +00:00
Sandeep Akinapelli	f538b43911	IMPALA-5317: add DATE_TRUNC() function Added a UDF builtin function date_trunc. Reuse many of the Trunc functions implemented already for trunc() including truncate unit and except strToTruncUnit Added checks to ensure that truncation results that fall outside of posix timestamp range are returned as NULL. Added ctest for the date_trunc function. Change-Id: I953ba006cbb166dcc78e8c0c12dfbf70f093b584 Reviewed-on: http://gerrit.cloudera.org:8080/7313 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-07 01:29:01 +00:00
Jinchul	15e6cf8fd0	IMPALA-5529: Add additional function signatures for TRUNC() The following signatures to be added: +--------------+----------------------------------+-------------+---------------+ \| return type \| signature \| binary type \| is persistent \| +--------------+----------------------------------+-------------+---------------+ \| DECIMAL(,) \| trunc(DECIMAL(,)) \| BUILTIN \| true \| \| DECIMAL(,) \| trunc(DECIMAL(,), BIGINT) \| BUILTIN \| true \| \| DECIMAL(,) \| trunc(DECIMAL(,), INT) \| BUILTIN \| true \| \| DECIMAL(,) \| trunc(DECIMAL(,), SMALLINT) \| BUILTIN \| true \| \| DECIMAL(,) \| trunc(DECIMAL(,), TINYINT) \| BUILTIN \| true \| \| BIGINT \| trunc(DOUBLE) \| BUILTIN \| true \| +--------------+----------------------------------+-------------+---------------+ Tests: * Adds tests for the new builtin trunc()/dtrunc() Change-Id: I856da9f817b948de3c72af60a0742b128398b4cf Reviewed-on: http://gerrit.cloudera.org:8080/7450 Tested-by: Impala Public Jenkins Reviewed-by: Matthew Jacobs <mj@cloudera.com>	2017-07-29 20:53:45 +00:00
Matthew Jacobs	7a1ff1e5e9	IMPALA-5539: Fix Kudu timestamp with -use_local_tz_for_unix_ts The -use_local_tz_for_unix_timestamp_conversion flag exists to specify if TIMESTAMPs should be interpreted as localtime or UTC when converting to/from Unix time via builtins: from_unixtime(bigint unixtime) unix_timestamp(string datetime[, ...]) unix_timestamp(timestamp datetime) However, the KuduScanner was calling into code that, when the gflag above was set, interpreted Unix times as local time. Unfortunately the write path (KuduTableSink) and some FE TIMESTAMP code (see KuduUtil.java) did not have this behavior, i.e. we were handling the gflag inconsistently. Tests: * Adds a custom cluster test to run Kudu test cases with -use_local_tz_for_unix_timestamp_conversion. * Adds tests for the new builtin unix_micros_to_utc_timestamp() which run in a custom cluster test (added test_local_tz_conversion.py) as well as in the regular tests (added to test_exprs.py). Change-Id: I423a810427353be76aa64442044133a9a22cdc9b Reviewed-on: http://gerrit.cloudera.org:8080/7311 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-07-19 22:17:13 +00:00
Bikramjeet Vig	9037b8e385	IMPALA-3504: UDF for current timestamp in UTC This change adds a UDF "utc_timestamp" which returns the current date and time in UTC. Example query: select utc_timestamp(); +-------------------------------+ \| utc_timestamp() \| +-------------------------------+ \| 2017-06-15 17:36:39.290773000 \| +-------------------------------+ Change-Id: I969fc805922f2bb9c8101e84f85ff2cc3b1b6729 Reviewed-on: http://gerrit.cloudera.org:8080/7203 Tested-by: Impala Public Jenkins Reviewed-by: Matthew Jacobs <mj@cloudera.com>	2017-07-06 23:04:28 +00:00
Vincent Tran	d5b6cb903d	IMPALA-5316: Adds last_day() function This change adds last_day() function. The function takes exactly one TIMESTAMP argument and returns a TIMESTAMP that is the last date of the input date's calendar month. The function will return NULL when: 1) The input argument cannot be implicitly casted to a TIMESTAMP. 2) The TIMESTAMP argument is missing a date component. 3) The TIMESTAMP argument is outside of the supported range: between 1400-01-31 00:00:00 and 9999-12-31 23:59:59 Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370 Reviewed-on: http://gerrit.cloudera.org:8080/6991 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-15 04:51:49 +00:00
Matthew Jacobs	2dcbefc652	IMPALA-5338: Fix Kudu timestamp column default values While support for TIMESTAMP columns in Kudu tables has been committed (IMPALA-5137), it does not support TIMESTAMP column default values. This supports CREATE TABLE syntax to specify the default values, but more importantly this fixes the loading of Kudu tables that may have had default values set on UNIXTIME_MICROS columns, e.g. if the table was created via the python client. This involves fixing KuduColumn to hide the LiteralExpr representing the default value because it will be a BIGINT if the column type is TIMESTAMP. It is only needed to call toSql() and toStringValue(), so helper functions are added to KuduColumn to encapsulate special logic for TIMESTAMP. TODO: Add support and tests for ALTER setting the default value (when IMPALA-4622 is committed). Change-Id: I655910fb4805bb204a999627fa9f68e43ea8aaf2 Reviewed-on: http://gerrit.cloudera.org:8080/6936 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-02 01:47:48 +00:00
Matthew Jacobs	24c77f194b	IMPALA-5137: Support pushing TIMESTAMP predicates to Kudu This change builds on the support for reading and writing TIMESTAMP columns to Kudu tables (see [1]), adding support for pushing TIMESTAMP predicates to Kudu for scans. Binary predicates and IN list predicates are supported. Testing: Added some planner and EE tests to validate the behavior. 1: https://gerrit.cloudera.org/#/c/6526/ Change-Id: I08b6c8354a408e7beb94c1a135c23722977246ea Reviewed-on: http://gerrit.cloudera.org:8080/6789 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-18 21:09:51 +00:00
Zach Amsden	0715a303ea	IMPALA-4729: Implement REPLACE() This turned out to be slightly non-trivial as REPLACE is already a keyword, and thus the parser needs to be tweaked to allow this, since function names act as bare identifiers. It was difficult to get this to match performance of regexp_replace. For expanding patterns, the fact that regexp_replace copies the expansion inline means that it may in fact win on large strings with sparse matches that are > dcache size apart. Let's leave optimizing that for later. Testing: Added a full test for maximum size strings and got most of the boundary conditions I could identify. Manually ran queries on TPC-H dataset in impala to verify both performance and correctness. Added large string and exprs.test test clauses and ran the tests to verify they work as expected. Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Reviewed-on: http://gerrit.cloudera.org:8080/5776 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-15 01:33:23 +00:00
Thomas Tauber-Marshall	b2c2fe7813	IMPALA-3786: Replace "cloudera" with "apache" (part 2) As part of the ASF transition, we need to replace references to Cloudera in Impala with references to Apache. This primarily means changing Java package names from com.cloudera.impala.* to org.apache.impala.* A prior patch renamed all the files as necessary, and this patch performs the actual code changes. Most of the changes in this patch were generated with some commands of the form: find . \| grep "\.java\\|\.py\\|\.h\\|\.cc" \| \ xargs sed -i s/'com$.$cloudera$\.$impala/org\1apache\2impala/g along with some manual fixes. After this patch, the remaining references to Cloudera in the repo mostly fall into the categories: - External components that have cloudera in their own package names, eg. com.cloudera.kudu/llama - URLs, eg. https://repository.cloudera.com/ Change-Id: I0d35fa6602a7fc0c212b2ef5e2b3322b77dde7e2 Reviewed-on: http://gerrit.cloudera.org:8080/3937 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Internal Jenkins	2016-09-29 21:14:13 +00:00
Zoltan Ivanfi	64ffbc4b51	IMPALA-3973: add position and occurrence to instr() Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Reviewed-on: http://gerrit.cloudera.org:8080/4094 Tested-by: Internal Jenkins Reviewed-by: Matthew Jacobs <mj@cloudera.com>	2016-09-13 20:28:27 +00:00
Zoltan Ivanfi	c23dc3a53a	IMPALA-1659: Netezza compatibility functions: metadata Added the SQL functions current_catalog(), current_user() and session_user() as aliases to existing ones and a new SQL function current_sid(). Change-Id: I9b5d1009bbf42acc175a942d2df484e1c64822ca Reviewed-on: http://gerrit.cloudera.org:8080/4063 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2016-08-26 16:29:24 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Hayabusa-intel	4e7172f6f5	IMPALA-2459: Implement next_day date/time UDF Returns the date of the weekday that follows a particular date. The weekday argument is a string literal indicating the day of the week. Also this argument is case-insensitive. Available values are: "Sunday"/"SUN", "Monday"/"MON", "Tuesday"/"TUE", "Wednesday"/"WED", "Thursday"/"THU", "Friday"/"FRI", "Saturday"/"SAT". For example, the first Saturday after Wednesday, 25 December 2013 is on 28 December 2013. select next_day('2013-12-25','Saturday') returns '2013-12-28 00:00:00' select next_day(to_timestamp('08-1987-21', 'MM-yyyy-dd'), 'FRIDAY') returns '1987-08-28 00:00:00' Change-Id: I2721d236c096639a9e7d2df8a45ca888c6b3e83e Reviewed-on: http://gerrit.cloudera.org:8080/1943 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Lars Volker <lv@cloudera.com>	2016-06-09 04:30:48 -07:00
Jim Apple	1c16dd0cf8	IMPALA-2107: Add Base64 encoder/decoder Change-Id: I911451c5d68e8ae9d352abfcf4d5ff36484f0bf3 Reviewed-on: http://gerrit.cloudera.org:8080/2633 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:32 -07:00
Thomas Tauber-Marshall	1c98ec7f81	IMPALA-1772: Add additional date/time functions. Implemented the 'millisecond' built-in function, which takes a timestamp and returns an integer representing its millisecond portion. Other functions pending. Change-Id: I3bafc6aaf80d1d8d2a634d120d9dbdb954d3f0c4 Reviewed-on: http://gerrit.cloudera.org:8080/2148 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2016-03-08 03:12:51 +00:00
Hayabusa-intel	df599b79d9	IMPALA-1477: implement UUID function Utilize Boost UUID libraries to generate UUID values. Usage: select uuid(); Change-Id: I932f78952d65f4073d8177c6e80693586e6285cb Reviewed-on: http://gerrit.cloudera.org:8080/647 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-02-03 10:10:28 +00:00
Michael Ho	f3e7274342	IMPALA-2711: Fix memory leak in Rand(). MathFunctions::RandPrepare() allocates a 4-bytes seed and stores it in the FunctionContext's thread local state. However, it was never freed. This change fixes the problem by adding a close function for Rand() so it has a chance to free the seed. A new test is also added to verify the fix. Change-Id: Ibcc2e1ca0d052b86defe80aad471f9fdaac5a453 Reviewed-on: http://gerrit.cloudera.org:8080/1855 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2016-01-26 11:53:38 +00:00
Jim Apple	1a3d7ffd4f	IMPALA-2147: Support IS [NOT] DISTINCT FROM and "<=>" predicates Enforces that the planner treats IS NOT DISTINCT FROM as eligible for hash joins, but does not find the minimum spanning tree of equivalences for use in optimizing query plans; this is left as future work. Change-Id: I62c5300b1fbd764796116f95efe36573eed4c8d0 Reviewed-on: http://gerrit.cloudera.org:8080/710 Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Internal Jenkins	2016-01-14 05:45:22 +00:00
Michael Ho	34a94c2503	IMPALA-2404: Implements built-in function regexp_match_count This patch implements a new built-in function regexp_match_count. This function returns the number of matching occurrences in input. The regexp_match_count() function has the following syntax: int = regexp_match_count(string input, string pattern) int = regexp_match_count(string input, string pattern, int start_pos, string flags) The input value specifies the string on which the regular expression is processed. The pattern value specifies the regular expression. The start_pos value specifies the character position at which to start the search for a match. It is set to 1 by default if it's not specified. The flags value (if specified) dictates the behavior of the regular expression matcher: m: Specifies that the input data might contain more than one line so that the '^' and the '$' matches should take that into account. i: Specifies that the regex matcher is case insensitive. c: Specifies that the regex matcher is case sensitive. n: Specifies that the '.' character matches newlines. By default, the flag value is set to 'c'. Note that the flags are consistent with other existing built-in functions (e.g. regexp_like) so certain flags in IBM netezza such as 's' are not supported to avoid confusion. Change-Id: Ib33ece0448f78e6a60bf215640f11b5049e47bb5 Reviewed-on: http://gerrit.cloudera.org:8080/1248 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-10-27 10:11:13 +00:00
aacalfa	5e733e8d62	IMPALA-2190: Complete conversion functions between timestamp, unixtime, and string dates Change-Id: I48a446f19c7634477f175d0defa8779dd70a392f Reviewed-on: http://gerrit.cloudera.org:8080/654 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-09-07 07:07:20 +00:00
Tim Armstrong	d7e52e336a	IMPALA-1660: addendum: add more aliases Add in missing dfloor alias. This should have been added as part of IMPALA-1660 as an alias for floor(double) but was overlooked. Also add in aliases for decimal versions of functions where they exist. Change-Id: Icb790745714882248d365274e95d45eaaf0ba133 Reviewed-on: http://gerrit.cloudera.org:8080/697 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-09-01 10:46:16 +00:00
Feni Chawla	2db0371a26	IMPALA-2033: Netezza compatibility date/time related functions. Added INT_MONTHS_BETWEEN, TIMEOFDAY, TIMESTAMP_CMP, MONTHS_BETWEEN functions Change-Id: I44834c84e21856568613938418947c532e7fbd2e Reviewed-on: http://gerrit.cloudera.org:8080/642 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins	2015-08-27 04:17:05 +00:00
Sailesh Mukil	1c46cab5c6	IMPALA-2084: SPLIT_PART and REGEXP_LIKE functions for Tableau pushdown Added the SPLIT_PART and the REGEXP_LIKE builtin functions and tests for both. The REGEXP_LIKE has an optional third parameter which if used, uses a different 'prepare' function (RegexpLikePrepare in like-predicate.cc) so that the appropriate options can be set in the RE2 library. Added a patch for the RE2 library so that the 'dot matches all' option is exposed via the RE2 class. Fixed a bug in the case when the function to be evaluated for the WHERE clause operates on constants, proper cleanup isn't guaranteed on certain edge cases. Change-Id: Ia2a8de9eeb2854100a2d949f612cfaba317c5a7b Reviewed-on: http://gerrit.cloudera.org:8080/501 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2015-08-18 09:07:34 +00:00
Casey Ching	cf60967b7e	IMPALA-1675: Avoid overflow when adding large intervals to TIMESTAMPs It turns out there is a variety of cases where boost incorrectly adds intervals if the interval is at (or beyond) an edge case value. This change defines a max interval and returns NULL if the user supplies an interval beyond the max. Change-Id: I4fb6869be22ab06089b66eeffaea04b0c0880080 Reviewed-on: http://gerrit.cloudera.org:8080/492 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-08-16 12:09:24 +00:00
Feni Chawla	9428448146	IMPALA 2034: Netezza compatibility char functions for ASCII and UTF-8 strings: CHR and BTRIM Change-Id: I76bf9ba76172b9f1a192ee0936d73718808c0fbd Reviewed-on: http://gerrit.cloudera.org:8080/529 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins	2015-08-06 02:24:24 +00:00
Casey Ching	074e5b4349	Remove hashbang from non-script python files Many python files had a hashbang and the executable bit set though they were not intended to be run a standalone script. That makes determining which python files are actually scripts very difficult. A future patch will update the hashbang in real python scripts so they use $IMPALA_HOME/bin/impala-python. Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba Reviewed-on: http://gerrit.cloudera.org:8080/599 Reviewed-by: Casey Ching <casey@cloudera.com> Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2015-08-04 05:26:07 +00:00
Tim Armstrong	e151ebaa71	IMPALA-1001: Bit and byte manipulation functions Bit and byte functions for compatibility with Teradata: bitand, bitor, bitxor, bitnot, countset, getbit, setbit, shiftleft, shiftright, rotateleft, rotateright. Interfaces and behavior follow Teradata documentation. All bit* functions are compatible with DB2. bitand only is compatible with Oracle. Change-Id: Idba3fb7beb029de493b602e6279aa68e32688df3	2015-07-28 08:11:01 -07:00
Tim Armstrong	822cb8f5e2	IMPALA-1660: Netezza compatibility - factorial Implements suffix n! operator for factorial and factorial function. Slightly refactor operators in fe to share code between unary operators. Based partially on work by Arthur Peng <arthur.peng@intel.com>. Change-Id: I71b6c824c59fc5305f16b8c4457805126a1da93b Reviewed-on: http://gerrit.cloudera.org:8080/531 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-07-27 19:03:48 +00:00
Tim Armstrong	e5cc539d3f	IMPALA-1660: Netezza math function aliases Add aliases for existing functions for Netezza compatibility: dceil->ceil, dtrunc->truncate, dexp->exp, dlog1->ln, log10->dlog10, dpow->pow, fpow->pow, dsqrt->sqrt, random->rand. Change-Id: I97da27b676d4e07e55735540f494bdb873f7ed61 Reviewed-on: http://gerrit.cloudera.org:8080/559 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-07-23 21:56:33 +00:00

1 2 3

133 Commits