impala

mirror of https://github.com/apache/impala.git synced 2026-02-01 12:00:22 -05:00

Author	SHA1	Message	Date
Tamas Mate	423b087762	IMPALA-11520: Remove functional.unsupported_types misc test IMPALA-9482 added support to the remaining Hive types and removed the functional.unsupported_types table. There was a reference remaining in a misc test. test_misc is not marked as exhaustive but it only runs in exhaustive builds. Change-Id: I65b6ea5ac742fbcc427ad41741d347558cb7d110 Reviewed-on: http://gerrit.cloudera.org:8080/18896 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-08-25 16:24:41 +00:00
Csaba Ringhofer	7ca11dfc7f	IMPALA-9482: Support for BINARY columns This patch adds support for BINARY columns for all table formats with the exception of Kudu. In Hive the main difference between STRING and BINARY is that STRING is assumed to be UTF8 encoded, while BINARY can be any byte array. Some other differences in Hive: - BINARY can be only cast from/to STRING - Only a small subset of built-in STRING functions support BINARY. - In several file formats (e.g. text) BINARY is base64 encoded. - No NDV is calculated during COMPUTE STATISTICS. As Impala doesn't treat STRINGs as UTF8, BINARY and STRING become nearly identical, especially from the backend's perspective. For this reason, BINARY is implemented a bit differently compared to other types: while the frontend treats STRING and BINARY as two separate types, most of the backend uses PrimitiveType::TYPE_STRING for BINARY too, e.g. in SlotDesc. Only the following parts of backend need to differentiate between STRING and BINARY: - table scanners - table writers - HS2/Beeswax service These parts have access to column metadata, which allows to add special handling for BINARY. Only a very few builtins are allowed for BINARY at the moment: - length - min/max/count - coalesce and similar "selector" functions Other STRING functions can be only used by casting to STRING first. Adding support for more of these functions is very easy, as simply the BINARY type has to be "connected" to the already existing STRING function's signature. Functions where the result depends on utf8_mode need to ensure that with BINARY it always works as if utf8_mode=0 (for example length() is mapped to bytes() as length count utf8 chars if utf8_mode=1). All kinds of UDFs (native, Hive legacy, Hive generic) support BINARY, though in case of legacy Hive UDFs it is only supported if the argument and return types are set explicitely to ensure backward compatibility. See IMPALA-11340 for details. The original plan was to behave as close to Hive as possible, but I realized that Hive has more relaxed casting rules than Impala, which led to STRING<->BINARY casts being necessary in more cases in Impala. This was needed to disallow passing a BINARY to functions that expect a STRING argument. An example for the difference is that in INSERT ... VALUES () string literals need to be explicitly cast to BINARY, while this is not needed in Hive. Testing: - Added functional.binary_tbl for all file formats (except Kudu) to test scanning. - Removed functional.unsupported_types and related tests, as now Impala supports all (non-complex) types that Hive does. - Added FE/EE tests mainly based on the ones added to the DATE type Change-Id: I36861a9ca6c2047b0d76862507c86f7f153bc582 Reviewed-on: http://gerrit.cloudera.org:8080/16066 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-08-19 13:55:42 +00:00
Tamas Mate	dc133d9513	IMPALA-10499: Fix failing test_misc This change modifies the result type of the misc test which was failing. Testing: - executed the misc tests with exhaustive exploration strategy Change-Id: Ibe95f4bc3521f49d19e6da53deb904a25ac30982 Reviewed-on: http://gerrit.cloudera.org:8080/17066 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-02-15 22:25:41 +00:00
Tamas Mate	701714b10a	IMPALA-10379: Add missing HiveLexer classes to shared-deps HIVE-19064 introduced additional lexer classes that are required during runtime. This commit adds the missing HiveLexer lexer classes to the shared-deps. Without these classes queries such as 'select 1 as "``"' would fail with 'NoClassDefFoundError'. Testing: - added a misc.test to verify that the classes are available and that IMPALA-9641 is fixed by HIVE-19064 Change-Id: I6e3a00335983f26498c1130ab9f109f6e67256f5 Reviewed-on: http://gerrit.cloudera.org:8080/17019 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-02-07 05:20:48 +00:00
Qifan Chen	6dbf1ca09c	IMPALA-6628: Use unqualified table references in .test files run from test_queries.py This fix modified the following tests launched from test_queries.py by removing references to database 'functional' whenever possible. The objective of the change is to allow more testing coverage with different databases than the single 'functional' database. In the fix, neither new tables were added nor expected results were altered. empty.test inline-view-limit.test inline-view.test limit.test misc.test sort.test subquery-single-node.test subquery.test top-n.test union.test with-clause.test It was determined that other tests in testdata/workloads/functional-query/queries/QueryTest do not refer to 'functional' or the references are a must for some reason. Testing Ran query_tests on these changed tests with exhaustive exploration strategy. Change-Id: Idd50eaaaba25e3bedc2b30592a314d2b6b83f972 Reviewed-on: http://gerrit.cloudera.org:8080/16603 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-21 05:20:33 +00:00
Attila Jeges	b5805de3e6	IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form yyyy-MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is 0000-01-01 to 9999-12-31. This initial DATE type support covers TEXT and HBASE fileformats only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types (note that invalid casts will fail with an error just like invalid DECIMAL_V2 casts, while failed casts to other types do no lead to warning or error): - from STRING to DATE. The string value must be formatted as yyyy-MM-dd HH:mm:ss.SSSSSSSSS. The date component is mandatory, the time component is optional. If the time component is present, it will be truncated silently. - from DATE to STRING. The resulting string value is formatted as yyyy-MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since STRING -> DATE, STRING -> TIMESTAMP and DATE -> TIMESTAMP implicit conversions are now all possible, the existing function overload resolution logic is not adequate anymore. For example, it resolves the if(false, '2011-01-01', DATE '1499-02-02') function call to the if(BOOLEAN, TIMESTAMP, TIMESTAMP) version of the overloaded function, instead of the if(BOOLEAN, DATE, DATE) version. This is clearly wrong, so the function overload resolution logic had to be changed to resolve function calls to the best-fit overloaded function definition if there are multiple applicable candidates. An overloaded function definition is an applicable candidate for a function call if each actual parameter in the function call either matches the corresponding formal parameter's type (without casting) or is implicitly castable to that type. When looking for the best-fit applicable candidate, a parameter match score (i.e. the number of actual parameters in the function call that match their corresponding formal parameter's type without casting) is calculated and the applicable candidate with the highest parameter match score is chosen. There's one more issue that the new resolution logic has to address: if two applicable candidates have the same parameter match score and the only difference between the two is that the first one requires a STRING -> TIMESTAMP implicit cast for some of its parameters while the second one requires a STRING -> DATE implicit cast for the same parameters then the first candidate has to be chosen not to break backward compatibility. E.g: year('2019-02-15') function call must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. When the resolution algorithm considers overloaded function definitions, first it orders them lexicographically by the types in their parameter lists. To ensure the backward compatible behavior Primitivetype.DATE enum value has to come after PrimitiveType.TIMESTAMP. - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' (and the corresponding HBASE table 'functional_hbase.date_tbl') was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT and HBASE fileformats only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add Kudu support. - Optionally support Avro and ORC. For further details, see IMPALA-6169. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Reviewed-on: http://gerrit.cloudera.org:8080/12481 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-23 13:33:57 +00:00
Tim Armstrong	85166afa8a	IMPALA-6374: fix handling of commas in .test files The .test file parser implemented an unconventional method for parsing single-quoted strings in comma-separated value format. This didn't handle trailing commas in the string correctly. This commit switches to using a conventional method for parsing comma-separated value format: * Commas enclosed by single quotes are not treated as field separators * Single quotes can be escaped within a string by doubling them. I looked into using Python's .csv module for this, but it wouldn't work without modifying the test file format more because it automatically discards the quotes during parsing, which are actually semantically important in .test files. E.g. without the quotes we can't distinguish between the literal string 'regex:...' and the regex regex:.... Testing: Ran exhaustive tests and fixed .test files that required modifications. Will rerun before merging. Added a couple of tests to exercise edge cases in the test file parser. Change-Id: I18ddcb0440490ddf8184be66d3681038a1615dd9 Reviewed-on: http://gerrit.cloudera.org:8080/11800 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-10-30 22:17:49 +00:00
Taras Bobrovytsky	eb8120d218	IMPALA-3812: Fix error message for unsupported types Before this patch an unclear error message was returned if DATE or DATETIME appeared in the select list after a star expansion. This was because DATE and DATETIME PrimitiveType was serialized as INVALID_TYPE. This is fixed by serializing correctly. Change-Id: I9019b4bfd219f94e554c795befd3ff5e39706ea9 Reviewed-on: http://gerrit.cloudera.org:8080/4859 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-17 05:31:34 +00:00
Alex Behm	1a4a830a6d	IMPALA-2776: Remove escapechartesttable and associated tests. The original purpose of the escapechartesttable was to test Impala's behavior on text tables that have the same character as line terminator and escape character. Recent changes in Hive have made creating such a table impossible because 1) Only newline is allowed as the line terminator 2) Newline is forbidden as the escape character See HIVE-11785 for details on the Hive changes. This commit removes escapechartesttable and all associated tests, but does not add the same enforcement rules as Hive. These enforcement rules should be added in a follow-on change. Change-Id: I2bd9755f4c2cc3d7dfd8d67c3759885951550f08 Reviewed-on: http://gerrit.cloudera.org:8080/1690 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2016-01-05 06:04:41 +00:00
Taras Bobrovytsky	75691156be	IMPALA-2239: update misc.test to match the new .test file format Change-Id: Ia5b9925628b415c306f320ef186246179e38f73b Reviewed-on: http://gerrit.cloudera.org:8080/684 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-08-25 00:12:52 +00:00
Alex Behm	325f5a4551	[CDH5] Fix exhaustive test runs: Correct malformed test section. Change-Id: Ief7128b8d21144199c629ee002c81b0930d2fc14 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5496 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-12-04 18:23:00 -08:00
Skye Wanderman-Milne	e60bf29a96	IMPALA-13: Use SSE string functions that take an explicit length This patch modifies DelimitedTextParser and StringValue to work with data containing null characters by using SSE instructions that take a length, rather than expecting null-terminated strings. It also adds some other minor changes to correctly handle data with nulls and to faciliate testing. I checked the execution time of a count() and a select() limit 1 query locally, and saw no difference for either text or sequence files. Change-Id: Ia920b35bea7048aa286f39ec83e313c2a39251d1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2110 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/2181	2014-04-11 11:16:24 -07:00
Nong Li	53d7bbb97a	[CDH5] Impala changes for updated thirdparty components. Changes include: - version changes in impala-config - version changes in various loading scripts - hbase jars are no longer in hive/lib - mini-llama script changes - updates due to sentry api changes - JDBC tests disabled - unsupported types tests disabled. Change-Id: If8cf1b7ad8e22aa4d23094b9a4b1047f7e9d93ee	2014-01-15 15:12:13 -08:00
Alex Behm	14557c7bab	IMPALA-297: Remove distinction between value_expr and expr in parser.	2014-01-08 10:50:08 -08:00
Alex Behm	5db3f2cdf5	IMPALA-227: SELECT * on partitioned table returns columns in different order than Hive.	2014-01-08 10:49:48 -08:00
Alex Behm	be03e6c21c	IMPALA-138: Error messages for unknown column types are particularly bad.	2014-01-08 10:48:53 -08:00
Alex Behm	a01573af63	IMPALA-65: Add MySQL-style string literals with escaping.	2014-01-08 10:48:51 -08:00
ishaan	09d6d931f4	Change the way data is loaded	2014-01-08 10:48:09 -08:00
Lenni Kuff	30dbf59ef2	Final changes to enable Python test infrastructure and tests With this change the Python tests will now be called as part of buildall and the corresponding Java tests have been disabled. The new tests can also be invoked calling ./tests/run-tests.sh directly. This includes a fix from Nong that caused wrong results for limit on non-io manager formats.	2014-01-08 10:46:57 -08:00
Lenni Kuff	ef48f65e76	Add test framework for running Impala query tests via Python This is the first set of changes required to start getting our functional test infrastructure moved from JUnit to Python. After investigating a number of option, I decided to go with a python test executor named py.test (http://pytest.org/). It is very flexible, open source (MIT licensed), and will enable us to do some cool things like parallel test execution. As part of this change, we now use our "test vectors" for query test execution. This will be very nice because it means if load the "core" dataset you know you will be able to run the "core" query tests (specified by --exploration_strategy when running the tests). You will see that now each combination of table format + query exec options is treated like an individual test case. this will make it much easier to debug exactly where something failed. These new tests can be run using the script at tests/run-tests.sh	2014-01-08 10:46:50 -08:00
Nong Li	34879a4ddc	Fix IMP-297	2014-01-08 10:46:44 -08:00
Michael Ubell	477422beda	IMP-380 handle '\r' at end of row.	2014-01-08 10:46:14 -08:00
Henry Robinson	3519701529	Support backtick quoting for identifiers	2014-01-08 10:46:00 -08:00
Michael Ubell	5f951ffc4a	Handle missing columns at the end of a row	2014-01-08 10:45:11 -08:00
Nong Li	4c9c82910a	Text parser fix for columns off end.	2014-01-08 10:44:40 -08:00
Nong Li	4d0319d32b	Fix null string parsing.	2014-01-08 10:44:40 -08:00

26 Commits