impala

mirror of https://github.com/apache/impala.git synced 2026-01-07 00:02:28 -05:00

Author	SHA1	Message	Date
Joe McDonnell	5bb988b1c5	IMPALA-4996: Single-threaded KuduScanNode This introduces KuduScanNodeMt, the single-threaded version of KuduScanNode that materializes the tuples in GetNext(). KuduScanNodeMt is enabled by the same condition as HdfsScanNodeMt: mt_dop is greater than or equal to 1. To share code between the two implementations, KuduScanNode and KuduScanNodeMt are now subclasses of KuduScanNodeBase, which implements the shared code. The KuduScanner is minimally impacted, as it already had the required GetNext interface. Since the KuduClient is a heavy-weight object, it is now shared at the QueryState level. We try to share the KuduClient as much as possible, but there are times when the KuduClient cannot be shared. Each Kudu table has master addresses stored in the Hive Metastore. We only share KuduClients for tables that have an identical value for the master addresses. In the ideal case, every Kudu table will have the same value, but there is no explicit guarantee of this. The testing for this is a modified version of kudu-scan-node.test run with various mt_dop values. Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a Reviewed-on: http://gerrit.cloudera.org:8080/6312 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Impala Public Jenkins	2017-03-17 19:33:31 +00:00
Taras Bobrovytsky	529a5f99b9	IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 16 elements. Once the buffer becomes full, we reallocate a new buffer with double capacity and copy the original buffer into the new one. We continue doubling the buffer size until the buffer has room for 20,000 elements as before. Testing: Added some EE APPX_MEDIAN() tests on larger datasets that exercise the resize code path. Perf Benchrmark (about 35,000 elements per bucket): SELECT MAX(a) from ( SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t BEFORE: 11s067ms Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------------- 06:AGGREGATE 1 124.726us 124.726us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 29.544us 29.544us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE 3 86.406us 120.372us 3 1 44.00 KB 10.00 MB 04:AGGREGATE 3 1s840ms 2s824ms 2.00K -1 1.02 GB 128.00 MB FINALIZE 03:EXCHANGE 3 1s163ms 1s989ms 6.00K -1 0 0 HASH(c1) 01:AGGREGATE 3 3s356ms 3s416ms 6.00K -1 1.95 GB 128.00 MB STREAMING 00:SCAN HDFS 3 64.962ms 65.490ms 65.54M -1 25.97 MB 64.00 MB tpcds_10_parquet.benchmark AFTER: 9s465ms Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------------ 06:AGGREGATE 1 73.961us 73.961us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 18.101us 18.101us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE 3 75.795us 83.969us 3 1 44.00 KB 10.00 MB 04:AGGREGATE 3 1s608ms 2s683ms 2.00K -1 1.02 GB 128.00 MB FINALIZE 03:EXCHANGE 3 826.683ms 1s322ms 6.00K -1 0 0 HASH(c1) 01:AGGREGATE 3 2s457ms 2s672ms 6.00K -1 3.14 GB 128.00 MB STREAMING 00:SCAN HDFS 3 81.514ms 89.056ms 65.54M -1 25.94 MB 64.00 MB tpcds_10_parquet.benchmark Memory Benchmark (about 12 elements per bucket): SELECT MAX(a) FROM ( SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a FROM tpcds_parquet.store_sales GROUP BY ss_customer_sk) t BEFORE: 7s477ms Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail --------------------------------------------------------------------------------------------------------------------- 06:AGGREGATE 1 114.686us 114.686us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 18.214us 18.214us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE 3 147.055us 165.464us 3 1 28.00 KB 10.00 MB 04:AGGREGATE 3 2s043ms 2s147ms 14.82K -1 4.94 GB 128.00 MB FINALIZE 03:EXCHANGE 3 840.528ms 943.254ms 15.61K -1 0 0 HASH(ss_customer_sk) 01:AGGREGATE 3 1s769ms 1s869ms 15.61K -1 5.32 GB 128.00 MB STREAMING 00:SCAN HDFS 3 17.941ms 37.109ms 183.59K -1 1.94 MB 16.00 MB tpcds_parquet.store_sales AFTER: 434ms Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail --------------------------------------------------------------------------------------------------------------------- 06:AGGREGATE 1 125.915us 125.915us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 72.179us 72.179us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE 3 79.054us 83.385us 3 1 28.00 KB 10.00 MB 04:AGGREGATE 3 6.559ms 7.669ms 14.82K -1 17.32 MB 128.00 MB FINALIZE 03:EXCHANGE 3 67.370us 85.068us 15.60K -1 0 0 HASH(ss_customer_sk) 01:AGGREGATE 3 19.245ms 24.472ms 15.60K -1 9.48 MB 128.00 MB STREAMING 00:SCAN HDFS 3 53.173ms 55.844ms 183.59K -1 1.18 MB 16.00 MB tpcds_parquet.store_sales Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Reviewed-on: http://gerrit.cloudera.org:8080/6025 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Impala Public Jenkins	2017-03-16 05:59:40 +00:00
Joe McDonnell	6441ca65bd	IMPALA-5039: Fix variability in parquet dictionary filtering test The tests for dictionary filtering look at how many row groups are processed and how many are filtered by matching text in the profile. However, the number of row groups processed and filtered by any individual fragment depends on how the work is split and how many impalads are running. This causes variability in the test output. To fix this, the test needs a way to aggregate the results across fragments. This fix introduces the following syntax for specifying these aggregates: aggregate(function_name, field_name): expected_value This searches the runtime profile for lines that contain 'field_name: number'. It skips the averaged fragment, as this is derived from all the other fragments. Currently, only SUM is implemented, and the expected_value is required to be an integer. It should be easy to implement other interesting functions like COUNT and MIN/MAX. It would also be possible to extend it to floats. Switching the dictionary filtering tests over to this new syntax eliminates the variability in the tests. Change-Id: I6b7b84d973b3ac678a24e82900f2637d569158bb Reviewed-on: http://gerrit.cloudera.org:8080/6301 Tested-by: Impala Public Jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2017-03-13 17:37:15 +00:00
Alex Behm	7d8acee814	IMPALA-4725: Query option to control Parquet array resolution. Summary of changes: Introduces a new query option PARQUET_ARRAY_RESOLUTION to control the path-resolution behavior for Parquet files with nested arrays. The values are: - THREE_LEVEL Assumes arrays are encoded with the 3-level representation. Also resolves arrays encoded with a single level. Does not attempt a 2-level resolution. - TWO_LEVEL Assumes arrays are encoded with the 2-level representation. Also resolves arrays encoded with a single level. Does not attempt a 3-level resolution. - TWO_LEVEL_THEN_THREE_LEVEL First tries to resolve assuming the 2-level representation, and if unsuccessful, tries the 3-level representation. Also resolves arrays encoded with a single level. This is the current Impala behavior and is used as the default value for compatibility. Note that 'failure' to resolve a schema path with a given array-resolution policy does not necessarily mean a warning or error is returned by the query. A mismatch might be treated like a missing field which is necessary to support schema evolution. There is no way to reliably distinguish the 'bad resolution' and 'legitimately missing field' cases. The new query option is independent of and can be combined with the existing PARQUET_FALLBACK_SCHEMA_RESOLUTION. Background: Arrays can be represented in several ways in Parquet: - Three Level Encoding (standard) - Two Level Encoding (legacy) - One Level Encoding (legacy) More details are in the "Lists" section of the spec: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md Unfortunately, there is no reliable metadata within Parquet files to indicate which encoding was used. There is even the possibility of having mixed encodings within the same file if there are multiple arrays. As a result, Impala currently tries to auto-detect the file encoding when resolving a schema path in a Parquet file using the TWO_LEVEL_THEN_THREE_LEVEL policy. However, regardless of whether a Parquet data file uses the 2-level or 3-level encoding, the index-based resolution may return incorrect results if the representation in the Parquet file does not exactly match the attempted array-resoution policy. Intuitively, when attempting a 2-level resolution on a 3-level file, the matched schema node may not be deep enough in the schema tree, but could still be a scalar node with expected type. Similarly, when attempting a 3-level resolution on a 2-level file a level may be incorrectly skipped. The name-based policy generally does not have this problem because it avoids traversing incorrect schema paths. However, the index-based resoution allows a different set of schema-evolution operations, so just using name-based resolution is not an acceptable workaround in all cases. Testing: - Added new Parquet data files that show how incorrect results can be returned with a mismatched file encoding and resolution policy. Added both 2-level and 3-level versions of the data. - Added a new test in test_nested_types.py that shows the behavior with the new PARQUET_ARRAY_RESOLUTION query option. - Locally ran test_scanners.py and test_nested_types.py on core. Change-Id: I4f32e19ec542d4d485154c9d65d0f5e3f9f0a907 Reviewed-on: http://gerrit.cloudera.org:8080/6250 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-03-09 05:07:44 +00:00
Joe McDonnell	9b923a1a27	IMPALA-4624: Implement Parquet dictionary filtering Here is a basic summary of the changes: Frontend looks for conjuncts that operate on a single slot and pass a map from slot id to the conjunct index through thrift to the backend. The conjunct indices are the indices into the normal PlanNode conjuncts list. The conjuncts need to satisfy certain conditions: 1. They are bound on a single slot 2. They are deterministic (no random functions) 3. They evaluate to FALSE on a NULL input. This is because the dictionary does not include NULLs, so any condition that evaluates to TRUE on NULL cannot be evaluated by looking only at the dictionary. The backend converts the indices into ExprContexts. These are cloned in the scanner threads. The dictionary read codepath has been removed from ReadDataPage into its own function, InitDictionary. This has also been turned into its own step in row group initialization. ReadDataPage will not see any dictionary pages unless the parquet file is invalid. For dictionary filtering, we initialize dictionaries only as needed to evaluate the conjuncts. The Parquet scanner evaluates the dictionary filter conjuncts on the dictionary to see if any dictionary entry passes. If no entry passes, the row group is eliminated. If the row group passes the dictionary filtering, then we initialize all remaining dictionaries. Dictionary filtering is controlled by a new query option, parquet_dictionary_filtering, which is on by default. Since column chunks can have a mixture of encodings, dictionary filtering uses three tests to determine whether this is purely dictionary encoded: 1. If the encoding_stats is in the parquet file, then use it to determine if there are only dictionary encoded pages (i.e. there are no data pages with an encoding other than PLAIN_DICTIONARY). -OR- 2. If the encoding stats are not present, then look at the encodings. The column is purely dictionary encoded if: a) PLAIN_DICTIONARY is present AND b) Only PLAIN_DICTIONARY, RLE, or BIT_PACKED encodings are listed -OR- 3. If this file was written by an older version of Impala, then we know that dictionary failover happens when the dictionary reaches 40,000 values. Dictionary filtering can proceed as long as the dictionary is smaller than that. parquet-mr writes the encoding list correctly in the current version in our environment (1.5.0). This means that check #2 works on some existing files (potentially most existing parquet-mr files). parquet-mr writes the encoding stats starting in 1.9.0. This is the version where check #1 will start working. Impala's parquet writer now implements both, so either check above will work. Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7 Reviewed-on: http://gerrit.cloudera.org:8080/5904 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Impala Public Jenkins	2017-03-06 23:20:34 +00:00
Nathan Salmon	34353218ce	IMPALA-4675: Case-insensitive matching of Parquet fields. The query option PARQUET_FALLBACK_SCHEMA_RESOLUTION allows matching of Parquet fields by name instead of by index (the default). Parquet column names are case sensitive, but Impala treats db/table/column/field names as case-insensitive. Today, there is no way today to select Parquet columns with mixed casing via SQL using the name-based field resolution policy. This patch changes the matching of Parquet fields to be case-insensitive. Testing: - Modified the data files backing complextypestbl to contain fields with mixed casing. - Several existing tests run against this table, including the test for name-based resolution. - I confirmed that without this fix, the existing name-based resolution tests fail on the modified data files. - I locally ran test_scanners.py and test_nested_types.py on exhaustive with this fix. Change-Id: I87395f84ba29b4c3d8e41be1ea4e89e500b8a9f4 Reviewed-on: http://gerrit.cloudera.org:8080/5891 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-03-03 10:20:07 +00:00
Lars Volker	996fb5eab3	IMPALA-2328: Address additional comments - test_parquet_stats.py was missing and the tests weren't run during GVO. - The tests in parquet_stats.test assume that the queries were executed in a single fragment, so they now run with 'num_nodes = 1'. - Parquet columns are now resolved correctly. - Parquet files with missing columns are now handled correctly. - Predicates with implicit casts can now be evaluated against parquet::Statistics. - This change also cleans up some old friend declarations I came across. Change-Id: I54c205fad7afc4a0b0a7d0f654859de76db29a02 Reviewed-on: http://gerrit.cloudera.org:8080/6147 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-03-03 02:34:10 +00:00
Dan Hecht	bf2e897209	IMPALA-4810: add DECIMAL test case to strict_mode tests The string parsing code already errors if the decimal column either overflows or underflows (i.e. loses scale). Let's just add a test case. Change-Id: Idd66c0fb5a4d201919d39f73dea08b87339d6469 Reviewed-on: http://gerrit.cloudera.org:8080/6150 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-03-03 01:43:42 +00:00
Matthew Jacobs	3a2a380cf7	IMPALA-4616: Add missing Kudu column options Adds support for missing Kudu column options in ALTER TABLE (it was there in CREATE TABLE already): * encoding * compression * block_size Also adds support for adding nullable columns with default values. All of the above was not originally implemented due to limitations in the Kudu client, but have since been fixed: KUDU-1746, KUDU-1747 Testing: Updates and adds relevant test cases. Change-Id: I96a0fce7f6bc0c086b259d3119daa72c94b0af7b Reviewed-on: http://gerrit.cloudera.org:8080/6220 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Impala Public Jenkins	2017-03-03 01:29:14 +00:00
Zach Amsden	c87ab35af1	IMPALA-4813: Round on divide and multiply Address rounding on divide and multiply when results are truncated. Testing: Manually ran some divides that should overflow, then added the results to the test. Made the decimal-test use rounding behavior by default, and now the error margin of the test has decreased. Initial perf results: Multiply is totall uninteresting so far, all implementations return the same values in the same time: +-------------------------+-----------------------------------+ \| sum(l_quantity * l_tax) \| sum(l_extendedprice * l_discount) \| +-------------------------+-----------------------------------+ \| 61202493.3700 \| 114698450836.4234 \| +-------------------------+-----------------------------------+ Fetched 1 row(s) in 1.13s Divide shows no regression from prior with DECIMAL_V2 off: +-----------------------------+-----------------------------------+ \| sum(l_quantity / l_tax) \| sum(l_extendedprice / l_discount) \| +-----------------------------+-----------------------------------+ \| 46178777464.523809516381723 \| 61076151920731.010714279183910 \| +-----------------------------+-----------------------------------+ before: Fetched 1 row(s) in 13.08s after: Fetched 1 row(s) in 13.06s And with DECIMAL_V2 on: +-----------------------------+-----------------------------------+ \| sum(l_quantity / l_tax) \| sum(l_extendedprice / l_discount) \| +-----------------------------+-----------------------------------+ \| 46178777464.523809523847285 \| 61076151920731.010714285714202 \| +-----------------------------+-----------------------------------+ Fetched 1 row(s) in 16.06s So the performance regression is not as bad as expected. Still, divide performance could use some work. Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312 Reviewed-on: http://gerrit.cloudera.org:8080/6132 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-03-02 20:12:05 +00:00
Lars Volker	1417d764e4	IMPALA-4982: Add parquet stats test IMPALA-2328 added support for skipping row groups based on parquet::Statistics. This change adds a test for root-level scalar columns of parquet files with nested types. Change-Id: If81c8a1ecea937794885d4e5e7bf765bd238f5fb Reviewed-on: http://gerrit.cloudera.org:8080/6130 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-03-01 02:00:19 +00:00
Lars Volker	749a55c4ad	IMPALA-2328: Read support for min/max Parquet statistics This change adds support for skipping row groups based on Parquet row group statistics. With this change we only support reading statistics from Parquet files for numerical types (bool, integer, floating point) and for simple predicates of the forms <slot> <op> <constant> or <constant> <op> <slot>, where <op> is LT, LE, GE, GT, and EQ. Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Reviewed-on: http://gerrit.cloudera.org:8080/6032 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-23 11:16:17 +00:00
Taras Bobrovytsky	d07580c171	IMPALA-4962: Fix SHOW COLUMN STATS for HS2 Impala incorrectly returned NULLs in the "Max Size" column of the SHOW COLUMN STATS result when executed through the HS2 interface. The issue was that the column was specified to be type INT in the result schema, but the actual type of the contents that we inserted into it was "long". The reason why this is not an issue in Impala shell is because we stringify the contents without inspecting the metadata for beeswax results. The issue was fixed by changing the type from INT to BIGINT. Change-Id: I419657744635dfdc2e1562fe60a597617fff446e Reviewed-on: http://gerrit.cloudera.org:8080/6109 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-22 23:10:34 +00:00
Michael Ho	637cc3e447	IMPALA-4821: Update AVG() for DECIMAL_V2 This change implements the DECIMAL_V2's behavior for AVG(). The differences with DECIMAL_V1 are: 1. The output type has a minimum scale of 6. This is similar to MS SQL's behavior which takes the max of 6 and the input type's scale. We deviate from MS SQL in the output's precision which is always set to 38. We use the smallest precision which can store the output. A key insight is that the output of AVG() is no wider than the inputs. Precision only needs to be adjusted when the scale is augmented. Using a smaller precision avoids potential loss of precision in subsequent decimal operations (e.g. division) if AVG() is a subexpression. Please note that the output type is different from SUM()/COUNT() as the latter can have a much larger scale. 2. Due to a minimum of 6 decimal places for the output, AVG() for decimal values whose whole number part exceeds 32 decimal places (e.g. DECIMAL(38,4), DECIMAL(33,0)) will always overflow as the scale is augmented to 6. Certain decimal types which work with AVG() in DECIMAL_V1 no longer work in DECIMAL_V2. Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f Reviewed-on: http://gerrit.cloudera.org:8080/6038 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-22 06:31:14 +00:00
Alex Behm	d845413ab8	IMPALA-4854: Fix incremental stats with complex types. The bug: Compute incremental stats used to always do a full stats recomputation for tables with complex types. The logic for detecting schema changes (e.g. an added column) did not take into consideration that columns with complex types are ignored in the stats computation, and should therefore not be recognized as a new column that does not yet have stats. Testing: - Added a new regression test - Locally ran test_compute_stats.py and the FE tests Change-Id: I6e0335048d688ee25ff55c6628d0f6f8ecc1dd8a Reviewed-on: http://gerrit.cloudera.org:8080/6033 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-17 06:02:48 +00:00
Zach Amsden	0715a303ea	IMPALA-4729: Implement REPLACE() This turned out to be slightly non-trivial as REPLACE is already a keyword, and thus the parser needs to be tweaked to allow this, since function names act as bare identifiers. It was difficult to get this to match performance of regexp_replace. For expanding patterns, the fact that regexp_replace copies the expansion inline means that it may in fact win on large strings with sparse matches that are > dcache size apart. Let's leave optimizing that for later. Testing: Added a full test for maximum size strings and got most of the boundary conditions I could identify. Manually ran queries on TPC-H dataset in impala to verify both performance and correctness. Added large string and exprs.test test clauses and ran the tests to verify they work as expected. Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Reviewed-on: http://gerrit.cloudera.org:8080/5776 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-15 01:33:23 +00:00
Dan Hecht	a53eeb2068	IMPALA-4370: Divide and modulo result types for DECIMAL version V2 Implement the new DECIMAL return type rules for divide and modulo expressions, active when query option DECIMAL_V2=1. See the comment in the code for more details. A couple of examples that show why new return type rules for divide are desirable. For modulo, the return types are actually equivalent, though the rules are expressed differently to have consistency with how precision fixups are handled for each version. DECIMAL Version 1: +-------------------------------------------------------+ \| cast(1 as decimal(20,0)) / cast(3 as decimal(20,0)) \| +-----------------------------------------------------+ \| 0 \| +-------------------------------------------------------+ DECIMAL Version 2: +-------------------------------------------------------+ \| cast(1 as decimal(20,0)) / cast(3 as decimal(20,0)) \| +-----------------------------------------------------+ \| 0.333333333333333333 \| +-------------------------------------------------------+ DECIMAL Version 1: +-------------------------------------------------------+ \| cast(1 as decimal(6,0)) / cast(0.1 as decimal(38,38)) \| +-------------------------------------------------------+ \| NULL \| +-------------------------------------------------------+ WARNINGS: UDF WARNING: Expression overflowed, returning NULL DECIMAL Version 2: +-------------------------------------------------------+ \| cast(1 as decimal(6,0)) / cast(0.1 as decimal(38,38)) \| +-------------------------------------------------------+ \| 10.000000 \| +-------------------------------------------------------+ Change-Id: I83e7f7787edfa4b4bddc25945090542a0e90881b Reviewed-on: http://gerrit.cloudera.org:8080/5952 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-14 18:40:54 +00:00
Sailesh Mukil	1f80396b20	IMPALA-4897: AnalysisException: specified cache pool does not exist A few tests were added with IMPALA-1670 that made use of HDFS caching. This patch moves these tests to a new file and only executes them when the default filesystem is HDFS. There was also a bug where the tests used absolute locations instead of locations relative to the table they were in which could easily collide with locations of other tables if they raced. That has been fixed too. Also added a testcase for testing alter table ADD multiple PARTITIONS for non-HDFS filesystems. Change-Id: Iefe61556bc28ae320f3f41fdc930d37b258d970a Reviewed-on: http://gerrit.cloudera.org:8080/5972 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-14 05:56:33 +00:00
Michael Ho	f982c3f76e	IMPALA-2020, IMPALA-4809: Codegen support for DECIMAL_V2 Currently, codegen supports converting type attributes (e.g. decimal type's precision and scale, type's size) obtained via calls to FunctionContextImpl::GetFnAttr() (previously Expr::GetConstantInt()) in cross-compiled code to runtime constants. This change extends this support for the query option DECIMAL_V2. To test this change, this change also handles a subset of IMPALA-2020: casting between decimal values is updated to support rounding (instead of truncation) when decimal_v2 is true. This change also cleans up the existing code by moving the codegen logic Expr::InlineConstant() to the codegen module and the type related logic in Expr::GetConstantInt() to FunctionContextImpl. Change-Id: I2434d240f65b81389b8a8ba027f980a0e1d1f981 Reviewed-on: http://gerrit.cloudera.org:8080/5950 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-11 07:07:45 +00:00
Tim Armstrong	d2d3f4c1a6	IMPALA-1430,IMPALA-4878,IMPALA-4879: codegen native UDAs This uses the existing infrastructure for codegening builtin UDAs and for codegening calls to UDFs. GetUdf() is refactored to support both UDFs and UDAs. IR UDAs are still not allowed by the frontend. It's unclear if we want to enable them going forward because of the difficulties in testing and supporting IR UDFs/UDAs. This also fixes some bugs with the Get*Type() methods of FunctionContext. GetArgType() and related methods now always return the logical input types of the aggregate function. Getting the tests to pass required fixing IMPALA-4878 because they called GetIntermediateType(). Testing: test_udfs.py tests UDAs with codegen enabled and disabled. Added some asserts to test UDAs to check that the correct types are passed in via the FunctionContext. Change-Id: Id1708eaa96eb76fb9bec5eeabf209f81c88eec2f Reviewed-on: http://gerrit.cloudera.org:8080/5161 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-10 02:18:32 +00:00
Thomas Tauber-Marshall	3edc9099bc	IMPALA-4849: IllegalStateException from rewritten CASE expr In SelectList.reset(), we call reset() on each select list item's expr. reset() is supposed to remove implicit casts, by returning the reset expr with implicit cast exprs removed from the tree. Previously SelectList.reset() ignored the return value of the calls to Expr.reset(), meaning that if the top-most expr of the select list item is an implicit cast, it won't actually get removed, which causes problems with analysis since implicit casts are always treated as pre-analyzed. The solution is to set the select list item's exprs to the return value of reset(). Testing: - Added a regression test to exprs.test Change-Id: I16ff88716b185e1d72d2bc603a42bd06c60ec18e Reviewed-on: http://gerrit.cloudera.org:8080/5917 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-09 20:36:22 +00:00
aphadke	c671ee1ece	IMPALA-4738: STDDEV_SAMP should return NULL for single record input In calculating the STDDEV_SAMP/VARIANCE of N rows a divion by N-1 rows is involved. Hence STDDEV_SAMP/VARIANCE for a single row involves a division by 0. This change returns a NULL instead of a 0 when calculating STDDEV_SAMP/VARIANCE for a single row. STDDEV_POP/VARIANCE_POP for single row will still return a 0 since this does not involve a division by 0. This matches the postgres behavior. Change-Id: Ide8af752cd8a2e554a2cd5a1ec948967a80de1fe Reviewed-on: http://gerrit.cloudera.org:8080/5800 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-07 04:19:06 +00:00
Thomas Tauber-Marshall	6a9df54096	IMPALA-3524: Don't process spilled partitions with 0 probe rows In the partitioned hash join node, if a spilled partition has no probe rows, building the hash table is unnecessary. For some build types (right outer, right anti, and full outer), we still need to process the build side to output unmatched rows (in this case, all rows since there were no probe rows to match). Testing: Added some cases to spilling.test. Manually tested these cases for performance, and they all show around a 6% improvement. Change-Id: I175b32dd9031e51218b38c37693ac3e31dfab47b Reviewed-on: http://gerrit.cloudera.org:8080/5389 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-02-06 20:22:33 +00:00
Attila Jeges	c452595bff	IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION Just like Hive, Implala should support multiple partitions in ALTER TABLE ADD PARTITION statements. The syntax is as follows: ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec1 [location_spec1] [cache_spec1] PARTITION partition_spec2 [location_spec2] [cache_spec2] ... Grammar was modified to handle the new syntax. Introduced PartitionDef class to capture the repeatable part of the statement. TPartitionDef is the name of the corresponding thrift class. AlterTableAddPartitionStmt and CatalogOpExecutor classes were also modified to work with a list of partitions. Duplicate partition specs are rejected in AlterTableAddPartitionStmt.analyze(). Added FE, E2E and integration tests. Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750 Reviewed-on: http://gerrit.cloudera.org:8080/4144 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-04 01:47:23 +00:00
Michael Ho	1d933919ee	IMPALA-4705, IMPALA-4779, IMPALA-4780: Fix some Expr bugs with codegen This change fixes expr-test.cc to work with codegen as it's originally intended. Fixing it uncovers a couple of bugs fixed in this patch: IMPALA-4705: When an IR function is materialized, its function body is parsed to find all its callee functions to be materialized too. However, the old code doesn't detect callee fnctions referenced indirectly (e.g. a callee function passed as argument to another function). This change fixes the problem above inspecting the use lists of llvm::Function objects. When parsing the bitcode module into memory, LLVM already establishes a use list for each llvm::Value object which llvm::Function is a subclass of. A use list contains all the locations in the module in which the Value is referenced. For a llvm::Function object, that would be its call sites and constant expressions referencing the functions. By using the use lists of llvm::Function in the module, a global map is established at Impala initialization time to map functions to their corresponding callee functions. This map is then used when materializing a function to ensure all its callee functions are also materialized recursively. IMPALA-4779: conditional function isfalse(), istrue(), isnotfalse(), isnotrue() aren't cross-compiled so they will lead to unexpected query failure when codegen is enabled. This change will cross-compile these functions. IMPALA-4780: next_day() always returns NULL when codegen is enabled. The bound checks for next_day() use some class static variables initialized in the global constructors (@llvm.global_ctors). However, we never execute the global constructors before calling the JIT compiled functions. This causes these variables to remain as zero, causing all executions of next_day() to fail the bound checks. The reason why these class static variables aren't compiled as global constants in LLVM IR is that TimestampFunctions::MIN_YEAR is not a compile time constant. This change fixes the problem above by setting TimestampFunctions::MIN_YEAR to a known constant value. A DCHECK is added to verify that it matches the value defined in the boost library. Change-Id: I40fdb035a565ae2f9c9fbf4db48a548653ef7608 Reviewed-on: http://gerrit.cloudera.org:8080/5732 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-03 23:35:25 +00:00
Thomas Tauber-Marshall	4b486b0f90	IMPALA-1861: Simplify conditionals with constant conditions When there are conditionals with constant values of TRUE or FALSE we can simplify them during analysis using the ExprRewriter. This patch introduces the SimplifyConditionalsRule with covers IF, OR, AND, CASE, and DECODE. It also introduces NormalizeExprsRule which normalizes AND and OR such that if either child is a BoolLiteral, then the left child is a BoolLiteral. Testing: - Added unit tests to ExprRewriteRulesTest. - Added functional tests to expr.test - Ran FE planner tests and BE expr-test. Change-Id: Id70aaf9fd99f64bd98175b7e2dbba28f350e7d3b Reviewed-on: http://gerrit.cloudera.org:8080/5585 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-01-24 03:22:08 +00:00
Alex Behm	7438730052	IMPALA-4767: Workaround for HIVE-15653 to preserve table stats. HIVE-15653 is a Hive Metastore bug that results in ALTER TABLE commands wiping the table stats of unpartitioned tables. Until the Hive bug is fixed, this patch adds a workaround to Impala that forces the Metastore to preserve the table stats. Testing: Private core/hdfs run passed. Change-Id: Ic191c765f73624bc716badadd7215c8dca9d6b1f Reviewed-on: http://gerrit.cloudera.org:8080/5731 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-20 01:18:10 +00:00
Tim Armstrong	69859bddfb	IMPALA-4549: consistently treat 9999 as upper bound for timestamp year Previously Impala was inconsistent about whether the year 10000 was supported, as a result of inconsistency in boost, which reported the maximum year as 9999 but sometimes allowed 10000. This meant that Impala sometimes accepted the year 10000 and sometimes not. Use the patched boost version and update tests accordingly. Testing: Ran an exhaustive build. Change-Id: Iaf23b40833017789d879e5da7bb10384129e2d10 Reviewed-on: http://gerrit.cloudera.org:8080/5665 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-19 00:04:27 +00:00
Thomas Tauber-Marshall	89a3d3c1eb	IMPALA-4716: Expr rewrite causes IllegalStateException The DECODE constructor in CaseExpr uses the same decodeExpr object when building the BinaryPredicates that compare the decodeExpr to each 'when' of the DECODE. This causes problems when different BinaryPredicates try to cast the same decodeExpr object to different types during analysis, in this case leading to a Precondition check failure. The solution is to clone the decodeExpr in the DECODE constructor in CaseExpr for each generated BinaryPredicate. Testing: - Added a regression test to exprs.test Change-Id: I4de9ed7118c8d18ec3f02ff74c9cca211c716e51 Reviewed-on: http://gerrit.cloudera.org:8080/5631 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-13 04:44:07 +00:00
Joe McDonnell	5755261954	IMPALA-4036: invalid SQL generated for partitioned table with comment For a table that has both a table comment and a partition specified, "show create table" incorrectly outputs the comment before the partition. This is not the correct order, and it results in an invalid SQL. This transaction fixes the ordering (partition comes before comment) and adds tests for this case. Change-Id: I29a33cfd142b473997fdc3acfe3f0966bc7ed784 Reviewed-on: http://gerrit.cloudera.org:8080/5648 Tested-by: Impala Public Jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2017-01-12 20:41:35 +00:00
Marcel Kornacker	70ae2e38eb	IMPALA-4739: ExprRewriter fails on HAVING clauses The bug was that expr rewrite rules such as ExtractCommonConjunctRule analyzed their own output, which doesn't work for syntactic elements that allow column aliases, such as the HAVING clause. The fix was to remove the analysis step (the re-analysis happens anyway in AnalysisCtx). Change-Id: Ife74c61f549f620c42f74928f6474e8a5a7b7f00 Reviewed-on: http://gerrit.cloudera.org:8080/5662 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-12 02:31:44 +00:00
Tim Armstrong	95ed4434f2	IMPALA-3202,IMPALA-2079: rework scratch file I/O Refactor BufferedBlockMgr/TmpFileMgr to push more I/O logic into TmpFileMgr, in anticipation of it being shared with BufferPool. TmpFileMgr now handles: * Scratch space allocation and recycling * Read and write I/O The interface is also greatly changed so that it is built around Write() and Read() calls, abstracting away the details of temporary file allocation from clients. This means the TmpFileMgr::File class can be hidden from clients. Write error recovery: Also implement write error recovery in TmpFileMgr. If an error occurs while writing to scratch and we have multiple scratch directories, we will try one of the other directories before cancelling the query. File-level blacklisting is used to prevent excessive repeated attempts to resize a scratch file during a single query. Device-level blacklisting is not implemented because it is problematic to permanently take a scratch directory out of use. To reduce the number of error paths, all I/O errors are now handled asynchronously. Previously errors creating or extending the file were returned synchronously from WriteUnpinnedBlock(). This required modifying DiskIoMgr to create the file if not present when opened. Also set the default max_errors value in the thrift definition file, so that it is in effect for backend tests. Future Work: * Support for recycling variable-length scratch file ranges. I omitted this to avoid making the patch even large. Testing: Updated BufferedBlockMgr unit test to reflect changes in behaviour: * Scratch space is no longer permanently associated with a block, and is remapped every time a new block is written to disk . * Files are now blacklisted - updated existing tests and enable the disable blacklisting test. Added some basic testing of recycling of scratch file ranges in the TmpFileMgr unit test. I also manually tested the code in two ways. First by removing permissions for /tmp/impala-scratch and ensuring that a spilling query fails cleanly. Second, by creating a tiny ramdisk (16M) and running with two scratch directories: one on /tmp and one on the tiny ramdisk. When spilling, an out of space error is encountered for the tiny ramdisk and impala spills the remaining data (72M) to /tmp. Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17 Reviewed-on: http://gerrit.cloudera.org:8080/5141 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-05 02:26:24 +00:00
Amos Bird	b3636c97d4	IMPALA-4033: Treat string-partition key values as case sensitive. This commit makes ADD PARTITION operations treat string partition-key values as case sensitive in consistent with other related partition DDL operations. Change-Id: I6fbe67d99df8a50a16a18456fde85d03d622c7a1 Reviewed-on: http://gerrit.cloudera.org:8080/5535 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-22 10:45:39 +00:00
Lars Volker	ce9b332ee9	IMPALA-4163: Add sortby() query hint This change introduces the sortby() query plan hint for insert statements. When specified, sortby(a, b) will add an additional sort step to the plan to order data by columns a, b before inserting it into the target table. Change-Id: I37a3ffab99aaa5d5a4fd1ac674b3e8b394a3c4c0 Reviewed-on: http://gerrit.cloudera.org:8080/5051 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2016-12-17 05:37:43 +00:00
Lars Volker	1e683d4ee6	IMPALA-4403: Implement SHOW RANGE PARTITIONS for Kudu tables Change-Id: Idf5b2fdd02938a42fa59ec98884e4ac915dd1f65 Reviewed-on: http://gerrit.cloudera.org:8080/5390 Reviewed-by: Lars Volker <lv@cloudera.com> Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-12-10 00:05:50 +00:00
Tim Armstrong	88448d1d4a	IMPALA-4586: don't constant fold in backend This patch ensures that setting the query option enable_expr_rewrites=false will disable both constant folding in the frontend (which it did already) and constant caching in the backend (which is enabled in this patch). This gives a way for users to revert to the old behaviour of non-deterministic UDFs before these optimisations were added in Impala 2.8. Before this patch, the backend would cache values based on IsConstant(). This meant that there was no way to override caching of values of non-deterministic UDFs, e.g. with enable_expr_rewrites. After this patch, we only cache literal values in the backend. This offers the same performance as before in the common case where the frontend will constant fold the expressions anyway. Also rename some functions to more cleanly separate the backend concepts of "constant" expressions and expressions that can be evaluated without a TupleRow. In a future change (IMPALA-4617) we should remove the IsConstant() analysis logic from the backend entirely and pass the information from the frontend. We should also fix isConstant() in the frontend so that it only returns true when it is safe to constant-fold the expression (IMPALA-4606). Once that is done, we could revert back to using IsConstant() instead of IsLiteral(). Testing: Added targeted test to test constant folding of UDFs: we expect different results depending on whether constant folding is enabled. Also run TestUdfs with expr rewrites enabled and disabled, since this can exercise different code paths. Refactored test_udfs somewhat to avoid running uninteresting combinations of query options for targeted tests and removed some 'drop * if not exists' statements that aren't necessary when using unique_database. This change revealed flakiness in test_mem_limit, which seems to have only worked by coincidence. Updated TrackAllocation() to actually set the query status when a memory limit is exceeded. Looped this test for a while to make sure it isn't flaky any more. Also fix other test bugs where the vector argument is modified in-place, which can leak out to other tests. Change-Id: I0c76e3c8a8d92749256c312080ecd7aac5d99ce7 Reviewed-on: http://gerrit.cloudera.org:8080/5391 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2016-12-08 04:53:53 +00:00
Dimitris Tsirogiannis	5ea1798661	IMPALA-4619: Allow NULL as default value in Kudu tables This commit fixes an issue where an error is thrown if the default value for a Kudu column is set to NULL. Change-Id: Ida27ce56f1dd7603485a69c680db3bcea6702aff Reviewed-on: http://gerrit.cloudera.org:8080/5405 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-12-08 04:53:38 +00:00
Taras Bobrovytsky	1083639ff2	IMPALA-4585: Allow the $DATABASE template in the CATCH section In a recent change (IMPALA-4363) we introduced a change where all file paths in .test files should be replaced with '__HDFS_FILENAME__'. This caused problems for tests on non-HDFS file systems and we also lost some test coverage. This patch fixes the problem by allowing the $DATABASE template in the catch section of the .test file. Change-Id: If0f6ae8dea7ac4cdaf0c61ebd8f0c589c353a96e Reviewed-on: http://gerrit.cloudera.org:8080/5372 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2016-12-08 02:20:50 +00:00
Michael Ho	9337518137	IMPALA-4595: Ignore discarded functions after linking For LLVM IR UDF, Impalad will link an external LLVM module in which the IR UDF is defined with the main module. If it happens that a symbol is defined in both modules, LLVM may choose to discard the one defined in the external module. The discarded function and its callee will not be present in the linked module. In IMPALA-4595, udf-sample.cc was compiled without any optimization. Duplicated definition such as StringVal::null() may have different inlining level between the external module and the main module. When the duplicated definition in the external module is discarded, some of its callee functions (which are not inlined) may not be defined in the main module so they can no longer be located in the linked module. This trips up some code in the LlvmCodegen::LinkModule(). In particular, when parsing for functions in external module which are materialized during linking, certain functions may not be present due to the reason above. Impalad will hit a DCHECK in debug build or crash due to null pointer access in release build. This change fixes the problem above by taking into account that certain functions may not be defined anymore after linking. This change also fixes two incorrect status propagation in fe-support.cc. Change-Id: Iaa056a0c888bfcc95b412e1bc1063bb607b58ab7 Reviewed-on: http://gerrit.cloudera.org:8080/5384 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2016-12-07 22:52:35 +00:00
Bharath Vissapragada	bb63339377	IMPALA-3314: Fix Avro schema loading for partitioned tables. Bug: Commit 6f31c7 fixed a crash when setting Avro schemas for tables with storage altered to Avro file format. However the fix was incomplete for partitioned/multi file format tables since 'hasAvroData_' is not set for all code paths that load the partitioned tables (For example: HdfsTable#loadAllPartitions()). Fix: Moved the code for setting 'hasAvroData_' to addPartition() which is the common logic for all code paths adding new partitions. Also fixed the test coverage gap by adding a new test for partitioned tables altered to Avro format. Change-Id: I7854ff002b2277ec4a5388216218a1d5ad142de8 Reviewed-on: http://gerrit.cloudera.org:8080/5388 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-07 09:45:11 +00:00
Dan Burkert	f83652c1da	Replace INTO N BUCKETS with PARTITIONS N in CREATE TABLE This commit also removes the now unused `DISTRIBUTE`, `SPLIT`, and `BUCKETS` keywords that were going to be newly released in Impala 2.6, but are now unused. Additionally, a few remaining uses of the `DISTRIBUTE BY` syntax has been switched to `PARTITION BY`. Change-Id: I32fdd5ef26c532f7a30220db52bdfbf228165922 Reviewed-on: http://gerrit.cloudera.org:8080/5382 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-12-07 07:31:16 +00:00
Matthew Jacobs	5188f879a7	IMPALA-4477: Bump Kudu version to latest master (60aa54e) Bumps the toolchain version to get a newer Kudu build. Also fixes test failures resulting from changes in Kudu. Notably error strings have changed (IMPALA-4590) and the number of replicas must be odd (IMPALA-4589). Note: The toolchain binaries starting with this build are now using the toolchain binutils rather than the system binutils. Testing: private exhaustive build. Change-Id: If1912f058c240fbe82b06f77e31add7755289be1 Reviewed-on: http://gerrit.cloudera.org:8080/5369 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-12-07 05:11:13 +00:00
Alex Behm	534999382d	IMPALA-4574: Do not treat UUID() like a constant expr. A recent change (IMPALA-1788) lead UUID() to be constant folded, and therefore, produce the same value for every invocation across rows. Similar issues might also occur due to the BE optimizing UUID() during codegen of scalar-fn-call.h/cc. The fix is to not treat UUID() like a constant expr in both the FE and BE. Discussion: The fix in this patch is rather blunt, but minimally invasive to reduce the risk of adding new bugs. Ideally, the constness of an Expr should be determined in one place and the FE and BE should agree on which Exprs are constant. I considered the following alternatives but concluded they were too risky: 1. Pass a flag from FE to BE for ever Expr indicating its constness. This simple solution would populate a thrift field with the result of Expr.isConstant() for every Expr in an Expr tree. There are several issues. Calling isConstant() for every Expr in an Expr tree is rather expensive due to repeated traversals of the tree. That could be mitigated by populating an isConstant flag during Expr.analyze() to avoid re-computing the constness repeatedly. This requires changes to analyze(), clone(), reset(), and possibly other places for many Exprs. There is potential for missing a place and adding a new bug. 2. The above solution could be limited to only FunctionCallExpr. However, the BE expr type FUNCTION_CALL which maps to scalar-fn-call.h/cc is created from various FE Exprs, not just FunctionCallExpr. So adding a flag only to scalar-fn-call.h/cc would be confusing because it would only sometimes be set in a meaningful way. This seems more confusing than the current straightforward solution. Testing: Added FE and EE tests. Change-Id: If2499f5f6ecdcb098623202c8e6dc2d02727194a Reviewed-on: http://gerrit.cloudera.org:8080/5324 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-06 22:03:01 +00:00
Dimitris Tsirogiannis	cba93f1ac3	IMPALA-4561: Replace DISTRIBUTE BY with PARTITION BY in CREATE TABLE Change-Id: I0e07c41eabb4c8cb95754cf04293cbd9e03d6ab2 Reviewed-on: http://gerrit.cloudera.org:8080/5317 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-12-06 10:41:53 +00:00
Dimitris Tsirogiannis	5bb9959fa4	IMPALA-4584: Make alter table operations on Kudu tables synchronous This commit changes the behavior of alter table operations on Kudu tables from asynchronous to synchronous. With this change, alter table operations return when either the operations complete successfully or a timeout is reached. Change-Id: I385bce66691ae9040e72f97557e1bba31009e36b Reviewed-on: http://gerrit.cloudera.org:8080/5364 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-12-06 03:53:15 +00:00
Dimitris Tsirogiannis	867b2434ca	Additional functional testing for default values on Kudu tables This commit also fixes an issue where an error is thrown if a default value is set for a boolean column on a Kudu table. Change-Id: I25b66275d29d1cf21df14e78ab58f625a83b0725 Reviewed-on: http://gerrit.cloudera.org:8080/5337 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2016-12-06 00:04:43 +00:00
Tim Armstrong	1495b2007d	IMPALA-4498: crash in to_utc_timestamp/from_utc_timestamp The bugs was that the functions did not check whether the conversion pushed the value out of range. The fix is to use boost's validation immediately to check the validity of the timestamp and catch any exceptions thrown. It would be preferable to avoid the exceptions, but Boost does not provide a straightforward way to disable the exceptions or extract potentially-invalid values from a date object. Testing: Added expression tests that exercise out-of-range cases. Also added additional tests to confirm that date addition and subtraction weren't affected by similar bugs. Change-Id: Idc427b06ac33ec874a05cb98d01c00e970d3dde6 Reviewed-on: http://gerrit.cloudera.org:8080/5251 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2016-12-05 23:33:44 +00:00
Dimitris Tsirogiannis	1da57019ad	IMPALA-4579: SHOW CREATE VIEW fails for view containing a subquery This commit fixes an issue where a SHOW CREATE VIEW statement throws an analysis error if the view contains a subquery. Change-Id: I4a89e46a022f0ccec198b6e3e2b30230103831ce Reviewed-on: http://gerrit.cloudera.org:8080/5333 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-12-04 08:35:15 +00:00
Taras Bobrovytsky	858f5c2197	IMPALA-4363: Add Parquet timestamp validation Before this patch, we would simply read the INT96 Parquet timestamp representation and assume that it's valid. However, not all bit permutations represent a valid timestamp. One of the boost functions raised an exception (that we didn't catch) when passed an invalid boost date object, which resulted in a crash. This patch fixes problem by validating that the date falls into 1400..9999 year range as we are scanning Parquet. Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac Reviewed-on: http://gerrit.cloudera.org:8080/5343 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2016-12-03 06:41:07 +00:00
Dimitris Tsirogiannis	da34ce9780	IMPALA-4527: Columns in Kudu tables created from Impala default to "NULL" This commit reverts the behavior introduced by IMPALA-3719 which used the Kudu default behavior for column nullability if none was specified in the CREATE TABLE statement. With this commit, non-key columns of Kudu tables that are created from Impala are by default nullable unless specified otherwise. Change-Id: I950d9a9c64e3851e11a641573617790b340ece94 Reviewed-on: http://gerrit.cloudera.org:8080/5259 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-02 02:06:22 +00:00

1 2 3 4 5 ...

880 Commits