impala

mirror of https://github.com/apache/impala.git synced 2026-01-28 00:04:45 -05:00

Author	SHA1	Message	Date
Yongjun Zhang	dbe0f86e00	IMPALA-6442: Misleading file offset reporting in error messages. The error message described in IMPALA-6442 incorrectly reported the file offset where the Parquet footer starts, as if the offset is counted from the file end instead of from the file beginning. The fix changed the reported file offset to be counted from the beginning of the Parquet file. Testing: Create a small table that contains one row of data with a single column that's bigint and store it as Parquet. Manually changed the footer size field to be 1) smaller than the original footer size by 1, to trigger the error message fixed by this jira to be printed, to verify that the fix functions correctly; 2) bigger than the file size, thus to trigger another related error message to be printed. Change-Id: I35235e99ea9ceb0d31961dd3b8069f7194f5a2de Reviewed-on: http://gerrit.cloudera.org:8080/11379 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-10 16:09:41 +00:00
Fredy Wijaya	9934b473b7	IMPALA-6373: Allow primitive type widening on parquet tables This patch implements support for primitive type widening on parquet tables. It only supports conversion to those types without any loss of precision. - tinyint (INT32) -> smallint (INT32), int (INT32), bigint (INT64), double (DOUBLE) - smallint (INT32) -> int (INT32), bigint (INT64), double (DOUBLE) - int (INT32) -> bigint (INT64), double (DOUBLE) - float (FLOAT) -> double (DOUBLE) Testing: - Added BE test - Added E2E test - Ran core tests Change-Id: If93394b035c64cf6fc5f37b54d29c034cc1f86e4 Reviewed-on: http://gerrit.cloudera.org:8080/11268 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-23 15:55:53 +00:00
Zoltan Borok-Nagy	e27954a5aa	IMPALA-6709: Simplify tests that copy local files to tables We had quite a few tests that created a table and used "hdfs dfs -copyFromLocal" to copy data files to the warehouse directory for this table. This operation needs some boilerplate code that I refactored to the new functions called create_table_from_parquet() and create_table_and_copy_files(). Change-Id: Ie00a4561825facf8abe2e8e74a6b6e93194f416f Reviewed-on: http://gerrit.cloudera.org:8080/11127 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-22 18:08:20 +00:00
Vincent Tran	3ff4cde772	IMPALA-6844: Fix possible NULL dereference in to_date() builtin If result.ptr allocation fails for some reason inside the StringVal constructor, we still overwrite result.len and continue. This change checks that the StringVal pointer is not NULL before dereferencing it, and returns NULL if it is. Testing: Added a test case of the to_date() function to alloc-fail-init.test to leverage the fault injector --stress_fn_ctx_alloc. Change-Id: I14cfb29a592885bb2f39958c8644f93db5220a68 Reviewed-on: http://gerrit.cloudera.org:8080/11286 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-08-21 22:06:02 +00:00
Tim Armstrong	bddd7def99	IMPALA-7465: fix test_kudu_scan_mem_usage The issue was that the row batch queue could grow a lot if the consumer was slow. Also add an additional test to exercise the OOM code path in Kudu for completeness. Testing: Added sleep to kudu-scan-node.cc that reproduced the problem. Looped modified test to flush out flakiness. Change-Id: Ic4a95b6b6d96a447df68ef4912a86f1e11f219ca Reviewed-on: http://gerrit.cloudera.org:8080/11285 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-21 21:38:11 +00:00
Todd Lipcon	7c0450d6d4	IMPALA-7439. CREATE DATABASE creates a catalog entry with missing location Previously, when creating a new database, the CatalogOpExecutor would create an HMS Database object, issue the HMS createDatabase call, and then create a Catalog entry from that same Database object. The resulting Catalog entry would be missing certain fields that are auto-created by the HMS itself, most importantly the location field. The code for CTAS seems to have contained a workaround for this issue ever since catalogd was first introduced: rather than using the location stored in the Db object, it would re-fetch the Database from HMS. Now that this is fixed, that workaround could be removed and some code simplified. A new test verifies that a newly-created database has the appropriate location, and existing CTAS tests verify that functionality didn't regress. Change-Id: I13df31cee1e5768b073e0e35c4c16ebf1892be23 Reviewed-on: http://gerrit.cloudera.org:8080/11229 Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-17 00:54:28 +00:00
Tim Armstrong	7ccf736908	IMPALA-7096: restore scanner thread memory heuristics This restores some of the heuristics removed in IMPALA-4835 that can help scans from hitting OOM conditions. The heuristics are implemented at the query level rather than in each scan node in isolation. Introduce a ScannerMemLimiter class that belongs to the QueryState that tracks the amount of memory estimated to be consumed for all scanner threads running for the query on the current backend. Also check soft memory limits to see if scanner threads should be started or the current scanner thread should stop. The long-term plan is to switch to the MT scan node implementations. When that happens this code can be removed. In the meantime this code is imperfect but will help avoid OOM in many scenarios. Testing: Added regression tests for HDFS and Kudu where we previously could run out of memory with a low mem_limit. Manual testing: * Ran query tests with --thread_creation_fault_injection=true for a bit, confirmed no crashes. * ran single-node stress test for Kudu and Parquet for 10-20 min each. Change-Id: Ib9907fa8c4d2b0b85f67f4f160899c1c258ad82b Reviewed-on: http://gerrit.cloudera.org:8080/11103 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-16 21:25:00 +00:00
Tim Armstrong	29905d1ea4	IMPALA-7442: reduce mem requirement of semi-joins-exhaustive The test started running into IMPALA-7446, maybe because of a timing change. This appears to have always been possible. The fix is to reduce the memory requirement of the test. IMPALA-2256 is no longer really possible because the BufferedTupleStream code was simplified to avoid the 32-bit row index limitation, so we're not losing important coverage on the current code with this change. Testing: Ran test in a loop to confirm it did not OOM. Change-Id: I9d9480cad6bf8222abe990e7046498a0531e2849 Reviewed-on: http://gerrit.cloudera.org:8080/11223 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-16 00:21:54 +00:00
Michael Brown	cddb35be92	IMPALA-7445: separate skippable tables from test_resource_limits Separate the Hbase and Kudu tables used in test_resource_limits to their own files so they are skippable in environments where those services are not used. Change-Id: I02d9fd0b48817f755e1eee2ae4613e2089fa1973 Reviewed-on: http://gerrit.cloudera.org:8080/11221 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-15 04:15:46 +00:00
Tianyi Wang	3cb784310f	IMPALA-7347: Ignore numFilesErasureCoded in TestShowCreateTable This table properties only exist for HDFS tables. To get the test work on local tables, it needs to be ignored. Change-Id: Icc8494fb91c4777cee662a97f750486aa8e79a8e Reviewed-on: http://gerrit.cloudera.org:8080/11192 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-13 21:36:16 +00:00
Tim Armstrong	b7d509d761	IMPALA-7231: group plan nodes into pipelines This adds some informational output to explain plans and sends the information to the backend. The idea is that this will make it easier to explain how Impala's pipelined execution works and also enable future work on profile analysis that can more intelligently group plan nodes. Tests: * Updated planner tests to include new output. Change-Id: I1d10eb14d997242f445e5c5fc5362d5410370721 Reviewed-on: http://gerrit.cloudera.org:8080/10848 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-10 19:31:30 +00:00
Tim Armstrong	3e17705eca	IMPALA-6034: Add scanned bytes limits per query This adds support for aggregate resource limits at runtime, specified via query options. If a query exceeds a limit it is terminated. The checks are periodic so the query may go somewhat over the limits. SCAN_BYTES_LIMIT is exposed as an advanced query option. CPU_LIMIT_S is hidden as a development query option because it is flawed - the CPU user/sys time is only updated upon thread completion, so in many cases the limit will not take effect until well after the resources have been used. IMPALA-7318 tracks enabling this. Query profile is updated to include query wide and per backend metrics for CPU and scanned bytes. Example from "select count(*) from tpch_parquet.lineitem": Per Node Peak Memory Usage: tarmstrong-box:22000(289.50 KB) tarmstrong-box:22001(249.50 KB) tarmstrong-box:22002(249.50 KB) Per Node Bytes Read: tarmstrong-box:22000(100.00 KB) tarmstrong-box:22001(100.00 KB) tarmstrong-box:22002(100.00 KB) Per Node User Time: tarmstrong-box:22000(40.000ms) tarmstrong-box:22001(32.000ms) tarmstrong-box:22002(24.000ms) Per Node System Time: tarmstrong-box:22000(0.000ns) tarmstrong-box:22001(0.000ns) tarmstrong-box:22002(0.000ns) - FiltersReceived: 0 (0) - FinalizationTimer: 0.000ns - NumBackends: 3 (3) - NumFragmentInstances: 4 (4) - NumFragments: 2 (2) - TotalBytesRead: 300.00 KB (307200) - TotalCpuTime: 96.000ms Testing: Added tests for various permutations for CPU_LIMIT_S and SCAN_BYTES_LIMIT Based on a previous patch by Mostafa Mokhtar <mmokhtar@cloudera.com> Change-Id: I3e85f80b70b3fce47e637e9322ed0316ee84f6a9 Reviewed-on: http://gerrit.cloudera.org:8080/11081 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-09 22:18:01 +00:00
Todd Lipcon	d958b4779c	IMPALA-7411, IMPALA-7414. Fix failing tests on local filesystem The new test added by IMPALA-7308 failed on local-filesystem builds because the warehouse path is not directly at /test-warehouse. This fix prefixes the path appropriately with $FILESYSTEM_PREFIX Additionally, the fix for IMPALA-5542 made a similar mistake constructing a path on the Python side of a test case. Fixed by using the get_fs_path function. Change-Id: I6922e24a74576d0d000e8e2645a235868583c1e1 Reviewed-on: http://gerrit.cloudera.org:8080/11164 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-09 19:18:08 +00:00
Tim Armstrong	a6c356850b	IMPALA-7403: fix child batch mem mgmt in analytic The core of the fix is in ProcessChildBatches(), where we copy 'prev_input_tuple_' to 'prev_input_tuple_pool_' and reset the child batch before the call to child(0)->GetNext(). This solves a couple of problems: * prev_input_tuple_ may be referencing memory from the child that had the needs_deep_copy() flag set and therefore will be freed or recycled when calling child(0)->GetNext() again. * 'prev_child_batch_' may have been holding onto resources that the child had flushed, which means they need to be freed before the next GetNext() call. Also refactors the logic around child_cmp_row_ to make the variable lifetime and data flow clearer. Testing: Add regression test. The test passes both with this patch alone and with IMPALA-7333 reapplied. Change-Id: I09eb6213d47287f2addb72f8c1304085d2d48c55 Reviewed-on: http://gerrit.cloudera.org:8080/11155 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-08 20:51:45 +00:00
Todd Lipcon	4aec50484a	IMPALA-7308. Support Avro tables in LocalCatalog This adds support for loading Avro-formatted tables in LocalCatalog. In the case that the table properties indicate a table is Avro-formatted, the semantics are identical to the existing catalog implementation: - if an explicit avro schema is specified, it overrides the schema provided by the HMS - if no explicit avro schema is specified, one is inferred, and then the inferred schema takes the place of the one provided by the HMS (thus promoting columns like TINYINT to INT) - on COMPUTE STATS, if any discrepancy is discovered between the HMS schema and the inferred schema, an error is emitted. The semantics for LocalCatalog are slightly different in the case of tables which have not been configured as Avro format on the table level: The existing implementation has the behavior that, when a table is loaded, all partitions are inspected, and, if any partition is discovered with Avro format, the above rules are applied. This has some very unexpected results, described in an earlier email to dev@impala.apache.org [1]. To summarize that email thread, the existing behavior was decided to be unintuitive and inconsistent with Hive. Additionally, this behavior requires loading all partitions up-front, which gets in the goal of lazy/granular metadata loading in LocalCatalog. Thus, the LocalCatalog implementation differs as follows: - the "schema override" behavior ONLY occurs if the Avro file format has been selected at a table level. - if an Avro partition is added to a non-Avro table, and that partition has a schema that isn't compatible with the table's schema, an error will occur on read. The thread additionally discusses adding an error message on "alter" to prevent users from adding an Avro partition to a table with an incompatible schema. To keep the scope of this patch minimal, that is not yet implemented here. I filed IMPALA-7309 to change the behavior of the existing catalog implementation to match. A new test verifies the behavior, set to 'xfail' when running on the existing catalog implementation. [1] https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@%3Cdev.impala.apache.org%3E Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Reviewed-on: http://gerrit.cloudera.org:8080/10970 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com>	2018-08-07 17:38:04 +00:00
Tim Armstrong	a8d7a50bd5	IMPALA-7354: planner test resource estimates for more workloads Adds the resource estimates for key benchmark workloads: TPC-H, TPC-DS, TPC-H Nested and TPC-H Kudu to the planner test so that we can track changes in resource requirements and estimates for these queries. Also don't show decimal places for MB and KB estimates. The estimates are not accurate to that level and displaying extra precision has some disadvantages: * It communicates to readers that the estimates have a high level of precision. * It increases the odds of small variations in file sizes, etc causing test failures. Also fixed a regex in the stress test that didn't escape the decimal point correctly. Testing: Ran core tests. Change-Id: I6a9f836699200ea87fb03bf36abad0e23949ac26 Reviewed-on: http://gerrit.cloudera.org:8080/11087 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-07 08:34:33 +00:00
Thomas Tauber-Marshall	70bf9ea296	IMPALA-7397: fix DCHECK in impala::AggregationNode::Close As part of IMPALA-110, all of the logic of performing aggregations was refactored out of the aggregation ExecNode and into Aggregators. Each Aggregator manages its own memory, so a DCHECK was added in AggregationNode::Close to ensure that no allocations were made from the regular ExecNode mem pools. This DCHECK is violated if the node was assigned conjuncts that allocate mem in Prepare - even though the conjuncts are evaluated in the Aggregator, we still initialize them in ExecNode::Prepare. This patch solves the problem by creating a copy of the TPlanNode without the conjuncts to pass to ExecNode. In the future, when TAggregator is introduced, we can get rid of this by directly assigning conjuncts to Aggregators. Note that this doesn't affect StreamingAggregationNode, which never has conjuncts assigned to it, but this patch fixes an incorrect DCHECK that enforces this. Testing: - Added a regression test for this case. Change-Id: I65ae00ac23a62ab9f4a7ff06ac9ad5cece80c39d Reviewed-on: http://gerrit.cloudera.org:8080/11132 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-06 23:45:08 +00:00
Csaba Ringhofer	dc32bf7703	IMPALA-7362: Add query option to set timezone This change adds a new query option "timezone" which defines the timezone used for utc<->local conversions. The main goal is to simplify testing, but I think that some users may also find it useful so it is added as a "general" query option. Examples: set timezone=UTC; set timezone="Europe/Budapest" The timezones are validated, but as query options are not sent to the coordinator immediately, the error checking will only happen when running a query. Leading/trailing " and 'characters are stripped because the / character cannot be entered unquoted in some contexts. Currently the timezone has effect in the following cases: -function now() -conversions between unix time and timestamp if flag use_local_tz_for_unix_timestamp_conversions is true -reading parquet timestamps written by Hive if flag convert_legacy_hive_parquet_utc_timestamps is true In the near future Parquet timestamps's isAdjustedToUTC property will be supported, which will decide whether to do utc->local conversion on a per file+column basis. This conversion will be also affected. Testing: - Extended test_local_tz_conversion.py to actually test utc<->local conversion. Until now the effect of flag use_local_tz_for_unix_timestamp_conversions was practically untested. - Added a shell test to check that the default of the query option is the system's timezone. - Added a shell test to check timezone validation. Change-Id: I73de86eff096e1c581d3b56a0d9330d686f77272 Reviewed-on: http://gerrit.cloudera.org:8080/11064 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-03 17:45:25 +00:00
Tianyi Wang	fb3d47d356	IMPALA-7347: Update tests to accomodate HIVE-18118 HIVE-18118 adds 'numFilesErasureCoded' to table properties. This patch addes it to test_show_create_table to work with the latest Hive. Change-Id: I6aae402dd38374de90b35c32166a9507e6eb29f9 Reviewed-on: http://gerrit.cloudera.org:8080/11108 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-02 20:25:31 +00:00
Zoltan Borok-Nagy	7917eac0ad	IMPALA-5542: Impala cannot scan Parquet decimal stored as int64_t/int32_t The Decimal type in Parquet is a logical type. That means the Parquet file stores some physical/primitive type that is annotated by the DECIMAL tag to make it behave like decimals. The allowed physical types for decimals are INT32, INT64, FIXED, and BINARY. Before this commit Impala could only read decimals stored as FIXED or BINARY. Spark decided to write decimals as INT32 or INT64 when their precision allows it: (1 <= precision <= 9) ==> INT32 (10 <= precision <= 18) ==> INT64 I updated our column readers to accept INT32 and INT64 as valid physical types for decimals. Testing: * extended parquet-plain-test.cc * added Parquet files generated by Spark 2.3.1 and updated test_scanners.py Change-Id: Ib8c41bfc7c1664bdba5099d3893dc8dbe4304794 Reviewed-on: http://gerrit.cloudera.org:8080/11000 Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-02 20:21:12 +00:00
Fredy Wijaya	a203733fac	IMPALA-7295: Remove IMPALA_MINICLUSTER_PROFILE=2 This patch removes the use of IMPALA_MINICLUSTER_PROFILE. The code that uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it defaults to code from IMPALA_MINICLUSTER_PROFILE=3. In order to reduce having too many code changes in this patch, there is no code change for the shims. The shims for IMPALA_MINICLUSTER_PROFILE=3 automatically become the default implementation. Testing: - Ran core and exhaustive tests Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08 Reviewed-on: http://gerrit.cloudera.org:8080/10940 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-14 01:03:18 +00:00
Michael Brown	3da2dc63fe	IMPALA-6810: runtime_row_filters.test: omit pool name in pattern Some downstream tests run this with a fair-scheduler.xml set that, while not changing admission control behavior, does change the name of the pool. Omit the pool name to permit that downstream test to succeed. Testing: - local with change in minicluster - downstream in environment as well Change-Id: I3fe6beb169dc6bfefabde9dc7a4632c1a5e63fa7 Reviewed-on: http://gerrit.cloudera.org:8080/10942 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-13 20:45:37 +00:00
Thomas Tauber-Marshall	e2aafae204	IMPALA-7251: Fix QueryMaintenance calls in Aggregators A recent change, IMPALA-110 (part 2), refactored PartitionedAggregationNode into several classes, including a new type 'Aggregator'. During this refactor, code that makes local allocations while evaluating exprs was moved from the ExecNode (now AggregationNode/StreamingAggregationNode) into the Aggregators, but code related to cleaning these allocations up (ie QueryMaintenance()) was not, resulting in some queries using an excessive amount of memory. This patch removes all calls to QueryMaintenance() from the exec nodes and moves them into the Aggregators. Testing: - Added new test cases with a mem limit that fails if the expr allocations aren't released in a timely manner. - Passed a full exhaustive run. Change-Id: I4dac2bb0a15cdd7315ee15608bae409c125c82f5 Reviewed-on: http://gerrit.cloudera.org:8080/10871 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-13 03:04:35 +00:00
poojanilangekar	c6f9b61ec2	IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans This change ensures that the planner computes parquet conjuncts only when for scans containing parquet files. Additionally, it also handles PARQUET_DICTIONARY_FILTERING and PARQUET_READ_STATISTICS query options in the planner. Testing was carried out independently on parquet and non-parquet scans: 1. Parquet scans were tested via the existing parquet-filtering planner test. Additionally, a new test [parquet-filtering-disabled] was added to ensure that the explain plan generated skips parquet predicates based on the query options. 2. Non-parquet scans were tested manually to ensure that the functions to compute parquet conjucts were not invoked. Additional test cases were added to the parquet-filtering planner test to scan non parquet tables and ensure that the plans do not contain conjuncts based on parquet statistics. 3. A parquet partition was added to the alltypesmixedformat table in the functional database. Planner tests were added to ensure that Parquet conjuncts are constructed only when the Parquet partition is included in the query. Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7 Reviewed-on: http://gerrit.cloudera.org:8080/10704 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-06 02:06:50 +00:00
Tianyi Wang	61e6a47776	IMPALA-7236: Fix the parsing of ALLOW_ERASURE_CODED_FILES This patch adds a missing "break" statement in a switch statement changed by IMPALA-7102. Also fixes an non-deterministic test case. Change-Id: Ife1e791541e3f4fed6bec00945390c7d7681e824 Reviewed-on: http://gerrit.cloudera.org:8080/10857 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-03 23:49:44 +00:00
Bikramjeet Vig	30e82c63ec	IMPALA-7190: Remove unsupported format writer support This patch removes write support for unsupported formats like Sequence, Avro and compressed text. Also, the related query options ALLOW_UNSUPPORTED_FORMATS and SEQ_COMPRESSION_MODE have been migrated to the REMOVED query options type. Testing: Ran exhaustive build. Change-Id: I821dc7495a901f1658daa500daf3791b386c7185 Reviewed-on: http://gerrit.cloudera.org:8080/10823 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-03 20:34:27 +00:00
Taras Bobrovytsky	8060f4d50e	IMPALA-7102 (Part 1): Disable reading of erasure coding by default In this patch we add a query option ALLOW_ERASURE_CODED_FILES, that allows us to enable or disable the support of erasure coded files. Even though Impala should be able to handle HDFS erasure coded files already, this feature hasn't been tested thoroughly yet. Also, Impala lacks metrics, observability and DDL commands related to erasure coding. This is a query option instead of a startup flag because we want to make it possible for advanced users to enable the feature. We may also need a follow on patch to also disable the write path with this flag. Cherry-picks: not for 2.x Change-Id: Icd3b1754541262467a6e67068b0b447882a40fb3 Reviewed-on: http://gerrit.cloudera.org:8080/10646 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-06-29 23:26:35 +00:00
poojanilangekar	e988c36bf0	IMPALA-6305: Allow column definitions in ALTER VIEW This change adds support to change column definitions in ALTER VIEW statements. This support only required minor changes in the parser and the AlterViewStmt constructor. Here's an example syntax: alter view foo (a, b comment 'helloworld') as select * from bar; describe foo; +------+--------+------------+ \| name \| type \| comment \| +------+--------+------------+ \| a \| string \| \| \| b \| string \| helloworld \| +------+--------+------------+ The following tests were modified: 1. ParserTest - To check that the parser handles column definitions for alter view statements. 2. AnalyzerDDLTest - To ensure that the analyzer supports the change column definitions parsed. 3. TestDdlStatements - To verify the end-to-end functioning of ALTER VIEW statements with change column definitions. 4. AuthorizationTest - To ensure that alter table commands with column definitions check permissions as expected. Change-Id: I6073444a814a24d97e80df15fcd39be2812f63fc Reviewed-on: http://gerrit.cloudera.org:8080/10720 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-06-27 03:41:47 +00:00
Attila Jeges	17749dbcfc	IMPALA-3307: Add support for IANA time-zone db Impala currently uses two different libraries for timestamp manipulations: boost and glibc. Issues with boost: - Time-zone database is currently hard coded in timezone_db.cc. Impala admins cannot update it without upgrading Impala. - Time-zone database is flat, therefore can’t track year-to-year changes. - Time-zone database is not updated on a regular basis. Issues with glibc: - Uses /usr/share/zoneinfo/ database which could be out of sync on some of the nodes in the Impala cluster. - Uses the host system’s local time-zone. Different nodes in the Impala cluster might use a different local time-zone. - Conversion functions take a global lock, which causes severe performance degradation. In addition to the issues above, the fact that /usr/share/zoneinfo/ and the hard-coded boost time-zone database are both in use is a source of inconsistency in itself. This patch makes the following changes: - Instead of boost and glibc, impalad uses Google's CCTZ to implement time-zone conversions. - Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to specify an HDFS/S3/ADLS path to a zip archive that contains the shared compiled IANA time-zone database. If the startup flag is set, impalad will use the specified time-zone database. Otherwise, impalad will use the default /usr/share/zoneinfo time-zone database. - Introduces a new startup flag (--hdfs_zone_alias_conf) to impalad to specify an HDFS/S3/ADLS path to a shared config file that contains definitions for non-standard time-zone aliases. - impalad reads the entire time-zone database into an in-memory map on startup for fast lookups. - The name of the coordinator node’s local time-zone is saved to the query context when preparing query execution. This time-zone is used whenever the current time-zone is referred afterwards in an execution node. - Adds a new ZipUtil class to extract files from a zip archive. The implementation is not vulnerable to Zip Slip. Cherry-picks: not for 2.x. Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 Reviewed-on: http://gerrit.cloudera.org:8080/9986 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Attila Jeges <attilaj@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-06-22 13:18:58 +00:00
Tim Armstrong	894ab8e980	IMPALA-7115: set a default THREAD_RESERVATION_LIMIT value The value is chosen to allow only queries that have a reasonable chance of succeeding, albeit with poor performance because of the high number of threads. Testing: Added a test to make sure that the default value rejects a large query. Change-Id: I31d3fa3f6305c360922649dba53a9026c9563384 Reviewed-on: http://gerrit.cloudera.org:8080/10628 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-06-19 03:02:49 +00:00
Michael Ho	51ff47d05e	IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() This change codegens the hash partitioning logic of KrpcDataStreamSender::Send() when the partitioning strategy is HASH_PARTITIONED. It does so by unrolling the loop which evaluates each row against the partitioning expressions and hashes the result. It also replaces the number of channels of that sender with a constant at runtime. With this change, we get reasonable speedup with some benchmarks: +------------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +------------+-----------------------+---------+------------+------------+----------------+ \| TPCH(_300) \| parquet / none / none \| 20.03 \| -6.44% \| 13.56 \| -7.15% \| +------------+-----------------------+---------+------------+------------+----------------+ +---------------------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +---------------------+-----------------------+---------+------------+------------+----------------+ \| TARGETED-PERF(_300) \| parquet / none / none \| 58.59 \| -5.56% \| 12.28 \| -5.30% \| +---------------------+-----------------------+---------+------------+------------+----------------+ +-------------------------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +-------------------------+-----------------------+---------+------------+------------+----------------+ \| TPCDS-UNMODIFIED(_1000) \| parquet / none / none \| 15.60 \| -3.10% \| 7.16 \| -4.33% \| +-------------------------+-----------------------+---------+------------+------------+----------------+ +-------------------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +-------------------+-----------------------+---------+------------+------------+----------------+ \| TPCH_NESTED(_300) \| parquet / none / none \| 30.93 \| -3.02% \| 17.46 \| -4.71% \| +-------------------+-----------------------+---------+------------+------------+----------------+ Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Reviewed-on: http://gerrit.cloudera.org:8080/10421 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-06-14 23:37:00 +00:00
Tim Armstrong	d8ed07f112	IMPALA-6035: Add query options to limit thread reservation Adds two options: THREAD_RESERVATION_LIMIT and THREAD_RESERVATION_AGGREGATE_LIMIT, which are both enforced by admission control based on planner resource requirements and the schedule. The mechanism used is the same as the minimum reservation checks. THREAD_RESERVATION_LIMIT limits the total number of reserved threads in fragments scheduled on a single backend. THREAD_RESERVATION_AGGREGATE_LIMIT limits the sum of reserved threads across all fragments. This also slightly improves the minimum reservation error message to include the host name. Testing: Added end-to-end tests that exercise the code paths. Ran core tests. Change-Id: I5b5bbbdad5cd6b24442eb6c99a4d38c2ad710007 Reviewed-on: http://gerrit.cloudera.org:8080/10365 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-06-14 03:25:55 +00:00
Thomas Tauber-Marshall	bf2124bf30	IMPALA-6929: Support multi-column range partitions for Kudu Kudu allows specifying range partitions over multiple columns. Impala already has support for doing this when the partitions are specified with '=', but if the partitions are specified with '<' or '<=', the parser would return an error. This patch modifies the parser to allow for creating Kudu tables like: create table kudu_test (a int, b int, primary key(a, b)) partition by range(a, b) (partition (0, 0) <= values < (1, 1)); and similary to alter partitions like: alter table kudu_test add range partition (1, 1) <= values < (2, 2); Testing: - Modified functional_kudu.jointbl's schema so that we have a table in functional with a multi-column range partition to test things against. - Added FE and E2E tests for CREATE and ALTER. Change-Id: I0141dd3344a4f22b186f513b7406f286668ef1e7 Reviewed-on: http://gerrit.cloudera.org:8080/10441 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-06-13 00:10:13 +00:00
Zoltan Borok-Nagy	e6ca7ca14d	IMPALA-7108: IllegalStateException hit during CardinalityCheckNode.<init> Since IMPALA-6314 on runtime scalar subqueries we set LIMIT 2 in StmtRewriter.mergeExpr(). We do that because later we add a CardinalityCheckNode on top of such subqueries and with LIMIT 2 we can still check if they return more than one row. In the constructor of CardinalityCheckNode there is a precondition that checks if the child node has LIMIT 2 to be certain that we've set the limit for all the necessary cases. However, some subqueries will get a LIMIT 1 later breaking the precondition in CardinalityCheckNode. An example to these subqueries is a select stmt that selects from an inline view that returns a single row: select * from functional.alltypes where int_col = (select f.id from ( select * from functional.alltypes limit 1) f); Note that we shouldn't add a CardinalityCheckNode to the plan of this query in the first place. To generate a proper plan I updated SelectStmt.returnsSingleRow() because this method didn't handle this case well. I also changed the precondition from Preconditions.checkState(child.getLimit() == 2); to Preconditions.checkState(child.getLimit() <= 2); in order to be more permissive. I added tests for the aforementioned query. Change-Id: I82a7a3fe26db3e12131c030c4ad055a9c4955407 Reviewed-on: http://gerrit.cloudera.org:8080/10605 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-06-08 20:15:50 +00:00
Lars Volker	c9e8f2f7e7	IMPALA-7008: Rewrite query to make it not return 100M rows One query in spilling.test is expected to fail. When it does not fail, it returned 100M rows, which would then cause the Python test code to consume memory until it gets OOM-killed by the kernel. To fix this, we rewrite the query. I tested this locally to make sure that the query still fails as expected on HDFS. Change-Id: I31956d3092a7e69b979f631df3a6dfda14ebe140 Reviewed-on: http://gerrit.cloudera.org:8080/10597 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-06-05 01:14:35 +00:00
Thomas Tauber-Marshall	ba7893cb9e	IMPALA-6338: Disable more flaky bloom filter tests Until IMPALA-6338 is fixed, temporarily disable tests that are affected by it - any test that has a 'limit' and relies on the contents of the runtime profile. This patch disables the runtime profile check for all such tests in bloom_filter.test Change-Id: Ifc9da892efa3b27d63056ad8e3befac82808ffdb Reviewed-on: http://gerrit.cloudera.org:8080/10530 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-05-30 08:00:50 +00:00
Tim Armstrong	f4f28d310c	IMPALA-6941: load more text scanner compression plugins Add extensions for LZ4 and ZSTD (which are supported by Hadoop). Even without a plugin this results in better behaviour because we don't try to treat the files with unknown extensions as uncompressed text. Also allow loading tables containing files with unsupported compression types. There was weird behaviour before we knew of the file extension but didn't support querying the table - the catalog would load the table but the impalad would fail processing the catalog update. The simplest way to fix it is to just allow loading the tables. Similarly, make the "LOAD DATA" operation more permissive - we can copy files into a directory even if we can't decompress them. Switch to always checking plugin version - running mismatched plugin is inherently unsafe. Testing: Positive case where LZO is loaded is exercised. Added coverage for negative case where LZO is disabled. Fixed test gaps: * Querying LZO table with LZO plugin not available. * Interacting with tables with known but unsupported text compressions. * Querying files with unknown compression suffixes (which are treated as uncompressed text). Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 Reviewed-on: http://gerrit.cloudera.org:8080/10165 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-05-18 03:44:46 +00:00
Zoltan Borok-Nagy	ccf19f9f8f	IMPALA-5842: Write page index in Parquet files This commit builds on the previous work of Pooja Nilangekar: https://gerrit.cloudera.org/#/c/7464/ The commit implements the write path of PARQUET-922: "Add column indexes to parquet.thrift". As specified in the parquet-format, Impala writes the page indexes just before the footer. This allows much more efficient page filtering than using the same information from the 'statistics' field of DataPageHeader. I updated Pooja's python tests as well. Change-Id: Icbacf7fe3b7672e3ce719261ecef445b16f8dec9 Reviewed-on: http://gerrit.cloudera.org:8080/9693 Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-05-17 20:22:02 +00:00
Lars Volker	a64cfc523e	IMPALA-7032: Disable codegen for CHAR type null literals Analogous to IMPALA-6435, we have to disable codegen for CHAR type null literals. Otherwise we will crash in impala::NullLiteral::GetCodegendComputeFn(). This change adds a test to make sure that the crash is fixed. Change-Id: I34033362263cf1292418f69c5ca1a3b84aed39a9 Reviewed-on: http://gerrit.cloudera.org:8080/10409 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Lars Volker <lv@cloudera.com>	2018-05-16 00:00:15 +00:00
Zoltan Borok-Nagy	fab65d4479	IMPALA-7022: TestQueries.test_subquery: Subquery must not return more than one row TestQueries.test_subquery sometimes fails during exhaustive tests. In the tests we expect to catch an exception that is prefixed by the "Query aborted:" string. The prefix is usually added by impala_beeswax.py::wait_for_completion(), but in rare cases it isn't added. From the point of the test it is irrelevant if the exception is prefixed with "Query aborted:" or not, so I removed it from the expected exception string. Change-Id: I3b8655ad273b1dd7a601099f617db609e4a4797b Reviewed-on: http://gerrit.cloudera.org:8080/10407 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-05-15 23:37:06 +00:00
Tim Armstrong	3661100fa3	IMPALA-6645: Enable disk spill encryption by default Perf: Targeted benchmarks with a heavily spilling query on a machine with PCLMULQDQ support show < 5% of CPU time spent in encryption and decryption. PCLMULQDQ was introduced in AMD Bulldozer (c. 2011) and Intel Westmere (c. 2010). Testing: Ran core tests with the change. Updated the custom cluster test to exercise the non-default configuration. Change-Id: Iee4be2a95d689f66c3663d99e4df0fb3968893a9 Reviewed-on: http://gerrit.cloudera.org:8080/10345 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-05-15 22:23:14 +00:00
Tim Armstrong	e12ee485cf	IMPALA-6957: calc thread resource requirement in planner This only factors in fragment execution threads. E.g. this does not try to account for the number of threads on the old Thrift RPC code path if that is enabled. This is loosely related to the old VCores estimate, but is different in that it: * Directly ties into the notion of required threads in ThreadResourceMgr. * Is a strict upper bound on the number of such threads, rather than an estimate. Does not include "optional" threads. ThreadResourceMgr in the backend bounds the number of "optional" threads per impalad, so the number of execution threads on a backend is limited by sum(required threads per query) + CpuInfo::num_cores() * FLAGS_num_threads_per_core DCHECKS in the backend enforce that the calculation is correct. They were actually hit in KuduScanNode because of some races in thread management leading to multiple "required" threads running. Now the first thread in the multithreaded scans never exits, which means that it's always safe for any of the other threads to exit early, which simplifies the logic a lot. Testing: Updated planner tests. Ran core tests. Change-Id: I982837ef883457fa4d2adc3bdbdc727353469140 Reviewed-on: http://gerrit.cloudera.org:8080/10256 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-05-12 01:43:37 +00:00
Tim Armstrong	25c13bfdd6	IMPALA-7010: don't run memory usage tests on non-HDFS Moved a number of tests with tuned mem_limits. In some cases this required separating the tests from non-tuned functional tests. TestQueryMemLimit used very high and very low limits only, so seemed safe to run in all configurations. Change-Id: I9686195a29dde2d87b19ef8bb0e93e08f8bee662 Reviewed-on: http://gerrit.cloudera.org:8080/10370 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-05-11 22:41:49 +00:00
Sailesh Mukil	f13abdca67	IMPALA-6975: TestRuntimeRowFilters.test_row_filters failing with Memory limit exceeded This test has started failing relatively frequently. We think that this may be due to timing differences of when RPCs arrive from the recent changes with KRPC. Increasing the memory limit should allow this test to pass consistently. Change-Id: Ie39482e2a0aee402ce156b11cce51038cff5e61a Reviewed-on: http://gerrit.cloudera.org:8080/10315 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-05-05 03:01:35 +00:00
Thomas Tauber-Marshall	ba84ad03cb	IMPALA-6954: Fix problems with CTAS into Kudu with an expr rewrite This patch fixes two problems: - Previously a CTAS into a Kudu table where an expr rewrite occurred would create an unpartitioned table, due to the partition info being reset in TableDataLayout and then never reconstructed. Since the Kudu partition info is set by the parser and never changes, the solution is to not reset it. - Previously a CTAS into a Kudu table with a range partition where an expr rewrite occurred would fail with an analysis exception due to a Precondition check in RangePartition.analyze that checked that the RangePartition wasn't already analyzed, as the analysis can't be done twice. Since the state in RangePartition never changes, it doesn't need to be reanalyzed and we can just return instead of failing on the check. Testing: - Added an e2e test that creates a partitioned Kudu table with a CTAS with a rewrite, and checks that the expected partitions are created. Change-Id: I731743bd84cc695119e99342e1b155096147f0ed Reviewed-on: http://gerrit.cloudera.org:8080/10251 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-05-02 02:54:23 +00:00
Tim Armstrong	418c705787	IMPALA-6679,IMPALA-6678: reduce scan reservation This has two related changes. IMPALA-6679: defer scanner reservation increases ------------------------------------------------ When starting each scan range, check to see how big the initial scan range is (the full thing for row-based formats, the footer for Parquet) and determine whether more reservation would be useful. For Parquet, base the ideal reservation on the actual column layout of each file. This avoids reserving memory that we won't use for the actual files that we're scanning. This also avoid the need to estimate ideal reservation in the planner. We also release scanner thread reservations above the minimum as soon as threads complete, so that resources can be released slightly earlier. IMPALA-6678: estimate Parquet column size for reservation --------------------------------------------------------- This change also reduces reservation computed by the planner in certain cases by estimating the on-disk size of column data based on stats. It also reduces the default per-column reservation to 4MB since it appears that < 8MB columns are generally common in practice and the method for estimating column size is biased towards over-estimating. There are two main cases to consider for the performance implications: * Memory is available to improve query perf - if we underestimate, we can increase the reservation so we can do "efficient" 8MB I/Os for large columns. * The ideal reservation is not available - query performance is affected because we can't overlap I/O and compute as much and may do smaller (probably 4MB I/Os). However, we should avoid pathological behaviour like tiny I/Os. When stats are not available, we just default to reserving 4MB per column, which typically is more memory than required. When stats are available, the memory required can be reduced below when some heuristic tell us with high confidence that the column data for most or all files is smaller than 4MB. The stats-based heuristic could reduce scan performance if both the conservative heuristics significantly underestimate the column size and memory is constrained such that we can't increase the scan reservation at runtime (in which case the memory might be used by a different operator or scanner thread). Observability: Added counters to track when threads were not spawned due to reservation and to track when reservation increases are requested and denied. These allow determining if performance may have been affected by memory availability. Testing: Updated test_mem_usage_scaling.py memory requirements and added steps to regenerate the requirements. Loops test for a while to flush out flakiness. Added targeted planner and query tests for reservation calculations and increases. Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842 Reviewed-on: http://gerrit.cloudera.org:8080/9757 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-04-28 23:41:39 +00:00
Tim Armstrong	93d714c645	IMPALA-6560: fix regression test for IMPALA-2376 The test is modified to increase the size of collections allocated. num_nodes and mt_dop query options are set to make execution as deterministic as possible. I looped the test overnight to try to flush out flakiness. Adds support for row_regex lines in CATCH sections so that we can match a larger part of the error message. Change-Id: I024cb6b57647902b1735defb885cd095fd99738c Reviewed-on: http://gerrit.cloudera.org:8080/9681 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-04-28 23:41:39 +00:00
Tim Armstrong	d7bba82192	IMPALA-6587: free buffers before ScanRange::Cancel() returns ScanRange::Cancel() now waits until an in-flight read finishes so that the disk I/O buffer being processed by the disk thread is freed when Cancel() returns. The fix is to set a 'read_in_flight_' flag on the scan range while the disk thread is doing the read. Cancel() blocks until read_in_flight_ == false. The code is refactored to move more logic into ScanRange and to avoid holding RequestContext::lock_ for longer than necessary. Testing: Added query test that reproduces the issue. Added a unit test and a stress option that reproduces the problem in a targeted way. Ran disk-io-mgr-stress test for a few hours. Ran it under TSAN and inspected output to make sure there were no non-benign data races. Change-Id: I87182b6bd51b5fb0b923e7e4c8d08a44e7617db2 Reviewed-on: http://gerrit.cloudera.org:8080/9680 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-04-28 23:41:39 +00:00
Tim Armstrong	fb5dc9eb48	IMPALA-4835: switch I/O buffers to buffer pool This is the following squashed patches that were reverted. I will fix the known issues with some follow-on patches. ====================================================================== IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation In preparation for switching the I/O mgr to the buffer pool, this removes and cleans up a lot of code so that the switchover patch starts from a cleaner slate. * Remove the free buffer cache (which will be replaced by buffer pool's own caching). * Make memory limit exceeded error checking synchronous (in anticipation of having to propagate buffer pool errors synchronously). * Simplify error propagation - remove the (ineffectual) code that enqueued BufferDescriptors containing error statuses. * Document locking scheme better in a few places, make it part of the function signature when it seemed reasonable. * Move ReturnBuffer() to ScanRange, because it is intrinsically connected with the lifecycle of a scan range. * Separate external ReturnBuffer() and internal CleanUpBuffer() interfaces - previously callers of ReturnBuffer() were fudging the num_buffers_in_reader accounting to make the external interface work. * Eliminate redundant state in ScanRange: 'eosr_returned_' and 'is_cancelled_'. * Clarify the logic around calling Close() for the last BufferDescriptor. -> There appeared to be an implicit assumption that buffers would be freed in the order they were returned from the scan range, so that the "eos" buffer was returned last. Instead just count the number of outstanding buffers to detect the last one. -> Touching the is_cancelled_ field without holding a lock was hard to reason about - violated locking rules and it was unclear that it was race-free. * Remove DiskIoMgr::Read() to simplify the interface. It is trivial to inline at the callsites. This will probably regress performance somewhat because of the cache removal, so my plan is to merge it around the same time as switching the I/O mgr to allocate from the buffer pool. I'm keeping the patches separate to make reviewing easier. Testing: * Ran exhaustive tests * Ran the disk-io-mgr-stress-test overnight ====================================================================== IMPALA-4835: Part 2: Allocate scan range buffers upfront This change is a step towards reserving memory for buffers from the buffer pool and constraining per-scanner memory requirements. This change restructures the DiskIoMgr code so that each ScanRange operates with a fixed set of buffers that are allocated upfront and recycled as the I/O mgr works through the ScanRange. One major change is that ScanRanges get blocked when a buffer is not available and get unblocked when a client returns a buffer via ReturnBuffer(). I was able to remove the logic to maintain the blocked_ranges_ list by instead adding a separate set with all ranges that are active. There is also some miscellaneous cleanup included - e.g. reducing the amount of code devoted to maintaining counters and metrics. One tricky part of the existing code was the it called IssueInitialRanges() with empty lists of files and depended on DiskIoMgr::AddScanRanges() to not check for cancellation in that case. See IMPALA-6564/IMPALA-6588. I changed the logic to not try to issue ranges for empty lists of files. I plan to merge this along with the actual buffer pool switch, but separated it out to allow review of the DiskIoMgr changes separate from other aspects of the buffer pool switchover. Testing: * Ran core and exhaustive tests. ====================================================================== IMPALA-4835: Part 3: switch I/O buffers to buffer pool This is the final patch to switch the Disk I/O manager to allocate all buffer from the buffer pool and to reserve the buffers required for a query upfront. * The planner reserves enough memory to run a single scanner per scan node. * The multi-threaded scan node must increase reservation before spinning up more threads. * The scanner implementations must be careful to stay within their assigned reservation. The row-oriented scanners were most straightforward, since they only have a single scan range active at a time. A single I/O buffer is sufficient to scan the whole file but more I/O buffers can improve I/O throughput. Parquet is more complex because it issues a scan range per column and the sizes of the columns on disk are not known during planning. To deal with this, the reservation in the frontend is based on a heuristic involving the file size and # columns. The Parquet scanner can then divvy up reservation to columns based on the size of column data on disk. I adjusted how the 'mem_limit' is divided between buffer pool and non buffer pool memory for low mem_limits to account for the increase in buffer pool memory. Testing: * Added more planner tests to cover reservation calcs for scan node. * Test scanners for all file formats with the reservation denial debug action, to test behaviour when the scanners hit reservation limits. * Updated memory and buffer pool limits for tests. * Added unit tests for dividing reservation between columns in parquet, since the algorithm is non-trivial. Perf: I ran TPC-H and targeted perf locally comparing with master. Both showed small improvements of a few percent and no regressions of note. Cluster perf tests showed no significant change. Change-Id: I3ef471dc0746f0ab93b572c34024fc7343161f00 Reviewed-on: http://gerrit.cloudera.org:8080/9679 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-04-28 23:41:39 +00:00
Taras Bobrovytsky	d0f838b66a	IMPALA-6340,IMPALA-6518: Check that decimal types are compatible in FE In this patch we implement strict decimal type checking in the FE in various situations when DECIMAL_V2 is enabled. What is affected: - Union. If we union two decimals and it is not possible to come up with a decimal that will be able to contain all the digits, an error is thrown. For example, the union(decimal(20, 10), decimal(20, 20)) returns decimal(30, 20). However, for union(decimal(38, 0), decimal(38, 38)) the ideal return type would be decimal(76,38), but this is too large, so an error is thrown. - Insert. If we are inserting a decimal value into a column where we are not guaranteed that all digits will fit, an error is thrown. For example, inserting a decimal(38,0) value into a decimal(38,38) column. - Functions such as coalesce(). If we are unable to determine the output type that guarantees that all digits will fit from all the arguments, an error is thrown. For example, coalesce(decimal(38,38), decimal(38,0)) will throw an error. - Hash Join. When joining on two decimals, if a type cannot be determined that both columns can be cast to, we throw an error. For example, join on decimal(38,0) and decimal(38,38) will result in an error. To avoid these errors, you need to use CAST() on some of the decimals. In this patch we also change the output decimal calculation of decimal round, truncate and related functions. If these functions are a no-op, the resulting decimal type is the same as the input type. Testing: - Ran a core build which passed. Change-Id: Id406f4189e01a909152985fabd5cca7a1527a568 Reviewed-on: http://gerrit.cloudera.org:8080/9930 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-04-28 03:33:02 +00:00

1 2 3 4 5 ...

1125 Commits