impala

mirror of https://github.com/apache/impala.git synced 2025-12-20 10:29:58 -05:00

Author	SHA1	Message	Date
Tim Armstrong	c4e7977f5e	IMPALA-10351,IMPALA-9812: enable mt_dop for DML by default This allows setting mt_dop for any query with any configuration. Before this patch it was not supported for DML. --unlock_mt_dop and --mt_dop_auto_fallback are now ignored. Testing: * Updated tests to reflect new behaviour. * Removed irrelevant tests for fallback/validation. * Ran exhaustive tests. Change-Id: I66331481260fe4b69d9e95b0200029b14d230ade Reviewed-on: http://gerrit.cloudera.org:8080/16775 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-11-25 03:14:37 +00:00
Joe McDonnell	9125de7ae3	IMPALA-9318: Add admission control setting to cap MT_DOP This introduces the max-mt-dop setting for admission control. If a statement runs with an MT_DOP setting that exceeds the max-mt-dop, then the MT_DOP setting is downgraded to the max-mt-dop value. If max-mt-dop is set to a negative value, no limit is applied. max-mt-dop is set via the llama-site.xml and can be set at the daemon level or at the resource pool level. When there is no max-mt-dop setting, it defaults to -1, so no limit is applied. The max-mt-dop is evaluated once prior to query planning. The MT_DOP settings for queries past planning are not reevaluated if the policy changes. If a statement is downgraded, it's runtime profile contains a message explaining the downgrade: MT_DOP limited by admission control: Requested MT_DOP=9 reduced to MT_DOP=4. Testing: - Added custom cluster test with various max-mt-dop settings - Ran core tests Change-Id: I3affb127a5dca517591323f2b1c880aa4b38badd Reviewed-on: http://gerrit.cloudera.org:8080/16020 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-09 16:26:23 +00:00
Adam Tamas	7295edcc26	IMPALA-9680: Fixed compressed inserts failing Modified the insert testfiles to get which database they need to use for 'CREATE TABLE LIKE' dynamically. Tests: Did targeted exhaustive testruns in test_insert.py and test_mt_dop.py and did a full exhaustive testrun. Change-Id: Ib3c7ba02190f57a7ed40311c95a3dd9eca9b474d Reviewed-on: http://gerrit.cloudera.org:8080/15816 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2020-05-11 19:32:08 +00:00
Adam Tamas	c32849a391	IMPALA-8980: Remove functional*.alltypesinsert from EE tests -Modified the ‘test_insert.py’ so the tests can run parallel. -Every test will create its own temporary tables for insert testing. -Swapped out the SETUP tags to Truncate table QUERY statement. -Becouse the SETUP tag is not used anymore, the correspondig code was removed. -A test query in ‘insert.test’. The test was incorrect so modified to test for the right behavior. Testing: -tests/run-tests.py query_test/test_insert.py -impala-py.test tests/query_test/test_insert.py -the same for test_insert_permutation.py and test_load.py Change-Id: I257e936868917a2fcc6c030f6c855b247e8a0eea Reviewed-on: http://gerrit.cloudera.org:8080/15529 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-14 12:18:21 +00:00
Tim Armstrong	ab7e209d1b	IMPALA-9099: allow mt_dop for joins without feature flag This allows running any read-only query with mt_dop > 0. Before this patch, no joins were allowed with mt_dop > 0. Previous patches, particularly IMPALA-9156, added significantly more code coverage for multithreading+joins. It should be safe to allow enabling on a query-by-query basis. Many improvements are still planned - see IMPALA-3902. So behaviour and performance characteristics of mt_dop > 0 with more complex plans and joins will continue to change. Testing: Updated the mt_dop validation tests and remove redundant planner test that doesn't provide much additional coverage of the validation support. Ran exhaustive tests. Change-Id: I9c6566abb239db0e775f2beaa25a62c36313cd6f Reviewed-on: http://gerrit.cloudera.org:8080/15545 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-31 20:45:59 +00:00
Tim Armstrong	0bb056e525	IMPALA-4224: execute separate join builds fragments This enables parallel plans with the join build in a separate fragment and fixes all of the ensuing fallout. After this change, mt_dop plans with joins have separate build fragments. There is still a 1:1 relationship between join nodes and builders, so the builders are only accessed by the join node's thread after it is handed off. This lets us defer the work required to make PhjBuilder and NljBuilder safe to be shared between nodes. Planner changes: * Combined the parallel and distributed planning code paths. * Misc fixes to generate reasonable thrift structures in the query exec requests, i.e. containing the right nodes. * Fixes to resource calculations for the separate build plans. Calculate separate join/build resource consumption. Simplified the resource estimation by calculating resource consumption for each fragment separately, and assuming that all fragments hit their peak resource consumption at the same time. IMPALA-9255 is the follow-on to make the resource estimation more accurate. Scheduler changes: * Various fixes to handle multiple TPlanExecInfos correctly, which are generated by the planner for the different cohorts. * Add logic to colocate build fragments with parent fragments. Runtime filter changes: * Build sinks now produce runtime filters, which required planner and coordinator fixes to handle. DataSink changes: * Close the input plan tree before calling FlushFinal() to release resources. This depends on Send() not holding onto references to input batches, which was true except for NljBuilder. This invariant is documented. Join builder changes: * Add a common base class for PhjBuilder and NljBuilder with functions to handle synchronisation with the join node. * Close plan tree earlier in FragmentInstanceState::Exec() so that peak resource requirements are lower. * The NLJ always copies input batches, so that it can close its input tree. JoinNode changes: * Join node blocks waiting for build-side to be ready, then eventually signals that it's done, allowing the builder to be cleaned up. * NLJ and PHJ nodes handle both the integrated builder and the external builder. There is a 1:1 relationship between the node and the builder, so we don't deal with thread safety yet. * Buffer reservations are transferred between the builder and join node when running with the separate builder. This is not really necessary right now, since it is all single-threaded, but will be important for the shared broadcast. - The builder transfers memory for probe buffers to the join node at the end of each build phase. - At end of each probe phase, reservation needs to be handed back to builder (or released). ExecSummary changes: * The summary logic was modified to handle connecting fragments via join builds. The logic is an extension of what was used for exchanges. Testing: * Enable --unlock_mt_dop for end-to-end tests * Migrate some tests to run as part of end-to-end tests instead of custom cluster. * Add mt_dop dimension to various end-to-end tests to provide coverage of join queries, spill-to-disk and cancellation. * Ran a single node TPC-H and TPC-DS stress test with mt_dop=0 and mt_dop=4. Perf: * Ran TPC-H scale factor 30 locally with mt_dop=0. No significant change. Change-Id: I4403c8e62d9c13854e7830602ee613f8efc80c58 Reviewed-on: http://gerrit.cloudera.org:8080/14859 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-02-20 01:51:54 +00:00
Tim Armstrong	f38da0df8e	IMPALA-4400: aggregate runtime filters locally Move RuntimeFilterBank to QueryState(). Implement fine-grained locking for each filter to mitigate any increased lock contention from the change. Make RuntimeFilterBank handle multiple producers of the same filter, e.g. multiple instances of a partitioned join. It computes the expected number of filters upfront then sends the filter to the coordinator once all the local instances have been merged together. The merging can be done in parallel locally to improve latency of filter propagation. Add Or() methods to MinMaxFilter and BloomFilter, since we now need to merge those, not just the thrift versions. Update coordinator filter routing to expect only one instance of a filter from each producer backend and to only send one instance to each consumer backend (instead of sending one per fragment). Update memory reservations and estimates to be lower to account for sharing of filters between fragment instances. mt_dop plans are modified to show these shared and non-shared resources separately. Enable waiting for runtime filters for kudu scanner with mt_dop. Made min/max filters const-correct. Testing * Added unit tests for Or() methods. * Added some additional e2e test coverage for mt_dop queries * Updated planner tests with new estimates and reservation. * Ran a single node 3-impalad stress test with TPC-H kudu and TPC-DS parquet. * Ran exhaustive tests. * Ran core tests with ASAN. Perf * Did a single-node perf run on TPC-H with default settings. No perf change. * Single-node perf run with mt_dop=8 showed significant speedups: +----------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +----------+-----------------------+---------+------------+------------+----------------+ \| TPCH(30) \| parquet / none / none \| 10.14 \| -7.29% \| 5.05 \| -11.68% \| +----------+-----------------------+---------+------------+------------+----------------+ +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+ \| Workload \| Query \| File Format \| Avg(s) \| Base Avg(s) \| Delta(Avg) \| StdDev(%) \| Base StdDev(%) \| Iters \| Median Diff(%) \| MW Zval \| Tval \| +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+ \| TPCH(30) \| TPCH-Q7 \| parquet / none / none \| 38.87 \| 38.44 \| +1.13% \| 7.17% \| * 10.92% * \| 20 \| +0.72% \| 0.72 \| 0.39 \| \| TPCH(30) \| TPCH-Q1 \| parquet / none / none \| 4.28 \| 4.26 \| +0.50% \| 1.92% \| 1.09% \| 20 \| +0.03% \| 0.31 \| 1.01 \| \| TPCH(30) \| TPCH-Q22 \| parquet / none / none \| 2.32 \| 2.32 \| +0.05% \| 2.01% \| 1.89% \| 20 \| -0.03% \| -0.36 \| 0.08 \| \| TPCH(30) \| TPCH-Q15 \| parquet / none / none \| 3.73 \| 3.75 \| -0.42% \| 0.84% \| 1.05% \| 20 \| -0.25% \| -0.77 \| -1.40 \| \| TPCH(30) \| TPCH-Q13 \| parquet / none / none \| 9.80 \| 9.83 \| -0.38% \| 0.51% \| 0.80% \| 20 \| -0.32% \| -1.30 \| -1.81 \| \| TPCH(30) \| TPCH-Q2 \| parquet / none / none \| 1.98 \| 2.00 \| -1.32% \| 1.74% \| 2.81% \| 20 \| -0.64% \| -1.71 \| -1.79 \| \| TPCH(30) \| TPCH-Q6 \| parquet / none / none \| 1.22 \| 1.25 \| -2.14% \| 2.66% \| 4.15% \| 20 \| -0.96% \| -2.00 \| -1.95 \| \| TPCH(30) \| TPCH-Q19 \| parquet / none / none \| 5.13 \| 5.22 \| -1.65% \| 1.20% \| 1.40% \| 20 \| -1.76% \| -3.34 \| -4.02 \| \| TPCH(30) \| TPCH-Q16 \| parquet / none / none \| 2.46 \| 2.56 \| -4.13% \| 2.49% \| 1.99% \| 20 \| -4.31% \| -4.04 \| -5.94 \| \| TPCH(30) \| TPCH-Q9 \| parquet / none / none \| 81.63 \| 85.07 \| -4.05% \| 4.94% \| 3.06% \| 20 \| -5.46% \| -3.28 \| -3.21 \| \| TPCH(30) \| TPCH-Q10 \| parquet / none / none \| 5.07 \| 5.50 \| I -7.92% \| 0.96% \| 1.33% \| 20 \| I -8.51% \| -5.27 \| -22.14 \| \| TPCH(30) \| TPCH-Q21 \| parquet / none / none \| 24.00 \| 26.24 \| I -8.57% \| 0.46% \| 0.38% \| 20 \| I -9.34% \| -5.27 \| -67.47 \| \| TPCH(30) \| TPCH-Q18 \| parquet / none / none \| 8.66 \| 9.50 \| I -8.86% \| 0.62% \| 0.44% \| 20 \| I -9.75% \| -5.27 \| -55.17 \| \| TPCH(30) \| TPCH-Q3 \| parquet / none / none \| 6.01 \| 6.70 \| I -10.19% \| 1.01% \| 0.90% \| 20 \| I -11.25% \| -5.27 \| -35.76 \| \| TPCH(30) \| TPCH-Q12 \| parquet / none / none \| 2.98 \| 3.39 \| I -12.23% \| 1.48% \| 1.48% \| 20 \| I -13.56% \| -5.27 \| -27.75 \| \| TPCH(30) \| TPCH-Q11 \| parquet / none / none \| 1.69 \| 2.00 \| I -15.55% \| 1.63% \| 1.47% \| 20 \| I -18.09% \| -5.27 \| -34.60 \| \| TPCH(30) \| TPCH-Q4 \| parquet / none / none \| 2.42 \| 2.87 \| I -15.69% \| 1.48% \| 1.26% \| 20 \| I -18.61% \| -5.27 \| -39.50 \| \| TPCH(30) \| TPCH-Q14 \| parquet / none / none \| 4.64 \| 6.27 \| I -26.02% \| 1.35% \| 0.73% \| 20 \| I -35.37% \| -5.27 \| -94.07 \| \| TPCH(30) \| TPCH-Q20 \| parquet / none / none \| 3.19 \| 4.37 \| I -27.01% \| 1.54% \| 0.99% \| 20 \| I -36.85% \| -5.27 \| -80.74 \| \| TPCH(30) \| TPCH-Q5 \| parquet / none / none \| 4.57 \| 6.39 \| I -28.36% \| 1.04% \| 0.75% \| 20 \| I -39.56% \| -5.27 \| -120.02 \| \| TPCH(30) \| TPCH-Q17 \| parquet / none / none \| 3.15 \| 4.71 \| I -33.06% \| 1.59% \| 1.31% \| 20 \| I -49.43% \| -5.27 \| -87.64 \| \| TPCH(30) \| TPCH-Q8 \| parquet / none / none \| 5.25 \| 7.95 \| I -33.95% \| 0.95% \| 0.53% \| 20 \| I -51.11% \| -5.27 \| -185.02 \| +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+ Change-Id: Iabeeab5eec869ff2197250ad41c1eb5551704acc Reviewed-on: http://gerrit.cloudera.org:8080/14538 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-01-29 00:58:24 +00:00
Tim Armstrong	023e92f5e7	IMPALA-9302: disable ineffective filters for mt_dop > 0 The optimisation of disabling ineffective row-level runtime filters was not implemented in the MT scan code paths, because the ProcessSplit() functions, where it was implemented, are not used for mt_dop > 0. This change adds it to HdfsScanner::GetNext(), which is used for mt_dop > 0 but not mt_dop = 0. Testing: Run existing runtime row filters test with mt_dop. This reproduced the issue before I fixed it. Change-Id: I8a55a9d4ac9e0d93cb3675dd2d5da086cb7d941d Reviewed-on: http://gerrit.cloudera.org:8080/15065 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-01-18 02:05:54 +00:00
Sahil Takiar	e8fda1f224	IMPALA-9117, IMPALA-7726: Fixed a few unit tests for ABFS This test makes the following changes / fixes when running Impala tests on ABFS: * Skips some tests in test_lineage.py that don't work on ABFS / ADLS (they were already skipped for S3) * Skips some tests in test_mt_dop.py; the test creates a directory that ends with a period (and ABFS does not support writing files or directories that end with a period) * Removes the ABFS skip flag SkipIfABFS.trash (IMPALA-7726: Drop with purge tests fail against ABFS due to trash misbehavior"); I removed these flags and looped the tests overnight with no failures, so it is likely whatever bug was causing this has now been fixed * Now that HADOOP-15860 has been resolved, and the agreed upon behavior for ABFS is that it will fail if a client tries to write a file / directory that ends with a period, I added a new entry to the SkipIfABFS class called file_or_folder_name_ends_with_period and applied it where necessary Testing: * Ran core tests on ABFS Change-Id: I18ae5b0f7de6aa7628a1efd780ff30a0cc3c5285 Reviewed-on: http://gerrit.cloudera.org:8080/14636 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-11-06 05:44:01 +00:00
Tim Armstrong	e1f9bd6349	IMPALA-9087: disable mt_dop_fallback for non-HDFS The test was tuned for the 3 node minicluster and failed with HDFS erasure coding. Change-Id: I13c31b843bc5a8d5624a8985c3e0c7bcc2b936e6 Reviewed-on: http://gerrit.cloudera.org:8080/14543 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2019-10-24 23:49:06 +00:00
Tim Armstrong	96143cfcf1	IMPALA-8997: auto fallback to mt_dop=0 Add a temporary --mt_dop_auto_fallback to allow a graceful transition to using mt_dop for workloads. When this flag is set, DML queries and joins that would otherwise fail with an error when run with mt_dop > 0 fall back to running with mt_dop = 0. This means that a user can set mt_dop for their queries and it will only take effect when supported. The behaviour generally does not change when this flag is not set, with a couple of exceptions: * I made mt_dop automatic for compute stats on all file formats * mt_dop is allowed for single node plans with inserts. The quirky validatePlan() logic previously disallowed this but allowed joins in single node plans. The checks added by this patch can be removed safely once mt_dop is supported by default for all queries. This includes some cleanup: * isDmlStmt() was stale and incorrectly implemented. * Various TreeNode methods did not return instances of subclasses of the requested class, which was strange. This fix is required to make 'contains(JoinNode.class)' work correctly. I checked the callsites of the fixed functions and none of them would be affected by this change because they specified a terminal class without any subclasses. I didn't actually use this fix in the end (I had to write a custom tree traversal in hasUnsupportedMtDopJoin()), but figured I would leave the improvement in here. Testing: Add some basic functional tests ensuring that the fallback takes effect. Run basic join and insert tests with this flag enabled. Change-Id: Ie0d73d8744059874293697c8e104891a10dba04d Reviewed-on: http://gerrit.cloudera.org:8080/14344 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-10-22 00:31:01 +00:00
Tim Armstrong	9090fc239b	IMPALA-8097: mt_dop for all queries via hidden flag --unlock_mt_dop=true unlocks mt_dop for all queries including joins and inserts. This disables the parallel plans with separate join builds when running standalone, because these are not executable until IMPALA-4224 is implemented. Inserts work without modification - they were disabled because of lack of testing and the possibility for generating many small files with unpartitioned inserts - see IMPALA-8125. Testing: Add custom cluster test that exercise joins, runtime filters and inserts as a sanity check for the flag. Ran exhaustive build. Manually ran TPC-H and TPC-DS tests against a minicluster with mt_dop = 4. Change-Id: I72f0b02a005e8bf22fd17b8fb5aabf8c0d9b6b15 Reviewed-on: http://gerrit.cloudera.org:8080/12257 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-03-19 01:20:13 +00:00

12 Commits