Commit Graph

12 Commits

Author SHA1 Message Date
Tim Armstrong
c4e7977f5e IMPALA-10351,IMPALA-9812: enable mt_dop for DML by default
This allows setting mt_dop for any query with any configuration.
Before this patch it was not supported for DML.

--unlock_mt_dop and --mt_dop_auto_fallback are now ignored.

Testing:
* Updated tests to reflect new behaviour.
* Removed irrelevant tests for fallback/validation.
* Ran exhaustive tests.

Change-Id: I66331481260fe4b69d9e95b0200029b14d230ade
Reviewed-on: http://gerrit.cloudera.org:8080/16775
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-25 03:14:37 +00:00
Joe McDonnell
9125de7ae3 IMPALA-9318: Add admission control setting to cap MT_DOP
This introduces the max-mt-dop setting for admission
control. If a statement runs with an MT_DOP setting that
exceeds the max-mt-dop, then the MT_DOP setting is
downgraded to the max-mt-dop value. If max-mt-dop is set
to a negative value, no limit is applied. max-mt-dop is
set via the llama-site.xml and can be set at the daemon
level or at the resource pool level. When there is no
max-mt-dop setting, it defaults to -1, so no limit is
applied. The max-mt-dop is evaluated once prior to query
planning. The MT_DOP settings for queries past planning
are not reevaluated if the policy changes.

If a statement is downgraded, it's runtime profile contains
a message explaining the downgrade:
MT_DOP limited by admission control: Requested MT_DOP=9 reduced to MT_DOP=4.

Testing:
 - Added custom cluster test with various max-mt-dop settings
 - Ran core tests

Change-Id: I3affb127a5dca517591323f2b1c880aa4b38badd
Reviewed-on: http://gerrit.cloudera.org:8080/16020
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-06-09 16:26:23 +00:00
Adam Tamas
7295edcc26 IMPALA-9680: Fixed compressed inserts failing
Modified the insert testfiles to get which database they need to
use for 'CREATE TABLE LIKE' dynamically.

Tests:
Did targeted exhaustive testruns in test_insert.py and
test_mt_dop.py and did a full exhaustive testrun.

Change-Id: Ib3c7ba02190f57a7ed40311c95a3dd9eca9b474d
Reviewed-on: http://gerrit.cloudera.org:8080/15816
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2020-05-11 19:32:08 +00:00
Adam Tamas
c32849a391 IMPALA-8980: Remove functional*.alltypesinsert from EE tests
-Modified the ‘test_insert.py’ so the tests can run parallel.
  -Every test will create its own temporary tables for insert testing.
-Swapped out the  SETUP tags to Truncate table QUERY statement.
  -Becouse the SETUP tag is not used anymore, the correspondig
  code was removed.
-A test query in ‘insert.test’. The test was incorrect so modified
to test for the right behavior.

Testing:
-tests/run-tests.py query_test/test_insert.py
-impala-py.test tests/query_test/test_insert.py
-the same for test_insert_permutation.py and test_load.py

Change-Id: I257e936868917a2fcc6c030f6c855b247e8a0eea
Reviewed-on: http://gerrit.cloudera.org:8080/15529
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-14 12:18:21 +00:00
Tim Armstrong
ab7e209d1b IMPALA-9099: allow mt_dop for joins without feature flag
This allows running *any* read-only query with mt_dop > 0.
Before this patch, no joins were allowed with mt_dop > 0.

Previous patches, particularly IMPALA-9156, added significantly
more code coverage for multithreading+joins. It should be safe to
allow enabling on a query-by-query basis. Many improvements are
still planned - see IMPALA-3902. So behaviour and performance
characteristics of mt_dop > 0 with more complex plans and joins
will continue to change.

Testing:
Updated the mt_dop validation tests and remove redundant planner test
that doesn't provide much additional coverage of the validation
support.

Ran exhaustive tests.

Change-Id: I9c6566abb239db0e775f2beaa25a62c36313cd6f
Reviewed-on: http://gerrit.cloudera.org:8080/15545
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-03-31 20:45:59 +00:00
Tim Armstrong
0bb056e525 IMPALA-4224: execute separate join builds fragments
This enables parallel plans with the join build in a
separate fragment and fixes all of the ensuing fallout.
After this change, mt_dop plans with joins have separate
build fragments. There is still a 1:1 relationship between
join nodes and builders, so the builders are only accessed
by the join node's thread after it is handed off. This lets
us defer the work required to make PhjBuilder and NljBuilder
safe to be shared between nodes.

Planner changes:
* Combined the parallel and distributed planning code paths.
* Misc fixes to generate reasonable thrift structures in the
  query exec requests, i.e. containing the right nodes.
* Fixes to resource calculations for the separate build plans.
** Calculate separate join/build resource consumption.
** Simplified the resource estimation by calculating resource
   consumption for each fragment separately, and assuming that
   all fragments hit their peak resource consumption at the
   same time. IMPALA-9255 is the follow-on to make the resource
   estimation more accurate.

Scheduler changes:
* Various fixes to handle multiple TPlanExecInfos correctly,
  which are generated by the planner for the different cohorts.
* Add logic to colocate build fragments with parent fragments.

Runtime filter changes:
* Build sinks now produce runtime filters, which required
  planner and coordinator fixes to handle.

DataSink changes:
* Close the input plan tree before calling FlushFinal() to release
  resources. This depends on Send() not holding onto references
  to input batches, which was true except for NljBuilder. This
  invariant is documented.

Join builder changes:
* Add a common base class for PhjBuilder and NljBuilder with
  functions to handle synchronisation with the join node.
* Close plan tree earlier in FragmentInstanceState::Exec()
  so that peak resource requirements are lower.
* The NLJ always copies input batches, so that it can close
  its input tree.

JoinNode changes:
* Join node blocks waiting for build-side to be ready,
  then eventually signals that it's done, allowing the builder
  to be cleaned up.
* NLJ and PHJ nodes handle both the integrated builder and
  the external builder. There is a 1:1 relationship between
  the node and the builder, so we don't deal with thread safety
  yet.
* Buffer reservations are transferred between the builder and join
  node when running with the separate builder. This is not really
  necessary right now, since it is all single-threaded, but will
  be important for the shared broadcast.
  - The builder transfers memory for probe buffers to the join node
    at the end of each build phase.
  - At end of each probe phase, reservation needs to be handed back
    to builder (or released).

ExecSummary changes:
* The summary logic was modified to handle connecting fragments
  via join builds. The logic is an extension of what was used
  for exchanges.

Testing:
* Enable --unlock_mt_dop for end-to-end tests
* Migrate some tests to run as part of end-to-end tests instead of
  custom cluster.
* Add mt_dop dimension to various end-to-end tests to provide
  coverage of join queries, spill-to-disk and cancellation.
* Ran a single node TPC-H and TPC-DS stress test with mt_dop=0
  and mt_dop=4.

Perf:
* Ran TPC-H scale factor 30 locally with mt_dop=0. No significant
  change.

Change-Id: I4403c8e62d9c13854e7830602ee613f8efc80c58
Reviewed-on: http://gerrit.cloudera.org:8080/14859
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-20 01:51:54 +00:00
Tim Armstrong
f38da0df8e IMPALA-4400: aggregate runtime filters locally
Move RuntimeFilterBank to QueryState(). Implement fine-grained
locking for each filter to mitigate any increased lock
contention from the change.

Make RuntimeFilterBank handle multiple producers of the
same filter, e.g. multiple instances of a partitioned
join. It computes the expected number of filters upfront
then sends the filter to the coordinator once all the
local instances have been merged together. The merging
can be done in parallel locally to improve latency of
filter propagation.

Add Or() methods to MinMaxFilter and BloomFilter, since
we now need to merge those, not just the thrift versions.

Update coordinator filter routing to expect only one
instance of a filter from each producer backend and
to only send one instance to each consumer backend
(instead of sending one per fragment).

Update memory reservations and estimates to be lower
to account for sharing of filters between fragment
instances. mt_dop plans are modified to show these
shared and non-shared resources separately.

Enable waiting for runtime filters for kudu scanner
with mt_dop.

Made min/max filters const-correct.

Testing
* Added unit tests for Or() methods.
* Added some additional e2e test coverage for mt_dop queries
* Updated planner tests with new estimates and reservation.
* Ran a single node 3-impalad stress test with TPC-H kudu and
  TPC-DS parquet.
* Ran exhaustive tests.
* Ran core tests with ASAN.

Perf
* Did a single-node perf run on TPC-H with default settings. No perf change.
* Single-node perf run with mt_dop=8 showed significant speedups:

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 10.14   | -7.29%     | 5.05       | -11.68%        |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval    |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+
| TPCH(30) | TPCH-Q7  | parquet / none / none | 38.87  | 38.44       |   +1.13%   |   7.17%   | * 10.92% *     | 20    |   +0.72%       | 0.72    | 0.39    |
| TPCH(30) | TPCH-Q1  | parquet / none / none | 4.28   | 4.26        |   +0.50%   |   1.92%   |   1.09%        | 20    |   +0.03%       | 0.31    | 1.01    |
| TPCH(30) | TPCH-Q22 | parquet / none / none | 2.32   | 2.32        |   +0.05%   |   2.01%   |   1.89%        | 20    |   -0.03%       | -0.36   | 0.08    |
| TPCH(30) | TPCH-Q15 | parquet / none / none | 3.73   | 3.75        |   -0.42%   |   0.84%   |   1.05%        | 20    |   -0.25%       | -0.77   | -1.40   |
| TPCH(30) | TPCH-Q13 | parquet / none / none | 9.80   | 9.83        |   -0.38%   |   0.51%   |   0.80%        | 20    |   -0.32%       | -1.30   | -1.81   |
| TPCH(30) | TPCH-Q2  | parquet / none / none | 1.98   | 2.00        |   -1.32%   |   1.74%   |   2.81%        | 20    |   -0.64%       | -1.71   | -1.79   |
| TPCH(30) | TPCH-Q6  | parquet / none / none | 1.22   | 1.25        |   -2.14%   |   2.66%   |   4.15%        | 20    |   -0.96%       | -2.00   | -1.95   |
| TPCH(30) | TPCH-Q19 | parquet / none / none | 5.13   | 5.22        |   -1.65%   |   1.20%   |   1.40%        | 20    |   -1.76%       | -3.34   | -4.02   |
| TPCH(30) | TPCH-Q16 | parquet / none / none | 2.46   | 2.56        |   -4.13%   |   2.49%   |   1.99%        | 20    |   -4.31%       | -4.04   | -5.94   |
| TPCH(30) | TPCH-Q9  | parquet / none / none | 81.63  | 85.07       |   -4.05%   |   4.94%   |   3.06%        | 20    |   -5.46%       | -3.28   | -3.21   |
| TPCH(30) | TPCH-Q10 | parquet / none / none | 5.07   | 5.50        | I -7.92%   |   0.96%   |   1.33%        | 20    | I -8.51%       | -5.27   | -22.14  |
| TPCH(30) | TPCH-Q21 | parquet / none / none | 24.00  | 26.24       | I -8.57%   |   0.46%   |   0.38%        | 20    | I -9.34%       | -5.27   | -67.47  |
| TPCH(30) | TPCH-Q18 | parquet / none / none | 8.66   | 9.50        | I -8.86%   |   0.62%   |   0.44%        | 20    | I -9.75%       | -5.27   | -55.17  |
| TPCH(30) | TPCH-Q3  | parquet / none / none | 6.01   | 6.70        | I -10.19%  |   1.01%   |   0.90%        | 20    | I -11.25%      | -5.27   | -35.76  |
| TPCH(30) | TPCH-Q12 | parquet / none / none | 2.98   | 3.39        | I -12.23%  |   1.48%   |   1.48%        | 20    | I -13.56%      | -5.27   | -27.75  |
| TPCH(30) | TPCH-Q11 | parquet / none / none | 1.69   | 2.00        | I -15.55%  |   1.63%   |   1.47%        | 20    | I -18.09%      | -5.27   | -34.60  |
| TPCH(30) | TPCH-Q4  | parquet / none / none | 2.42   | 2.87        | I -15.69%  |   1.48%   |   1.26%        | 20    | I -18.61%      | -5.27   | -39.50  |
| TPCH(30) | TPCH-Q14 | parquet / none / none | 4.64   | 6.27        | I -26.02%  |   1.35%   |   0.73%        | 20    | I -35.37%      | -5.27   | -94.07  |
| TPCH(30) | TPCH-Q20 | parquet / none / none | 3.19   | 4.37        | I -27.01%  |   1.54%   |   0.99%        | 20    | I -36.85%      | -5.27   | -80.74  |
| TPCH(30) | TPCH-Q5  | parquet / none / none | 4.57   | 6.39        | I -28.36%  |   1.04%   |   0.75%        | 20    | I -39.56%      | -5.27   | -120.02 |
| TPCH(30) | TPCH-Q17 | parquet / none / none | 3.15   | 4.71        | I -33.06%  |   1.59%   |   1.31%        | 20    | I -49.43%      | -5.27   | -87.64  |
| TPCH(30) | TPCH-Q8  | parquet / none / none | 5.25   | 7.95        | I -33.95%  |   0.95%   |   0.53%        | 20    | I -51.11%      | -5.27   | -185.02 |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+

Change-Id: Iabeeab5eec869ff2197250ad41c1eb5551704acc
Reviewed-on: http://gerrit.cloudera.org:8080/14538
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-01-29 00:58:24 +00:00
Tim Armstrong
023e92f5e7 IMPALA-9302: disable ineffective filters for mt_dop > 0
The optimisation of disabling ineffective row-level
runtime filters was not implemented in the MT scan
code paths, because the ProcessSplit() functions,
where it was implemented, are not used for mt_dop > 0.

This change adds it to HdfsScanner::GetNext(), which
is used for mt_dop > 0 but not mt_dop = 0.

Testing:
Run existing runtime row filters test with mt_dop.
This reproduced the issue before I fixed it.

Change-Id: I8a55a9d4ac9e0d93cb3675dd2d5da086cb7d941d
Reviewed-on: http://gerrit.cloudera.org:8080/15065
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-01-18 02:05:54 +00:00
Sahil Takiar
e8fda1f224 IMPALA-9117, IMPALA-7726: Fixed a few unit tests for ABFS
This test makes the following changes / fixes when running Impala tests
on ABFS:
* Skips some tests in test_lineage.py that don't work on ABFS / ADLS
(they were already skipped for S3)
* Skips some tests in test_mt_dop.py; the test creates a directory that
ends with a period (and ABFS does not support writing files or
directories that end with a period)
* Removes the ABFS skip flag SkipIfABFS.trash (IMPALA-7726: Drop with
purge tests fail against ABFS due to trash misbehavior"); I removed
these flags and looped the tests overnight with no failures, so it is
likely whatever bug was causing this has now been fixed
* Now that HADOOP-15860 has been resolved, and the agreed upon behavior
for ABFS is that it will fail if a client tries to write a file /
directory that ends with a period, I added a new entry to the SkipIfABFS
class called file_or_folder_name_ends_with_period and applied it where
necessary

Testing:
* Ran core tests on ABFS

Change-Id: I18ae5b0f7de6aa7628a1efd780ff30a0cc3c5285
Reviewed-on: http://gerrit.cloudera.org:8080/14636
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-11-06 05:44:01 +00:00
Tim Armstrong
e1f9bd6349 IMPALA-9087: disable mt_dop_fallback for non-HDFS
The test was tuned for the 3 node minicluster and failed with
HDFS erasure coding.

Change-Id: I13c31b843bc5a8d5624a8985c3e0c7bcc2b936e6
Reviewed-on: http://gerrit.cloudera.org:8080/14543
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2019-10-24 23:49:06 +00:00
Tim Armstrong
96143cfcf1 IMPALA-8997: auto fallback to mt_dop=0
Add a temporary --mt_dop_auto_fallback to allow a graceful transition to
using mt_dop for workloads. When this flag is set, DML queries and joins
that would otherwise fail with an error when run with mt_dop > 0 fall
back to running with mt_dop = 0. This means that a user can set mt_dop
for their queries and it will only take effect when supported.

The behaviour generally does not change when this flag is not set,
with a couple of exceptions:
* I made mt_dop automatic for compute stats on all file formats
* mt_dop is allowed for single node plans with inserts. The
  quirky validatePlan() logic previously disallowed this but
  allowed joins in single node plans.

The checks added by this patch can be removed safely once mt_dop is
supported by default for all queries.

This includes some cleanup:
* isDmlStmt() was stale and incorrectly implemented.
* Various TreeNode methods did not return instances of subclasses of
  the requested class, which was strange. This fix is required to
  make 'contains(JoinNode.class)' work correctly. I checked the
  callsites of the fixed functions and none of them would be affected
  by this change because they specified a terminal class without
  any subclasses.
  I didn't actually use this fix in the end (I had to write a custom
  tree traversal in hasUnsupportedMtDopJoin()), but figured I would
  leave the improvement in here.

Testing:
Add some basic functional tests ensuring that the fallback takes
effect.

Run basic join and insert tests with this flag enabled.

Change-Id: Ie0d73d8744059874293697c8e104891a10dba04d
Reviewed-on: http://gerrit.cloudera.org:8080/14344
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-10-22 00:31:01 +00:00
Tim Armstrong
9090fc239b IMPALA-8097: mt_dop for all queries via hidden flag
--unlock_mt_dop=true unlocks mt_dop for all queries
including joins and inserts.

This disables the parallel plans with separate join builds
when running standalone, because these are not executable
until IMPALA-4224 is implemented. Inserts work without
modification - they were disabled because of lack of
testing and the possibility for generating many small
files with unpartitioned inserts - see IMPALA-8125.

Testing:
Add custom cluster test that exercise joins, runtime filters
and inserts as a sanity check for the flag.

Ran exhaustive build.

Manually ran TPC-H and TPC-DS tests against a minicluster
with mt_dop = 4.

Change-Id: I72f0b02a005e8bf22fd17b8fb5aabf8c0d9b6b15
Reviewed-on: http://gerrit.cloudera.org:8080/12257
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-03-19 01:20:13 +00:00