This allows setting mt_dop for any query with any configuration.
Before this patch it was not supported for DML.
--unlock_mt_dop and --mt_dop_auto_fallback are now ignored.
Testing:
* Updated tests to reflect new behaviour.
* Removed irrelevant tests for fallback/validation.
* Ran exhaustive tests.
Change-Id: I66331481260fe4b69d9e95b0200029b14d230ade
Reviewed-on: http://gerrit.cloudera.org:8080/16775
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This introduces the max-mt-dop setting for admission
control. If a statement runs with an MT_DOP setting that
exceeds the max-mt-dop, then the MT_DOP setting is
downgraded to the max-mt-dop value. If max-mt-dop is set
to a negative value, no limit is applied. max-mt-dop is
set via the llama-site.xml and can be set at the daemon
level or at the resource pool level. When there is no
max-mt-dop setting, it defaults to -1, so no limit is
applied. The max-mt-dop is evaluated once prior to query
planning. The MT_DOP settings for queries past planning
are not reevaluated if the policy changes.
If a statement is downgraded, it's runtime profile contains
a message explaining the downgrade:
MT_DOP limited by admission control: Requested MT_DOP=9 reduced to MT_DOP=4.
Testing:
- Added custom cluster test with various max-mt-dop settings
- Ran core tests
Change-Id: I3affb127a5dca517591323f2b1c880aa4b38badd
Reviewed-on: http://gerrit.cloudera.org:8080/16020
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Modified the insert testfiles to get which database they need to
use for 'CREATE TABLE LIKE' dynamically.
Tests:
Did targeted exhaustive testruns in test_insert.py and
test_mt_dop.py and did a full exhaustive testrun.
Change-Id: Ib3c7ba02190f57a7ed40311c95a3dd9eca9b474d
Reviewed-on: http://gerrit.cloudera.org:8080/15816
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
-Modified the ‘test_insert.py’ so the tests can run parallel.
-Every test will create its own temporary tables for insert testing.
-Swapped out the SETUP tags to Truncate table QUERY statement.
-Becouse the SETUP tag is not used anymore, the correspondig
code was removed.
-A test query in ‘insert.test’. The test was incorrect so modified
to test for the right behavior.
Testing:
-tests/run-tests.py query_test/test_insert.py
-impala-py.test tests/query_test/test_insert.py
-the same for test_insert_permutation.py and test_load.py
Change-Id: I257e936868917a2fcc6c030f6c855b247e8a0eea
Reviewed-on: http://gerrit.cloudera.org:8080/15529
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This allows running *any* read-only query with mt_dop > 0.
Before this patch, no joins were allowed with mt_dop > 0.
Previous patches, particularly IMPALA-9156, added significantly
more code coverage for multithreading+joins. It should be safe to
allow enabling on a query-by-query basis. Many improvements are
still planned - see IMPALA-3902. So behaviour and performance
characteristics of mt_dop > 0 with more complex plans and joins
will continue to change.
Testing:
Updated the mt_dop validation tests and remove redundant planner test
that doesn't provide much additional coverage of the validation
support.
Ran exhaustive tests.
Change-Id: I9c6566abb239db0e775f2beaa25a62c36313cd6f
Reviewed-on: http://gerrit.cloudera.org:8080/15545
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This enables parallel plans with the join build in a
separate fragment and fixes all of the ensuing fallout.
After this change, mt_dop plans with joins have separate
build fragments. There is still a 1:1 relationship between
join nodes and builders, so the builders are only accessed
by the join node's thread after it is handed off. This lets
us defer the work required to make PhjBuilder and NljBuilder
safe to be shared between nodes.
Planner changes:
* Combined the parallel and distributed planning code paths.
* Misc fixes to generate reasonable thrift structures in the
query exec requests, i.e. containing the right nodes.
* Fixes to resource calculations for the separate build plans.
** Calculate separate join/build resource consumption.
** Simplified the resource estimation by calculating resource
consumption for each fragment separately, and assuming that
all fragments hit their peak resource consumption at the
same time. IMPALA-9255 is the follow-on to make the resource
estimation more accurate.
Scheduler changes:
* Various fixes to handle multiple TPlanExecInfos correctly,
which are generated by the planner for the different cohorts.
* Add logic to colocate build fragments with parent fragments.
Runtime filter changes:
* Build sinks now produce runtime filters, which required
planner and coordinator fixes to handle.
DataSink changes:
* Close the input plan tree before calling FlushFinal() to release
resources. This depends on Send() not holding onto references
to input batches, which was true except for NljBuilder. This
invariant is documented.
Join builder changes:
* Add a common base class for PhjBuilder and NljBuilder with
functions to handle synchronisation with the join node.
* Close plan tree earlier in FragmentInstanceState::Exec()
so that peak resource requirements are lower.
* The NLJ always copies input batches, so that it can close
its input tree.
JoinNode changes:
* Join node blocks waiting for build-side to be ready,
then eventually signals that it's done, allowing the builder
to be cleaned up.
* NLJ and PHJ nodes handle both the integrated builder and
the external builder. There is a 1:1 relationship between
the node and the builder, so we don't deal with thread safety
yet.
* Buffer reservations are transferred between the builder and join
node when running with the separate builder. This is not really
necessary right now, since it is all single-threaded, but will
be important for the shared broadcast.
- The builder transfers memory for probe buffers to the join node
at the end of each build phase.
- At end of each probe phase, reservation needs to be handed back
to builder (or released).
ExecSummary changes:
* The summary logic was modified to handle connecting fragments
via join builds. The logic is an extension of what was used
for exchanges.
Testing:
* Enable --unlock_mt_dop for end-to-end tests
* Migrate some tests to run as part of end-to-end tests instead of
custom cluster.
* Add mt_dop dimension to various end-to-end tests to provide
coverage of join queries, spill-to-disk and cancellation.
* Ran a single node TPC-H and TPC-DS stress test with mt_dop=0
and mt_dop=4.
Perf:
* Ran TPC-H scale factor 30 locally with mt_dop=0. No significant
change.
Change-Id: I4403c8e62d9c13854e7830602ee613f8efc80c58
Reviewed-on: http://gerrit.cloudera.org:8080/14859
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The optimisation of disabling ineffective row-level
runtime filters was not implemented in the MT scan
code paths, because the ProcessSplit() functions,
where it was implemented, are not used for mt_dop > 0.
This change adds it to HdfsScanner::GetNext(), which
is used for mt_dop > 0 but not mt_dop = 0.
Testing:
Run existing runtime row filters test with mt_dop.
This reproduced the issue before I fixed it.
Change-Id: I8a55a9d4ac9e0d93cb3675dd2d5da086cb7d941d
Reviewed-on: http://gerrit.cloudera.org:8080/15065
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This test makes the following changes / fixes when running Impala tests
on ABFS:
* Skips some tests in test_lineage.py that don't work on ABFS / ADLS
(they were already skipped for S3)
* Skips some tests in test_mt_dop.py; the test creates a directory that
ends with a period (and ABFS does not support writing files or
directories that end with a period)
* Removes the ABFS skip flag SkipIfABFS.trash (IMPALA-7726: Drop with
purge tests fail against ABFS due to trash misbehavior"); I removed
these flags and looped the tests overnight with no failures, so it is
likely whatever bug was causing this has now been fixed
* Now that HADOOP-15860 has been resolved, and the agreed upon behavior
for ABFS is that it will fail if a client tries to write a file /
directory that ends with a period, I added a new entry to the SkipIfABFS
class called file_or_folder_name_ends_with_period and applied it where
necessary
Testing:
* Ran core tests on ABFS
Change-Id: I18ae5b0f7de6aa7628a1efd780ff30a0cc3c5285
Reviewed-on: http://gerrit.cloudera.org:8080/14636
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Add a temporary --mt_dop_auto_fallback to allow a graceful transition to
using mt_dop for workloads. When this flag is set, DML queries and joins
that would otherwise fail with an error when run with mt_dop > 0 fall
back to running with mt_dop = 0. This means that a user can set mt_dop
for their queries and it will only take effect when supported.
The behaviour generally does not change when this flag is not set,
with a couple of exceptions:
* I made mt_dop automatic for compute stats on all file formats
* mt_dop is allowed for single node plans with inserts. The
quirky validatePlan() logic previously disallowed this but
allowed joins in single node plans.
The checks added by this patch can be removed safely once mt_dop is
supported by default for all queries.
This includes some cleanup:
* isDmlStmt() was stale and incorrectly implemented.
* Various TreeNode methods did not return instances of subclasses of
the requested class, which was strange. This fix is required to
make 'contains(JoinNode.class)' work correctly. I checked the
callsites of the fixed functions and none of them would be affected
by this change because they specified a terminal class without
any subclasses.
I didn't actually use this fix in the end (I had to write a custom
tree traversal in hasUnsupportedMtDopJoin()), but figured I would
leave the improvement in here.
Testing:
Add some basic functional tests ensuring that the fallback takes
effect.
Run basic join and insert tests with this flag enabled.
Change-Id: Ie0d73d8744059874293697c8e104891a10dba04d
Reviewed-on: http://gerrit.cloudera.org:8080/14344
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
--unlock_mt_dop=true unlocks mt_dop for all queries
including joins and inserts.
This disables the parallel plans with separate join builds
when running standalone, because these are not executable
until IMPALA-4224 is implemented. Inserts work without
modification - they were disabled because of lack of
testing and the possibility for generating many small
files with unpartitioned inserts - see IMPALA-8125.
Testing:
Add custom cluster test that exercise joins, runtime filters
and inserts as a sanity check for the flag.
Ran exhaustive build.
Manually ran TPC-H and TPC-DS tests against a minicluster
with mt_dop = 4.
Change-Id: I72f0b02a005e8bf22fd17b8fb5aabf8c0d9b6b15
Reviewed-on: http://gerrit.cloudera.org:8080/12257
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>