This patch limits the number of rows produced by a query by
tracking it at the PlanRootSink level. When the
NUM_ROWS_PRODUCED_LIMIT is set, it cancels a query when its
execution produces more rows than the specified limit. This limit
only applies when the results are returned to a client, e.g. for a
SELECT query, but not an INSERT query.
Testing:
Added tests to query-resource-limits.test to verify that the rows
produced limit is honored.
Manually tested on various combinations of tables, fileformats
and ROWS_RETURNED_LIMIT values.
Change-Id: I7b22dbe130a368f4be1f3662a559eb9aae7f0c1d
Reviewed-on: http://gerrit.cloudera.org:8080/12328
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
test_resource_limits was failing in release build because the queries
used were finishing earlier than expected. This resulted in fragment
instances not being able to send enough updates to the coordinator in
order to hit the limits used for the tests. This patches adds a
deterministic sleep to the queries which gives enough time to the
coordinator to catch up on reports.
Testing:
Checked that tests passed on release builds.
Change-Id: I4a47391e52f3974db554dfc0d38139d3ee18a1b4
Reviewed-on: http://gerrit.cloudera.org:8080/11933
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fixes one of the tests in test_resource_limits that expects a
query to run for more than 2 seconds but currently fails because it
sometimes completes earlier than that.
Change-Id: I2ba7080f62f0af3e16ef6c304463ebf78dec1b0c
Reviewed-on: http://gerrit.cloudera.org:8080/11741
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds support for aggregate resource limits at runtime, specified
via query options. If a query exceeds a limit it is terminated. The
checks are periodic so the query may go somewhat over the limits.
SCAN_BYTES_LIMIT is exposed as an advanced query option.
CPU_LIMIT_S is hidden as a development query option because it is flawed
- the CPU user/sys time is only updated upon thread completion, so in
many cases the limit will not take effect until well after the resources
have been used. IMPALA-7318 tracks enabling this.
Query profile is updated to include query wide and per backend metrics
for CPU and scanned bytes. Example from "select count(*) from
tpch_parquet.lineitem":
Per Node Peak Memory Usage: tarmstrong-box:22000(289.50 KB) tarmstrong-box:22001(249.50 KB) tarmstrong-box:22002(249.50 KB)
Per Node Bytes Read: tarmstrong-box:22000(100.00 KB) tarmstrong-box:22001(100.00 KB) tarmstrong-box:22002(100.00 KB)
Per Node User Time: tarmstrong-box:22000(40.000ms) tarmstrong-box:22001(32.000ms) tarmstrong-box:22002(24.000ms)
Per Node System Time: tarmstrong-box:22000(0.000ns) tarmstrong-box:22001(0.000ns) tarmstrong-box:22002(0.000ns)
- FiltersReceived: 0 (0)
- FinalizationTimer: 0.000ns
- NumBackends: 3 (3)
- NumFragmentInstances: 4 (4)
- NumFragments: 2 (2)
- TotalBytesRead: 300.00 KB (307200)
- TotalCpuTime: 96.000ms
Testing:
Added tests for various permutations for CPU_LIMIT_S and
SCAN_BYTES_LIMIT
Based on a previous patch by Mostafa Mokhtar
<mmokhtar@cloudera.com>
Change-Id: I3e85f80b70b3fce47e637e9322ed0316ee84f6a9
Reviewed-on: http://gerrit.cloudera.org:8080/11081
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>