If RM and per-query memory limits were enabled at the same time, the
per-query limit would be ignored if RM wanted to expand the memory
allocation. This change adds an optional reservation limit to a
memtracker. The original limit goes back to being a hard limit -
i.e. any attempt to consume more than that amount results in
failure. The RM reservation limit is the RM-allocated memory limit. If
that is exceeded it triggers the ExpandRmReservation() method, which tries
to retrieve more memory as long as the hard limit is observed.
The net effect is that per-query memory limits have the intended,
hard-limit effect, while the RM limits coexist nicely and can expand
with more memory as required.
At the same time, we change the precedence of various ways of suggesting
an initial reservation size so that the user can change the reservation
size via a query option (MEM_RESERVATION_SIZE).
Change-Id: I41bfa4eb1336810a8a5946f6be3472111a052144
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3134
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
The runtime profile as we present it is not very useful and I think the structure of
it makes it hard to consume. This patch adds a new client facing schemed set of
counters that are collected from the runtime profiles. For example, with this structure
it would be easy to have the shell get the stats of a running query and print a useful
progress report or to check the most relevant metrics for diagnosing issues.
Here's an example of the output for one of the tpch queries:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------
09:MERGING-EXCHANGE 1 79.738us 79.738us 5 5 0 -1.00 B UNPARTITIONED
05:TOP-N 3 84.693us 88.810us 5 5 12.00 KB 120.00 B
04:AGGREGATE 3 5.263ms 6.432ms 5 5 44.00 KB 10.00 MB MERGE FINALIZE
08:AGGREGATE 3 16.659ms 27.444ms 52.52K 600.12K 3.20 MB 15.11 MB MERGE
07:EXCHANGE 3 2.644ms 5.1ms 52.52K 600.12K 0 0 HASH(o_orderpriority)
03:AGGREGATE 3 342.913ms 966.291ms 52.52K 600.12K 10.80 MB 15.11 MB
02:HASH JOIN 3 2s165ms 2s171ms 144.87K 600.12K 13.63 MB 941.01 KB INNER JOIN, BROADCAST
|--06:EXCHANGE 3 8.296ms 8.692ms 57.22K 15.00K 0 0 BROADCAST
| 01:SCAN HDFS 2 1s412ms 1s978ms 57.22K 15.00K 24.21 MB 176.00 MB tpch.orders o
00:SCAN HDFS 3 8s032ms 8s558ms 3.79M 600.12K 32.29 MB 264.00 MB tpch.lineitem l
Change-Id: Iaad4b9dd577c375006313f19442bee6d3e27246a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2964
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Enable order-by without limit
Added BufferedBlockMgr to allocate buffers and spill to disk.
Added Sorter for the external sort impelementation
Added new SortNode execution node that completely sorts its input
Changes to enable writing in IoMgr went in a separate patch.
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1539
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
Conflicts:
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
Change-Id: I3ece32affe5b006f53bbdfcc03ded01471e818ac
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2900
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
their parent's permissions
This patch adds --insert_inherit_permissions. If true, all
new partition directories created by INSERT will inherit their
permissions from their parent. When false, the directories are created
with the default permissions.
Change-Id: Ib2b4c251e51ea5048387169678e8dde34ecfe5f6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1917
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
Admission control adds support for configuring pools via a fair scheduler
allocation configuration, so the pool configuration mechanism is no longer
needed. This also renames the "yarn_pool" query option to the more general
"request_pool" as it can also be used to configure the admission controller
when RM/Yarn is not used. Similarly, the query profile shows the pool as
"Request Pool" rather than "Yarn Pool".
Change-Id: Id2cefb77ccec000e8df954532399d27eb18a2309
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1668
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 8d59416fb519ec357f23b5267949fd9682c9d62f)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1759
Impala reserves resources from YARN via Llama and handles resources
preemptions by cancelling affected queries. Adds the Impala Resource
Broker for interacting with Llama. Refactors scheduler and coordinator
to move fragment-to-host assignment logic into scheduler. Local test
setup uses MiniLLama.
Change-Id: Ic7b0fe43de52d30f4207b4e65cce7e6a294e54e1
This change adds support for cluster-synchronized catalog operations. This provides the
guaranteethat after a catalog op completes, all other subscribers to the catalog topic have
also processed that update. This is useful when load balancing, because a common workflow
is to target a different impalad for each statement executed.
For example if each of the following were executed sequentially, but targeting
a different node:
1) CREATE TABLE Foo
2) INSERT INTO Foo
3) SELECT * FROM Foo
4) INSERT INTO Foo ....
Since both the INSERT and the CREATE update the catalog, it would not work as expected
without this patch. The user might either get a "table not found" error or would be
missing partition information from the INSERT.
The downside is that this approach to DDL takes a bit longer because we need to wait
until all subscribers have processed an update. If all nodes are healthy, this overhead
should not be significantly longer than the current DDL time. However, a single bad node
might slow down or completely block the completion of all DDL operations. By default
this feature is disabled, but it can be enabled using a new query option: SYNCED_DDL=1
To test this, the base test suite was updated to support selecting a random impalad
to execute each query section in a query test file. This is currently only enabled
for the insert and DDL tests, but could be leveraged by more tests in the future.
TODO: Add additional failure tests around this functionality.
TODO: Add an explicit "sync" statement so users do not need to run all their DDL
in this mode (since it is slower).
Change-Id: I45e757a931bf2a4740cc0cdd1e76ce49a1e22b83
Reviewed-on: http://gerrit.ent.cloudera.com:8080/899
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
This patch goes some way to improving recovery after an INSERT
fails. Inserts now write intermediate results to
<table_dir>/.impala_insert_staging. After execution completes, either
successfully or not, the query-specific directory under that directory
is deleted.
This doesn't complete the job for better cleanup (although this goes as
far as IMPALA-449 suggests). Two things to do in the future:
* Have each backend delete its own staging files on error. The
difficulty getting there now is that backends don't know if they are
cancelled in error or because a LIMIT was reached.
* If the operation to move files to their final destinations should
fail during FinalizeQuery(), the coordinator should perform
compensation actions and delete the files that made it.
Note: We also considered a query-wide and impalad-wide option to change
the staging dir. There are advantages to this (all intermediate results
go to a known location which is easy to clean up on failure), but also
security and other operational concerns. Worth revisiting in the future.
Change-Id: Ia54cf36db6a382e359877f87d7d40aad7fdb77be
Reviewed-on: http://gerrit.ent.cloudera.com:8080/670
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
- new class MemLimit
- new query flag MEM_LIMIT
- implementation of impalad flag mem_limit
Still missing:
- parsing a mem limit spec that contains "M/G", as in: 1.25G
This patch implements the HiveServer2 API.
We have tested it with Lenni's patch against the tpch workload. It has also
been tested manually against Hive's beeline with queries and metadata operations.
All of the HiveServer2 code is implemented in impala-hs2-server.cc. Beeswax
code is refactored to impala-beeswax-server.cc.
HiveServer2 has a few more metadata operations. These operations go through
impala-hs2-server to ddl-executor and then to FE. The logics are implemented in
fe/src/main/java/com/cloudera/impala/service/MetadataOp.java.
Because of the Thrift union issue, I have to modify the generated c++ file.
Therefore, all the HiveServer2 thrift generated c++ code are checked into
be/src/service/hiveserver2/. Once the thrift issue is resolved, I'll remove
these files.
Change-Id: I9a8fe5a09bf250ddc43584249bdc87b6da5a5881
- created new class PlanFragment, which encapsulates everything having to do with a single
plan fragment, including its partition, output exprs, destination node, etc.
- created new class DataPartition
- explicit classes for fragment and plan node ids, to avoid getting them mixed up, which is easy to do with ints
- Adding IdGenerator class.
- moved PlanNode.ExplainPlanLevel to Types.thrift, so it can also be used for
PlanFragment.getExplainString()
- Changed planner interface to return scan ranges with a complete list of server locations,
instead of making a server assignment.
Also included: cleaned up AggregateInfo:
- the 2nd phase of a DISTINCT aggregation is now captured separately from a merge aggregation.
- moved analysis functionality into AggregateInfo
Removing broken test cases from workload functional-planner (they're being handled correctly in functional-newplanner).
- added DataStreamMgr::Cancel(), which is used to propagate cancellation from the
coordinator to all (possibly blocked) ExchangeNodes
- all exec nodes now check for cancellation before they do anything that might block for a while
- fixed up logic related to async cancellation
Added support for async query execution via beeswax interface:
- implemented ImpalaServer::query()
- QueryExecState now tracks beeswax's idea of the query state
- ImpalaServer::get_state() now returns the actual state
Fixed handling of ExecNode::Close():
- needs to be called for entire plan tree, regardless of what fails (can't use
RETURN_IF_ERROR() inside of it)
- needs to be called for every Open() call by coordinator/ImpalaServer
- making the coordinator asynchronous
- renamed ImpalaBackendService to ImpalaInternalService;
- new class ImpalaServer implements ImpalaService and ImpalaInternalService
- renaming ImpalaInternalService fields to conform to c++ style
- merged impala-service.{cc,h} and backend-service.{cc,h} into impala-server.{cc,h}
- added TStatusCode field to Status.ErrorDetail
- removed ImpalaInternalService.CloseChannel
Also removed JdbcDriverTest.java
(CDH/hive-odbc-test) passes (except those with "create table" and "show table".
We should have nightly regression of the odbc test to run against impalad.
There're still a few issues:
1. running with num_node > 0 crashes the coordinator;
2. work around for a few ODBC jiras
3. no test for bool/timestamp because ODBC doesn't support them.
review: issue 110
At the same time, this patch removes the partitionKeyRegex in favour
of explicitly sending a list of literal expressions for each file path
from the front end.
- breaks out ImpalaService implementation into impala-service.{cc,h} and
completes the implementation (minus cancellation)
- reorg of testutil/QueryExecutor: now we have a QueryExecutorIf with two implementations,
InProcessQueryExecutor (the existing one) and ImpaladQueryExecutor (which
executes against a running impalad process)
- adding flag --backends="host:port,host:port,..." , which TestEnv uses to create clients for ImpalaBackendServices
running on those nodes; this is just a hack in order to be able to use runquery for multi-node execution
- impalad-main.cc: main() of impala daemon, which will export both ImpalaService and
ImpalaBackendService (but at the moment only does the latter; everything related to ImpalaService is commented out)
- com.cloudera.impala.service.Frontend: API to the frontend functionality; invoked by impalad via jni; ignore for now