mirror of https://github.com/apache/impala.git synced 2026-01-07 00:02:28 -05:00

Go to file

Tim Armstrong 64fd0115e5 IMPALA-4862: make resource profile consistent with backend behaviour

This moves away from the PipelinedPlanNodeSet approach of enumerating
sets of concurrently-executing nodes because unions would force
creating many overlapping sets of nodes. The new approach computes
the peak resources during Open() and the peak resources between Open()
and Close() (i.e. while calling GetNext()) bottom-up for each plan node
in a fragment. The fragment resources are then combined to produce the
query resources.

The basic assumptions for the new resource estimates are:
* resources are acquired during or after the first call to Open()
  and released in Close().
* Blocking nodes call Open() on their child before acquiring
  their own resources (this required some backend changes).
* Blocking nodes call Close() on their children before returning
  from Open().
* The peak resource consumption of the query is the sum of the
  independent fragments (except for the parallel join build plans
  where we can assume there will be synchronisation). This is
  conservative but we don't synchronise fragment Open() and Close()
  across exchanges so can't make stronger assumptions in general.

Also compute the sum of minimum reservations. This will be useful
in the backend to determine exactly when all of the initial
reservations have been claimed from a shared pool of initial reservations.

Testing:
* Updated planner tests to reflect behavioural changes.
* Added extra resource requirement planner tests for unions, subplans,
  pipelines of blocking operators, and bushy join plans.
* Added single-node plans to resource-requirements tests. These have
  more complex plan trees inside a single fragment, which is useful
  for testing the peak resource requirement logic.

Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
Reviewed-on: http://gerrit.cloudera.org:8080/7223
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins

2017-07-12 01:17:24 +00:00

IMPALA-4862: make resource profile consistent with backend behaviour

2017-07-12 01:17:24 +00:00

bin

Bump Kudu version to 1070e76

2017-07-11 20:06:20 +00:00

cmake_modules

IMPALA-4758: (2/2) Impala-side changes to build with latest gutil

2017-03-29 02:52:34 +00:00

common

IMPALA-4862: make resource profile consistent with backend behaviour

2017-07-12 01:17:24 +00:00

docs

IMPALA-3603 [DOCS] Document handling of NaN values

2017-07-11 00:26:13 +00:00

ext-data-source

IMPALA-5224: remove defunct codehaus repository

2017-04-20 03:24:33 +00:00

IMPALA-4862: make resource profile consistent with backend behaviour

2017-07-12 01:17:24 +00:00

infra

IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies

2017-05-26 07:52:40 +00:00

shell

IMPALA-5507: Add clear description to help information of KEYVAL option

2017-07-11 11:05:14 +00:00

ssh_keys

Move ssh keys from bin directory to fix packaging build break

2014-01-08 10:44:12 -08:00

testdata

IMPALA-4862: make resource profile consistent with backend behaviour

2017-07-12 01:17:24 +00:00

tests

IMPALA-5640: re-enable gzip for parquet insert tests

2017-07-12 00:18:44 +00:00

www

IMPALA-5643: Add total number of threads created per group to /threadz

2017-07-11 05:49:27 +00:00

.clang-format

Match .clang-format more closely to actual practice.

2016-10-14 00:08:17 +00:00

.clang-tidy

IMPALA-3200: move bufferpool under runtime

2016-11-22 07:31:34 +00:00

.gitignore

IMPALA-4653: fix sticky config variable problem

2017-01-05 01:43:36 +00:00

buildall.sh

Add a build flag for the undefined behavior sanitizer, aka "ubsan".

2017-03-01 03:09:17 +00:00

CMakeLists.txt

IMPALA-4029: Reduce memory requirements for storing file metadata

2017-05-10 09:23:05 +00:00

DISCLAIMER

IMPALA-3808: Add incubating DISCLAIMER from the Incubator Branding Guide

2016-09-02 02:12:45 +00:00

EXPORT_CONTROL.md

IMPALA-4406: Add cryptography export control notice

2016-11-04 18:26:40 +00:00

LICENSE.txt

IMPALA-4669: [KUTIL] Import kudu_util library from kudu@314c9d8

2017-06-17 00:42:48 +00:00

LOGS.md

Consolidate test and cluster logs under a single directory.

2016-03-28 19:23:22 +00:00

NOTICE.txt

2017-01-29 00:01:03 +00:00

README.md

IMPALA-4512: Add a script that builds Impala on stock Ubuntu 14.04.

2016-11-29 22:10:58 +00:00

README.md

Welcome to Impala

Lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters.

Impala is a modern, massively-distributed, massively-parallel, C++ query engine that lets you analyze, transform and combine data from a variety of data sources:

Best of breed performance and scalability.
Support for data stored in HDFS, Apache HBase and Amazon S3.
Wide analytic SQL support, including window functions and subqueries.
On-the-fly code generation using LLVM to generate CPU-efficient code tailored specifically to each individual query.
Support for the most commonly-used Hadoop file formats, including the Apache Parquet (incubating) project.
Apache-licensed, 100% open source.

More about Impala

To learn more about Impala as a business user, or to try Impala live or in a VM, please visit the Impala homepage.

If you are interested in contributing to Impala as a developer, or learning more about Impala's internals and architecture, visit the Impala wiki.

Supported Platforms

Impala only supports Linux at the moment.

Build Instructions

See bin/bootstrap_build.sh.

Export Control Notice

This distribution uses cryptographic software and may be subject to export controls. Please refer to EXPORT_CONTROL.md for more information.

Languages

C++ 49.3%

Java 30.4%

Python 14.5%

JavaScript 1.3%

C 1.2%

Other 3.2%