Files
impala/tests/custom_cluster/test_alloc_fail.py
Tim Armstrong c14a090400 IMPALA-5844: use a MemPool for expr result allocations
This is also a step towards IMPALA-2399 (remove QueryMaintenance()).

"local" allocations containing expression results (either intermediate
or final results) have the following properties:
* They are usually small allocations
* They can be made frequently (e.g. every function call)
* They are owned and managed by the Impala runtime
* They are freed in bulk at various points in query execution.

A MemPool (i.e. bump allocator) is the right mechanism to manage
allocations with the above properties. Before this patch
FunctionContext's used a FreePool + vector of allocations to emulate the
above behaviour. This patch switches to using a MemPool to bring these
allocations in line with the rest of the codebase.

The steps required to do this conversion.
* Use a MemPool for FunctionContext local allocations.
* Identify appropriate MemPools for all of the local allocations from
  function contexts so that the memory lifetime is correct.
* Various cleanup and documentation of existing MemPools.
* Replaces calls to FreeLocalAllocations() with calls to
  MemPool::Clear()

More involved surgery was required in a few places:
* Made the Sorter own its comparator, exprs and MemPool.
* Remove FunctionContextImpl::ReallocateLocal() and just have
  StringFunctions::Replace() do the doubling itself to avoid
  the need for a special interface. Worst-case this doubles
  the memory requirements for Replace() since n / 2 + n / 4
  + n / 8 + .... bytes of memory could be wasted instead of recycled
  for an n-byte output string.
* Provide a way redirect agg fn Serialize()/Finalize() allocations
  to come directly from the output RowBatch's MemPool. This is
  also potentially applicable to other places where we currently
  copy out strings from local allocations, e.g.
  AnalyticEvalNode::AddResultTuple() and Tuple::MaterializeExprs().
* --stress_free_pool_alloc was changed to instead intercept at the
  FunctionContext layer so that it retains the old behaviour even
  though allocations do not all come from FreePools.

The "local" allocation concept was not exposed directly in udf.h so this
patch also renames them to better reflect that they're used for expr
results.

Testing:
* ran exhaustive and ASAN

Change-Id: I4ba5a7542ed90a49a4b5586c040b5985a7d45b61
Reviewed-on: http://gerrit.cloudera.org:8080/8025
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-10-06 00:01:08 +00:00

47 lines
2.1 KiB
Python

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
import pytest
from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
from tests.common.skip import SkipIfBuildType
@SkipIfBuildType.not_dev_build
class TestAllocFail(CustomClusterTestSuite):
"""Tests for handling malloc() failure for UDF/UDA"""
@classmethod
def get_workload(self):
return 'functional-query'
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args("--stress_fn_ctx_alloc=1")
def test_alloc_fail_init(self, vector):
self.run_test_case('QueryTest/alloc-fail-init', vector)
# TODO: Rewrite or remove the non-deterministic test.
@pytest.mark.xfail(run=True, reason="IMPALA-2925: the execution is not deterministic "
"so some tests sometimes don't fail as expected")
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args("--stress_fn_ctx_alloc=3")
def test_alloc_fail_update(self, vector, unique_database):
# Note that this test relies on pre-aggregation to exercise the Serialize() path so
# query option 'num_nodes' must not be 1. CustomClusterTestSuite.add_test_dimensions()
# already filters out vectors with 'num_nodes' != 0 so just assert it here.
assert vector.get_value('exec_option')['num_nodes'] == 0
self.run_test_case('QueryTest/alloc-fail-update', vector, use_db=unique_database)