Files
impala/tests/custom_cluster/test_krpc_metrics.py
Michael Ho 2ece4c9b2e IMPALA-8341: Data cache for remote reads
This is a patch based on PhilZ's prototype: https://gerrit.cloudera.org/#/c/12683/

This change implements an IO data cache which is backed by
local storage. It implicitly relies on the OS page cache
management to shuffle data between memory and the storage
device. This is useful for caching data read from remote
filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS).

A data cache is divided into one or more partitions based on
the configuration string which is a list of directories, separated
by comma, followed by the storage capacity per directory.
An example configuration string is like the following:
  --data_cache_config=/data/0,/data/1:150GB

In the configuration above, the cache may use up to 300GB of
storage space, with 150GB max for /data/0 and /data/1 respectively.

Each partition has a meta-data cache which tracks the mappings
of cache keys to the locations of the cached data. A cache key
is a tuple of (file's name, file's modification time, file offset)
and a cache entry is a tuple of (backing file, offset in the backing
file, length of the cached data, optional checksum). Note that the
cache currently doesn't support overlapping ranges. In other words,
if the cache contains an entry of a file for range [m, m+4MB), a lookup
for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen
this as a problem but this may require further evaluation in the future.

Each partition stores its set of cached data in backing files created
on local storage. When inserting new data into the cache, the data is
appended to the current backing file in use. The storage consumption
of each cache entry counts towards the quota of that partition. When a
partition reaches its capacity, the least recently used (LRU) data in
that partition is evicted. Evicted data is removed from the underlying
storage by punching holes in the backing file it's stored in. As a
backing file reaches a certain size (by default 4TB), new data will
stop being appended to it and a new file will be created instead. Note
that due to hole punching, the backing file is actually sparse. When
the number of backing files per partition exceeds,
--data_cache_max_files_per_partition, files are deleted in the order
in which they are created. Stale cache entries referencing deleted
files are erased lazily or evicted due to inactivity.

Optionally, checksumming can be enabled to verify read from the cache
is consistent with what was inserted and to verify that multiple attempted
insertions with the same cache key have the same cache content.
Checksumming is enabled by default for debug builds.

To probe for cached data in the cache, the interface Lookup() is used;
To insert data into the cache, the interface Store() is used. Please note
that eviction happens inline currently during Store().

This patch also added two startup flags for start-impala-cluster.py:
'--data_cache_dir' specifies the base directory in which each Impalad
creates the caching directory
'--data_cache_size' specifies the capacity string for each cache directory.

Testing done:
- added a new BE and EE test
- exhaustive (debug, release) builds with cache enabled
- core ASAN build with cache enabled

Perf:
- 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement
over runs without the cache. Each node has a cache size of 150GB per node.
The performance is at parity with a configuration of a HDFS cluster using
EBS as the storage.

Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Reviewed-on: http://gerrit.cloudera.org:8080/12987
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-05-03 19:39:42 +00:00

84 lines
3.3 KiB
Python

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
import pytest
import time
from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
from tests.common.impala_cluster import ImpalaCluster
from tests.common.skip import SkipIf, SkipIfBuildType
from tests.verifiers.mem_usage_verifier import MemUsageVerifier
class TestKrpcMetrics(CustomClusterTestSuite):
"""Test for KRPC metrics that require special arguments during cluster startup."""
RPCZ_URL = 'http://localhost:25000/rpcz?json'
TEST_QUERY = 'select count(*) from tpch_parquet.lineitem l1 \
join tpch_parquet.lineitem l2 where l1.l_orderkey = l2.l_orderkey;'
@classmethod
def get_workload(self):
return 'functional-query'
@classmethod
def setup_class(cls):
if cls.exploration_strategy() != 'exhaustive':
pytest.skip('runs only in exhaustive')
super(TestKrpcMetrics, cls).setup_class()
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args('-datastream_service_queue_mem_limit=1B \
-datastream_service_num_svc_threads=1')
def test_krpc_queue_overflow_rpcz(self, vector):
"""Test that rejected RPCs show up on the /rpcz debug web page.
"""
def get_rpc_overflows():
rpcz = self.get_debug_page(self.RPCZ_URL)
assert len(rpcz['services']) > 0
for s in rpcz['services']:
if s['service_name'] == 'impala.DataStreamService':
return int(s['rpcs_queue_overflow'])
assert False, "Could not find DataStreamService metrics"
before = get_rpc_overflows()
assert before == 0
self.client.execute(self.TEST_QUERY)
after = get_rpc_overflows()
assert before < after
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args('-datastream_service_queue_mem_limit=1B \
-datastream_service_num_svc_threads=1')
def test_krpc_queue_overflow_metrics(self, vector):
"""Test that rejected RPCs show up on the /metrics debug web page.
"""
metric_name = 'rpc.impala.DataStreamService.rpcs_queue_overflow'
before = self.get_metric(metric_name)
assert before == 0
self.client.execute(self.TEST_QUERY)
after = self.get_metric(metric_name)
assert before < after
@pytest.mark.execute_serially
def test_krpc_service_queue_metrics(self, vector):
"""Test that memory usage metrics for the data stream service queue show up on the
/metrics debug web page.
"""
self.client.execute(self.TEST_QUERY)
assert self.get_metric('mem-tracker.DataStreamService.current_usage_bytes') >= 0
assert self.get_metric('mem-tracker.DataStreamService.peak_usage_bytes') > 0