Files
impala/tests/query_test/test_compute_stats.py
Alex Behm 93e5b262c2 Added COMPUTE STATS command for gathering table and column stats.
A compute stats command computes the table and column stats for a given
table and persists them in the metastore.
The table stats consist of the per-partition and per-table row count.
The column stats are computed on a per-table basis and consist of the
number of distinct values and the number of NULLs per column.

This patch introduces a new 'child query' concept that
compute stats utilizes. Child queries are cancelled
if the parent query is cancelled. A compute stats stmt is
executed by the following query hirarchy:
parent: compute stats query (DDL)
- child: compute table stats query (QUERY)
- child: compute column stats query (QUERY)

The new child query concept is necessary to decouple child query fetches
from parent query fetches, i.e., we could not execute a child query as
part of the original compute stats query, because then a client could
fetch the results we need for updating the Metastore statistics. The
reason why our existing CTAS works without this decoupling
is that its insert 'child query' is not fetchable.

Change-Id: I560533e3cb09bcbbdb3eea7fcf0b460bc6b36dcd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/873
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:14 -08:00

48 lines
1.8 KiB
Python

#!/usr/bin/env python
# Copyright (c) 2012 Cloudera, Inc. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from tests.common.test_vector import *
from tests.common.impala_test_suite import *
# Tests the COMPUTE STATS command for gathering table and column stats.
# TODO: Merge this test file with test_col_stats.py
class TestComputeStats(ImpalaTestSuite):
TEST_DB_NAME = "compute_stats_db"
@classmethod
def get_workload(self):
return 'functional-query'
@classmethod
def add_test_dimensions(cls):
super(TestComputeStats, cls).add_test_dimensions()
cls.TestMatrix.add_dimension(create_single_exec_option_dimension())
# Do not run these tests using all dimensions because the expected results
# are different for different file formats.
cls.TestMatrix.add_constraint(lambda v:\
v.get_value('table_format').file_format == 'text' and\
v.get_value('table_format').compression_codec == 'none')
def setup_method(self, method):
# cleanup and create a fresh test database
self.cleanup_db(self.TEST_DB_NAME)
self.client.refresh()
self.execute_query("create database %s" % (self.TEST_DB_NAME))
def teardown_method(self, method):
self.cleanup_db(self.TEST_DB_NAME)
def test_compute_stats(self, vector):
self.run_test_case('QueryTest/compute-stats', vector)