mirror of
https://github.com/apache/impala.git
synced 2025-12-19 18:12:08 -05:00
This patch refreshes compute_table_stats.py script with the following
changes:
- Limit parallelism to IMPALA_BUILD_THREADS at maximum if --parallelism
argument is not set.
- Change its default connection to hs2, leveraging existing
ImpylaHS2Connection.
- Change OptionParser to ArgumentParser.
- Use impala-python3 to run the script.
- Add --exclude_table_names to skip running COMPUTE STATS on certain
tables/views.
- continue_on_error is False by default.
This patch also improves query handle logging in ImpylaHS2Connection.
collect_profile_and_log argument is added to control whether to pull
logs and runtime profile at the end of __fetch_results(). The default
behavior remains unchanged.
Skip COMPUTE STATS for functional_kudu.alltypesagg and
functional_kudu.manynulls because it is invalid to run COMPUTE STATS
over view.
Customized hive-site.xml to set datanucleus.connectionPool.maxPoolSize
to 30 and hikaricp.connectionTimeout to 60000 ms. Also set hive.log.dir
to ${IMPALA_CLUSTER_LOGS_DIR}/hive.
Testing:
Repeatedly run compute-table-stats.sh from cold state and confirm there
is no error occurs. This is the script to do so from active minicluster:
cd $IMPALA_HOME
./bin/start-impala-cluster.py --kill
./testdata/bin/kill-hive-server.sh
./testdata/bin/run-hive-server.sh
./bin/start-impala-cluster.py
./testdata/bin/compute-table-stats.sh > /tmp/compute-stats.txt 2>&1
grep error /tmp/compute-stats.txt
Core tests ran and passed.
Change-Id: I1ebf02f95b957e7dda3a30622b87e8fca3197699
Reviewed-on: http://gerrit.cloudera.org:8080/22231
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
55 lines
2.3 KiB
Bash
Executable File
55 lines
2.3 KiB
Bash
Executable File
#!/bin/bash
|
|
#
|
|
# Licensed to the Apache Software Foundation (ASF) under one
|
|
# or more contributor license agreements. See the NOTICE file
|
|
# distributed with this work for additional information
|
|
# regarding copyright ownership. The ASF licenses this file
|
|
# to you under the Apache License, Version 2.0 (the
|
|
# "License"); you may not use this file except in compliance
|
|
# with the License. You may obtain a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing,
|
|
# software distributed under the License is distributed on an
|
|
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
# KIND, either express or implied. See the License for the
|
|
# specific language governing permissions and limitations
|
|
# under the License.
|
|
|
|
# Runs compute table stats over a curated set of Impala test tables.
|
|
#
|
|
set -euo pipefail
|
|
. $IMPALA_HOME/bin/report_build_error.sh
|
|
setup_report_build_error
|
|
|
|
. ${IMPALA_HOME}/bin/impala-config.sh > /dev/null 2>&1
|
|
|
|
# TODO: We need a better way of managing how these get set. See IMPALA-4346
|
|
IMPALAD_HS2=${IMPALAD_HS2:-localhost:21050}
|
|
|
|
COMPUTE_STATS_SCRIPT="${IMPALA_HOME}/tests/util/compute_table_stats.py\
|
|
--impalad=${IMPALAD_HS2}"
|
|
|
|
# Run compute stats over as many of the tables used in the Planner tests as possible.
|
|
${COMPUTE_STATS_SCRIPT} --db_names=functional\
|
|
--table_names="alltypes,alltypesagg,alltypesaggmultifilesnopart,alltypesaggnonulls,
|
|
alltypessmall,alltypestiny,jointbl,dimtbl,stringpartitionkey,nulltable,nullrows,
|
|
date_tbl,chars_medium,part_strings_with_quotes,alltypes_date_partition,
|
|
alltypes_date_partition_2,mv1_alltypes_jointbl,binary_tbl,binary_tbl_big"
|
|
${COMPUTE_STATS_SCRIPT} --db_names=functional_parquet \
|
|
--table_names="unique_with_nulls"
|
|
|
|
# We cannot load HBase on s3 and isilon yet.
|
|
if [ "${TARGET_FILESYSTEM}" = "hdfs" ]; then
|
|
${COMPUTE_STATS_SCRIPT} --db_name=functional_hbase\
|
|
--table_names="alltypessmall,stringids"
|
|
fi
|
|
${COMPUTE_STATS_SCRIPT} --db_names=tpch,tpch_parquet,tpch_orc_def \
|
|
--table_names=customer,lineitem,nation,orders,part,partsupp,region,supplier
|
|
${COMPUTE_STATS_SCRIPT} --db_names="tpch_nested_parquet,tpch_kudu,tpcds,tpcds_parquet,\
|
|
tpcds_partitioned_parquet_snap"
|
|
${COMPUTE_STATS_SCRIPT} --db_names=functional_kudu \
|
|
--exclude_table_names="alltypesagg,manynulls"
|
|
|