IMPALA-11306: Create symlink for dataset of scale factor 1

single_node_perf_run.py and load-data.py can fail if user set scale
factor argument 1. This is because generate-schema-statements.py will
insert the scale factor into the database name (ie., "tpch1"), but the
preload script omit the scale factor when creating dataset
directory (ie., "tpch"). This patch fix the issue by additionally
creating symlink for scale factor 1.

Testing:
- Manual test by running the following script:
  ./bin/load-data.py --scale_factor=1 --workloads=targeted-perf \
    --table_formats=text/none/none

Change-Id: I76c9c90b243df6213626e11652cfed59643aed2c
Reviewed-on: http://gerrit.cloudera.org:8080/18545
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
Riza Suminto
2022-05-19 15:08:12 -07:00
committed by Impala Public Jenkins
parent b58966b983
commit ad915ca58e
2 changed files with 12 additions and 0 deletions

View File

@@ -42,6 +42,12 @@ echo "Generating TPC-DS data into ${TPC_DS_DATA}"
# Delete any preexisting data or symlinks
rm -rf ${TPC_DS_DATA}
mkdir -p ${TPC_DS_DATA}
# Create symlink if scale factor is 1
if [ ${SCALE_FACTOR} -eq 1 ]
then
rm -rf ${TPC_DS_DATA}${SCALE_FACTOR}
ln -s ${TPC_DS_DATA} ${TPC_DS_DATA}${SCALE_FACTOR}
fi
cd ${TPC_DS_DATA}
# dsdgen uses fixed size buffers that cause bizarre issues if the path to the

View File

@@ -50,6 +50,12 @@ echo "Generating TPC-H data into ${TPC_H_DATA}"
chmod +w ${TPC_H_DATA} || true
rm -rf ${TPC_H_DATA}
mkdir -p ${TPC_H_DATA}
# Create symlink if scale factor is 1
if [ ${SCALE_FACTOR} -eq 1 ]
then
rm -rf ${TPC_H_DATA}${SCALE_FACTOR}
ln -s ${TPC_H_DATA} ${TPC_H_DATA}${SCALE_FACTOR}
fi
cd ${TPC_H_DATA}
if [ -t 1 ]