Files
impala/tests/util/parse_util.py
Taras Bobrovytsky 2159beee89 IMPALA-4467: Add support for DML statements in stress test
- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Here's an example of a full call with the new options that runs the
stress test on the local mini cluster:
./concurrent_select.py \
    --tpch-kudu-db=tpch_kudu \
    --generate-dml-queries \
    --dml-mod-values 11 13 17 \
    --generate-compute-stats-queries \
    --select-probability=0.5 \
    --mem-limit-padding-pct=25 \
    --mem-limit-padding-abs=50 \
    --reset-databases-before-binary-search \
    --reset-databases-after-binary-search

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Reviewed-on: http://gerrit.cloudera.org:8080/5093
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Impala Public Jenkins
2016-12-20 01:33:01 +00:00

71 lines
2.1 KiB
Python

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
import re
from datetime import datetime
NEW_GLOG_ENTRY_PATTERN = re.compile(r"[IWEF](?P<Time>\d{4} \d{2}:\d{2}:\d{2}\.\d{6}).*")
def parse_glog(text, start_time=None):
'''Parses the log 'text' and returns a list of log entries. If a 'start_time' is
provided only log entries that are after the time will be returned.
'''
year = datetime.now().year
found_start = False
log = list()
entry = None
for line in text.splitlines():
if not found_start:
found_start = line.startswith("Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu")
continue
match = NEW_GLOG_ENTRY_PATTERN.match(line)
if match:
if entry:
log.append("\n".join(entry))
if not start_time or start_time <= datetime.strptime(
match.group("Time"), "%m%d %H:%M:%S.%f").replace(year):
entry = [line]
else:
entry = None
elif entry:
entry.append(line)
if entry:
log.append("\n".join(entry))
return log
def parse_mem_to_mb(mem, units):
mem = float(mem)
if mem <= 0:
return
units = units.strip().upper() if units else ""
if units.endswith("B"):
units = units[:-1]
if not units:
mem /= 10 ** 6
elif units == "K":
mem /= 10 ** 3
elif units == "M":
pass
elif units == "G":
mem *= 10 ** 3
elif units == "T":
mem *= 10 ** 6
else:
raise Exception('Unexpected memory unit "%s"' % units)
return int(mem)