impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Files

Michael Smith 166b39547e IMPALA-14553: Run schema eval concurrently

The majority of time spent in generate-schema-statements.py is in
eval_section for schema operations that shell out, often uploading files
via the hadoop CLI or generating data files. These operations should be
independent.

Runs eval_section at the beginning so we don't repeat it for each row in
test_vectors, and executes them in parallel via a ThreadPool. Defaults
to NUM_CONCURRENT_TESTS threads because the underlying operations have
some concurrency to them (such as HDFS mirroring writes).

Also collects existing tables into a set to optimize lookup.

Reduces generate-schema-statements by ~60%, from 2m30s to 1m. Confirmed
that contents of logs/data_loading/sql/functional are identical.

Change-Id: I2a78d05fd6a0005c83561978713237da2dde6af2
Reviewed-on: http://gerrit.cloudera.org:8080/23627
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>

2025-11-17 16:34:22 +00:00