This commit improves the performance of DDL statements on tables with
large number of partitions. Previously, the catalog would force-reload
the entire table metadata during the execution of DDL and insert
statements, causing significant delays for tables with large number of
partitions. With this commit the catalog is reusing any cached table
entries to partially reload table metadata for only those partitions
that have been modified. With this change we've improved the performance
of some DDL and insert statements by at least 4-5X.
This commit also adds basic table-level locking to protect table
metadata from concurrent DDL operations.
Preliminary performance measurements
-----------------------------------
Workload: insert into table partition () select ... limit 10
Iterations: 10
Num partitions OLD (avg time sec) NEW (avg time sec)
1K 1.15 0.45
5K 3.65 0.9
10K 5.75 1.38
15K 10.1 2.02
30K 25.4 4.46
Workload: alter table partition() set location...
Iterations: 10
Num partitions OLD (avg time sec) NEW (avg time sec)
1K 0.8 0.47
5K 4.3 0.71
10K 7.1 1.2
15K 13.2 1.8
30K 26.8 3.4
Change-Id: I4da7fb6df0a71162b0cb60e6025a4019cb9572bf
Reviewed-on: http://gerrit.cloudera.org:8080/1706
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins