This commit is the first step in improving the performance of partition
pruning. Currently, Impala can prune approximately 10K partitions per
sec, thereby introducing significant overhead for huge table with a
large number of partitions. With this commit we reduce that overhead by
3X by batching the partition pruning calls to the backend.
Change-Id: I3303bfc7fb6fe014790f58a5263adeea94d0fe7d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2608
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2687
When updating partition metadata as part of COMPUTE STATS we would previously
attempt to update all partitions at once. This could lead to HMS socket timeouts
and also could run into issues if there were > 32K partitions.
In this change we now update the partitions in batches, with a max size of 500
partitions per batch. We also compare whether the row count has changed and only
update partitions that have been modified.
Change-Id: If7bfcc30f86fc2fdd79855b981067ac29a47b5e1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1913
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1918
This updates how Impala fetches partition metadata from the Hive Metastore to fetch
partitions in batches, rather than all at once. This helps reduce the load on the
HMS and also lets Impala scale to above 32K partitions. The downside is that it
may require additional RPCs to get all the partitions.
This is done by first querying the metastore to get all the partition names that
exist, then splitting the list of names into seperate batches to get the actual
partition metadata.
Impala uses a default size of 1000 partitions per batch, but it can be configured
by setting the 'hive.metastore.batch.retrieve.table.partition.max' parameter
in the hive-site.xml config file.
Change-Id: Ide0ec30ef8a9e00f79c26551aa8e5e7814c73034
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1662
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1698
When a HMS connection failed to open, an unchecked exception was being thrown. This
wasn't getting handled properly and was causing the loading threads to die. This fixes
the problem by ensuring the loading threads catch all types of exceptions and also
fixes the TableLoader to return an IncompleteTable should a HMS connection failure
occur.
Change-Id: I3b696fd8ef12aa6749b602324dcdfe4d27c935ee
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1609
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1627
Adds a new client API for retrieving all user defined functions (aggregate and scalar)
in a database. This is a requirement from CM Backup Disaster and Recovery.
Change-Id: I4e33d714795fe808370262f36218ea112f67ec30
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1271
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins