mirror of
https://github.com/apache/impala.git
synced 2026-01-07 09:02:19 -05:00
Today, Impala populates the 'rawDataSize' property during COMPUTE STATS for the purpose of extrapolating row counts based on file sizes. After this patch Impala will populate 'totalSize' instead of 'rawDataSize'. The 'rawDataSize' is not populated or used. Intended meaning/use of tblproperties: - rawDataSize' is the estimated in-memory size of a table (without encoding and compression) - 'totalSize' represents the on-disk size Using the fields correctly is important for compatibility with other users of the HMS such as Hive and SparkSQL. For example, SparkSQL relies on the 'totalSize' for join ordering. Testing: - core/hdfs run passed Change-Id: If7c2c4e1e99b297c849f9f0d18b2bef34ad811c6 Reviewed-on: http://gerrit.cloudera.org:8080/8110 Tested-by: Impala Public Jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com>