Today, Impala populates the 'rawDataSize' property
during COMPUTE STATS for the purpose of extrapolating
row counts based on file sizes.
After this patch Impala will populate 'totalSize' instead of
'rawDataSize'. The 'rawDataSize' is not populated or used.
Intended meaning/use of tblproperties:
- rawDataSize' is the estimated in-memory size of a table
(without encoding and compression)
- 'totalSize' represents the on-disk size
Using the fields correctly is important for compatibility
with other users of the HMS such as Hive and SparkSQL.
For example, SparkSQL relies on the 'totalSize' for
join ordering.
Testing:
- core/hdfs run passed
Change-Id: If7c2c4e1e99b297c849f9f0d18b2bef34ad811c6
Reviewed-on: http://gerrit.cloudera.org:8080/8110
Tested-by: Impala Public Jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>