impala/testdata/workloads/tpch/tpch_core.csv at eaefbb90cefc65fc7a0520b8a18f8bf4d03d99ad

mirror of https://github.com/apache/impala.git synced 2025-12-30 03:01:44 -05:00

Files

Bikramjeet Vig 36b4ea6f65 IMPALA-1683: Allow REFRESH on a single partition

Currently the only way to refresh metadata for a partition was to refresh
the whole table. This is a relatively time consuming process especially if
there are many partitions and only one is to be refreshed.
This patch allows the client to REFRESH on a single partition by using the
following syntax:
REFRESH [database_name.]table_name PARTITION (partition_spec)

Testing:
Added parsing and authorization tests in ParserTest.java and
AuthorizationTest.java respectively. A new test file
"test_refresh_partition.py" was added for testing functionality.

Performance:
For a table with 10000 partitions and 1 file per partition

                     execResetMetadata()       Total Execution Time

Refresh Table              3795 ms                   4630 ms

Refersh Partition            42 ms                    680 ms

We see that the time to refresh improves by a factor of 90x but due to
significant overhead of about 640ms in this case the effective improvement
is about 7x. As the size of the table and number of partitions increase,
this improvement would be more significant.

Change-Id: Ia9aa25d190ada367fbebaca47ae8b2cafbea16fb
Reviewed-on: http://gerrit.cloudera.org:8080/3813
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins

2016-07-29 23:57:50 +00:00

658 B

Raw Blame History

1	# Manually created file.
2	file_format:text, dataset:tpch, compression_codec:none, compression_type:none
3	file_format:text, dataset:tpch, compression_codec:gzip, compression_type:block
4	file_format:seq, dataset:tpch, compression_codec:gzip, compression_type:block
5	file_format:seq, dataset:tpch, compression_codec:snap, compression_type:block
6	file_format:rc, dataset:tpch, compression_codec:none, compression_type:none
7	file_format:avro, dataset:tpch, compression_codec: none, compression_type: none
8	file_format:avro, dataset:tpch, compression_codec: snap, compression_type: block
9	file_format:parquet, dataset:tpch, compression_codec: none, compression_type: none

658 B Raw Blame History

658 B

Raw Blame History