Taras Bobrovytsky
529a5f99b9
IMPALA-4787: Optimize APPX_MEDIAN() memory usage
Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.
This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.
Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.
Perf Benchrmark (about 35,000 elements per bucket):
SELECT MAX(a) from (
SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t
BEFORE: 11s067ms
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
-------------------------------------------------------------------------------------------------------------------------
06:AGGREGATE 1 124.726us 124.726us 1 1 28.00 KB -1.00 B FINALIZE
05:EXCHANGE 1 29.544us 29.544us 3 1 0 -1.00 B UNPARTITIONED
02:AGGREGATE 3 86.406us 120.372us 3 1 44.00 KB 10.00 MB
04:AGGREGATE 3 1s840ms 2s824ms 2.00K -1 1.02 GB 128.00 MB FINALIZE
03:EXCHANGE 3 1s163ms 1s989ms 6.00K -1 0 0 HASH(c1)
01:AGGREGATE 3 3s356ms 3s416ms 6.00K -1 1.95 GB 128.00 MB STREAMING
00:SCAN HDFS 3 64.962ms 65.490ms 65.54M -1 25.97 MB 64.00 MB tpcds_10_parquet.benchmark
AFTER: 9s465ms
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------
06:AGGREGATE 1 73.961us 73.961us 1 1 28.00 KB -1.00 B FINALIZE
05:EXCHANGE 1 18.101us 18.101us 3 1 0 -1.00 B UNPARTITIONED
02:AGGREGATE 3 75.795us 83.969us 3 1 44.00 KB 10.00 MB
04:AGGREGATE 3 1s608ms 2s683ms 2.00K -1 1.02 GB 128.00 MB FINALIZE
03:EXCHANGE 3 826.683ms 1s322ms 6.00K -1 0 0 HASH(c1)
01:AGGREGATE 3 2s457ms 2s672ms 6.00K -1 3.14 GB 128.00 MB STREAMING
00:SCAN HDFS 3 81.514ms 89.056ms 65.54M -1 25.94 MB 64.00 MB tpcds_10_parquet.benchmark
Memory Benchmark (about 12 elements per bucket):
SELECT MAX(a) FROM (
SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
FROM tpcds_parquet.store_sales
GROUP BY ss_customer_sk) t
BEFORE: 7s477ms
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
---------------------------------------------------------------------------------------------------------------------
06:AGGREGATE 1 114.686us 114.686us 1 1 28.00 KB -1.00 B FINALIZE
05:EXCHANGE 1 18.214us 18.214us 3 1 0 -1.00 B UNPARTITIONED
02:AGGREGATE 3 147.055us 165.464us 3 1 28.00 KB 10.00 MB
04:AGGREGATE 3 2s043ms 2s147ms 14.82K -1 4.94 GB 128.00 MB FINALIZE
03:EXCHANGE 3 840.528ms 943.254ms 15.61K -1 0 0 HASH(ss_customer_sk)
01:AGGREGATE 3 1s769ms 1s869ms 15.61K -1 5.32 GB 128.00 MB STREAMING
00:SCAN HDFS 3 17.941ms 37.109ms 183.59K -1 1.94 MB 16.00 MB tpcds_parquet.store_sales
AFTER: 434ms
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
---------------------------------------------------------------------------------------------------------------------
06:AGGREGATE 1 125.915us 125.915us 1 1 28.00 KB -1.00 B FINALIZE
05:EXCHANGE 1 72.179us 72.179us 3 1 0 -1.00 B UNPARTITIONED
02:AGGREGATE 3 79.054us 83.385us 3 1 28.00 KB 10.00 MB
04:AGGREGATE 3 6.559ms 7.669ms 14.82K -1 17.32 MB 128.00 MB FINALIZE
03:EXCHANGE 3 67.370us 85.068us 15.60K -1 0 0 HASH(ss_customer_sk)
01:AGGREGATE 3 19.245ms 24.472ms 15.60K -1 9.48 MB 128.00 MB STREAMING
00:SCAN HDFS 3 53.173ms 55.844ms 183.59K -1 1.18 MB 16.00 MB tpcds_parquet.store_sales
Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Reviewed-on: http://gerrit.cloudera.org:8080/6025
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-16 05:59:40 +00:00
..
2016-09-13 21:57:36 +00:00
2017-03-16 05:59:40 +00:00
2017-03-16 05:59:40 +00:00
2016-02-10 00:54:11 +00:00
2017-02-14 05:56:33 +00:00
2017-02-22 23:10:34 +00:00
2017-02-22 23:10:34 +00:00
2014-09-26 16:56:40 -07:00
2016-11-24 00:23:58 +00:00
2016-12-07 09:45:11 +00:00
2016-07-14 19:04:43 +00:00
2014-09-26 12:28:03 -07:00
2014-09-26 12:24:07 -07:00
2015-03-07 09:51:27 +00:00
2016-11-24 08:03:39 +00:00
2017-02-22 23:10:34 +00:00
2017-02-22 23:10:34 +00:00
2016-05-12 14:17:50 -07:00
2015-04-21 19:27:50 +00:00
2017-02-22 23:10:34 +00:00
2016-08-31 00:58:03 +00:00
2016-09-02 02:47:02 +00:00
2016-09-02 02:47:02 +00:00
2016-09-02 02:47:02 +00:00
2016-06-10 10:31:15 -07:00
2016-09-02 02:47:02 +00:00
2016-10-07 03:36:43 +00:00
2016-07-14 19:04:44 +00:00
2017-03-02 20:12:05 +00:00
2017-02-22 06:31:14 +00:00
2016-06-07 09:34:30 -07:00
2016-06-07 09:34:30 -07:00
2016-06-07 09:34:30 -07:00
2016-06-07 09:34:30 -07:00
2015-07-16 19:38:17 +00:00
2016-02-19 00:03:15 -08:00
2016-08-19 06:04:18 +00:00
2016-01-27 20:41:45 +00:00
2016-03-02 23:23:04 -08:00
2014-06-11 03:10:11 -07:00
2014-06-11 03:10:11 -07:00
2014-06-11 03:10:11 -07:00
2014-06-11 03:10:11 -07:00
2017-02-15 01:33:23 +00:00
2016-09-02 02:47:02 +00:00
2017-02-04 01:47:23 +00:00
2017-02-22 23:10:34 +00:00
2017-02-22 23:10:34 +00:00
2014-05-08 13:59:00 -07:00
2015-02-23 23:32:41 +00:00
2015-08-22 01:46:26 +00:00
2015-02-23 23:32:41 +00:00
2015-02-23 23:32:41 +00:00
2016-10-13 00:40:41 +00:00
2015-02-23 23:32:41 +00:00
2017-02-22 23:10:34 +00:00
2014-01-08 10:52:36 -08:00
2015-02-23 23:32:41 +00:00
2015-04-21 19:27:50 +00:00
2016-11-15 03:27:36 +00:00
2015-10-12 14:41:00 -07:00
2014-06-24 02:14:27 -07:00
2016-05-12 14:17:46 -07:00
2016-02-19 00:03:15 -08:00
2014-01-08 10:52:14 -08:00
2016-05-12 14:17:59 -07:00
2014-01-08 10:46:49 -08:00
2016-02-19 00:03:15 -08:00
2015-10-07 14:47:40 -07:00
2015-08-22 01:46:26 +00:00
2014-01-08 10:53:51 -08:00
2015-05-27 22:25:12 +00:00
2014-12-02 18:08:09 -08:00
2016-05-12 14:17:49 -07:00
2016-12-17 05:37:43 +00:00
2016-02-19 00:03:15 -08:00
2016-11-09 03:27:12 +00:00
2015-02-23 23:32:41 +00:00
2015-10-12 14:41:05 -07:00
2016-10-25 05:52:33 +00:00
2017-03-03 01:29:14 +00:00
2016-12-08 04:53:38 +00:00
2016-12-07 07:31:16 +00:00
2016-12-07 07:31:16 +00:00
2016-12-07 07:31:16 +00:00
2016-12-10 00:05:50 +00:00
2016-12-07 05:11:13 +00:00
2016-12-06 10:41:53 +00:00
2016-12-07 07:31:16 +00:00
2016-12-07 07:31:16 +00:00
2016-12-07 07:31:16 +00:00
2016-11-05 06:43:45 +00:00
2017-02-15 01:33:23 +00:00
2015-09-10 04:50:31 +00:00
2016-11-09 03:27:12 +00:00
2015-02-23 23:32:41 +00:00
2016-11-09 03:27:12 +00:00
2016-07-19 23:30:02 -07:00
2016-06-08 16:30:32 -07:00
2016-09-03 00:39:07 +00:00
2016-11-17 05:31:34 +00:00
2014-01-08 10:48:09 -08:00
2016-11-03 11:59:07 +00:00
2016-12-02 01:46:55 +00:00
2016-12-02 01:46:55 +00:00
2016-12-02 01:46:55 +00:00
2016-05-31 23:32:11 -07:00
2017-03-01 02:00:19 +00:00
2016-05-12 23:06:36 -07:00
2015-09-02 19:23:54 +00:00
2015-09-02 19:23:54 +00:00
2016-04-01 05:06:38 +00:00
2015-09-02 19:23:54 +00:00
2015-09-02 19:23:54 +00:00
2016-01-27 20:41:45 +00:00
2016-08-31 21:20:29 +00:00
2015-10-07 14:47:40 -07:00
2015-08-22 01:46:26 +00:00
2016-12-08 02:20:50 +00:00
2016-12-08 02:20:50 +00:00
2016-02-10 07:16:58 +00:00
2017-03-03 01:43:42 +00:00
2017-03-03 02:34:10 +00:00
2016-05-12 14:17:48 -07:00
2017-03-09 05:07:44 +00:00
2017-03-09 05:07:44 +00:00
2016-08-25 10:20:36 +00:00
2016-06-20 15:37:18 -07:00
2016-06-20 15:37:18 -07:00
2017-03-13 17:37:15 +00:00
2017-03-03 10:20:07 +00:00
2016-10-14 05:41:22 +00:00
2016-12-08 02:20:50 +00:00
2016-09-08 04:31:27 +00:00
2016-12-22 10:45:39 +00:00
2016-11-22 00:42:57 +00:00
2016-05-12 23:06:35 -07:00
2016-09-02 02:19:52 +00:00
2016-09-01 02:35:41 +00:00
2016-09-02 02:19:52 +00:00
2016-11-03 23:08:56 +00:00
2015-09-27 15:13:32 -07:00
2016-08-31 03:12:30 +00:00
2014-11-19 17:21:36 -08:00
2017-01-05 02:26:24 +00:00
2017-01-12 20:41:35 +00:00
2016-09-29 21:14:13 +00:00
2017-02-22 23:10:34 +00:00
2016-07-07 10:41:29 -07:00
2016-06-02 21:33:08 -07:00
2016-01-26 03:13:05 +00:00
2016-09-30 01:21:05 +00:00
2016-06-02 21:33:08 -07:00
2017-02-06 20:22:33 +00:00
2017-03-03 01:43:42 +00:00
2017-03-03 01:43:42 +00:00
2015-09-27 15:13:28 -07:00
2016-05-12 23:06:36 -07:00
2016-07-19 23:30:02 -07:00
2016-02-28 21:31:37 -08:00
2014-09-26 12:28:03 -07:00
2016-05-12 14:17:45 -07:00
2017-02-22 23:10:34 +00:00
2016-11-09 03:27:12 +00:00
2017-02-10 02:18:32 +00:00
2016-11-09 03:27:12 +00:00
2016-11-09 03:27:12 +00:00
2016-12-08 04:53:53 +00:00
2016-12-08 04:53:53 +00:00
2016-11-09 03:27:12 +00:00
2016-12-08 04:53:53 +00:00
2017-02-03 23:35:25 +00:00
2016-11-03 08:59:45 +00:00
2014-01-08 10:48:09 -08:00
2016-01-23 23:59:27 +00:00
2014-09-18 00:36:03 -07:00
2016-09-02 02:47:02 +00:00
2014-06-20 13:35:10 -07:00
2014-01-08 10:54:01 -08:00
2015-10-07 14:47:40 -07:00