mirror of
https://github.com/apache/impala.git
synced 2025-12-31 15:00:10 -05:00
We were previously wasting memory by always reading into 8MB IO buffers, even when the data read was much less than 8MB. With this patch, the IO manager picks a buffer size closer to the actual amount being read (we don't use the exact size so we can continue to recycle buffers). The minimum IO buffer size is determined via the --min_buffer_size flag, and the max IO buffer size via the --read_size flag. This technique also helps with IMPALA-652, since short columns will not use as much memory as before (we will not use considerably more memory than the size of the table). This patch also changes StringBuffer to use a doubling strategy so it doesn't end up allocating many large unused buffers, and has the scanner context use the requested length as the sync read size if it's larger than the size produced by read_past_size_cb(). These changes help prevent the boundary buffer in the scanner context from allocating excess memory. Change-Id: I0efb3b023ddfddb08bca22d5cb5f9511fb4d6c50 Reviewed-on: http://gerrit.ent.cloudera.com:8080/938 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins
10 lines
119 B
Plaintext
10 lines
119 B
Plaintext
====
|
|
---- QUERY
|
|
# string_col is 10MB
|
|
select length(string_col) from widerow;
|
|
---- RESULTS
|
|
10485760
|
|
---- TYPES
|
|
INT
|
|
====
|