mirror of
https://github.com/apache/impala.git
synced 2026-01-01 00:00:20 -05:00
This is an experimental implementation of external sorting. This patch includes the following additions:
(1) creation and implementation of the Sorter interface, which can sort Impala Tuples.
(2) normalization of Tuples to allow memcmp-able sorting.
(3) a testing framework for the Sorter,
(4) a benchmark to compare the current state of the Sorter with other sorts,
(5) an implementation of a Vector which can store data whose size is only known at runtime,
(6) a sorting algorithm (basically a dumbed down STL sort) which can operate over such a vector,
(7) implementation of a simple in-memory Merger, and
(8) logic to stream blocks of memory in and out of memory for the actual external merging.
I have a local branch for experimental optimizations and benchmarking -- this should be considered
a "basic", working sort.
The following optimizations have been implemented:
(i) Optionally extracting keys instead of writing them in place.
(ii) Optionally opportunistically parallelize run building (sorting & prepare for output).
(iii) Maximize disk IO and minimize buffer recycling by writing buffers out, but also keeping
them in memory until right when they're needed.
(iv) Prepare auxililary data backwards so the buffers can be released as we go, and still
go out in an order which preserves the first buffers of the run.
(v) Always merge maximum number of runs at a time, taking from the next merge level if
available.
Change-Id: I1d7304d54d73152da929b1efffc1e851e5fb8fd4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/126
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Aaron Davidson <aaron.davidson@cloudera.com>
52 lines
1.1 KiB
Plaintext
52 lines
1.1 KiB
Plaintext
====
|
|
---- QUERY
|
|
# Basic test on querying a view.
|
|
select count(int_col), count(bigint_col) from functional.alltypes_view
|
|
---- RESULTS
|
|
7300,7300
|
|
---- TYPES
|
|
BIGINT, BIGINT
|
|
====
|
|
---- QUERY
|
|
# Using views in union.
|
|
select bigint_col, string_col from functional.alltypes_view order by id limit 2
|
|
union all (select * from functional.complex_view) order by 1, 2 limit 10
|
|
---- RESULTS
|
|
0,'0'
|
|
2,'0'
|
|
2,'1'
|
|
10,'1'
|
|
---- TYPES
|
|
BIGINT, STRING
|
|
====
|
|
---- QUERY
|
|
# Using a view in subquery.
|
|
select t.* from (select * from functional.complex_view) t
|
|
order by t.abc limit 10;
|
|
---- RESULTS
|
|
2,'1'
|
|
2,'0'
|
|
---- TYPES
|
|
BIGINT, STRING
|
|
====
|
|
---- QUERY
|
|
# Using multiple views in a join.
|
|
select count(*) from functional.alltypes_view t1, functional.alltypes_view_sub t2
|
|
where t1.id < 10 and t2.x < 5 and t1.id = t2.x
|
|
---- RESULTS
|
|
3650
|
|
---- TYPES
|
|
BIGINT
|
|
====
|
|
---- QUERY
|
|
# Self-join of a view to make sure the join op is properly set
|
|
# in the cloned view instances.
|
|
select count(*) from functional.alltypes_view t1
|
|
left outer join functional.alltypes_view t2 on t1.id+10 = t2.id
|
|
full outer join functional.alltypes_view t3 on t2.id+20 = t3.id
|
|
---- RESULTS
|
|
7330
|
|
---- TYPES
|
|
BIGINT
|
|
====
|