impala

mirror of https://github.com/apache/impala.git synced 2026-01-06 06:01:03 -05:00

Files

Bikramjeet Vig 3d65f856f7 IMPALA-6621: Improve set lookup performance for in-predicate evaluation

Currently when using a SET_LOOKUP strategy for in-predicates in impala
we use a std:set object for checking membership. This patch takes a
hybrid approach based on benchmarking results and uses boost::flat_set
for int, big int, and float datatypes and boost::unordered_set for the
rest (tiny int, small int, double, string, timestamp, decimal).

The intent of this change is to fix a regression when upgrading the
toolchain to use LLVM 5.0.1 (IMPALA-5980).

Performance:
Ran a query for each data type with a large in predicate containing
500 elements on a single node with mt_dop set to 1.

+-----------+---------------+----------+---------------+----------+
| Data Type | Llvm 3 hybrid |  Llvm 3  | Llvm 5 hybrid |  Llvm 5  |
+-----------+---------------+----------+---------------+----------+
|           Table used: tpch100_parquet.lineitem                  |
+-----------+---------------+----------+--------------+-----------+
| big int   | 17s782ms      | 13s941ms | 13s201ms      | 25s604ms |
| string    | 40s750ms      | 64s      | 40s723ms      | 73s      |
| decimal   | 13s929ms      | 22s272ms | 13s710ms      | 34s338ms |
| int       | 19s368ms      | 11s308ms | 9s169ms       | 15s254ms |
+-----------+---------------+----------+--------------+-----------+
|           Table used: alltypes with 33638400 rows               |
+-----------+---------------+----------+--------------+-----------+
| double    | 5s726ms       | 5s894ms  | 5s595ms       | 6s592ms  |
| small int | 4s776ms       | 5s057ms  | 4s740ms       | 5s358ms  |
| float     | 7s223ms       | 6s397ms  | 6s287ms       | 6s926ms  |
+-----------+---------------+----------+---------------+----------+

Also added a targeted perf query that uses a large in-predicate
over a decimal column.

Testing:
- Ran expr-test and test_exprs successfully.

Change-Id: Ifd1627d779d10a16468cc3c2d0bc26a497e048df
Reviewed-on: http://gerrit.cloudera.org:8080/9570
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins

2018-03-21 00:40:10 +00:00

queries

IMPALA-6621: Improve set lookup performance for in-predicate evaluation

2018-03-21 00:40:10 +00:00

targeted-perf_core.csv

IMPALA-3200: more buffer pool end-to-end tests

2017-08-07 00:57:46 +00:00

targeted-perf_dimensions.csv

IMPALA-3200: more buffer pool end-to-end tests

2017-08-07 00:57:46 +00:00

targeted-perf_exhaustive.csv

IMPALA-3200: more buffer pool end-to-end tests