mirror of
https://github.com/apache/impala.git
synced 2026-01-06 06:01:03 -05:00
Currently when using a SET_LOOKUP strategy for in-predicates in impala we use a std:set object for checking membership. This patch takes a hybrid approach based on benchmarking results and uses boost::flat_set for int, big int, and float datatypes and boost::unordered_set for the rest (tiny int, small int, double, string, timestamp, decimal). The intent of this change is to fix a regression when upgrading the toolchain to use LLVM 5.0.1 (IMPALA-5980). Performance: Ran a query for each data type with a large in predicate containing 500 elements on a single node with mt_dop set to 1. +-----------+---------------+----------+---------------+----------+ | Data Type | Llvm 3 hybrid | Llvm 3 | Llvm 5 hybrid | Llvm 5 | +-----------+---------------+----------+---------------+----------+ | Table used: tpch100_parquet.lineitem | +-----------+---------------+----------+--------------+-----------+ | big int | 17s782ms | 13s941ms | 13s201ms | 25s604ms | | string | 40s750ms | 64s | 40s723ms | 73s | | decimal | 13s929ms | 22s272ms | 13s710ms | 34s338ms | | int | 19s368ms | 11s308ms | 9s169ms | 15s254ms | +-----------+---------------+----------+--------------+-----------+ | Table used: alltypes with 33638400 rows | +-----------+---------------+----------+--------------+-----------+ | double | 5s726ms | 5s894ms | 5s595ms | 6s592ms | | small int | 4s776ms | 5s057ms | 4s740ms | 5s358ms | | float | 7s223ms | 6s397ms | 6s287ms | 6s926ms | +-----------+---------------+----------+---------------+----------+ Also added a targeted perf query that uses a large in-predicate over a decimal column. Testing: - Ran expr-test and test_exprs successfully. Change-Id: Ifd1627d779d10a16468cc3c2d0bc26a497e048df Reviewed-on: http://gerrit.cloudera.org:8080/9570 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins