mirror of
https://github.com/apache/impala.git
synced 2026-01-27 06:10:53 -05:00
Currently, we use a std::unordered_set<int64_t> for all numeric types (including DATE type). It's a waste of space for small data types like tinyint, smallint, int, etc. This patch extends the base InListFilter class with native implementations for different data types. For string type in-list filters, this patch uses impala::StringValue instead of std::string. This simplifies the Insert() method, which improves the codegen time. To use impala::StringValue, this patch switches the set implementation to boost::unordered_set. Same as what we use in InPredicate. Another improvement of using impala::StringValue is that we can easily maintain the strings in MemPool. When inserting a new batch of values, the new values are inserted into a temp set. String pointers still reference to the original tuple values. At the end of processing each batch, MaterializeValues() is invoked to copy the strings into the filter's own mem pool. This is more memory-friendly than the original approach since we can allocate the string batch at once. Tests: - Add unit tests for different types of in-list filters Change-Id: Id434a542b2ced64efa3bfc974cb565b94a4193e9 Reviewed-on: http://gerrit.cloudera.org:8080/18433 Reviewed-by: Qifan Chen <qchen@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>