impala

mirror of https://github.com/apache/impala.git synced 2026-01-04 18:00:57 -05:00

Files

Tim Armstrong 4db330e69a IMPALA-4397,IMPALA-3259: reduce codegen time and memory

A handful of fixes to codegen memory usage:
* Delete the IR module when we're done with it (it can be fairly large)
* Track the compiled code size (typically not that large, but it can add
  up if there are many fragments).
* Estimate optimisation memory requirements and track it in the memory
  tracker. This is very crude but much better than not tracking it.

A handful of fixes to improve codegen time/cost, particularly targeted
at compute stats workloads:
* Avoid over-inlining when there are many aggregate functions,
  conjuncts, etc by adding "NoInline" attributes.
* Don't codegen non-grouping merge aggregations. They will only process
  one row per Impala daemon, so codegen is not worth it.
* Make the Hll algorithm more efficient by specialising the hash function
  based on decimal width.

Limitations:
* This doesn't tackle over-inlining of large expr trees, but a similar
  approach will be used there in a follow-on patch.

Perf:
Compute stats on functional_parquet.widetable_1000_cols goes from 1min+
of codegen to ~ 5s codegen on my machine. Local perf runs of tpc-h
and targeted perf showed no regressions and some moderate improvements
(1-2%).

Also did an experiment to understand the perf consequences of disabling
inlining. I manually set CODEGEN_INLINE_EXPRS_THRESHOLD to 0, and ran:

  drop stats tpch_20_parquet.lineitem
  compute stats tpch_20_parquet.lineitem;

There was no difference in time spent in the agg node: 30.7s with
inlining, 30.5s without.

Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Reviewed-on: http://gerrit.cloudera.org:8080/4956
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins

2016-11-23 08:18:17 +00:00

aggregation_no_codegen_only.test

IMPALA-3491: Use unique db in test_scanners.py and test_aggregation.py

2016-09-13 21:57:36 +00:00

aggregation.test

IMPALA-3718: Support subset of functional-query for Kudu

2016-09-14 22:11:04 +00:00

alloc-fail-init.test

IMPALA-3350: Add some missing StringVal.is_null checks

2016-05-12 14:17:39 -07:00

alloc-fail-update.test

IMPALA-2925: Fix flaky tests in test_alloc_fail_update()

2016-02-10 00:54:11 +00:00

alter-table-set-column-stats.test

IMPALA-3634: Use $FILESYSTEM_PREFIX in alter-table-set-column-stats.test

2016-05-31 23:32:12 -07:00

alter-table.test

IMPALA-1654: General partition exprs in DDL operations.

2016-11-15 03:27:36 +00:00

analytic-fns-tpcds.test

[CDH5] Fix tpcds analytical functions test.

2014-09-26 16:56:40 -07:00

analytic-fns.test

IMPALA-4120: Incorrect results with LEAD() analytic function

2016-10-22 07:39:37 +00:00

avro-schema-changes.test

IMPALA-4372: 'Describe formatted' returns types in upper case

2016-11-15 05:38:12 +00:00

avro-schema-resolution.test

IMPALA-3687: Prefer Avro field name during schema reconciliation

2016-07-14 19:04:43 +00:00

avro-writer.test

IMPALA-1185: Make Avro and Seq writers unsupported

2014-09-26 12:28:03 -07:00

chars-formats.test

Char PARQUET, AVRO, and TEXT tests

2014-09-26 12:24:07 -07:00

chars.test

IMPALA-1636: Generalize index-based partition pruning to allow constant

2015-03-07 09:51:27 +00:00

compute-stats-decimal.test

Use unique_database fixture in test_compute_stats.py.

2016-05-12 14:17:50 -07:00

compute-stats-incremental.test

IMPALA-4170: Fix identifier quoting in COMPUTE INCREMENTAL STATS.

2016-09-21 01:24:53 +00:00

compute-stats-keywords.test

Use unique_database fixture in test_compute_stats.py.

2016-05-12 14:17:50 -07:00

compute-stats-many-partitions.test

IMPALA-1595: Add 'location' to SHOW [TABLE STATS|PARTITIONS] for HDFS tables

2015-04-21 19:27:50 +00:00

compute-stats.test

IMPALA-4397,IMPALA-3259: reduce codegen time and memory

2016-11-23 08:18:17 +00:00

corrupt-stats.test

IMPALA-1657: Rework detection and reporting of corrupt table stats.

2016-08-31 00:58:03 +00:00

create-database.test

IMPALA-3491: Use unique database fixture in test_ddl.py.

2016-09-02 02:47:02 +00:00

create-table-as-select.test

IMPALA-3491: Use unique database fixture in test_ddl.py.

2016-09-02 02:47:02 +00:00

create-table-like-file.test

IMPALA-3491: Use unique database fixture in test_ddl.py.

2016-09-02 02:47:02 +00:00

create-table-like-table.test

IMPALA-3530: Clean up test_ddl.py. Part 1.

2016-06-10 10:31:15 -07:00

create-table.test

IMPALA-3491: Use unique database fixture in test_ddl.py.

2016-09-02 02:47:02 +00:00

data-source-tables.test

IMPALA-4237: Fix materialization of 4 byte decimals in data source scan node.

2016-10-07 03:36:43 +00:00

decimal_avro.test

IMPALA-3206: Enable codegen for AVRO_DECIMAL

2016-07-14 19:04:44 +00:00

decimal.test

IMPALA-3210: last/first_value() support for IGNORE NULLS

2016-07-18 08:28:09 -07:00

delimited-latin-text.test

IMPALA-3491: Use unique_database fixture in test_delimited_text.py.

2016-06-07 09:34:30 -07:00

delimited-text.test

IMPALA-3491: Use unique_database fixture in test_delimited_text.py.

2016-06-07 09:34:30 -07:00

describe-db.test

IMPALA-3491: Use unique_database fixture in test_metadata_query_statements.py.

2016-06-07 09:34:30 -07:00

describe-path.test

IMPALA-3491: Use unique_database fixture in test_metadata_query_statements.py.

2016-06-07 09:34:30 -07:00

distinct-estimate.test

Improve Hll estimate for small cardinalities.

2015-07-16 19:38:17 +00:00

distinct.test

IMPALA-3004: Fix QueryTest tests

2016-02-19 00:03:15 -08:00

empty-build-joins.test

Add functional and targeted perf tests for joins with empty builds

2016-08-19 06:04:18 +00:00

empty.test

IMPALA-2894: Move regression test into a different .test file.

2016-01-27 20:41:45 +00:00

exchange-delays.test

Test for IMPALA-2987

2016-03-02 23:23:04 -08:00

explain-level0.test

ExecSummary

2014-06-11 03:10:11 -07:00

explain-level1.test

ExecSummary

2014-06-11 03:10:11 -07:00

explain-level2.test

ExecSummary

2014-06-11 03:10:11 -07:00

explain-level3.test

ExecSummary

2014-06-11 03:10:11 -07:00

exprs.test

IMPALA-4302,IMPALA-2379: constant expr arg fixes

2016-11-08 02:44:51 +00:00

functions-ddl.test

IMPALA-3491: Use unique database fixture in test_ddl.py.

2016-09-02 02:47:02 +00:00

grant_revoke.test

IMPALA-3133: Wrong privileges after a REVOKE ALL ON SERVER statement

2016-05-12 14:17:57 -07:00

hbase-compute-stats-incremental.test

IMPALA-3491: Merge test_hbase_metadata.py into compute_stats.py. Use unique db fixture.

2016-05-23 08:40:19 -07:00

hbase-compute-stats.test

IMPALA-3491: Merge test_hbase_metadata.py into compute_stats.py. Use unique db fixture.

2016-05-23 08:40:19 -07:00

hbase-filters.test

IMPALA-642: Conjunctive predicates on HBase table not working...

2014-05-08 13:59:00 -07:00

hbase-inline-view.test

Remove explicit references to functional_hbase tables from .test files.

2015-02-23 23:32:41 +00:00

hbase-inserts.test

Add nested types support to Create Table Like File

2015-08-22 01:46:26 +00:00

hbase-limit.test

Remove explicit references to functional_hbase tables from .test files.

2015-02-23 23:32:41 +00:00

hbase-rowkeys.test

Remove explicit references to functional_hbase tables from .test files.

2015-02-23 23:32:41 +00:00

hbase-scan-node.test

IMPALA-4047: Remove occurrences of 'CDH'/'cdh' from repo

2016-10-13 00:40:41 +00:00

hbase-show-create-table.test

Remove explicit references to functional_hbase tables from .test files.

2015-02-23 23:32:41 +00:00

hbase-show-stats.test

Remove explicit references to functional_hbase tables from .test files.

2015-02-23 23:32:41 +00:00

hbase-subquery.test

Treat HBase as a file format for functional tests

2014-01-08 10:52:36 -08:00

hbase-top-n.test

Remove explicit references to functional_hbase tables from .test files.

2015-02-23 23:32:41 +00:00

hdfs-caching-validation.test

IMPALA-1595: Add 'location' to SHOW [TABLE STATS|PARTITIONS] for HDFS tables

2015-04-21 19:27:50 +00:00

hdfs-caching.test

IMPALA-1654: General partition exprs in DDL operations.

2016-11-15 03:27:36 +00:00

hdfs-partitions.test

IMPALA-2514: DCHECK on destroying an ExprContext

2015-10-12 14:41:00 -07:00

hdfs-scan-node.test

Add partition pruning tests

2014-06-24 02:14:27 -07:00

hdfs-text-scan-with-header.test

IMPALA-1740: Add support for skip.header.line.count.

2016-05-12 14:17:46 -07:00

hdfs-text-scan.test

IMPALA-3004: Fix QueryTest tests

2016-02-19 00:03:15 -08:00

hdfs-tiny-scan.test

Fix IMPALA-129, IMPALA-534, and other scanner bugs.

2014-01-08 10:52:14 -08:00

hidden-files.test

IMPALA-3491: Use unique_database fixture in test_hidden_files.py.

2016-05-12 14:17:59 -07:00

impala-demo.test

Test data loading framework improvements

2014-01-08 10:46:49 -08:00

inline-view-limit.test

IMPALA-3004: Fix QueryTest tests

2016-02-19 00:03:15 -08:00

inline-view.test

IMPALA-2375: Disabling/moving tests that don't work with the old HJ

2015-10-07 14:47:40 -07:00

insert_null.test

Add nested types support to Create Table Like File

2015-08-22 01:46:26 +00:00

insert_overwrite.test

Added SHOW TABLE/COLUMN STATS command.

2014-01-08 10:53:51 -08:00

insert_parquet_invalid_codec.test

Enable isilon end to end tests for Impala.

2015-05-27 22:25:12 +00:00

insert_part_key.test

Throw error on unrecognized test sections.

2014-12-02 18:08:09 -08:00

insert_permutation.test

IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems

2016-05-12 14:17:49 -07:00

insert.test

IMPALA-2523: Make HdfsTableSink aware of clustered input

2016-11-22 02:51:20 +00:00

invalid_header.test

IMPALA-3004: Fix QueryTest tests

2016-02-19 00:03:15 -08:00

java-udf.test

IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

2016-11-09 03:27:12 +00:00

joins-against-hbase.test

Remove explicit references to functional_hbase tables from .test files.

2015-02-23 23:32:41 +00:00

joins-partitioned.test

IMPALA-2529: expr test case fails on non-partitioned HJ

2015-10-12 14:41:05 -07:00

joins.test

IMPALA-3884: Support TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue()

2016-10-25 05:52:33 +00:00

kudu_alter.test

IMPALA-1654: General partition exprs in DDL operations.

2016-11-15 03:27:36 +00:00

kudu_create.test

IMPALA-4466: Improve Kudu CRUD test coverage

2016-11-17 02:54:30 +00:00

kudu_delete.test

IMPALA-3726: Add support for Kudu-specific column options

2016-11-18 11:41:01 +00:00

kudu_describe.test

IMPALA-3809: Show Kudu-specific column metadata in DESCRIBE.

2016-11-22 23:06:05 +00:00

kudu_insert.test

IMPALA-3726: Add support for Kudu-specific column options

2016-11-18 11:41:01 +00:00

kudu_partition_ddl.test

IMPALA-3724: Support Kudu non-covering range partitions

2016-11-04 22:02:22 +00:00

kudu_stats.test

IMPALA-3809: Show Kudu-specific column metadata in DESCRIBE.

2016-11-22 23:06:05 +00:00

kudu_update.test

IMPALA-3726: Add support for Kudu-specific column options

2016-11-18 11:41:01 +00:00

kudu_upsert.test

IMPALA-3726: Add support for Kudu-specific column options

2016-11-18 11:41:01 +00:00

kudu-scan-node.test

IMPALA-4408: Omit null bytes for Kudu scans with no nullable slots.

2016-11-01 01:47:30 +00:00

kudu-timeouts-catalogd.test

IMPALA-3771: Expose kudu client timeout and set default

2016-11-05 06:43:45 +00:00

kudu-timeouts-impalad.test

IMPALA-3771: Expose kudu client timeout and set default

2016-11-05 06:43:45 +00:00

large_strings.test

IMPALA-3350: Add some missing StringVal.is_null checks

2016-05-12 14:17:39 -07:00

legacy-joins-aggs.test

Fail queries that require a SubplanNode when using legacy joins and aggs.

2015-09-10 04:50:31 +00:00

libs_with_same_filenames.test

IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

2016-11-09 03:27:12 +00:00

limit.test

Remove explicit references to functional_hbase tables from .test files.

2015-02-23 23:32:41 +00:00

load-java-udfs.test

IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

2016-11-09 03:27:12 +00:00

load.test

IMPALA-3729: batch_size=1 coverage for avro scanner

2016-07-19 23:30:02 -07:00

local-filesystem.test

IMPALA-3491: Use unique_database fixture in test_local_fs.py

2016-06-08 16:30:32 -07:00

max-nesting-depth.test

IMPALA-3491: Use unique database fixture in test_nested_types.py

2016-09-03 00:39:07 +00:00

misc.test

IMPALA-3812: Fix error message for unsupported types

2016-11-17 05:31:34 +00:00

mixed-format.test

Change the way data is loaded

2014-01-08 10:48:09 -08:00

mt-dop-compute-stats.test

Add functional tests for compute stats with mt_dop > 0.

2016-11-03 11:59:07 +00:00

mt-dop-parquet.test

IMPALA-4369: Avoid DCHECK in Parquet scanner with MT_DOP > 0.

2016-10-26 22:21:19 +00:00

mt-dop.test

IMPALA-4285/IMPALA-4286: Fixes for Parquet scanner with MT_DOP > 0.

2016-10-22 10:24:24 +00:00

multiple-filesystems.test

IMPALA-3577, IMPALA-3486: Partitions on multiple filesystems breaks with S3_SKIP_INSERT_STAGING

2016-05-31 23:32:11 -07:00

nested-types-runtime.test

IMPALA-3311: fix string data coming out of aggs in subplans

2016-05-12 23:06:36 -07:00

nested-types-scanner-array-materialization.test

Nested types: read and materialize nested types in Parquet scanner

2015-09-02 19:23:54 +00:00

nested-types-scanner-basic.test

Nested types: read and materialize nested types in Parquet scanner

2015-09-02 19:23:54 +00:00

nested-types-scanner-maps.test

Regenerate complextypestbl files to include nested_struct.g field

2016-04-01 05:06:38 +00:00

nested-types-scanner-multiple-materialization.test

Nested types: read and materialize nested types in Parquet scanner

2015-09-02 19:23:54 +00:00

nested-types-scanner-position.test

Nested types: read and materialize nested types in Parquet scanner

2015-09-02 19:23:54 +00:00

nested-types-subplan.test

IMPALA-2894: Move regression test into a different .test file.

2016-01-27 20:41:45 +00:00

nested-types-tpch.test

IMPALA-4049: fix empty batch handling NLJ build side

2016-08-31 21:20:29 +00:00

nested-types-with-clause.test

IMPALA-2375: Disabling/moving tests that don't work with the old HJ

2015-10-07 14:47:40 -07:00

null_data.test

Add nested types support to Create Table Like File

2015-08-22 01:46:26 +00:00

outer-joins.test

IMPALA-2950: Fully resolve exprs before wrapping with TupleIsNullPredicates.

2016-02-10 07:16:58 +00:00

overflow.test

IMPALA-724: Support infinite / nan values in text files

2014-05-08 12:28:53 -07:00

parquet-abort-on-error.test

IMPALA-2736: Basic column-wise slot materialization in Parquet scanner.

2016-05-12 14:17:48 -07:00

parquet-continue-on-error.test

IMPALA-(3895,3859): Don't log file data on parse errors

2016-08-25 10:20:36 +00:00

parquet-corrupt-rle-counts-abort.test

IMPALA-3754: fix TestParquet.test_corrupt_rle_counts flakiness

2016-06-20 15:37:18 -07:00

parquet-corrupt-rle-counts.test

IMPALA-3754: fix TestParquet.test_corrupt_rle_counts flakiness

2016-06-20 15:37:18 -07:00

parquet-resolution-by-name.test

Query options not correctly reset after each test.

2016-05-12 14:17:38 -07:00

parquet-zero-rows.test

IMPALA-3943: Address post-merge comments.

2016-10-14 05:41:22 +00:00

parquet.test

IMPALA-3764,3914: fuzz test HDFS scanners and fix parquet bugs found

2016-08-11 08:42:41 +00:00

partition-col-types.test

IMPALA-3491: Use unique database fixture in test_partitioning.py

2016-09-08 04:31:27 +00:00

partition-ddl-predicates-all-fs.test

IMPALA-4502: test_partition_ddl_predicates breaks on non-HDFS filesystems

2016-11-22 00:42:57 +00:00

partition-ddl-predicates-hdfs-only.test

IMPALA-4502: test_partition_ddl_predicates breaks on non-HDFS filesystems

2016-11-22 00:42:57 +00:00

runtime_filters_wait.test

IMPALA-3480: Add query options for min/max filter sizes

2016-05-12 23:06:35 -07:00

runtime_filters.test

IMPALA-4054: Remove serial test workarounds for IMPALA-2479.

2016-09-02 02:19:52 +00:00

runtime_row_filters_phj.test

IMPALA-3610: Account for memory used by filters in the coordinator

2016-09-01 02:35:41 +00:00

runtime_row_filters.test

IMPALA-4054: Remove serial test workarounds for IMPALA-2479.

2016-09-02 02:19:52 +00:00

scanners.test

IMPALA-4153: Fix count(*) on all blank('') columns - test

2016-11-03 23:08:56 +00:00

semi-joins-exhaustive.test

IMPALA-2375: Unblock old hj/agg test runs

2015-09-27 15:13:32 -07:00

semi-joins.test

IMPALA-3491: Use unique database fixture in test_join_queries.py.

2016-08-31 03:12:30 +00:00

seq-writer.test

Adding SEQUENCEFILE compressed record format

2014-11-19 17:21:36 -08:00

set.test

IMPALA-3535: Ignore invalid per-pool default query options

2016-05-17 10:09:05 -07:00

show-create-table.test

IMPALA-783: add show create view as alias for show create table

2016-01-20 04:32:21 +00:00

show-data-sources.test

IMPALA-3786: Replace "cloudera" with "apache" (part 2)

2016-09-29 21:14:13 +00:00

show-stats.test

IMPALA-3491: Use unique_database fixture in test_metadata_query_statements.py.

2016-06-07 09:34:30 -07:00

show.test

IMPALA-3711: Remove unnecessary privilege checks in getDbsMetadata()

2016-07-07 10:41:29 -07:00

single-node-large-sorts.test

IMPALA-3344: Simplify sorter and document/enforce invariants.

2016-06-02 21:33:08 -07:00

single-node-nlj-exhaustive.test

IMPALA-2824: Restore query options after each test.

2016-01-26 03:13:05 +00:00

single-node-nlj.test

IMPALA-4180: Synchronize accesses to RuntimeState::reader_contexts_

2016-09-30 01:21:05 +00:00

sort.test

IMPALA-3344: Simplify sorter and document/enforce invariants.

2016-06-02 21:33:08 -07:00

spilling.test

IMPALA-2932: Extend DistributedPlanner to account for hash table build cost

2016-08-29 16:44:22 +00:00

strict-mode-abort.test

IMPALA-(3895,3859): Don't log file data on parse errors

2016-08-25 10:20:36 +00:00

strict-mode.test

IMPALA-(3895,3859): Don't log file data on parse errors

2016-08-25 10:20:36 +00:00

subplans.test

IMPALA-2368: Prevent double Reset() with nested subplans.

2015-09-27 15:13:28 -07:00

subquery.test

IMPALA-3232: Allow not-exists uncorrelated subqueries

2016-05-12 23:06:36 -07:00

test-unmatched-schema.test

IMPALA-3729: batch_size=1 coverage for avro scanner

2016-07-19 23:30:02 -07:00

text-bzip-scan.test

IMPALA-1886/IMPALA-2154: Add support for multi-stream bz2/gzip compressed files.

2016-02-28 21:31:37 -08:00

text-writer.test

IMPALA-1185: Make Avro and Seq writers unsupported

2014-09-26 12:28:03 -07:00

top-n.test

IMPALA-3412: fix CHAR codegen crash in tuple comparator

2016-05-12 14:17:45 -07:00

truncate-table.test

IMPALA-3491: Use unique database fixture in test_ddl.py.

2016-09-02 02:47:02 +00:00

uda-mem-limit.test

IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

2016-11-09 03:27:12 +00:00

uda.test

IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

2016-11-09 03:27:12 +00:00

udf-codegen-required.test

IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

2016-11-09 03:27:12 +00:00

udf-errors.test

IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

2016-11-09 03:27:12 +00:00

udf-init-close.test

IMPALA-1030: HdfsTableSink was evaluating exprs in Prepare()

2014-06-12 02:23:20 -07:00

udf-mem-limit.test

IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

2016-11-09 03:27:12 +00:00

udf.test

IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

2016-11-09 03:27:12 +00:00

union.test

IMPALA-4336: Cast exprs after unnesting union operands.

2016-11-03 08:59:45 +00:00

use.test

Change the way data is loaded

2014-01-08 10:48:09 -08:00

values.test

IMPALA-2749: Fix decimal multiplication overflow

2016-01-23 23:59:27 +00:00

views-compatibility.test

IMPALA-995: Add plan hints embedded in comments and preserve them in views.

2014-09-18 00:36:03 -07:00

views-ddl.test

IMPALA-3491: Use unique database fixture in test_ddl.py.

2016-09-02 02:47:02 +00:00

views.test

Added order by query tests

2014-06-20 13:35:10 -07:00

wide-row.test

IMPALA-525: Adjust IO buffer size based on read length and other memory fixes

2014-01-08 10:54:01 -08:00

with-clause.test

IMPALA-2375: Disabling/moving tests that don't work with the old HJ

2015-10-07 14:47:40 -07:00