Commit Graph

1125 Commits

Author SHA1 Message Date
Yongjun Zhang
dbe0f86e00 IMPALA-6442: Misleading file offset reporting in error messages.
The error message described in IMPALA-6442 incorrectly reported the file offset where the
Parquet footer starts, as if the offset is counted from the file end instead of from the
file beginning. The fix changed the reported file offset to be counted from the beginning
of the Parquet file.

Testing:
Create a small table that contains one row of data with a single column that's bigint and
store it as Parquet. Manually changed the footer size field to be
  1) smaller than the original footer size by 1, to trigger the error message fixed by
this jira to be printed, to verify that the fix functions correctly;
  2) bigger than the file size, thus to trigger another related error message to be
printed.

Change-Id: I35235e99ea9ceb0d31961dd3b8069f7194f5a2de
Reviewed-on: http://gerrit.cloudera.org:8080/11379
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-09-10 16:09:41 +00:00
Fredy Wijaya
9934b473b7 IMPALA-6373: Allow primitive type widening on parquet tables
This patch implements support for primitive type widening on parquet
tables. It only supports conversion to those types without any loss of
precision.
- tinyint (INT32) -> smallint (INT32), int (INT32), bigint (INT64),
                     double (DOUBLE)
- smallint (INT32) -> int (INT32), bigint (INT64), double (DOUBLE)
- int (INT32) -> bigint (INT64), double (DOUBLE)
- float (FLOAT) -> double (DOUBLE)

Testing:
- Added BE test
- Added E2E test
- Ran core tests

Change-Id: If93394b035c64cf6fc5f37b54d29c034cc1f86e4
Reviewed-on: http://gerrit.cloudera.org:8080/11268
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-23 15:55:53 +00:00
Zoltan Borok-Nagy
e27954a5aa IMPALA-6709: Simplify tests that copy local files to tables
We had quite a few tests that created a table and used
"hdfs dfs -copyFromLocal" to copy data files to the
warehouse directory for this table.

This operation needs some boilerplate code that I
refactored to the new functions called
create_table_from_parquet() and
create_table_and_copy_files().

Change-Id: Ie00a4561825facf8abe2e8e74a6b6e93194f416f
Reviewed-on: http://gerrit.cloudera.org:8080/11127
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-22 18:08:20 +00:00
Vincent Tran
3ff4cde772 IMPALA-6844: Fix possible NULL dereference in to_date() builtin
If result.ptr allocation fails for some reason inside the StringVal
constructor, we still overwrite result.len and continue.

This change checks that the StringVal pointer is not NULL before
dereferencing it, and returns NULL if it is.

Testing: Added a test case of the to_date() function to
alloc-fail-init.test to leverage the fault injector
 --stress_fn_ctx_alloc.

Change-Id: I14cfb29a592885bb2f39958c8644f93db5220a68
Reviewed-on: http://gerrit.cloudera.org:8080/11286
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2018-08-21 22:06:02 +00:00
Tim Armstrong
bddd7def99 IMPALA-7465: fix test_kudu_scan_mem_usage
The issue was that the row batch queue could grow a lot if the consumer
was slow.

Also add an additional test to exercise the OOM code path in Kudu for
completeness.

Testing:
Added sleep to kudu-scan-node.cc that reproduced the problem.
Looped modified test to flush out flakiness.

Change-Id: Ic4a95b6b6d96a447df68ef4912a86f1e11f219ca
Reviewed-on: http://gerrit.cloudera.org:8080/11285
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-21 21:38:11 +00:00
Todd Lipcon
7c0450d6d4 IMPALA-7439. CREATE DATABASE creates a catalog entry with missing location
Previously, when creating a new database, the CatalogOpExecutor would
create an HMS Database object, issue the HMS createDatabase call, and
then create a Catalog entry from that same Database object. The
resulting Catalog entry would be missing certain fields that are
auto-created by the HMS itself, most importantly the location field.

The code for CTAS seems to have contained a workaround for this issue
ever since catalogd was first introduced: rather than using the location
stored in the Db object, it would re-fetch the Database from HMS.

Now that this is fixed, that workaround could be removed and some code
simplified.

A new test verifies that a newly-created database has the appropriate
location, and existing CTAS tests verify that functionality didn't
regress.

Change-Id: I13df31cee1e5768b073e0e35c4c16ebf1892be23
Reviewed-on: http://gerrit.cloudera.org:8080/11229
Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-17 00:54:28 +00:00
Tim Armstrong
7ccf736908 IMPALA-7096: restore scanner thread memory heuristics
This restores some of the heuristics removed in IMPALA-4835 that can
help scans from hitting OOM conditions. The heuristics are implemented
at the query level rather than in each scan node in isolation.

Introduce a ScannerMemLimiter class that belongs to the QueryState that
tracks the amount of memory estimated to be consumed for all scanner
threads running for the query on the current backend.

Also check soft memory limits to see if scanner threads should be
started or the current scanner thread should stop.

The long-term plan is to switch to the MT scan node implementations.
When that happens this code can be removed. In the meantime this
code is imperfect but will help avoid OOM in many scenarios.

Testing:
Added regression tests for HDFS and Kudu where we previously could
run out of memory with a low mem_limit.

Manual testing:
* Ran query tests with --thread_creation_fault_injection=true for a
  bit, confirmed no crashes.
* ran single-node stress test for Kudu and Parquet for 10-20 min each.

Change-Id: Ib9907fa8c4d2b0b85f67f4f160899c1c258ad82b
Reviewed-on: http://gerrit.cloudera.org:8080/11103
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-16 21:25:00 +00:00
Tim Armstrong
29905d1ea4 IMPALA-7442: reduce mem requirement of semi-joins-exhaustive
The test started running into IMPALA-7446, maybe because of
a timing change. This appears to have always been possible.

The fix is to reduce the memory requirement of the test.
IMPALA-2256 is no longer really possible because the
BufferedTupleStream code was simplified to avoid the 32-bit
row index limitation, so we're not losing important coverage
on the current code with this change.

Testing:
Ran test in a loop to confirm it did not OOM.

Change-Id: I9d9480cad6bf8222abe990e7046498a0531e2849
Reviewed-on: http://gerrit.cloudera.org:8080/11223
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-16 00:21:54 +00:00
Michael Brown
cddb35be92 IMPALA-7445: separate skippable tables from test_resource_limits
Separate the Hbase and Kudu tables used in test_resource_limits to their
own files so they are skippable in environments where those services are
not used.

Change-Id: I02d9fd0b48817f755e1eee2ae4613e2089fa1973
Reviewed-on: http://gerrit.cloudera.org:8080/11221
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-15 04:15:46 +00:00
Tianyi Wang
3cb784310f IMPALA-7347: Ignore numFilesErasureCoded in TestShowCreateTable
This table properties only exist for HDFS tables. To get the test work
on local tables, it needs to be ignored.

Change-Id: Icc8494fb91c4777cee662a97f750486aa8e79a8e
Reviewed-on: http://gerrit.cloudera.org:8080/11192
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-13 21:36:16 +00:00
Tim Armstrong
b7d509d761 IMPALA-7231: group plan nodes into pipelines
This adds some informational output to explain plans and
sends the information to the backend.

The idea is that this will make it easier to explain how Impala's
pipelined execution works and also enable future work on profile
analysis that can more intelligently group plan nodes.

Tests:
* Updated planner tests to include new output.

Change-Id: I1d10eb14d997242f445e5c5fc5362d5410370721
Reviewed-on: http://gerrit.cloudera.org:8080/10848
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-10 19:31:30 +00:00
Tim Armstrong
3e17705eca IMPALA-6034: Add scanned bytes limits per query
This adds support for aggregate resource limits at runtime, specified
via query options. If a query exceeds a limit it is terminated. The
checks are periodic so the query may go somewhat over the limits.

SCAN_BYTES_LIMIT is exposed as an advanced query option.

CPU_LIMIT_S is hidden as a development query option because it is flawed
- the CPU user/sys time is only updated upon thread completion, so in
many cases the limit will not take effect until well after the resources
have been used. IMPALA-7318 tracks enabling this.

Query profile is updated to include query wide and per backend metrics
for CPU and scanned bytes. Example from "select count(*) from
tpch_parquet.lineitem":

    Per Node Peak Memory Usage: tarmstrong-box:22000(289.50 KB) tarmstrong-box:22001(249.50 KB) tarmstrong-box:22002(249.50 KB)
    Per Node Bytes Read: tarmstrong-box:22000(100.00 KB) tarmstrong-box:22001(100.00 KB) tarmstrong-box:22002(100.00 KB)
    Per Node User Time: tarmstrong-box:22000(40.000ms) tarmstrong-box:22001(32.000ms) tarmstrong-box:22002(24.000ms)
    Per Node System Time: tarmstrong-box:22000(0.000ns) tarmstrong-box:22001(0.000ns) tarmstrong-box:22002(0.000ns)
     - FiltersReceived: 0 (0)
     - FinalizationTimer: 0.000ns
     - NumBackends: 3 (3)
     - NumFragmentInstances: 4 (4)
     - NumFragments: 2 (2)
     - TotalBytesRead: 300.00 KB (307200)
     - TotalCpuTime: 96.000ms

Testing:
Added tests for various permutations for CPU_LIMIT_S and
SCAN_BYTES_LIMIT

Based on a previous patch by Mostafa Mokhtar
<mmokhtar@cloudera.com>

Change-Id: I3e85f80b70b3fce47e637e9322ed0316ee84f6a9
Reviewed-on: http://gerrit.cloudera.org:8080/11081
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-09 22:18:01 +00:00
Todd Lipcon
d958b4779c IMPALA-7411, IMPALA-7414. Fix failing tests on local filesystem
The new test added by IMPALA-7308 failed on local-filesystem builds
because the warehouse path is not directly at /test-warehouse. This
fix prefixes the path appropriately with $FILESYSTEM_PREFIX

Additionally, the fix for IMPALA-5542 made a similar mistake
constructing a path on the Python side of a test case. Fixed by using
the get_fs_path function.

Change-Id: I6922e24a74576d0d000e8e2645a235868583c1e1
Reviewed-on: http://gerrit.cloudera.org:8080/11164
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-09 19:18:08 +00:00
Tim Armstrong
a6c356850b IMPALA-7403: fix child batch mem mgmt in analytic
The core of the fix is in ProcessChildBatches(), where we copy
'prev_input_tuple_' to 'prev_input_tuple_pool_' and reset
the child batch *before* the call to child(0)->GetNext(). This
solves a couple of problems:
* prev_input_tuple_ may be referencing memory from the child
  that had the needs_deep_copy() flag set and therefore will
  be freed or recycled when calling child(0)->GetNext() again.
* 'prev_child_batch_' may have been holding onto resources that
  the child had flushed, which means they need to be freed before
  the next GetNext() call.

Also refactors the logic around child_cmp_row_ to make the variable
lifetime and data flow clearer.

Testing:
Add regression test. The test passes both with this patch alone and with
IMPALA-7333 reapplied.

Change-Id: I09eb6213d47287f2addb72f8c1304085d2d48c55
Reviewed-on: http://gerrit.cloudera.org:8080/11155
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-08 20:51:45 +00:00
Todd Lipcon
4aec50484a IMPALA-7308. Support Avro tables in LocalCatalog
This adds support for loading Avro-formatted tables in LocalCatalog. In
the case that the table properties indicate a table is Avro-formatted,
the semantics are identical to the existing catalog implementation:

- if an explicit avro schema is specified, it overrides the schema
  provided by the HMS
- if no explicit avro schema is specified, one is inferred, and then the
  inferred schema takes the place of the one provided by the HMS (thus
  promoting columns like TINYINT to INT)
- on COMPUTE STATS, if any discrepancy is discovered between the HMS
  schema and the inferred schema, an error is emitted.

The semantics for LocalCatalog are slightly different in the case of
tables which have not been configured as Avro format on the table level:

The existing implementation has the behavior that, when a table is
loaded, all partitions are inspected, and, if any partition is
discovered with Avro format, the above rules are applied. This has some
very unexpected results, described in an earlier email to
dev@impala.apache.org [1]. To summarize that email thread, the existing
behavior was decided to be unintuitive and inconsistent with Hive.
Additionally, this behavior requires loading all partitions up-front,
which gets in the goal of lazy/granular metadata loading in
LocalCatalog.

Thus, the LocalCatalog implementation differs as follows:

- the "schema override" behavior ONLY occurs if the Avro file format has
  been selected at a table level.

- if an Avro partition is added to a non-Avro table, and that partition
  has a schema that isn't compatible with the table's schema, an error
  will occur on read.

The thread additionally discusses adding an error message on "alter" to
prevent users from adding an Avro partition to a table with an
incompatible schema. To keep the scope of this patch minimal, that is
not yet implemented here. I filed IMPALA-7309 to change the behavior of
the existing catalog implementation to match.

A new test verifies the behavior, set to 'xfail' when running on the
existing catalog implementation.

[1] https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@%3Cdev.impala.apache.org%3E

Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Reviewed-on: http://gerrit.cloudera.org:8080/10970
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com>
2018-08-07 17:38:04 +00:00
Tim Armstrong
a8d7a50bd5 IMPALA-7354: planner test resource estimates for more workloads
Adds the resource estimates for key benchmark workloads:
TPC-H, TPC-DS, TPC-H Nested and TPC-H Kudu to the planner
test so that we can track changes in resource requirements
and estimates for these queries.

Also don't show decimal places for MB and KB estimates. The
estimates are not accurate to that level and displaying
extra precision has some disadvantages:
* It communicates to readers that the estimates have a high level of
  precision.
* It increases the odds of small variations in file sizes, etc
  causing test failures.

Also fixed a regex in the stress test that didn't escape the decimal
point correctly.

Testing:
Ran core tests.

Change-Id: I6a9f836699200ea87fb03bf36abad0e23949ac26
Reviewed-on: http://gerrit.cloudera.org:8080/11087
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-07 08:34:33 +00:00
Thomas Tauber-Marshall
70bf9ea296 IMPALA-7397: fix DCHECK in impala::AggregationNode::Close
As part of IMPALA-110, all of the logic of performing aggregations was
refactored out of the aggregation ExecNode and into Aggregators. Each
Aggregator manages its own memory, so a DCHECK was added in
AggregationNode::Close to ensure that no allocations were
made from the regular ExecNode mem pools.

This DCHECK is violated if the node was assigned conjuncts that
allocate mem in Prepare - even though the conjuncts are evaluated in
the Aggregator, we still initialize them in ExecNode::Prepare.

This patch solves the problem by creating a copy of the TPlanNode
without the conjuncts to pass to ExecNode. In the future, when
TAggregator is introduced, we can get rid of this by directly
assigning conjuncts to Aggregators.

Note that this doesn't affect StreamingAggregationNode, which never
has conjuncts assigned to it, but this patch fixes an incorrect DCHECK
that enforces this.

Testing:
- Added a regression test for this case.

Change-Id: I65ae00ac23a62ab9f4a7ff06ac9ad5cece80c39d
Reviewed-on: http://gerrit.cloudera.org:8080/11132
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-06 23:45:08 +00:00
Csaba Ringhofer
dc32bf7703 IMPALA-7362: Add query option to set timezone
This change adds a new query option "timezone" which
defines the timezone used for utc<->local conversions.
The main goal is to simplify testing, but I think that
some users may also find it useful so it is added as a
"general" query option.

Examples:
set timezone=UTC;
set timezone="Europe/Budapest"

The timezones are validated, but as query options are not
sent to the coordinator immediately, the error checking
will only happen when running a query.

Leading/trailing " and 'characters are stripped because the
/ character cannot be entered unquoted in some contexts.

Currently the timezone has effect in the following cases:
-function now()
-conversions between unix time and timestamp if flag
 use_local_tz_for_unix_timestamp_conversions is true
-reading parquet timestamps written by Hive if flag
 convert_legacy_hive_parquet_utc_timestamps is true

In the near future Parquet timestamps's isAdjustedToUTC
property will be supported, which will decide whether
to do utc->local conversion on a per file+column basis.
This conversion will be also affected.

Testing:
- Extended test_local_tz_conversion.py to actually
  test utc<->local conversion. Until now the effect
  of flag use_local_tz_for_unix_timestamp_conversions
  was practically untested.
- Added a shell test to check that the default of the
  query option is the system's timezone.
- Added a shell test to check timezone validation.

Change-Id: I73de86eff096e1c581d3b56a0d9330d686f77272
Reviewed-on: http://gerrit.cloudera.org:8080/11064
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-03 17:45:25 +00:00
Tianyi Wang
fb3d47d356 IMPALA-7347: Update tests to accomodate HIVE-18118
HIVE-18118 adds 'numFilesErasureCoded' to table properties. This patch
addes it to test_show_create_table to work with the latest Hive.

Change-Id: I6aae402dd38374de90b35c32166a9507e6eb29f9
Reviewed-on: http://gerrit.cloudera.org:8080/11108
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-02 20:25:31 +00:00
Zoltan Borok-Nagy
7917eac0ad IMPALA-5542: Impala cannot scan Parquet decimal stored as int64_t/int32_t
The Decimal type in Parquet is a logical type. That means
the Parquet file stores some physical/primitive type that
is annotated by the DECIMAL tag to make it behave like
decimals.

The allowed physical types for decimals are INT32, INT64,
FIXED, and BINARY. Before this commit Impala could only
read decimals stored as FIXED or BINARY.

Spark decided to write decimals as INT32 or INT64 when
their precision allows it:
(1 <= precision <= 9) ==> INT32
(10 <= precision <= 18) ==> INT64

I updated our column readers to accept INT32 and INT64
as valid physical types for decimals.

Testing:
* extended parquet-plain-test.cc
* added Parquet files generated by Spark 2.3.1
  and updated test_scanners.py

Change-Id: Ib8c41bfc7c1664bdba5099d3893dc8dbe4304794
Reviewed-on: http://gerrit.cloudera.org:8080/11000
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-02 20:21:12 +00:00
Fredy Wijaya
a203733fac IMPALA-7295: Remove IMPALA_MINICLUSTER_PROFILE=2
This patch removes the use of IMPALA_MINICLUSTER_PROFILE. The code that
uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it defaults to code from
IMPALA_MINICLUSTER_PROFILE=3. In order to reduce having too many code
changes in this patch, there is no code change for the shims. The shims
for IMPALA_MINICLUSTER_PROFILE=3 automatically become the default
implementation.

Testing:
- Ran core and exhaustive tests

Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08
Reviewed-on: http://gerrit.cloudera.org:8080/10940
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-07-14 01:03:18 +00:00
Michael Brown
3da2dc63fe IMPALA-6810: runtime_row_filters.test: omit pool name in pattern
Some downstream tests run this with a fair-scheduler.xml set that, while
not changing admission control behavior, does change the name of the
pool. Omit the pool name to permit that downstream test to succeed.

Testing:
- local with change in minicluster
- downstream in environment as well

Change-Id: I3fe6beb169dc6bfefabde9dc7a4632c1a5e63fa7
Reviewed-on: http://gerrit.cloudera.org:8080/10942
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-07-13 20:45:37 +00:00
Thomas Tauber-Marshall
e2aafae204 IMPALA-7251: Fix QueryMaintenance calls in Aggregators
A recent change, IMPALA-110 (part 2), refactored
PartitionedAggregationNode into several classes, including a new type
'Aggregator'. During this refactor, code that makes local allocations
while evaluating exprs was moved from the ExecNode (now
AggregationNode/StreamingAggregationNode) into the Aggregators, but
code related to cleaning these allocations up (ie QueryMaintenance())
was not, resulting in some queries using an excessive amount of
memory.

This patch removes all calls to QueryMaintenance() from the exec nodes
and moves them into the Aggregators.

Testing:
- Added new test cases with a mem limit that fails if the expr
  allocations aren't released in a timely manner.
- Passed a full exhaustive run.

Change-Id: I4dac2bb0a15cdd7315ee15608bae409c125c82f5
Reviewed-on: http://gerrit.cloudera.org:8080/10871
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-07-13 03:04:35 +00:00
poojanilangekar
c6f9b61ec2 IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
This change ensures that the planner computes parquet conjuncts
only when for scans containing parquet files. Additionally, it
also handles PARQUET_DICTIONARY_FILTERING and
PARQUET_READ_STATISTICS query options in the planner.

Testing was carried out independently on parquet and non-parquet
scans:
  1. Parquet scans were tested via the existing parquet-filtering
     planner test. Additionally, a new test
     [parquet-filtering-disabled] was added to ensure that the
     explain plan generated skips parquet predicates based on the
     query options.
  2. Non-parquet scans were tested manually to ensure that the
     functions to compute parquet conjucts were not invoked.
     Additional test cases were added to the parquet-filtering
     planner test to scan non parquet tables and ensure that the
     plans do not contain conjuncts based on parquet statistics.
  3. A parquet partition was added to the alltypesmixedformat
     table in the functional database. Planner tests were added
     to ensure that Parquet conjuncts are constructed only when
     the Parquet partition is included in the query.

Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Reviewed-on: http://gerrit.cloudera.org:8080/10704
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-07-06 02:06:50 +00:00
Tianyi Wang
61e6a47776 IMPALA-7236: Fix the parsing of ALLOW_ERASURE_CODED_FILES
This patch adds a missing "break" statement in a switch statement
changed by IMPALA-7102.
Also fixes an non-deterministic test case.

Change-Id: Ife1e791541e3f4fed6bec00945390c7d7681e824
Reviewed-on: http://gerrit.cloudera.org:8080/10857
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-07-03 23:49:44 +00:00
Bikramjeet Vig
30e82c63ec IMPALA-7190: Remove unsupported format writer support
This patch removes write support for unsupported formats like Sequence,
Avro and compressed text. Also, the related query options
ALLOW_UNSUPPORTED_FORMATS and SEQ_COMPRESSION_MODE have been migrated
to the REMOVED query options type.

Testing:
Ran exhaustive build.

Change-Id: I821dc7495a901f1658daa500daf3791b386c7185
Reviewed-on: http://gerrit.cloudera.org:8080/10823
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-07-03 20:34:27 +00:00
Taras Bobrovytsky
8060f4d50e IMPALA-7102 (Part 1): Disable reading of erasure coding by default
In this patch we add a query option ALLOW_ERASURE_CODED_FILES, that
allows us to enable or disable the support of erasure coded files. Even
though Impala should be able to handle HDFS erasure coded files already,
this feature hasn't been tested thoroughly yet. Also, Impala lacks
metrics, observability and DDL commands related to erasure coding. This
is a query option instead of a startup flag because we want to make it
possible for advanced users to enable the feature.

We may also need a follow on patch to also disable the write path with
this flag.

Cherry-picks: not for 2.x

Change-Id: Icd3b1754541262467a6e67068b0b447882a40fb3
Reviewed-on: http://gerrit.cloudera.org:8080/10646
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-06-29 23:26:35 +00:00
poojanilangekar
e988c36bf0 IMPALA-6305: Allow column definitions in ALTER VIEW
This change adds support to change column definitions in ALTER VIEW
statements. This support only required minor changes in the parser
and the AlterViewStmt constructor.

Here's an example syntax:
    alter view foo (a, b comment 'helloworld') as
    select * from bar;

    describe foo;
    +------+--------+------------+
    | name | type   | comment    |
    +------+--------+------------+
    | a    | string |            |
    | b    | string | helloworld |
    +------+--------+------------+

The following tests were modified:
1. ParserTest - To check that the parser handles column definitions
   for alter view statements.
2. AnalyzerDDLTest - To ensure that the analyzer supports the
   change column definitions parsed.
3. TestDdlStatements - To verify the end-to-end functioning of
   ALTER VIEW statements with change column definitions.
4. AuthorizationTest - To ensure that alter table commands with
   column definitions check permissions as expected.

Change-Id: I6073444a814a24d97e80df15fcd39be2812f63fc
Reviewed-on: http://gerrit.cloudera.org:8080/10720
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-06-27 03:41:47 +00:00
Attila Jeges
17749dbcfc IMPALA-3307: Add support for IANA time-zone db
Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.

- Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to
  specify an HDFS/S3/ADLS path to a zip archive that contains the
  shared compiled IANA time-zone database. If the startup flag is set,
  impalad will use the specified time-zone database. Otherwise,
  impalad will use the default /usr/share/zoneinfo time-zone database.

- Introduces a new startup flag (--hdfs_zone_alias_conf) to impalad to
  specify an HDFS/S3/ADLS path to a shared config file that contains
  definitions for non-standard time-zone aliases.

- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.

- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.

- Adds a new ZipUtil class to extract files from a zip archive. The
  implementation is not vulnerable to Zip Slip.

Cherry-picks: not for 2.x.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Reviewed-on: http://gerrit.cloudera.org:8080/9986
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Attila Jeges <attilaj@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-06-22 13:18:58 +00:00
Tim Armstrong
894ab8e980 IMPALA-7115: set a default THREAD_RESERVATION_LIMIT value
The value is chosen to allow only queries that have a reasonable chance
of succeeding, albeit with poor performance because of the high
number of threads.

Testing:
Added a test to make sure that the default value rejects a large query.

Change-Id: I31d3fa3f6305c360922649dba53a9026c9563384
Reviewed-on: http://gerrit.cloudera.org:8080/10628
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-06-19 03:02:49 +00:00
Michael Ho
51ff47d05e IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send()
This change codegens the hash partitioning logic of
KrpcDataStreamSender::Send() when the partitioning strategy
is HASH_PARTITIONED. It does so by unrolling the loop which
evaluates each row against the partitioning expressions and
hashes the result. It also replaces the number of channels
of that sender with a constant at runtime.

With this change, we get reasonable speedup with some benchmarks:

+------------+-----------------------+---------+------------+------------+----------------+
| Workload   | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+------------+-----------------------+---------+------------+------------+----------------+
| TPCH(_300) | parquet / none / none | 20.03   | -6.44%     | 13.56      | -7.15%         |
+------------+-----------------------+---------+------------+------------+----------------+

+---------------------+-----------------------+---------+------------+------------+----------------+
| Workload            | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+---------------------+-----------------------+---------+------------+------------+----------------+
| TARGETED-PERF(_300) | parquet / none / none | 58.59   | -5.56%     | 12.28      | -5.30%         |
+---------------------+-----------------------+---------+------------+------------+----------------+

+-------------------------+-----------------------+---------+------------+------------+----------------+
| Workload                | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+-------------------------+-----------------------+---------+------------+------------+----------------+
| TPCDS-UNMODIFIED(_1000) | parquet / none / none | 15.60   | -3.10%     | 7.16       | -4.33%         |
+-------------------------+-----------------------+---------+------------+------------+----------------+

+-------------------+-----------------------+---------+------------+------------+----------------+
| Workload          | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+-------------------+-----------------------+---------+------------+------------+----------------+
| TPCH_NESTED(_300) | parquet / none / none | 30.93   | -3.02%     | 17.46      | -4.71%         |
+-------------------+-----------------------+---------+------------+------------+----------------+

Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Reviewed-on: http://gerrit.cloudera.org:8080/10421
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-06-14 23:37:00 +00:00
Tim Armstrong
d8ed07f112 IMPALA-6035: Add query options to limit thread reservation
Adds two options: THREAD_RESERVATION_LIMIT and
THREAD_RESERVATION_AGGREGATE_LIMIT, which are both enforced by admission
control based on planner resource requirements and the schedule. The
mechanism used is the same as the minimum reservation checks.

THREAD_RESERVATION_LIMIT limits the total number of reserved threads in
fragments scheduled on a single backend.
THREAD_RESERVATION_AGGREGATE_LIMIT limits the sum of reserved threads
across all fragments.

This also slightly improves the minimum reservation error message to
include the host name.

Testing:
Added end-to-end tests that exercise the code paths.

Ran core tests.

Change-Id: I5b5bbbdad5cd6b24442eb6c99a4d38c2ad710007
Reviewed-on: http://gerrit.cloudera.org:8080/10365
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-06-14 03:25:55 +00:00
Thomas Tauber-Marshall
bf2124bf30 IMPALA-6929: Support multi-column range partitions for Kudu
Kudu allows specifying range partitions over multiple columns. Impala
already has support for doing this when the partitions are specified
with '=', but if the partitions are specified with '<' or '<=', the
parser would return an error.

This patch modifies the parser to allow for creating Kudu tables like:
create table kudu_test (a int, b int, primary key(a, b))
  partition by range(a, b) (partition (0, 0) <= values < (1, 1));
and similary to alter partitions like:
alter table kudu_test add range partition (1, 1) <= values < (2, 2);

Testing:
- Modified functional_kudu.jointbl's schema so that we have a table
  in functional with a multi-column range partition to test things
  against.
- Added FE and E2E tests for CREATE and ALTER.

Change-Id: I0141dd3344a4f22b186f513b7406f286668ef1e7
Reviewed-on: http://gerrit.cloudera.org:8080/10441
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-06-13 00:10:13 +00:00
Zoltan Borok-Nagy
e6ca7ca14d IMPALA-7108: IllegalStateException hit during CardinalityCheckNode.<init>
Since IMPALA-6314 on runtime scalar subqueries we set LIMIT 2
in StmtRewriter.mergeExpr(). We do that because later we add a
CardinalityCheckNode on top of such subqueries and with
LIMIT 2 we can still check if they return more than one row.

In the constructor of CardinalityCheckNode there is a
precondition that checks if the child node has LIMIT 2 to
be certain that we've set the limit for all the necessary
cases.

However, some subqueries will get a LIMIT 1 later breaking the
precondition in CardinalityCheckNode. An example to these
subqueries is a select stmt that selects from an inline view
that returns a single row:

select * from functional.alltypes
where int_col = (select f.id from (
                 select * from functional.alltypes limit 1) f);

Note that we shouldn't add a CardinalityCheckNode to the plan
of this query in the first place. To generate a proper plan I
updated SelectStmt.returnsSingleRow() because this method didn't
handle this case well.

I also changed
the precondition from
Preconditions.checkState(child.getLimit() == 2);
to
Preconditions.checkState(child.getLimit() <= 2);
in order to be more permissive.

I added tests for the aforementioned query.

Change-Id: I82a7a3fe26db3e12131c030c4ad055a9c4955407
Reviewed-on: http://gerrit.cloudera.org:8080/10605
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-06-08 20:15:50 +00:00
Lars Volker
c9e8f2f7e7 IMPALA-7008: Rewrite query to make it not return 100M rows
One query in spilling.test is expected to fail. When it does not fail,
it returned 100M rows, which would then cause the Python test code to
consume memory until it gets OOM-killed by the kernel.

To fix this, we rewrite the query. I tested this locally to make sure
that the query still fails as expected on HDFS.

Change-Id: I31956d3092a7e69b979f631df3a6dfda14ebe140
Reviewed-on: http://gerrit.cloudera.org:8080/10597
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-06-05 01:14:35 +00:00
Thomas Tauber-Marshall
ba7893cb9e IMPALA-6338: Disable more flaky bloom filter tests
Until IMPALA-6338 is fixed, temporarily disable tests that are
affected by it - any test that has a 'limit' and relies on the
contents of the runtime profile. This patch disables the runtime
profile check for all such tests in bloom_filter.test

Change-Id: Ifc9da892efa3b27d63056ad8e3befac82808ffdb
Reviewed-on: http://gerrit.cloudera.org:8080/10530
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-30 08:00:50 +00:00
Tim Armstrong
f4f28d310c IMPALA-6941: load more text scanner compression plugins
Add extensions for LZ4 and ZSTD (which are supported by Hadoop).
Even without a plugin this results in better behaviour because
we don't try to treat the files with unknown extensions as
uncompressed text.

Also allow loading tables containing files with unsupported
compression types. There was weird behaviour before we knew
of the file extension but didn't support querying the table -
the catalog would load the table but the impalad would fail
processing the catalog update. The simplest way to fix it
is to just allow loading the tables.

Similarly, make the "LOAD DATA" operation more permissive -
we can copy files into a directory even if we can't
decompress them.

Switch to always checking plugin version - running mismatched plugin
is inherently unsafe.

Testing:
Positive case where LZO is loaded is exercised. Added
coverage for negative case where LZO is disabled.

Fixed test gaps:
* Querying LZO table with LZO plugin not available.
* Interacting with tables with known but unsupported text
  compressions.
* Querying files with unknown compression suffixes (which are
  treated as uncompressed text).

Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6
Reviewed-on: http://gerrit.cloudera.org:8080/10165
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-18 03:44:46 +00:00
Zoltan Borok-Nagy
ccf19f9f8f IMPALA-5842: Write page index in Parquet files
This commit builds on the previous work of
Pooja Nilangekar: https://gerrit.cloudera.org/#/c/7464/

The commit implements the write path of PARQUET-922:
"Add column indexes to parquet.thrift". As specified in the
parquet-format, Impala writes the page indexes just before
the footer. This allows much more efficient page filtering
than using the same information from the 'statistics' field
of DataPageHeader.

I updated Pooja's python tests as well.

Change-Id: Icbacf7fe3b7672e3ce719261ecef445b16f8dec9
Reviewed-on: http://gerrit.cloudera.org:8080/9693
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-17 20:22:02 +00:00
Lars Volker
a64cfc523e IMPALA-7032: Disable codegen for CHAR type null literals
Analogous to IMPALA-6435, we have to disable codegen for CHAR type null
literals. Otherwise we will crash in
impala::NullLiteral::GetCodegendComputeFn().

This change adds a test to make sure that the crash is fixed.

Change-Id: I34033362263cf1292418f69c5ca1a3b84aed39a9
Reviewed-on: http://gerrit.cloudera.org:8080/10409
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
2018-05-16 00:00:15 +00:00
Zoltan Borok-Nagy
fab65d4479 IMPALA-7022: TestQueries.test_subquery: Subquery must not return more than one row
TestQueries.test_subquery sometimes fails during exhaustive
tests.

In the tests we expect to catch an exception that is
prefixed by the "Query aborted:" string. The prefix is
usually added by impala_beeswax.py::wait_for_completion(),
but in rare cases it isn't added.

From the point of the test it is irrelevant if the exception
is prefixed with "Query aborted:" or not, so I removed it
from the expected exception string.

Change-Id: I3b8655ad273b1dd7a601099f617db609e4a4797b
Reviewed-on: http://gerrit.cloudera.org:8080/10407
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2018-05-15 23:37:06 +00:00
Tim Armstrong
3661100fa3 IMPALA-6645: Enable disk spill encryption by default
Perf:
Targeted benchmarks with a heavily spilling query on a machine
with PCLMULQDQ support show < 5% of CPU time spent in encryption and
decryption. PCLMULQDQ was introduced in AMD Bulldozer (c. 2011)
and Intel Westmere (c. 2010).

Testing:
Ran core tests with the change.

Updated the custom cluster test to exercise the non-default
configuration.

Change-Id: Iee4be2a95d689f66c3663d99e4df0fb3968893a9
Reviewed-on: http://gerrit.cloudera.org:8080/10345
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2018-05-15 22:23:14 +00:00
Tim Armstrong
e12ee485cf IMPALA-6957: calc thread resource requirement in planner
This only factors in fragment execution threads. E.g. this does *not*
try to account for the number of threads on the old Thrift RPC
code path if that is enabled.

This is loosely related to the old VCores estimate, but is different in
that it:
* Directly ties into the notion of required threads in
  ThreadResourceMgr.
* Is a strict upper bound on the number of such threads, rather than
  an estimate.

Does not include "optional" threads. ThreadResourceMgr in the backend
bounds the number of "optional" threads per impalad, so the number of
execution threads on a backend is limited by

  sum(required threads per query) +
      CpuInfo::num_cores() * FLAGS_num_threads_per_core

DCHECKS in the backend enforce that the calculation is correct. They
were actually hit in KuduScanNode because of some races in thread
management leading to multiple "required" threads running. Now the
first thread in the multithreaded scans never exits, which means
that it's always safe for any of the other threads to exit early,
which simplifies the logic a lot.

Testing:
Updated planner tests.

Ran core tests.

Change-Id: I982837ef883457fa4d2adc3bdbdc727353469140
Reviewed-on: http://gerrit.cloudera.org:8080/10256
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-12 01:43:37 +00:00
Tim Armstrong
25c13bfdd6 IMPALA-7010: don't run memory usage tests on non-HDFS
Moved a number of tests with tuned mem_limits. In some cases
this required separating the tests from non-tuned functional
tests.

TestQueryMemLimit used very high and very low limits only, so seemed
safe to run in all configurations.

Change-Id: I9686195a29dde2d87b19ef8bb0e93e08f8bee662
Reviewed-on: http://gerrit.cloudera.org:8080/10370
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-11 22:41:49 +00:00
Sailesh Mukil
f13abdca67 IMPALA-6975: TestRuntimeRowFilters.test_row_filters failing with Memory limit exceeded
This test has started failing relatively frequently. We think that
this may be due to timing differences of when RPCs arrive from the
recent changes with KRPC.

Increasing the memory limit should allow this test to pass
consistently.

Change-Id: Ie39482e2a0aee402ce156b11cce51038cff5e61a
Reviewed-on: http://gerrit.cloudera.org:8080/10315
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-05 03:01:35 +00:00
Thomas Tauber-Marshall
ba84ad03cb IMPALA-6954: Fix problems with CTAS into Kudu with an expr rewrite
This patch fixes two problems:
- Previously a CTAS into a Kudu table where an expr rewrite occurred
  would create an unpartitioned table, due to the partition info being
  reset in TableDataLayout and then never reconstructed. Since the
  Kudu partition info is set by the parser and never changes, the
  solution is to not reset it.
- Previously a CTAS into a Kudu table with a range partition where an
  expr rewrite occurred would fail with an analysis exception due to
  a Precondition check in RangePartition.analyze that checked that
  the RangePartition wasn't already analyzed, as the analysis can't
  be done twice. Since the state in RangePartition never changes, it
  doesn't need to be reanalyzed and we can just return instead of
  failing on the check.

Testing:
- Added an e2e test that creates a partitioned Kudu table with a CTAS
  with a rewrite, and checks that the expected partitions are created.

Change-Id: I731743bd84cc695119e99342e1b155096147f0ed
Reviewed-on: http://gerrit.cloudera.org:8080/10251
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-02 02:54:23 +00:00
Tim Armstrong
418c705787 IMPALA-6679,IMPALA-6678: reduce scan reservation
This has two related changes.

IMPALA-6679: defer scanner reservation increases
------------------------------------------------
When starting each scan range, check to see how big the initial scan
range is (the full thing for row-based formats, the footer for
Parquet) and determine whether more reservation would be useful.

For Parquet, base the ideal reservation on the actual column layout
of each file. This avoids reserving memory that we won't use for
the actual files that we're scanning. This also avoid the need to
estimate ideal reservation in the planner.

We also release scanner thread reservations above the minimum as
soon as threads complete, so that resources can be released slightly
earlier.

IMPALA-6678: estimate Parquet column size for reservation
---------------------------------------------------------
This change also reduces reservation computed by the planner in certain
cases by estimating the on-disk size of column data based on stats. It
also reduces the default per-column reservation to 4MB since it appears
that < 8MB columns are generally common in practice and the method for
estimating column size is biased towards over-estimating. There are two
main cases to consider for the performance implications:
* Memory is available to improve query perf - if we underestimate, we
  can increase the reservation so we can do "efficient" 8MB I/Os for
  large columns.
* The ideal reservation is not available - query performance is affected
  because we can't overlap I/O and compute as much and may do smaller
  (probably 4MB I/Os). However, we should avoid pathological behaviour
  like tiny I/Os.

When stats are not available, we just default to reserving 4MB per
column, which typically is more memory than required. When stats are
available, the memory required can be reduced below when some heuristic
tell us with high confidence that the column data for most or all files
is smaller than 4MB.

The stats-based heuristic could reduce scan performance if both the
conservative heuristics significantly underestimate the column size
and memory is constrained such that we can't increase the scan
reservation at runtime (in which case the memory might be used by
a different operator or scanner thread).

Observability:
Added counters to track when threads were not spawned due to reservation
and to track when reservation increases are requested and denied. These
allow determining if performance may have been affected by memory
availability.

Testing:
Updated test_mem_usage_scaling.py memory requirements and added steps
to regenerate the requirements. Loops test for a while to flush out
flakiness.

Added targeted planner and query tests for reservation calculations and
increases.

Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842
Reviewed-on: http://gerrit.cloudera.org:8080/9757
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-04-28 23:41:39 +00:00
Tim Armstrong
93d714c645 IMPALA-6560: fix regression test for IMPALA-2376
The test is modified to increase the size of collections allocated.
num_nodes and mt_dop query options are set to make execution as
deterministic as possible.

I looped the test overnight to try to flush out flakiness.

Adds support for row_regex lines in CATCH sections so that we can
match a larger part of the error message.

Change-Id: I024cb6b57647902b1735defb885cd095fd99738c
Reviewed-on: http://gerrit.cloudera.org:8080/9681
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2018-04-28 23:41:39 +00:00
Tim Armstrong
d7bba82192 IMPALA-6587: free buffers before ScanRange::Cancel() returns
ScanRange::Cancel() now waits until an in-flight read finishes so
that the disk I/O buffer being processed by the disk thread is
freed when Cancel() returns.

The fix is to set a 'read_in_flight_' flag on the scan range
while the disk thread is doing the read. Cancel() blocks until
read_in_flight_ == false.

The code is refactored to move more logic into ScanRange and
to avoid holding RequestContext::lock_ for longer than necessary.

Testing:
Added query test that reproduces the issue.

Added a unit test and a stress option that reproduces the problem in a
targeted way.

Ran disk-io-mgr-stress test for a few hours. Ran it under TSAN and
inspected output to make sure there were no non-benign data races.

Change-Id: I87182b6bd51b5fb0b923e7e4c8d08a44e7617db2
Reviewed-on: http://gerrit.cloudera.org:8080/9680
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2018-04-28 23:41:39 +00:00
Tim Armstrong
fb5dc9eb48 IMPALA-4835: switch I/O buffers to buffer pool
This is the following squashed patches that were reverted.

I will fix the known issues with some follow-on patches.

======================================================================
IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation

In preparation for switching the I/O mgr to the buffer pool, this
removes and cleans up a lot of code so that the switchover patch starts
from a cleaner slate.

* Remove the free buffer cache (which will be replaced by buffer pool's
  own caching).
* Make memory limit exceeded error checking synchronous (in anticipation
  of having to propagate buffer pool errors synchronously).
* Simplify error propagation - remove the (ineffectual) code that
  enqueued BufferDescriptors containing error statuses.
* Document locking scheme better in a few places, make it part of the
  function signature when it seemed reasonable.
* Move ReturnBuffer() to ScanRange, because it is intrinsically
  connected with the lifecycle of a scan range.
* Separate external ReturnBuffer() and internal CleanUpBuffer()
  interfaces - previously callers of ReturnBuffer() were fudging
  the num_buffers_in_reader accounting to make the external interface work.
* Eliminate redundant state in ScanRange: 'eosr_returned_' and
  'is_cancelled_'.
* Clarify the logic around calling Close() for the last
  BufferDescriptor.
  -> There appeared to be an implicit assumption that buffers would be
     freed in the order they were returned from the scan range, so that
     the "eos" buffer was returned last. Instead just count the number
     of outstanding buffers to detect the last one.
  -> Touching the is_cancelled_ field without holding a lock was hard to
     reason about - violated locking rules and it was unclear that it
     was race-free.
* Remove DiskIoMgr::Read() to simplify the interface. It is trivial to
  inline at the callsites.

This will probably regress performance somewhat because of the cache
removal, so my plan is to merge it around the same time as switching
the I/O mgr to allocate from the buffer pool. I'm keeping the patches
separate to make reviewing easier.

Testing:
* Ran exhaustive tests
* Ran the disk-io-mgr-stress-test overnight

======================================================================
IMPALA-4835: Part 2: Allocate scan range buffers upfront

This change is a step towards reserving memory for buffers from the
buffer pool and constraining per-scanner memory requirements. This
change restructures the DiskIoMgr code so that each ScanRange operates
with a fixed set of buffers that are allocated upfront and recycled as
the I/O mgr works through the ScanRange.

One major change is that ScanRanges get blocked when a buffer is not
available and get unblocked when a client returns a buffer via
ReturnBuffer(). I was able to remove the logic to maintain the
blocked_ranges_ list by instead adding a separate set with all ranges
that are active.

There is also some miscellaneous cleanup included - e.g. reducing the
amount of code devoted to maintaining counters and metrics.

One tricky part of the existing code was the it called
IssueInitialRanges() with empty lists of files and depended on
DiskIoMgr::AddScanRanges() to not check for cancellation in that case.
See IMPALA-6564/IMPALA-6588. I changed the logic to not try to issue
ranges for empty lists of files.

I plan to merge this along with the actual buffer pool switch, but
separated it out to allow review of the DiskIoMgr changes separate from
other aspects of the buffer pool switchover.

Testing:
* Ran core and exhaustive tests.

======================================================================
IMPALA-4835: Part 3: switch I/O buffers to buffer pool

This is the final patch to switch the Disk I/O manager to allocate all
buffer from the buffer pool and to reserve the buffers required for
a query upfront.

* The planner reserves enough memory to run a single scanner per
  scan node.
* The multi-threaded scan node must increase reservation before
  spinning up more threads.
* The scanner implementations must be careful to stay within their
  assigned reservation.

The row-oriented scanners were most straightforward, since they only
have a single scan range active at a time. A single I/O buffer is
sufficient to scan the whole file but more I/O buffers can improve I/O
throughput.

Parquet is more complex because it issues a scan range per column and
the sizes of the columns on disk are not known during planning. To
deal with this, the reservation in the frontend is based on a
heuristic involving the file size and # columns. The Parquet scanner
can then divvy up reservation to columns based on the size of column
data on disk.

I adjusted how the 'mem_limit' is divided between buffer pool and non
buffer pool memory for low mem_limits to account for the increase in
buffer pool memory.

Testing:
* Added more planner tests to cover reservation calcs for scan node.
* Test scanners for all file formats with the reservation denial debug
  action, to test behaviour when the scanners hit reservation limits.
* Updated memory and buffer pool limits for tests.
* Added unit tests for dividing reservation between columns in parquet,
  since the algorithm is non-trivial.

Perf:
I ran TPC-H and targeted perf locally comparing with master. Both
showed small improvements of a few percent and no regressions of
note. Cluster perf tests showed no significant change.

Change-Id: I3ef471dc0746f0ab93b572c34024fc7343161f00
Reviewed-on: http://gerrit.cloudera.org:8080/9679
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2018-04-28 23:41:39 +00:00
Taras Bobrovytsky
d0f838b66a IMPALA-6340,IMPALA-6518: Check that decimal types are compatible in FE
In this patch we implement strict decimal type checking in the FE in
various situations when DECIMAL_V2 is enabled. What is affected:
- Union. If we union two decimals and it is not possible to come up
  with a decimal that will be able to contain all the digits, an error
  is thrown. For example, the union(decimal(20, 10), decimal(20, 20))
  returns decimal(30, 20). However, for union(decimal(38, 0),
  decimal(38, 38)) the ideal return type would be decimal(76,38), but
  this is too large, so an error is thrown.
- Insert. If we are inserting a decimal value into a column where we are
  not guaranteed that all digits will fit, an error is thrown. For
  example, inserting a decimal(38,0) value into a decimal(38,38) column.
- Functions such as coalesce(). If we are unable to determine the output
  type that guarantees that all digits will fit from all the arguments,
  an error is thrown. For example,
  coalesce(decimal(38,38), decimal(38,0)) will throw an error.
- Hash Join. When joining on two decimals, if a type cannot be
  determined that both columns can be cast to, we throw an error.
  For example, join on decimal(38,0) and decimal(38,38) will result
  in an error.

To avoid these errors, you need to use CAST() on some of the decimals.

In this patch we also change the output decimal calculation of decimal
round, truncate and related functions. If these functions are a no-op,
the resulting decimal type is the same as the input type.

Testing:
- Ran a core build which passed.

Change-Id: Id406f4189e01a909152985fabd5cca7a1527a568
Reviewed-on: http://gerrit.cloudera.org:8080/9930
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-04-28 03:33:02 +00:00