Commit Graph

1297 Commits

Author SHA1 Message Date
Attila Jeges
2cd7a2b77a IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589
With HIVE-22589 Hive3 switched back to using Julian Calendar for
historical dates by default which caused an Impala test failure
around Avro DATE values.

Change-Id: I51dd933867ea7877235e7f6e1f2b56711dca107e
Reviewed-on: http://gerrit.cloudera.org:8080/15564
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-03-27 17:16:00 +00:00
Tamas Mate
7dd13f7278 IMPALA-5308: Resolve confusing Kudu SHOW TABLE STATS output
This change modifies the output of the SHOW TABLE STATS and SHOW
PARTITIONS for Kudu tables.
 - PARTITIONS: the #Row column has been removed
 - TABLE STATS: instead of showing partition informations it returns a
 resultset similar to HDFS table stats, #Rows, #Partitions, Size, Format
 and Location

Example outputs can be seen in the doc changes.

Testing:
* kudu_stats.test is modified to verify the new result set
* kudu_partition_ddl.test is modified to verify the new partitions style
* Updated unit test with the new error message

Change-Id: Ice4b8df65f0a53fe14b8fbe35d82c9887ab9a041
Reviewed-on: http://gerrit.cloudera.org:8080/15199
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-03-18 18:05:34 +00:00
Tim Armstrong
08acccf9eb IMPALA-9156: share broadcast join builds
The scheduler will only create one join build finstance per
backend in cases where this is supported.

The builder is aware of the number of finstances executing the
probe and hands off the build data structures to the builders.

Nested loop join requires minimal modifications because the
build data structures are read-only after initial construction.
The only significant change is that memory can't be transferred
to the multiple consumers, so MarkNeedsDeepCopy() needs to be
used instead.

Hash join requires additional synchronisation because the
spilling algorithm mutates build-side data structures. This
patch adds synchronisation so that rebuilding spilled
partitions is done in a thread-safe manner, using a single
thread. This uses the CyclicBarrier added in an earlier patch.

Threads blocked on CyclicBarrier need to be cancellable,
which is handled by cancelling the barrier when cancelling
fragments on the backend.

BufferPool now correctly handles multiple threads calling
CleanPages() concurrently, which makes other methods thread-safe.

Update planner to cost broadcast join and estimate memory
consumption based on a single instance per node.

Planner estimates of number of instances are improved. Instead of
assuming mt_dop instances per node, use the total number of input
splits (also called scan ranges in places) as an upper bound on
the number of instances generated by scans. These instance
estimates from the scan nodes are then propagated up the
plan tree in the same way as the numNodes estimates. The instance
estimate for the join build fragment is fixed to be based on
the destination fragment.

The profile now correctly accounts for time waiting for the
builder, counting it in inactive time and showing it in the
node timeline. Additional improvements/cleanup to the time
accounting are deferring until IMPALA-9422.

Testing:
* Updated planner tests
* Ran a single node stress test with TPC-H and TPC-DS
* Add a targeted test for spilling broadcast joins, both repartitioning
  and not repartitioning.
* Add a targeted test for a spilling broadcast join with empty probe
* Add a targeted test for spilling broadcast join with empty build
  partitions.
* Add a broadcast join to test_cancellation and test_failpoints.

Perf:

I did a single node run on my desktop:
+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 6.26    | -15.70%    | 4.63       | -16.16%        |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval    |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+
| TPCH(30) | TPCH-Q21 | parquet / none / none | 24.97  | 23.25       | R +7.38%   |   0.51%   |   0.22%        | 5     | R +6.95%       | 2.31    | 27.93   |
| TPCH(30) | TPCH-Q4  | parquet / none / none | 2.83   | 2.79        |   +1.31%   |   1.86%   |   0.36%        | 5     |   +1.88%       | 1.15    | 1.53    |
| TPCH(30) | TPCH-Q6  | parquet / none / none | 1.28   | 1.28        |   -0.01%   |   1.64%   |   1.63%        | 5     |   -0.11%       | -0.58   | -0.01   |
| TPCH(30) | TPCH-Q22 | parquet / none / none | 2.65   | 2.68        |   -0.94%   |   0.84%   |   1.46%        | 5     |   -0.21%       | -0.87   | -1.25   |
| TPCH(30) | TPCH-Q1  | parquet / none / none | 4.69   | 4.72        |   -0.56%   |   1.29%   |   0.52%        | 5     |   -1.04%       | -1.15   | -0.89   |
| TPCH(30) | TPCH-Q13 | parquet / none / none | 10.64  | 10.80       |   -1.48%   |   0.61%   |   0.60%        | 5     |   -1.39%       | -1.73   | -3.91   |
| TPCH(30) | TPCH-Q15 | parquet / none / none | 4.11   | 4.32        |   -4.92%   |   0.05%   |   0.40%        | 5     |   -4.93%       | -2.31   | -27.46  |
| TPCH(30) | TPCH-Q20 | parquet / none / none | 3.47   | 3.67        | I -5.41%   |   0.81%   |   0.03%        | 5     | I -5.70%       | -2.31   | -15.75  |
| TPCH(30) | TPCH-Q17 | parquet / none / none | 7.58   | 8.14        | I -6.93%   |   3.13%   |   2.62%        | 5     | I -9.31%       | -2.02   | -3.96   |
| TPCH(30) | TPCH-Q9  | parquet / none / none | 15.59  | 17.02       | I -8.38%   |   0.95%   |   0.43%        | 5     | I -8.92%       | -2.31   | -19.37  |
| TPCH(30) | TPCH-Q14 | parquet / none / none | 2.90   | 3.25        | I -10.93%  |   1.42%   |   4.41%        | 5     | I -10.28%      | -2.31   | -5.33   |
| TPCH(30) | TPCH-Q12 | parquet / none / none | 2.69   | 3.13        | I -14.31%  |   4.50%   |   1.40%        | 5     | I -17.79%      | -2.31   | -7.80   |
| TPCH(30) | TPCH-Q16 | parquet / none / none | 2.50   | 3.03        | I -17.54%  |   0.10%   |   0.79%        | 5     | I -20.55%      | -2.31   | -49.31  |
| TPCH(30) | TPCH-Q10 | parquet / none / none | 4.76   | 5.92        | I -19.52%  |   0.78%   |   0.33%        | 5     | I -24.31%      | -2.31   | -61.63  |
| TPCH(30) | TPCH-Q2  | parquet / none / none | 2.56   | 3.33        | I -23.18%  |   2.13%   |   0.85%        | 5     | I -30.39%      | -2.31   | -28.14  |
| TPCH(30) | TPCH-Q18 | parquet / none / none | 12.59  | 16.41       | I -23.26%  |   1.73%   |   0.90%        | 5     | I -30.43%      | -2.31   | -32.36  |
| TPCH(30) | TPCH-Q11 | parquet / none / none | 1.83   | 2.41        | I -24.04%  |   1.83%   |   2.22%        | 5     | I -30.48%      | -2.31   | -20.54  |
| TPCH(30) | TPCH-Q8  | parquet / none / none | 4.43   | 5.94        | I -25.33%  |   0.96%   |   0.54%        | 5     | I -34.54%      | -2.31   | -63.01  |
| TPCH(30) | TPCH-Q5  | parquet / none / none | 3.81   | 5.37        | I -29.08%  |   1.43%   |   0.69%        | 5     | I -41.47%      | -2.31   | -53.11  |
| TPCH(30) | TPCH-Q7  | parquet / none / none | 13.34  | 21.49       | I -37.92%  |   0.46%   |   0.30%        | 5     | I -60.69%      | -2.31   | -203.08 |
| TPCH(30) | TPCH-Q3  | parquet / none / none | 4.73   | 7.73        | I -38.81%  |   4.90%   |   1.35%        | 5     | I -61.68%      | -2.31   | -26.40  |
| TPCH(30) | TPCH-Q19 | parquet / none / none | 3.71   | 6.61        | I -43.83%  |   1.63%   |   0.09%        | 5     | I -77.12%      | -2.31   | -106.61 |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+

Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7
Reviewed-on: http://gerrit.cloudera.org:8080/15096
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-03-17 23:29:45 +00:00
Volodymyr Verovkin
6fdc644fed IMPALA-8800: Added support of Kudu DATE type to Impala
This patch supports reading and writing DATE values
to Kudu tables. It does not add min-max filter runtime
support, but there is followup JIRA IMPALA-9294.
Corresponding Kudu JIRA is KUDU-2632.

Change-Id: I91656749a58ac769b54c2a63bdd4f85c89520b32
Reviewed-on: http://gerrit.cloudera.org:8080/14705
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-03-12 14:26:13 +00:00
stiga-huang
9672d94596 IMPALA-7784: Use unescaped string in partition pruning + fix duplicatedly unescaping strings
String values from external systems (HDFS, Hive, Kudu, etc.) are already
unescaped, the same as string values in Thrift objects deserialized in
coordinators. We should mark needsUnescaping_ as false in creating
StringLiterals for these values (in LiteralExpr#create()).

When comparing StringLiterals in partition pruning, we should also use
the unescaped values if needsUnescaping_ is true.

Tests:
 - Add tests for partition pruning on unescaped strings.
 - Add test coverage for all existing code paths using
   LiteralExpr#create().
 - Run core tests

Change-Id: Iea8070f16a74f9aeade294504f2834abb8b3b38f
Reviewed-on: http://gerrit.cloudera.org:8080/15278
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-03-09 06:29:35 +00:00
Riza Suminto
a007e3caf8 IMPALA-8674: fix bug where REMOTE runtime filter always marked disabled
When a runtime filter has remote target, coordinator will Disable the
FilterState upon arrival of the last filter update to prevent another
update towards that filter. As consequence, such runtime filter will
always be displayed as disabled in runtime profile (Enabled column is
equal to false in Final filter table), when in reality the runtime
filter has heard back from all pending backends and complete. The
Enabled column should correctly distinguish between failed runtime
filter vs complete runtime filter. To do so, we add
all_updates_received_ flag in FilterState class and set it to true
after filter received enough filter update from pending backends to
proceed. If all_updates_received_ is true, then that runtime filter is
considered as enabled.

Testing:
- Add row regex in runtime_filters.test, query 6, to verify REMOTE
  runtime filter is marked as enabled in final filter table
- Run and pass test_runtime_filters.py
- Run and pass core tests

Change-Id: I82a5a776103abd0a6d73336bebc65e22b4e13fef
Reviewed-on: http://gerrit.cloudera.org:8080/15308
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-03-07 00:00:22 +00:00
Tim Armstrong
1ea013cac8 IMPALA-9452: slightly reduce reservation for test_spilling_aggs
As far as I can tell, the query failed to spill because the
pre-agg was able to release reservation before the post-agg
needed it. Probably there is some variance because of buffering
in the exchange.

This change slightly reduces the reservation to minimise the
chance of this recurring.

Also remove a duplicated instance of this test.

Change-Id: Ifb8376e2e12d3f73d6c0e27c697be4fc86f9c755
Reviewed-on: http://gerrit.cloudera.org:8080/15339
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-03-03 06:12:24 +00:00
Csaba Ringhofer
2c54dbe225 IMPALA-9385: Unix time conversion cleanup + ORC fix
ORC scanner uses TimestampValue::FromUnixTimeNanos() to convert
sec + nano representation to Impala's TimestampValue (day + nano).
FromUnixTimeNanos was affected by flag
use_local_tz_for_unix_timestamp_conversions, while that global option
should not affect ORC. By default there was no conversion, but if the
flag is 1, then timestamps were interpreted as UTC and converted to
local time.

This could be solved by creating a UTC version of FromUnixTimeNanos,
but I decided to change the interface in the hope of making To/From
timestamp functions less confusing.

Changes:
- Fixed the bug by passing UTC as timezone in the ORC scanner.
- Changed the interface of these TimestampValue functions to expect
  a timezone pointer, interpret null as UTC and skip conversion. It
  would be also possible to pass the actual UTC timezone and check
  for this in the functions, but I guess it is easier to optimize
  the inlined functions this way.
- Moved the checking of use_local_tz_for_unix_timestamp_conversions to
  RuntimeState and added property time_zone_for_unix_time_conversions()
  to return the timezone to use in Unix time conversions. This made
  TimestampValue's interface clearer and makes it easy to replace the
  flag with a query option if we want to.
- Changed RuntimeState and the Parquet scanner to skip timezone
  conversion if convert_legacy_hive_parquet_utc_timestamps=1 but the
  timezone is UTC. This allows users to avoid the performance penalty
  of this flag by setting query option timezone to UTC in their
  session (IMPALA-7557). CCTZ is not good at this, actually
  conversions are slower with fixed offset timezones (including UTC)
  than with timezones that have DST/historical rule changes.

Postponed changes:
- Didn't remove the UTC versions of the functions yet, as that would
  require changing (and possibly rethinking) several BE tests and
  benchmarks (IMPALA-9409).

Tests:
- Added regression test for Orc and other file formats to
  check that they are not affected by this flag.
- Extended test_hive_parquet_timestamp_conversion.py to cover the case
  when convert_legacy_hive_parquet_utc_timestamps=1 and timezone=UTC.
  Also did some cleanup there to use query option timezone instead of
  env var TZ.

Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Reviewed-on: http://gerrit.cloudera.org:8080/15222
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-22 02:02:56 +00:00
Tim Armstrong
0bb056e525 IMPALA-4224: execute separate join builds fragments
This enables parallel plans with the join build in a
separate fragment and fixes all of the ensuing fallout.
After this change, mt_dop plans with joins have separate
build fragments. There is still a 1:1 relationship between
join nodes and builders, so the builders are only accessed
by the join node's thread after it is handed off. This lets
us defer the work required to make PhjBuilder and NljBuilder
safe to be shared between nodes.

Planner changes:
* Combined the parallel and distributed planning code paths.
* Misc fixes to generate reasonable thrift structures in the
  query exec requests, i.e. containing the right nodes.
* Fixes to resource calculations for the separate build plans.
** Calculate separate join/build resource consumption.
** Simplified the resource estimation by calculating resource
   consumption for each fragment separately, and assuming that
   all fragments hit their peak resource consumption at the
   same time. IMPALA-9255 is the follow-on to make the resource
   estimation more accurate.

Scheduler changes:
* Various fixes to handle multiple TPlanExecInfos correctly,
  which are generated by the planner for the different cohorts.
* Add logic to colocate build fragments with parent fragments.

Runtime filter changes:
* Build sinks now produce runtime filters, which required
  planner and coordinator fixes to handle.

DataSink changes:
* Close the input plan tree before calling FlushFinal() to release
  resources. This depends on Send() not holding onto references
  to input batches, which was true except for NljBuilder. This
  invariant is documented.

Join builder changes:
* Add a common base class for PhjBuilder and NljBuilder with
  functions to handle synchronisation with the join node.
* Close plan tree earlier in FragmentInstanceState::Exec()
  so that peak resource requirements are lower.
* The NLJ always copies input batches, so that it can close
  its input tree.

JoinNode changes:
* Join node blocks waiting for build-side to be ready,
  then eventually signals that it's done, allowing the builder
  to be cleaned up.
* NLJ and PHJ nodes handle both the integrated builder and
  the external builder. There is a 1:1 relationship between
  the node and the builder, so we don't deal with thread safety
  yet.
* Buffer reservations are transferred between the builder and join
  node when running with the separate builder. This is not really
  necessary right now, since it is all single-threaded, but will
  be important for the shared broadcast.
  - The builder transfers memory for probe buffers to the join node
    at the end of each build phase.
  - At end of each probe phase, reservation needs to be handed back
    to builder (or released).

ExecSummary changes:
* The summary logic was modified to handle connecting fragments
  via join builds. The logic is an extension of what was used
  for exchanges.

Testing:
* Enable --unlock_mt_dop for end-to-end tests
* Migrate some tests to run as part of end-to-end tests instead of
  custom cluster.
* Add mt_dop dimension to various end-to-end tests to provide
  coverage of join queries, spill-to-disk and cancellation.
* Ran a single node TPC-H and TPC-DS stress test with mt_dop=0
  and mt_dop=4.

Perf:
* Ran TPC-H scale factor 30 locally with mt_dop=0. No significant
  change.

Change-Id: I4403c8e62d9c13854e7830602ee613f8efc80c58
Reviewed-on: http://gerrit.cloudera.org:8080/14859
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-20 01:51:54 +00:00
Csaba Ringhofer
796f5cc43d IMPALA-8909: Fix incorrectly omitting implicit cast in pre-insert sort
Inserts can add a sort node that orders the rows by partitioning
and Kudu primary key columns (aka. clustered insert). The issue
occurred when the target column was a timestamp and the source
was an expression that returned a string (e.g. concat()). Impala
adds an implicit cast to convert the strings to timestamps before
sorting, but this cast was incorrectly removed later during expression
substitution.

This led to hitting a DCHECK in debug builds and a (not too
informative) error message in release mode.

Note that the cast in question is not visible in EXPLAIN outputs.
Explain should contain implicit casts from explain_level=2 since
https://gerrit.cloudera.org/#/c/11719/ , but it is still not shown
in some expressions. I consider this to be a separate issue.

Testing:
- added an EE test that used to crash
- ran planner / sort / kudu_insert tests

Change-Id: Icca8ab1456a3b840a47833119c9d4fd31a1fff90
Reviewed-on: http://gerrit.cloudera.org:8080/15217
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-17 21:20:26 +00:00
stiga-huang
cad156181b IMPALA-9304: Support starting Hive with Ranger in minicluster
Add a new flag -with_ranger in testdata/bin/run-hive-server.sh to start
Hive with Ranger integration. The relative configuration files are
generated in bin/create-test-configuration.sh using a new varient
ranger_auth in hive-site.xml.py. Only Hive3 is supported.

Current limitation:
Can't use different username in Beeline by the -n option. "select
current_user()" keeps returning my username, while "select
logged_in_user()" can return the username given by -n option but it's
not used in authorization.

Tests:
 - Ran bin/create-test-configuration.sh and verified the generated
   hive-site_ranger_auth.xml contains Ranger configurations.
 - Ran testdata/bin/run-hive-server.sh -with_ranger. Verified column
   masking and row filtering policies took effect in Beeline.
 - Added test in test_ranger.py for this mode.

Change-Id: I01e3a195b00a98388244a922a1a79e65146cec42
Reviewed-on: http://gerrit.cloudera.org:8080/15189
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-14 04:26:16 +00:00
Yanjia Li
ea0e1def61 IMPALA-8778: Support Apache Hudi Read Optimized Table
Hudi Read Optimized Table contains multiple versions of parquet files,
in order to load the table correctly, Impala needs to recognize Hudi Read
Optimized Table as a HdfsTable and load the latest version of the file
using HoodieROTablePathFilter.

Tests
 - Unit test for Hudi in FileMetadataLoader
 - Create table tests in functional_schema_template.sql
 - Query tests in hudi-parquet.test

Change-Id: I65e146b347714df32fe968409ef2dde1f6a25cdf
Reviewed-on: http://gerrit.cloudera.org:8080/14711
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-11 15:08:39 +00:00
wzhou-code
ebc2c366f5 IMPALA-8110: Fix Parquet min/max filters for narrowed integer types
This patch adds validation for the paired stats values of tinyint
and smallint column data type when reading min/max column stats
value from Parquet file.

Testing:
 - Added automatic test cases in parquet-stats.test for column data
   type been changed from int to tinyint, from smallint to tinyint
   and from int to smallint.
 - Passed EE tests.
 - Passed all core tests.

Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Reviewed-on: http://gerrit.cloudera.org:8080/15087
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-09 23:18:20 +00:00
Tim Armstrong
5b233150d2 IMPALA-9309: fix uninitialised pointer in NestedLoopJoinNode
This was reproducible with a failpoint, but test_failpoints.py
does not run with the right set of parameters.

Testing:
Add regression test that reproduces the issue.

Change-Id: I6324d27e93b710e806d8599f775d0f7d575186cb
Reviewed-on: http://gerrit.cloudera.org:8080/15147
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-05 22:00:48 +00:00
stiga-huang
a17bea916f IMPALA-9330: Fix column masking in nested tables + enable column masking by default
Column masking policies on primitive columns of a table which contains
nested types (though they won't be masked) will cause query failures.
To be specifit, if tableA(id int, int_array array<int>) has a masking
policy on column "id", all queries on "tableA" will fail, e.g.
  select id from tableA;
  select t.id, a.item from tableA t, t.int_array a;

Column masking is implemented by wrapping the underlying table/view with
a table masking view. However, as we don't support nested types in
SelectList, the table masking view can't expose nested columns of the
masked table, which causes collection refs not being resolved correctly.

This patch fixes the issue by 2 steps:
1) Expose nested columns of the underlying table in the output Type of
   the table masking view (see InlineViewRef#createTupleDescriptor()).
   So nested Paths in the original query block can be resolved.
2) For such kind of Paths, resolved them again inside the table masking
   view. So they can point to the underlying table as what they mean
   (see Analyzer#resolvePathWithMasking()). TupleDescriptor of such kind
   of table masking view won't be materialized since the view is simple
   enough that its query plan is just a ScanNode of the underlying
   table. The whole query plan can be stitched as if the table is not
   masked.
Note that one day when we support nested columns in SelectList, we may
don't need these 2 hacks.

This patch also adds some TRACE level loggings to improve debuggability,
and enables column masking by default.

Test changes in TestRanger.test_column_masking:
 - Add column masking policy on a table containing nested types.
 - Add queries on the masked tables. Some queries are borrowed from
   existing tests for nested types.

Tests:
 - Run CORE tests.

Change-Id: I1cc5565c64c1a4a56445b8edde59b1168f387791
Reviewed-on: http://gerrit.cloudera.org:8080/15108
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-05 04:18:28 +00:00
Tim Armstrong
7b280e5841 IMPALA-9349: free output_unmatched_batch_ buffers promptly in PHJ
This fixes a subtle memory managment issue where freeing of a
buffer is delayed longer than it should be. This means that
the full buffer pool reservation is not available for
repartitioning, which can lead to crashes or hang for
very specific queries.

The fix is to transfer resources from output_unmatched_batch_
as soon as the last row from the batch is appended to the
output batch.

This bug would only be triggered by join modes that output
unmatched rows from the right side (RIGHT OUTER JOIN,
FULL OUTER JOIN, RIGHT ANTI JOIN) *and* have an empty
probe side (otherwise unmatched rows are output by
iterating over the hash table).

Testing:
Added DCHECKs to check that all resources are available
before repartitioning.

Added a regression test that triggered the bug.

Change-Id: Ie13b51d4d909afb0fe2e7b7dc00b085c51058fed
Reviewed-on: http://gerrit.cloudera.org:8080/15142
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-01 02:25:13 +00:00
Tim Armstrong
f38da0df8e IMPALA-4400: aggregate runtime filters locally
Move RuntimeFilterBank to QueryState(). Implement fine-grained
locking for each filter to mitigate any increased lock
contention from the change.

Make RuntimeFilterBank handle multiple producers of the
same filter, e.g. multiple instances of a partitioned
join. It computes the expected number of filters upfront
then sends the filter to the coordinator once all the
local instances have been merged together. The merging
can be done in parallel locally to improve latency of
filter propagation.

Add Or() methods to MinMaxFilter and BloomFilter, since
we now need to merge those, not just the thrift versions.

Update coordinator filter routing to expect only one
instance of a filter from each producer backend and
to only send one instance to each consumer backend
(instead of sending one per fragment).

Update memory reservations and estimates to be lower
to account for sharing of filters between fragment
instances. mt_dop plans are modified to show these
shared and non-shared resources separately.

Enable waiting for runtime filters for kudu scanner
with mt_dop.

Made min/max filters const-correct.

Testing
* Added unit tests for Or() methods.
* Added some additional e2e test coverage for mt_dop queries
* Updated planner tests with new estimates and reservation.
* Ran a single node 3-impalad stress test with TPC-H kudu and
  TPC-DS parquet.
* Ran exhaustive tests.
* Ran core tests with ASAN.

Perf
* Did a single-node perf run on TPC-H with default settings. No perf change.
* Single-node perf run with mt_dop=8 showed significant speedups:

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 10.14   | -7.29%     | 5.05       | -11.68%        |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval    |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+
| TPCH(30) | TPCH-Q7  | parquet / none / none | 38.87  | 38.44       |   +1.13%   |   7.17%   | * 10.92% *     | 20    |   +0.72%       | 0.72    | 0.39    |
| TPCH(30) | TPCH-Q1  | parquet / none / none | 4.28   | 4.26        |   +0.50%   |   1.92%   |   1.09%        | 20    |   +0.03%       | 0.31    | 1.01    |
| TPCH(30) | TPCH-Q22 | parquet / none / none | 2.32   | 2.32        |   +0.05%   |   2.01%   |   1.89%        | 20    |   -0.03%       | -0.36   | 0.08    |
| TPCH(30) | TPCH-Q15 | parquet / none / none | 3.73   | 3.75        |   -0.42%   |   0.84%   |   1.05%        | 20    |   -0.25%       | -0.77   | -1.40   |
| TPCH(30) | TPCH-Q13 | parquet / none / none | 9.80   | 9.83        |   -0.38%   |   0.51%   |   0.80%        | 20    |   -0.32%       | -1.30   | -1.81   |
| TPCH(30) | TPCH-Q2  | parquet / none / none | 1.98   | 2.00        |   -1.32%   |   1.74%   |   2.81%        | 20    |   -0.64%       | -1.71   | -1.79   |
| TPCH(30) | TPCH-Q6  | parquet / none / none | 1.22   | 1.25        |   -2.14%   |   2.66%   |   4.15%        | 20    |   -0.96%       | -2.00   | -1.95   |
| TPCH(30) | TPCH-Q19 | parquet / none / none | 5.13   | 5.22        |   -1.65%   |   1.20%   |   1.40%        | 20    |   -1.76%       | -3.34   | -4.02   |
| TPCH(30) | TPCH-Q16 | parquet / none / none | 2.46   | 2.56        |   -4.13%   |   2.49%   |   1.99%        | 20    |   -4.31%       | -4.04   | -5.94   |
| TPCH(30) | TPCH-Q9  | parquet / none / none | 81.63  | 85.07       |   -4.05%   |   4.94%   |   3.06%        | 20    |   -5.46%       | -3.28   | -3.21   |
| TPCH(30) | TPCH-Q10 | parquet / none / none | 5.07   | 5.50        | I -7.92%   |   0.96%   |   1.33%        | 20    | I -8.51%       | -5.27   | -22.14  |
| TPCH(30) | TPCH-Q21 | parquet / none / none | 24.00  | 26.24       | I -8.57%   |   0.46%   |   0.38%        | 20    | I -9.34%       | -5.27   | -67.47  |
| TPCH(30) | TPCH-Q18 | parquet / none / none | 8.66   | 9.50        | I -8.86%   |   0.62%   |   0.44%        | 20    | I -9.75%       | -5.27   | -55.17  |
| TPCH(30) | TPCH-Q3  | parquet / none / none | 6.01   | 6.70        | I -10.19%  |   1.01%   |   0.90%        | 20    | I -11.25%      | -5.27   | -35.76  |
| TPCH(30) | TPCH-Q12 | parquet / none / none | 2.98   | 3.39        | I -12.23%  |   1.48%   |   1.48%        | 20    | I -13.56%      | -5.27   | -27.75  |
| TPCH(30) | TPCH-Q11 | parquet / none / none | 1.69   | 2.00        | I -15.55%  |   1.63%   |   1.47%        | 20    | I -18.09%      | -5.27   | -34.60  |
| TPCH(30) | TPCH-Q4  | parquet / none / none | 2.42   | 2.87        | I -15.69%  |   1.48%   |   1.26%        | 20    | I -18.61%      | -5.27   | -39.50  |
| TPCH(30) | TPCH-Q14 | parquet / none / none | 4.64   | 6.27        | I -26.02%  |   1.35%   |   0.73%        | 20    | I -35.37%      | -5.27   | -94.07  |
| TPCH(30) | TPCH-Q20 | parquet / none / none | 3.19   | 4.37        | I -27.01%  |   1.54%   |   0.99%        | 20    | I -36.85%      | -5.27   | -80.74  |
| TPCH(30) | TPCH-Q5  | parquet / none / none | 4.57   | 6.39        | I -28.36%  |   1.04%   |   0.75%        | 20    | I -39.56%      | -5.27   | -120.02 |
| TPCH(30) | TPCH-Q17 | parquet / none / none | 3.15   | 4.71        | I -33.06%  |   1.59%   |   1.31%        | 20    | I -49.43%      | -5.27   | -87.64  |
| TPCH(30) | TPCH-Q8  | parquet / none / none | 5.25   | 7.95        | I -33.95%  |   0.95%   |   0.53%        | 20    | I -51.11%      | -5.27   | -185.02 |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+

Change-Id: Iabeeab5eec869ff2197250ad41c1eb5551704acc
Reviewed-on: http://gerrit.cloudera.org:8080/14538
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-01-29 00:58:24 +00:00
Anurag Mantripragada
567b3cd04c IMPALA-9311: Store SQLPrimaryKeys in canonical order.
HMS seems to be returning SQLPrimaryKeys in inconsistent orders.
This makes some of the primary keys tests flaky. This change sorts
the list of primary keys and stores them in canonical order within
Impala.

Testing:
- Modified the tests that were relying on HMS to return same order
  every time.
- Ran parametrized job.

Change-Id: I0f798d7a2659c6cd061002db151f3fa787eb6370
Reviewed-on: http://gerrit.cloudera.org:8080/15106
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2020-01-27 21:48:23 +00:00
Joe McDonnell
0163a10332 IMPALA-9068: Use different directories for external vs managed warehouse
Hive 3 changed the typical storage model for tables to split them
between two directories:
 - hive.metastore.warehouse.dir stores managed tables (which is now
   defined to be only transactional tables)
 - hive.metastore.warehouse.external.dir stores external tables
   (everything that is not a transactional table)
In more recent commits of Hive, there is now validation that the
external tables cannot be stored in the managed directory. In order
to adopt these newer versions of Hive, we need to use separate
directories for external vs managed warehouses.

Most of our test tables are not transactional, so they would reside
in the external directory. To keep the test changes small, this uses
/test-warehouse for the external directory and /test-warehouse/managed
for the managed directory. Having the managed directory be a subdirectory
of /test-warehouse means that the data snapshot code should not need to
change.

The Hive 2 configuration doesn't change as it does not have this concept.

Since this changes the dataload layout, this also sets the CDH_MAJOR_VERSION
to 7 for USE_CDP_HIVE=true. This means that dataload will uses a separate
location for data as compared to USE_CDP_HIVE=false. That should reduce
conflicts between the two configurations.

Testing:
 - Ran exhaustive tests with USE_CDP_HIVE=false
 - Ran exhaustive tests with USE_CDP_HIVE=true (with current Hive version)
 - Verified that dataload succeeds and tests are able to run with a newer
   Hive version.

Change-Id: I3db69f1b8ca07ae98670429954f5f7a1a359eaec
Reviewed-on: http://gerrit.cloudera.org:8080/15026
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-01-24 17:29:15 +00:00
Anurag Mantripragada
cfe60858da IMPALA-9158: Support loading primary key/foreign key constraints
in LocalCatalog Mode.

This change add a new method 'loadConstraints()' to the MetaProvider
interface.

1. In CatalogdMetaProvider implementation, we fetch the primary key
  (PK) and foreign key(FK) information via the GetPartialCatalogObject()
  RPC to the catalogd. This is modified to include PK/FK information.
  This is because, on catalog side we eagerly load PK/FK information
  which can be sent over to local catalog in a single RPC to Catalog.
  This information is then stored in TableMetaRef object for future
  consumers.
2. In the DirectMetaProvider implementation, we make two RPCs to HMS
  to directly get PK/FK information.

Load constraints can be extended to include other constraints later
(for ex: unique constraints.)

Testing:
- Added tests in LocalCatalogTest, CatalogTest and PartialCatalogInfoTest
- This change also modifies the toSqlUtil for show create table
  statements. Added a test for the same.

Change-Id: I7ea7e1bacf6eb502c67caf310a847b32687e0d58
Reviewed-on: http://gerrit.cloudera.org:8080/14731
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-01-18 03:36:37 +00:00
Gabor Kaszab
63f52518ab IMPALA-8801: Date type support for ORC scanner
Implements read path for the date type in ORC scanner. The internal
representation of a date is an int32 meaning the number of days since
Unix epoch using proleptic Gregorian calendar.

Similarly to the Parquet implementation (IMPALA-7370) this
representation introduces an interoperability issue between Impala
and older versions of Hive (before 3.1). For more details see the
commit message of the mentioned Parquet implementation.

Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Reviewed-on: http://gerrit.cloudera.org:8080/14982
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-01-17 18:54:33 +00:00
stiga-huang
d66610837e IMPALA-9009: Core support for Ranger column masking
Ranger provides column masking policies about how to show masked values
to specific users when reading specific columns. This patch adds support
to rewrite the query AST based on column masking policies.

We perform the column masking policies by replacing the TableRef with a
subquery doing the masking. For instance, the following query
  select c_id, c_name from customer c join orders on c_id = o_cid
will be transfomed into
  select c_id, c_name  from (
    select mask1(c_id) as c_id, mask2(c_name) as c_name from customer
  ) c
  join orders
  on c_id = o_cid

The transfomation is done in AST resolution. Just like view resolution,
if the table needs masking we replace it with a subquery(InlineViewRef)
containing the masking expressions.

This patch only adds support for mask types that don't require builtin
mask functions. So currently supported masking types are MASK_NULL and
CUSTOM.

Current Limitations:
 - Users are required to have privileges on all columns of a masked
   table(IMPALA-9223), since the table mask subquery contains all the
   columns.

Tests:
 - Add e2e tests for masked results
 - Run core tests

Change-Id: I4cad60e0e69ea573b7ecfc011b142c46ef52ed61
Reviewed-on: http://gerrit.cloudera.org:8080/14894
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-01-15 05:59:11 +00:00
norbert.luksa
0511b44f92 IMPALA-8046: Support CREATE TABLE from an ORC file
Impala supports creating a table using the schema of a file.
However, only Parquet is supported currently. This commit adds
support for creating tables from ORC files

The change relies on the ORC Java API with version 1.5 or
greater, because of a bug in earlier versions. Therefore, ORC is
listed as an external dependency, instead of relying on Hive's
ORC version (from Hive3, Hive also lists it as a dependency).

Also, the commit performs a little clean-up on the ParquetHelper
class, renaming it to ParquetSchemaExtractor and removing outdated
comments.

To create a table from an ORC file, run:
CREATE TABLE tablename LIKE ORC '/path/to/file'

Tests:
 * Added analysis tests for primitive and complex types.
 * Added e2e tests for creating tables from ORC files.

Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
Reviewed-on: http://gerrit.cloudera.org:8080/14811
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-01-14 17:15:17 +00:00
Vihang Karajgaonkar
6ebea33a9d IMPALA-9092: Add support for creating external Kudu table
In HMS-3 the translation layer converts a managed kudu table into an
external kudu table and adds additional table property
'external.table.purge' to 'true'. This means any installation which
is using HMS-3 (or a Hive version which has HIVE-22158) will always
create Kudu tables as external tables. This is problematic since the
output of show create table will now be different and may confuse
the users.

In order to improve the user experience of such synchronized tables
(external tables with external.table.purge property set to true),
this patch adds support in Impala to create
external Kudu tables. Previous versions of Impala disallowed
creating a external Kudu table if the Kudu table did not exist.
After this patch, Impala will check if the Kudu table exists and if
it does not it will create a Kudu table based on the schema provided
in the create table statement. The command will error out if the Kudu
table already exists. However, this applies to only the synchronized
tables. Previous way to create a pure external table behaves the
same.

Following syntax of creating a synchronized table is now allowed:

CREATE EXTERNAL TABLE foo (
  id int PRIMARY KEY,
  name string)
PARTITION BY HASH PARTITIONS 8
STORED AS KUDU
TBLPROPERTIES ('external.table.purge'='true')

The syntax is very similar to creating a managed table, except for
the EXTERNAL keyword and additional table property. A synchronized
table will behave similar to managed Kudu tables (drops and renames
are allowed). The output of show create table on a synchronized
table will display the full column and partition spec similar to the
managed tables.

Testing:
1. After the CDP version bump all of the existing Kudu tables now
create synchronized tables so there is good coverage there.
2. Added additional tests which create synchronized tables and
compares the show create table output.
3. Ran exhaustive tests with both CDP and CDH builds.

Change-Id: I76f81d41db0cf2269ee1b365857164a43677e14d
Reviewed-on: http://gerrit.cloudera.org:8080/14750
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-12-13 23:02:13 +00:00
norbert.luksa
bf031a2142 IMPALA-6660: Change -0/+0 floating point to compare as equal in hash table
Currently -0.0/+0.0 values are hashed to different values due to
their different binary representation, while -0.0==+0.0 is true in
C++. This caused them to be distinct values in hash maps despite
being treated as equal in comparisons.

This commit fixes the hashing of -0.0/+0.0, thus changing the
behaviour of hash joins and aggregations (since aggregations
follow the behaviour of the join). That way, the canonical form for
-0/+0 is changed to +0.

Tests:
 - Added e2e tests for aggregation (group by and distinct) and
   join queries with -0.0 and +0.0 present.

Change-Id: I6bb1a817c81c452d041238c19cb6c9f602a5d565
Reviewed-on: http://gerrit.cloudera.org:8080/14588
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-11-26 19:14:24 +00:00
norbert.luksa
2114fc6155 IMPALA-4618: Fixing #Hosts and adding #Instances in exec summary
When mt_dop > 0, the summary is reporting the number of fragment
instances, instead of the number of hosts as the header would
imply.

This commit fixes the issue so the number of hosts will be shown
under the #Hosts column. The commit also adds an #Inst column
where the number of instances are shown (current behaviour).

Tests:
 * Changed profile tests with mt_dop > 0.
 * Updated benchmark tests and shell tests accordingly.

Change-Id: I3bdf9a06d9bd842b2397cd16c28294b6bec7af69
Reviewed-on: http://gerrit.cloudera.org:8080/14715
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-11-26 07:28:23 +00:00
Alice Fan
a0ebeef63b IMPALA-9023: Fix IllegalStateException in SimplifyConditionalsRule
For 'case' function, if left WHEN condition is true, SimplifyConditionalsRule
will cast the THEN result-expression to original expression's type before return
result. In this Jira, we would like to remove the cast function for two reasons:
1. SimplifyConditionalsRule only applys to analyzed expression, which means
expression has already been casted to compatible type before it reaches the
expression rewrite step.
2. The cast function will cause IllegalStateException when 'CASE WHEN TRUE'
appearing in the where conjunction. For example:
Query: select * from functional.alltypessmall where case when true then id < 50 END
ERROR: IllegalStateException: null

Testing:
- Added e2e test to exprs.test
- Added unit test to ExprRewriteRulesTest
- Added unit test to ExprRewriterTest

Change-Id: I640d577200e76121c72685e4aaba1ef312a2d8b4
Reviewed-on: http://gerrit.cloudera.org:8080/14540
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-11-07 23:56:06 +00:00
Gabor Kaszab
050dcf912e IMPALA-9093: Change ACID tests to upgrade external tables
Due to Hive-22158 all non-ACID tables are treated as external tables
instead of being managed tables. The ACID tests occasionally upgrade
non-ACID tables to ACID tables but that is not allowed for external
tables. Since all non-ACID tables are external due to HIVE-22158 some
of the ACID tests started to fail after a CDP_BUILD_NUMBER bump that
brought in a Hive version containing the mentioned change.

The fix is to set 'EXTERNAL' table property to false in the same step
when upgrading the table to ACID. Also in the tests this step is
executed from HIVE instead of Impala.

Tested with the original CDP_BUILD_NUMBER in bin/impala-config.sh and
also tested after bumping that number to 1579022.

Change-Id: I796403e04b3f06c99131db593473d5438446d5fd
Reviewed-on: http://gerrit.cloudera.org:8080/14633
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
2019-11-07 14:30:47 +00:00
Tim Armstrong
b02f514583 IMPALA-8999: make union scheduling work with mt_dop
This change unifies mt_dop scheduling between the
union and scan cases.

Testing:
Manually checked that fragments with unions get parallelised
to the correct degree, both as a result of scans within the
fragment and input fragments.

Extend TestMtDopAdmissionSlots (renamed to TestMtDopScheduling)
to confirm that queries that were not parallelised before are
now parallelised. These tests verify the number of instances
of each operator using the ExecSummary embedded in the profile.

Change-Id: I0d2e9c86b530da3053e49d42b837dca0b1348ff2
Reviewed-on: http://gerrit.cloudera.org:8080/14384
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-10-24 04:41:58 +00:00
Tim Armstrong
9549d3859e IMPALA-9019: fix runtime filter propagation with mt_dop
The bug was that filter routing table construction removed
filters from the TPlanNode structure for the join when
a finstance was not a producer of that filter. The
TPlanNode is shared between all instances of a fragment
on a backend, so this meant that the filter was removed
for all instances on that backend, often meaning that
no filters would be produced at all.

It was awkward fixing the bug within the framework of
the current data structures, where the routing table
is keyed by filter_id, so I ended up refactoring
the routing table somewhat. This also allowed
fixing a TODO about O(n^2) construction of the
routing table.

Testing:
Add regression test that timed out without fix.

Perf:
Ran a single node TPC-H workload with scale factor
30. No perf change.

Change-Id: I26e3628a982d5d9b8b24eb96b28aff11f8aa6669
Reviewed-on: http://gerrit.cloudera.org:8080/14511
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-10-22 04:21:51 +00:00
Tim Armstrong
96143cfcf1 IMPALA-8997: auto fallback to mt_dop=0
Add a temporary --mt_dop_auto_fallback to allow a graceful transition to
using mt_dop for workloads. When this flag is set, DML queries and joins
that would otherwise fail with an error when run with mt_dop > 0 fall
back to running with mt_dop = 0. This means that a user can set mt_dop
for their queries and it will only take effect when supported.

The behaviour generally does not change when this flag is not set,
with a couple of exceptions:
* I made mt_dop automatic for compute stats on all file formats
* mt_dop is allowed for single node plans with inserts. The
  quirky validatePlan() logic previously disallowed this but
  allowed joins in single node plans.

The checks added by this patch can be removed safely once mt_dop is
supported by default for all queries.

This includes some cleanup:
* isDmlStmt() was stale and incorrectly implemented.
* Various TreeNode methods did not return instances of subclasses of
  the requested class, which was strange. This fix is required to
  make 'contains(JoinNode.class)' work correctly. I checked the
  callsites of the fixed functions and none of them would be affected
  by this change because they specified a terminal class without
  any subclasses.
  I didn't actually use this fix in the end (I had to write a custom
  tree traversal in hasUnsupportedMtDopJoin()), but figured I would
  leave the improvement in here.

Testing:
Add some basic functional tests ensuring that the fallback takes
effect.

Run basic join and insert tests with this flag enabled.

Change-Id: Ie0d73d8744059874293697c8e104891a10dba04d
Reviewed-on: http://gerrit.cloudera.org:8080/14344
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-10-22 00:31:01 +00:00
norbert.luksa
3f1d047224 IMPALA-6501: Optimize count(star) for Kudu scans
IMPALA-5036 added an optimisation for count(star) in Parquet scans
that avoids materialising dummy rows. This change provides similar
optimization for Kudu tables.

Instead of materializing empty rows when computing count star, we use
the NumRows field from the Kudu API. The Kudu scanner tuple is
modified to have one slot into which we will write the
num rows statistic. The aggregate function is changed from count to a
special sum function that gets initialized to 0.

Tests:
 * Added end-to-end tests
 ̣* Added planner tests
 * Run performance tests on tpch.lineitem Kudu table with 25 set as
   scaling factor, on 1 node, with mt_dop set to 1, just to measure
   the speedup gained when scanning. Counting the rows before the
   optimization took around 400ms, and around 170ms after.

Change-Id: Ic99e0f954d0ca65779bd531ca79ace1fcb066fb9
Reviewed-on: http://gerrit.cloudera.org:8080/14347
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-10-21 20:23:56 +00:00
Anurag Mantripragada
378a108ecc IMPALA-9070: Include table location in lineage for 'CREATE EXTERNAL
TABLE' DDL.

Atlas needs table location to establish lineage between a newly
created external table and its table location.

The table location information is not available until the createTable
catalog op succeeds. After this change, location information is sent
to the backend in the TDDLExecResponse message which adds it to the
lineage graph. This information is sent only for create external
table queries.

Testing:
Added a test to verify the tableLocation field is populated for a
create external table query lineage. Also, modified the
lineage.test file to include location information for all lineages.

Change-Id: If02b0cc16d52c1956298171628f5737cab62ce9f
Reviewed-on: http://gerrit.cloudera.org:8080/14515
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-10-21 20:12:47 +00:00
Zoltan Borok-Nagy
999becd252 IMPALA-9055: Impala shouldn't set expiration to NEVER for cache directives.
In HdfsCachingUtil we set the expiration of cache directives to never.
This works well until the cache pool has max TTL set. Once max TTL is
set Impala will get an exception when it tries to add caching for tables
or partitions.

I changed HdfsCachingUtil to not set the expiration. This way the cache
directive inherits the expiration from the cache pool.

Testing
Added e2e test that creates a table in a cache pool that has max TTL.

Change-Id: I475b92704b19e337b2e62f766e5b978585bf6583
Reviewed-on: http://gerrit.cloudera.org:8080/14485
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-10-18 08:15:00 +00:00
Anurag Mantripragada
b64efa76f7 IMPALA-9053: DDLs should generate lineage graphs.
DDLs like 'create table' should generate minimal lineage graphs so
that consumers like Atlas can use information like 'queryText' to
establish lineages.

This change adds a call to the computeLineageGraph() method during
analysis phase of createTable which populates the graph with basic
information like queryText. If it is a CTAS, this graph is enhanced
in the "insert" phase with dependencies.

Testing:
Add an EE test to verify lineage information and also to check it
is flushed to disk properly.

Change-Id: Ia6c7ed9fe3265fd777fe93590cf4eb2d9ba0dd1e
Reviewed-on: http://gerrit.cloudera.org:8080/14458
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-10-17 20:11:06 +00:00
Tim Armstrong
b0c6740fae IMPALA-8998: admission control accounting for mt_dop
This integrates mt_dop with the "slots" mechanism that's used
for non-default executor groups.

The idea is simple - the degree of parallelism on a backend
determines the number of slots consumed. The effective
degree of parallelism is used, not the raw mt_dop setting.
E.g. if the query only has a single input split and executes
only a single fragment instance on a host, we don't want
to count the full mt_dop value for admission control.

--admission_control_slots is added as a new flag that
replaces --max_concurrent_queries, since the name better
reflects the concept. --max_concurrent_queries is kept
for backwards compatibility and has the same meaning
as --admission_control_slots.

The admission control logic is extended to take this into
account. We also add an immediate rejection code path
since it is now possible for queries to not be admittable
based on the # of available slots.

We only factor in the "width" of the plan - i.e. the number
of instances of fragments. We don't account for the number
of distinct fragments, since they may not actually execute
in parallel with each other because of dependencies.

This number is added to the per-host profile as the
"AdmissionSlots" counter.

Testing:
Added unit tests for rejection and queue/admit checks.

Also includes a fix for IMPALA-9054 where we increase
the timeout.

Added end-to-end tests:
* test_admission_slots in test_mt_dop.py that checks the
  admission slot calculation via the profile.
* End-to-end admission test that exercises the admit
  immediately and queueing code paths.

Added checks to test_verify_metrics (which runs after
end-to-end tests) to ensure that the per-backend
slots in use goes to 0 when the cluster is quiesced.

Change-Id: I7b6b6262ef238df26b491352656a26e4163e46e5
Reviewed-on: http://gerrit.cloudera.org:8080/14357
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2019-10-16 20:26:11 +00:00
Zoltan Borok-Nagy
241adb116c IMPALA-8742: Switch to ScanRange::bytes_to_read() instead of len()
IMPALA-7543 introduced sub-ranges in scan ranges. These are smaller
parts of the scan ranges that actually need to be read, other parts
of the scan range can be skipped. Currently sub-ranges are only used
in the Parquet scanner during page filtering.

With sub-ranges the scan range has a new field 'bytes_to_read_', that
is the sum of the lengths of the sub-ranges. Or, if there are no
sub-ranges, 'bytes_to_read_' equals to field 'len_' which is the length
of the whole scan range.

At some parts of Impala ScanRange::len() is being used instead of
ScanRange::bytes_to_read(). It doesn't cause a bug because only the
Parquet scanner uses sub-ranges, i.e. bytes_to_read() usually equals to
len(). The Parquet scanner also doesn't hit the bug because it tracks
which pages it reads.

However, it can be a potential source of bugs in the future to leave
the invocations of len() instead of bytes_to_read(). Also, the scanners
might allocate more memory than needed. At couple of places we still
need to invoke len(), e.g. when we test scan-range containment (for
local splits), or when we test whether a scan range contains the
mid-point of a Parquet row group.

Testing:
Added a scanner reservation test.
Ran the exhaustive tests.

Change-Id: Ie896db3f4b5f3e2272d81c2d360049af09c41d9c
Reviewed-on: http://gerrit.cloudera.org:8080/14348
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-10-16 17:57:09 +00:00
Yongzhi Chen
0cd690dcbc IMPALA-9025: Handle AnalysisException caused by expr_rewrites
properly

When the optimizer rewrites conjunct exprs with constant values,
a new expr may cause AnalysisException. In this case,
the conjuncts should use the original expr, not the intermediate
expr produced by propagateConstants. Fixed optimizeConjuncts
to handle this scenario properly.

Tests:
Add unit test for alias.
Ran exhaustive tests.

Change-Id: Ic57bf3f4cdabfe9c5bb304d735bfbf1c0ca7a274
Reviewed-on: http://gerrit.cloudera.org:8080/14403
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-10-16 02:04:12 +00:00
norbert.luksa
f65c2a754f IMPALA-8498: Write column index for floating types when NaN is not present
IMPALA-7307 disabled column index writing for floating point columns
until PARQUET-1222 is resolved. However, the problematic values are
only the NaNs. Therefore we can write column index if NaNs are not
present in data.

Testing:
  * Added tests which should fail if a column index is
    present while having NaN values in the column.

Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Reviewed-on: http://gerrit.cloudera.org:8080/14264
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-10-14 15:21:16 +00:00
Sahil Takiar
ac87278b16 IMPALA-8950: Add -d, -f options to hdfs copyFromLocal, put, cp
Add the -d option and -f option to the following commands:

`hdfs dfs -copyFromLocal <localsrc> URI`
`hdfs dfs -put [ - | <localsrc1> .. ]. <dst>`
`hdfs dfs -cp URI [URI ...] <dest>`

The -d option "Skip[s] creation of temporary file with the suffix
._COPYING_." which improves performance of these commands on S3 since S3
does not support metadata only renames.

The -f option "Overwrites the destination if it already exists" combined
with HADOOP-13884 this improves issues seen with S3 consistency issues by
avoiding a HEAD request to check if the destination file exists or not.

Added the method 'copy_from_local' to the BaseFilesystem class.
Re-factored most usages of the aforementioned HDFS commands to use
the filesystem_client. Some usages were not appropriate / worth
refactoring, so occasionally this patch just adds the '-d' and '-f'
options explicitly. All calls to '-put' were replaced with
'copyFromLocal' because they both copy files from the local fs to a HDFS
compatible target fs.

Since WebHDFS does not have good support for copying files, this patch
removes the copy functionality from the PyWebHdfsClientWithChmod.
Re-factored the hdfs_client so that it uses a DelegatingHdfsClient
that delegates to either the HadoopFsCommandLineClient or
PyWebHdfsClientWithChmod.

Testing:
* Ran core tests on HDFS and S3

Change-Id: I0d45db1c00554e6fb6bcc0b552596d86d4e30144
Reviewed-on: http://gerrit.cloudera.org:8080/14311
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-10-05 00:04:08 +00:00
Attila Jeges
684a54a89e IMPALA-7368: Change supported year range for DATE values to 1..9999
Before this patch the supported year range for DATE type started with
year 0. This contradicts the ANSI SQL standard that defines the valid
DATE value range to be 0001-01-01 to 9999-12-31.

Change-Id: Iefdf1c036834763f52d44d0c39a25a1f04e41e07
Reviewed-on: http://gerrit.cloudera.org:8080/14349
Reviewed-by: Attila Jeges <attilaj@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-10-04 18:36:22 +00:00
Attila Jeges
27fa27e808 IMPALA-8198: DATE: Read from avro.
This change is a follow-up to IMPALA-7368 and adds support for DATE
type to the avro scanner.

Similarly to parquet, avro uses DATE logical type for dates. DATE
logical type annotates an INT32 that stores the number of days since
the unix epoch, 1 January 1970.

This representation introduces an avro interoperability issue between
Impala and older versions of Hive:
- Before version 3.1, Hive used Julian calendar to represent dates
  up to 1582-10-05 and Gregorian calendar for dates starting with
  1582-10-15. Dates between 1582-10-05 and 1582-10-15 were lost.
- Impala uses proleptic Gregorian calendar, extending the Gregorian
  calendar backward to dates preceding its official introduction in
  1582-10-15.
This means that pre-1582-10-15 dates written to an avro table by Hive
will be read back incorrectly by Impala.

Note that Hive 3.1 switched to proleptic Gregorian calendar too, so
for Hive 3.1+ this is no longer an issue.

Dependency changes:
- BE uses avro 1.7.4-p5 from native-toolchain.

Change-Id: I7a9d5b93a22cf3a00244037e187f8c145cacc959
Reviewed-on: http://gerrit.cloudera.org:8080/13944
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-09-27 17:18:35 +00:00
Tim Armstrong
548106f5e1 IMPALA-8451,IMPALA-8905: enable admission control for dockerised tests
This gives us some additional coverage for using admission
control in a simple but realistic configuration.

What are the implications of this change for test stability and
flakiness?

On one hand were are adding some more unpredictability
to tests, because they may be queued for an arbitrary amount of
time. On the other, we can prevent queries from contending over
memory. Currently we rely on luck to prevent concurrent queries
from forcing each other out-of-memory.

I think the unpredictability from the queueing is
preferable, because we can generally work around these by
fixing tests that are sensitive to being queued, whereas
contention over memory requires us to use crude workarounds
like forcing tests to execute serially.

Added observability for the configured queue wait time for each pool.
I noticed that I did not have a direct way to observe the effective
value when I set configs. This is IMPALA-8905.

I had to tweak tests in a few ways:
* Tests with large strings needed higher memory limits.
* Hardcoded instances of default-pool had to handle root.default
  as well.
* test_query_mem_limit needed to run without a mem_limit. I
  created a special pool root.no-limits with no memory limits
  to allow that.

Testing:
Ran the dockerised build 5-6 times to flush out flaky tests.

Change-Id: I7517673f9e348780fcf7cd6ce1f12c9c5a55373a
Reviewed-on: http://gerrit.cloudera.org:8080/13942
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-09-27 01:54:39 +00:00
norbert.luksa
288c8c41b5 IMPALA-8755: Frontend support for Z-ordering
Extended the SQL grammar with an optional and a default flag for
SORT BY, namely ZORDER and LEXICAL. If set, the new 'sort.algorithm'
table property will be set to ZORDER and the information will sink
down to the backend. The default order is indicated by LEXICAL
and can be omitted. Examples are:

CREATE TABLE t (a INT, b INT) PARTITIONED BY (c INT)
  SORT BY ZORDER (a, b);
CREATE TABLE t SORT BY ZORDER (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY ZORDER (id,zip);

ALTER TABLE t SORT BY ZORDER (int_col,id);

The following two are the same statements:
CREATE TABLE t (a INT, b INT) SORT BY (a, b);
CREATE TABLE t (a INT, b INT) SORT BY LEXICAL (a, b);

For strings, varchars, floats and doubles Z-ordering is currently
not supported. It's not suitable for strings and varchars, but
support can be added for floats and doubles later. The supported
types are: boolean, int types, decimals, date, timestamp, and char.

Currently ZORDER has the same functionality as a simple SORT BY clause,
therefore hidden behind a feature flag: unlock_zorder. The custom
sorting with Z-ordering will be in a different commit later.

Testing:
 * Added tests for the ZORDER option for every SORT BY test.
 * Modified some tests by adding the LEXICAL option.
 * The .test workloads are temporarily put in separate test files
   in order to set up the feature flag. These tests are run from
   tests/custom_cluster/test_zorder.py which is a duplication of
   the relevant tests, but with CustomClusterTestSuite decorator.

Change-Id: Ie122002ca8f52ca2c1e1ec8ff1d476ae1f4f875d
Reviewed-on: http://gerrit.cloudera.org:8080/13955
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-09-26 18:35:06 +00:00
Zoltan Borok-Nagy
ab975c9517 IMPALA-8969: Grouping aggregator can cause segmentation fault when doing multiple aggregations
Grouping aggregator always tried to serialize the 0th tuple regardless
of the aggregation index. This could lead to a segmentation fault
because the 0th tuple might be null.

Testing:
Added a query that triggers the error to multiple-distinct-aggs.test

Change-Id: I7acdd40c63166cd4986e546a992c0816f94823d5
Reviewed-on: http://gerrit.cloudera.org:8080/14290
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-09-25 16:58:14 +00:00
Tim Armstrong
3c193c33b2 IMPALA-2138: part 2: clean up result expr handling
The main refactoring is to move result expressions into the
DataSink implementations, which is where they are used
in the backend. This will make it easier to explicitly collect
all the expressions in the plan tree for the purposes of
projection. Previously the expressions were owned by
the PlanFragment and passed into the DataSink.

Show result exprs in explain plan of the table sinks
at higher verbosity.

Change-Id: I163a393b5ce6b8a926b3fee9b4b920e31d6846b2
Reviewed-on: http://gerrit.cloudera.org:8080/14270
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-09-25 01:06:19 +00:00
Tim Armstrong
983e3a66de IMPALA-2138: part 1: initial cleanup
This is a mixed bag of simplifications, debugging improvements
and test fixes that came up in the projection work.

I had to update some planner tests because some expressions
now include their arguments. Various things in the planner
tests were stale, so there are spurious changes in the
expected output that are ignored by the plan verification.

Change-Id: I75d2c8cab79988300c1a9c6c23d6ccea53da7d23
Reviewed-on: http://gerrit.cloudera.org:8080/14265
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-09-20 22:15:41 +00:00
Gabor Kaszab
bca1b43efb IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
This enhancement introduces FORMAT clause for CAST() operator that is
applicable for casts between string types and timestamp types. Instead
of accepting SimpleDateFormat patterns the FORMAT clause supports
datetime patterns following the ISO:SQL:2016 standard.
Note, the CAST() operator without the FORMAT clause still uses
Impala's implementation of SimpleDateFormat handling. Similarly, the
existing conversion functions such as to_timestamp(), from_timestamp()
etc. remain unchanged and use SimpleDateFormat. Contrary to how these
functions work the FORMAT clause must specify a string literal and
cannot be used with any other kind of a string expression.

Milestone 1 contains all the format tokens covered by the SQL
standard. Further milestones will add more functionality on top of
this list to cover functionality provided by other RDBMS systems.

List of tokens implemented by this change:
- YYYY, YYY, YY, Y: Year tokens
- RRRR, RR: Round year tokens
- MM: Month (1-12)
- DD: Day (1-31)
- DDD: Day of year (1-366)
- HH, HH12: Hour of day (1-12)
- HH24: Hour of day (0-23)
- MI: Minute (0-59)
- SS: Second (0-59)
- SSSSS: Second of day (0-86399)
- FF, FF1, ..., FF9: Fractional second
- AM, PM, A.M., P.M.: Meridiem indicators
- TZH: Timezone hour (-99-+99)
- TZM: Timezone minute (0-99)
- Separators: - . / , ' ; : space
- ISO8601 date indicators (T, Z)

Some notes about the matching algorithm:
- The parsing algorithm uses these tokens in a case insensitive
  manner.
- The separators are interchangeable with each other. For example a
  '-' separator in the format will match with a '.' character in the
  input.
- The length of the separator sequences is handled flexibly meaning
  that a single separator character in the format for instance would
  match with a multi-separator sequence in the input.
- In a string type to timestamp conversion the timezone offset tokens
  are parsed, expected to match with the input but they don't adjust
  the result as the input is already expected to be in UTC format.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-DD-YYYY');
SELECT CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP
    FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM');
SELECT CAST(timestamp_column as STRING
    FORMAT "YYYY MM HH12 YY") from some_table;

Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Reviewed-on: http://gerrit.cloudera.org:8080/13722
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-09-19 18:46:19 +00:00
Bharath Vissapragada
32dbf5d78a IMPALA-8572: Log query events before unregister.
Currently, the query events (audits and lineages) are logged as a
part of query unregistration. This delays the event logging in cases
where the Unregister() is delayed by client for some reason (ex: Hue
does not call Unregister until the browser tab is closed) or the client
goes away without calling Unregister and the query timeout kicks in.

This patch moves this event logging to an earlier stage in the query
lifecycle. Moved the event logging related code into ClientRequestState
for easier code refactoring.

The conditions under which the events are logged are slightly
modified by this patch. Without the patch, events are logged for
unsuccessful queries if atleast a single fetch is perfomed. This patch
relaxes this guarantee to log events for any query that reaches
the FINISHED state (rows are available to fetch by the client) and does
not wait for a fetch to be performed. This simplifies the coordinator
state machine by avoiding unnecessary synchronization.

Added some test coverage for coordinator side code paths for writing
lineages. fe specific lineage tests only verified the correctness of
lineage created but did not test whether it was being flushed correctly
to the disk.

Change-Id: I639b9c1acb9806b29292cd85be2863688453ca2e
Reviewed-on: http://gerrit.cloudera.org:8080/14143
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-09-12 02:02:44 +00:00
stiga-huang
bb5d3c255e IMPALA-8718: project out collection slots in analytic's sort tuple
Subplan node is mainly used to extract collection values. It evaluates
its right plan tree (usually a nested loop join) for every row from its
left child (usually a scan producing tuples with collection values), and
returns those rows produced by the right child. Each row (TupleRow)
produced by the join node consists of several tuples from the join
operands. So the scan node tuple that contains collection values will be
part of the output of the join node, then become part of the output of
the subplan node.

When generating analytic plan, a TupleDescriptor for sort is created
based on the materialized slots of the input. If the input comes from a
subplan node, there are collection slots in it. These collection slots
will be picked out into the sort tuple, and occur in the smap of it.
Then the output smap of the analytic plan will contain the collection
slot consequently. This causes IllegalStateException if the analytic
plan is the nullable side of an outer join. The exception is thrown when
we are checking the necessary of adding a TupleIsNullPredicate for each
output slot.

We should project out the collection slots in creating the sort tuple of
analytic plan to avoid causing such an exception. Projecting out them is
safe since outputs of the analytic node must be in the select list of
the block with the analytic, and we don't allow collection types to be
returned from a select block, and also don't support any builtin or UDF
functions that take collection types as an argument.

Tests
 - Add Planner test in analytic-fns.test with VALIDATE_CARDINALITY
 enabled. Also fix some incorrect row-sizes of existing tests.
 - Add e2e test in nested-types-runtime.test to verify that collection
 slots are projected out.

Change-Id: I7edf74ff0f603dfd33ff546e61545bc724990655
Reviewed-on: http://gerrit.cloudera.org:8080/14135
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-09-11 18:18:12 +00:00