Added an expression rewrite rule to convert a disjunctive predicate to
conjunctive normal form (CNF). Converting to CNF enables multi-table
predicates that were only evaluated by a Join operator to be converted
into either single-table conjuncts that are eligible for predicate pushdown
to the scan operator or other multi-table conjuncts that are eligible to
be pushed to a Join below. This helps improve performance for such queries.
Since converting to CNF expands the number of expressions, we place a
limit on the maximum number of CNF exprs (each AND is counted as 1 CNF expr)
that are considered. Once the MAX_CNF_EXPRS limit (default is unlimited) is
exceeded, whatever expression was supplied to the rule is returned without
further transformation. A setting of -1 or 0 allows unlimited number of
CNF exprs to be created upto int32 max. Another option ENABLE_CNF_REWRITES
enables or disables the entire rewrite. This is False by default until we
have done more thorough functional testing (tracking JIRA IMPALA-9539).
Examples of rewrites:
original: (a AND b) OR c
rewritten: (a OR c) AND (b OR c)
original: (a AND b) OR (c AND d)
rewritten: (a OR c) AND (a OR d) AND (b OR c) AND (b OR d)
original: NOT(a OR b)
rewritten: NOT(a) AND NOT(b)
Testing:
- Added new unit tests with variations of disjunctive predicates
and verified their Explain plans
- Manually tested the result correctness on impala shell by running
these queries with ENABLE_CNF_REWRITES enabled and disabled
- Added TPC-H q7, q19 and TPC-DS q13 with the CNF rewrite enabled
- Preliminary performance testing of TPC-DS q13 on a 10TB scale factor
shows almost 5x improvement:
Original baseline: 47.5 sec
With this patch and CNF rewrite enabled: 9.4 sec
Change-Id: I5a03cd7239333aaf375416ef5f2b7608fcd4a072
Reviewed-on: http://gerrit.cloudera.org:8080/15462
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This crash was caused by an empty sort tuple descriptor that was
generated as a result of union substitutions replacing all sort
fields with literals that were subsequently removed from the ordering
spec. There was no check in place to prevent the empty tuple descriptor
from being sent to impalad where it caused a divide-by-zero crash.
Fix:
This fix avoids inserting a sort node when there are no fields remaining
to sort on. Also added a precondition to the SortNode that will prevent
similar issues from crashing impalad.
Testing:
Testcases added to PlannerTest/union.test
Change-Id: If19303fbf55927c1e1b76b9b22ab354322b21c54
Reviewed-on: http://gerrit.cloudera.org:8080/15473
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is an improvement that tries to propagate predicates of the
nullable side of the outer join into inline view.
For example:
SELECT *
FROM functional.alltypessmall a
LEFT JOIN (
SELECT id, upper(string_col) AS upper_val,
length(string_col) AS len
FROM functional.alltypestiny
) b ON a.id = b.id
WHERE b.upper_val is NULL and b.len = 0
Before this change, the predicate b.len=0 can't be migrated into inline
view since that is on the nullable side of an outer join if the
predicate evaluates in the inline view nulls will not be rejected.
However, we can be more aggressive. In particular, some predicates that
must be evaluted at a join node can also be safely evaluted by the
outer-joined inline view. Such predicates are not marked as assigned.
The predicates propagate into the inline view and also be evaluated at
a join node.
We can divide predicates into two types. One that satisfies the condition
that same as Analyzer#canEvalPredicate can be migrated into inline view,
and one that satisfies the below three conditions is safe to be propagated
into the nullable side of an outer join.
1) The predicate needs to be bound by tupleIds.
2) The predicate is not on-clause.
3) The predicate evaluates to false when all its referenced tuples are NULL.
Therefore, 'b.upper_val is NULL' cannot be propagated to inline view but
‘b.len = 0’ can be propagated to inline view.
Tests:
* Add plan tests in inline-view.test
* One baseline plan in inline-view.test, one in nested-collections.test
and two in predicate-propagation.test had to be updated
* Ran the full set of verifications in Impala Public Jenkins
Change-Id: I6c23a45aeb5dd1aa06a95c9aa8628ecbe37ef2c1
Reviewed-on: http://gerrit.cloudera.org:8080/15047
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds an advanced PREAGG_BYTES_LIMIT query option that
allows limiting the memory consumption of streaming
preaggregation operators in a query.
It works by setting a maximum reservation on each grouping
aggregator in a preaggregation node. The aggregators switch
to passthrough mode automatically when hitting this limit,
the same as if they were hitting the query memory limit.
This does not override the minimum reservation computed for
the aggregation - if the limit is less than the minimum
reservation, the minimum reservation is used as the limit
instead.
The default behaviour is unchanged.
Testing:
Add a planner test with estimates higher and lower than limit
to ensure that resource estimates correctly reflect the option.
Add an end-to-end test that verifies that the option forces
passthrough when the memory limit is hit.
Change-Id: I87f7a5c68da93d068e304ef01afbcbb0d56807d9
Reviewed-on: http://gerrit.cloudera.org:8080/15463
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change modifies the output of the SHOW TABLE STATS and SHOW
PARTITIONS for Kudu tables.
- PARTITIONS: the #Row column has been removed
- TABLE STATS: instead of showing partition informations it returns a
resultset similar to HDFS table stats, #Rows, #Partitions, Size, Format
and Location
Example outputs can be seen in the doc changes.
Testing:
* kudu_stats.test is modified to verify the new result set
* kudu_partition_ddl.test is modified to verify the new partitions style
* Updated unit test with the new error message
Change-Id: Ice4b8df65f0a53fe14b8fbe35d82c9887ab9a041
Reviewed-on: http://gerrit.cloudera.org:8080/15199
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch supports reading and writing DATE values
to Kudu tables. It does not add min-max filter runtime
support, but there is followup JIRA IMPALA-9294.
Corresponding Kudu JIRA is KUDU-2632.
Change-Id: I91656749a58ac769b54c2a63bdd4f85c89520b32
Reviewed-on: http://gerrit.cloudera.org:8080/14705
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
String values from external systems (HDFS, Hive, Kudu, etc.) are already
unescaped, the same as string values in Thrift objects deserialized in
coordinators. We should mark needsUnescaping_ as false in creating
StringLiterals for these values (in LiteralExpr#create()).
When comparing StringLiterals in partition pruning, we should also use
the unescaped values if needsUnescaping_ is true.
Tests:
- Add tests for partition pruning on unescaped strings.
- Add test coverage for all existing code paths using
LiteralExpr#create().
- Run core tests
Change-Id: Iea8070f16a74f9aeade294504f2834abb8b3b38f
Reviewed-on: http://gerrit.cloudera.org:8080/15278
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When a runtime filter has remote target, coordinator will Disable the
FilterState upon arrival of the last filter update to prevent another
update towards that filter. As consequence, such runtime filter will
always be displayed as disabled in runtime profile (Enabled column is
equal to false in Final filter table), when in reality the runtime
filter has heard back from all pending backends and complete. The
Enabled column should correctly distinguish between failed runtime
filter vs complete runtime filter. To do so, we add
all_updates_received_ flag in FilterState class and set it to true
after filter received enough filter update from pending backends to
proceed. If all_updates_received_ is true, then that runtime filter is
considered as enabled.
Testing:
- Add row regex in runtime_filters.test, query 6, to verify REMOTE
runtime filter is marked as enabled in final filter table
- Run and pass test_runtime_filters.py
- Run and pass core tests
Change-Id: I82a5a776103abd0a6d73336bebc65e22b4e13fef
Reviewed-on: http://gerrit.cloudera.org:8080/15308
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
As far as I can tell, the query failed to spill because the
pre-agg was able to release reservation before the post-agg
needed it. Probably there is some variance because of buffering
in the exchange.
This change slightly reduces the reservation to minimise the
chance of this recurring.
Also remove a duplicated instance of this test.
Change-Id: Ifb8376e2e12d3f73d6c0e27c697be4fc86f9c755
Reviewed-on: http://gerrit.cloudera.org:8080/15339
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
If all primary key columns of the Kudu table are in equivalence
predicates pushed down to Kudu, Kudu will return at most one row.
In this case, we can adjust the cardinality estimation to speed
up point lookup.
This patch sets the input and output cardinality as 1 if the
number of primary key columns in equivalence predicates pushed
down to Kudu equals the total number of primary key columns of
the Kudu table, hence enable small query optimization.
Testing:
- Added test cases in following PlannerTest: small-query-opt.test,
disable-codegen.test and kudu.test.
- Passed all FE tests, including new test cases.
Change-Id: I4631cd4d1a528a1152b5cdcb268426f2ba1a0c08
Reviewed-on: http://gerrit.cloudera.org:8080/15250
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The schema file allows specifying a commandline command in
several of the sections (LOAD, DEPENDENT_LOAD, etc). These
are execute by testdata/bin/generate-schema-statements.py
when it is creating the SQL files that are later executed
for dataload. A fair number of tables use this flexibility
to execute hdfs mkdir and copy commands via the command line.
Unfortunately, this is very inefficient. HDFS command line
commands require spinning up a JVM and can take over one
second per command. These commands are executed during a
serial part of dataload, and they can be executed multiple
times. In short, these commands are a significant slowdown
for loading the functional tables.
This converts the hdfs command line statements to equivalent
Hive LOAD DATA LOCAL statements. These are doing the copy
from an already running JVM, so they do not need JVM startup.
They also run in the parallel part of dataload, speeding up
the SQL generation part.
This speeds up generate-schema-statements.py significantly.
On the functional dataset, it saves 7 minutes.
Before:
time testdata/bin/generate-schema-statements.py -w functional-query -e exhaustive -f
real 8m8.068s
user 10m11.218s
sys 0m44.932s
After:
time testdata/bin/generate-schema-statements.py -w functional-query -e exhaustive -f
real 0m35.800s
user 0m42.536s
sys 0m5.210s
This is currently a long-pole in dataload, so it translates directly to
an overall speedup of about 7 minutes.
Testing:
- Ran debug tests
Change-Id: Icf17b85ff85618933716a80f1ccd6701b07f464c
Reviewed-on: http://gerrit.cloudera.org:8080/15228
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
ORC scanner uses TimestampValue::FromUnixTimeNanos() to convert
sec + nano representation to Impala's TimestampValue (day + nano).
FromUnixTimeNanos was affected by flag
use_local_tz_for_unix_timestamp_conversions, while that global option
should not affect ORC. By default there was no conversion, but if the
flag is 1, then timestamps were interpreted as UTC and converted to
local time.
This could be solved by creating a UTC version of FromUnixTimeNanos,
but I decided to change the interface in the hope of making To/From
timestamp functions less confusing.
Changes:
- Fixed the bug by passing UTC as timezone in the ORC scanner.
- Changed the interface of these TimestampValue functions to expect
a timezone pointer, interpret null as UTC and skip conversion. It
would be also possible to pass the actual UTC timezone and check
for this in the functions, but I guess it is easier to optimize
the inlined functions this way.
- Moved the checking of use_local_tz_for_unix_timestamp_conversions to
RuntimeState and added property time_zone_for_unix_time_conversions()
to return the timezone to use in Unix time conversions. This made
TimestampValue's interface clearer and makes it easy to replace the
flag with a query option if we want to.
- Changed RuntimeState and the Parquet scanner to skip timezone
conversion if convert_legacy_hive_parquet_utc_timestamps=1 but the
timezone is UTC. This allows users to avoid the performance penalty
of this flag by setting query option timezone to UTC in their
session (IMPALA-7557). CCTZ is not good at this, actually
conversions are slower with fixed offset timezones (including UTC)
than with timezones that have DST/historical rule changes.
Postponed changes:
- Didn't remove the UTC versions of the functions yet, as that would
require changing (and possibly rethinking) several BE tests and
benchmarks (IMPALA-9409).
Tests:
- Added regression test for Orc and other file formats to
check that they are not affected by this flag.
- Extended test_hive_parquet_timestamp_conversion.py to cover the case
when convert_legacy_hive_parquet_utc_timestamps=1 and timezone=UTC.
Also did some cleanup there to use query option timezone instead of
env var TZ.
Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Reviewed-on: http://gerrit.cloudera.org:8080/15222
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This enables parallel plans with the join build in a
separate fragment and fixes all of the ensuing fallout.
After this change, mt_dop plans with joins have separate
build fragments. There is still a 1:1 relationship between
join nodes and builders, so the builders are only accessed
by the join node's thread after it is handed off. This lets
us defer the work required to make PhjBuilder and NljBuilder
safe to be shared between nodes.
Planner changes:
* Combined the parallel and distributed planning code paths.
* Misc fixes to generate reasonable thrift structures in the
query exec requests, i.e. containing the right nodes.
* Fixes to resource calculations for the separate build plans.
** Calculate separate join/build resource consumption.
** Simplified the resource estimation by calculating resource
consumption for each fragment separately, and assuming that
all fragments hit their peak resource consumption at the
same time. IMPALA-9255 is the follow-on to make the resource
estimation more accurate.
Scheduler changes:
* Various fixes to handle multiple TPlanExecInfos correctly,
which are generated by the planner for the different cohorts.
* Add logic to colocate build fragments with parent fragments.
Runtime filter changes:
* Build sinks now produce runtime filters, which required
planner and coordinator fixes to handle.
DataSink changes:
* Close the input plan tree before calling FlushFinal() to release
resources. This depends on Send() not holding onto references
to input batches, which was true except for NljBuilder. This
invariant is documented.
Join builder changes:
* Add a common base class for PhjBuilder and NljBuilder with
functions to handle synchronisation with the join node.
* Close plan tree earlier in FragmentInstanceState::Exec()
so that peak resource requirements are lower.
* The NLJ always copies input batches, so that it can close
its input tree.
JoinNode changes:
* Join node blocks waiting for build-side to be ready,
then eventually signals that it's done, allowing the builder
to be cleaned up.
* NLJ and PHJ nodes handle both the integrated builder and
the external builder. There is a 1:1 relationship between
the node and the builder, so we don't deal with thread safety
yet.
* Buffer reservations are transferred between the builder and join
node when running with the separate builder. This is not really
necessary right now, since it is all single-threaded, but will
be important for the shared broadcast.
- The builder transfers memory for probe buffers to the join node
at the end of each build phase.
- At end of each probe phase, reservation needs to be handed back
to builder (or released).
ExecSummary changes:
* The summary logic was modified to handle connecting fragments
via join builds. The logic is an extension of what was used
for exchanges.
Testing:
* Enable --unlock_mt_dop for end-to-end tests
* Migrate some tests to run as part of end-to-end tests instead of
custom cluster.
* Add mt_dop dimension to various end-to-end tests to provide
coverage of join queries, spill-to-disk and cancellation.
* Ran a single node TPC-H and TPC-DS stress test with mt_dop=0
and mt_dop=4.
Perf:
* Ran TPC-H scale factor 30 locally with mt_dop=0. No significant
change.
Change-Id: I4403c8e62d9c13854e7830602ee613f8efc80c58
Reviewed-on: http://gerrit.cloudera.org:8080/14859
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Inserts can add a sort node that orders the rows by partitioning
and Kudu primary key columns (aka. clustered insert). The issue
occurred when the target column was a timestamp and the source
was an expression that returned a string (e.g. concat()). Impala
adds an implicit cast to convert the strings to timestamps before
sorting, but this cast was incorrectly removed later during expression
substitution.
This led to hitting a DCHECK in debug builds and a (not too
informative) error message in release mode.
Note that the cast in question is not visible in EXPLAIN outputs.
Explain should contain implicit casts from explain_level=2 since
https://gerrit.cloudera.org/#/c/11719/ , but it is still not shown
in some expressions. I consider this to be a separate issue.
Testing:
- added an EE test that used to crash
- ran planner / sort / kudu_insert tests
Change-Id: Icca8ab1456a3b840a47833119c9d4fd31a1fff90
Reviewed-on: http://gerrit.cloudera.org:8080/15217
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Add a new flag -with_ranger in testdata/bin/run-hive-server.sh to start
Hive with Ranger integration. The relative configuration files are
generated in bin/create-test-configuration.sh using a new varient
ranger_auth in hive-site.xml.py. Only Hive3 is supported.
Current limitation:
Can't use different username in Beeline by the -n option. "select
current_user()" keeps returning my username, while "select
logged_in_user()" can return the username given by -n option but it's
not used in authorization.
Tests:
- Ran bin/create-test-configuration.sh and verified the generated
hive-site_ranger_auth.xml contains Ranger configurations.
- Ran testdata/bin/run-hive-server.sh -with_ranger. Verified column
masking and row filtering policies took effect in Beeline.
- Added test in test_ranger.py for this mode.
Change-Id: I01e3a195b00a98388244a922a1a79e65146cec42
Reviewed-on: http://gerrit.cloudera.org:8080/15189
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Hudi Read Optimized Table contains multiple versions of parquet files,
in order to load the table correctly, Impala needs to recognize Hudi Read
Optimized Table as a HdfsTable and load the latest version of the file
using HoodieROTablePathFilter.
Tests
- Unit test for Hudi in FileMetadataLoader
- Create table tests in functional_schema_template.sql
- Query tests in hudi-parquet.test
Change-Id: I65e146b347714df32fe968409ef2dde1f6a25cdf
Reviewed-on: http://gerrit.cloudera.org:8080/14711
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds validation for the paired stats values of tinyint
and smallint column data type when reading min/max column stats
value from Parquet file.
Testing:
- Added automatic test cases in parquet-stats.test for column data
type been changed from int to tinyint, from smallint to tinyint
and from int to smallint.
- Passed EE tests.
- Passed all core tests.
Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Reviewed-on: http://gerrit.cloudera.org:8080/15087
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Column masking policies on primitive columns of a table which contains
nested types (though they won't be masked) will cause query failures.
To be specifit, if tableA(id int, int_array array<int>) has a masking
policy on column "id", all queries on "tableA" will fail, e.g.
select id from tableA;
select t.id, a.item from tableA t, t.int_array a;
Column masking is implemented by wrapping the underlying table/view with
a table masking view. However, as we don't support nested types in
SelectList, the table masking view can't expose nested columns of the
masked table, which causes collection refs not being resolved correctly.
This patch fixes the issue by 2 steps:
1) Expose nested columns of the underlying table in the output Type of
the table masking view (see InlineViewRef#createTupleDescriptor()).
So nested Paths in the original query block can be resolved.
2) For such kind of Paths, resolved them again inside the table masking
view. So they can point to the underlying table as what they mean
(see Analyzer#resolvePathWithMasking()). TupleDescriptor of such kind
of table masking view won't be materialized since the view is simple
enough that its query plan is just a ScanNode of the underlying
table. The whole query plan can be stitched as if the table is not
masked.
Note that one day when we support nested columns in SelectList, we may
don't need these 2 hacks.
This patch also adds some TRACE level loggings to improve debuggability,
and enables column masking by default.
Test changes in TestRanger.test_column_masking:
- Add column masking policy on a table containing nested types.
- Add queries on the masked tables. Some queries are borrowed from
existing tests for nested types.
Tests:
- Run CORE tests.
Change-Id: I1cc5565c64c1a4a56445b8edde59b1168f387791
Reviewed-on: http://gerrit.cloudera.org:8080/15108
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This fixes a subtle memory managment issue where freeing of a
buffer is delayed longer than it should be. This means that
the full buffer pool reservation is not available for
repartitioning, which can lead to crashes or hang for
very specific queries.
The fix is to transfer resources from output_unmatched_batch_
as soon as the last row from the batch is appended to the
output batch.
This bug would only be triggered by join modes that output
unmatched rows from the right side (RIGHT OUTER JOIN,
FULL OUTER JOIN, RIGHT ANTI JOIN) *and* have an empty
probe side (otherwise unmatched rows are output by
iterating over the hash table).
Testing:
Added DCHECKs to check that all resources are available
before repartitioning.
Added a regression test that triggered the bug.
Change-Id: Ie13b51d4d909afb0fe2e7b7dc00b085c51058fed
Reviewed-on: http://gerrit.cloudera.org:8080/15142
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We don't support reading UNION columns. Queries on tables containing
UNION types will fail in planning. Error message is metadata loading
error. However, scanner may need to read an ORC file with UNION types if
the table schema doesn't map to the UNION columns. Though the UNION
values won't be read, the scanner need to resolve the file schema,
including the UNION types, correctly.
In OrcSchemaResolver::BuildSchemaPath, we create a map from ORC type ids
to Impala SchemaPath representation for all types of the file. We should
deal with UNION types as well.
This patch also include some refactor to improve code readability.
Tests:
- Add tests for table schema and file schema mismatching on all complex
types.
Change-Id: I452d27b4e281eada00b62ac58af773a3479163ec
Reviewed-on: http://gerrit.cloudera.org:8080/15103
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
HMS seems to be returning SQLPrimaryKeys in inconsistent orders.
This makes some of the primary keys tests flaky. This change sorts
the list of primary keys and stores them in canonical order within
Impala.
Testing:
- Modified the tests that were relying on HMS to return same order
every time.
- Ran parametrized job.
Change-Id: I0f798d7a2659c6cd061002db151f3fa787eb6370
Reviewed-on: http://gerrit.cloudera.org:8080/15106
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Hive 3 changed the typical storage model for tables to split them
between two directories:
- hive.metastore.warehouse.dir stores managed tables (which is now
defined to be only transactional tables)
- hive.metastore.warehouse.external.dir stores external tables
(everything that is not a transactional table)
In more recent commits of Hive, there is now validation that the
external tables cannot be stored in the managed directory. In order
to adopt these newer versions of Hive, we need to use separate
directories for external vs managed warehouses.
Most of our test tables are not transactional, so they would reside
in the external directory. To keep the test changes small, this uses
/test-warehouse for the external directory and /test-warehouse/managed
for the managed directory. Having the managed directory be a subdirectory
of /test-warehouse means that the data snapshot code should not need to
change.
The Hive 2 configuration doesn't change as it does not have this concept.
Since this changes the dataload layout, this also sets the CDH_MAJOR_VERSION
to 7 for USE_CDP_HIVE=true. This means that dataload will uses a separate
location for data as compared to USE_CDP_HIVE=false. That should reduce
conflicts between the two configurations.
Testing:
- Ran exhaustive tests with USE_CDP_HIVE=false
- Ran exhaustive tests with USE_CDP_HIVE=true (with current Hive version)
- Verified that dataload succeeds and tests are able to run with a newer
Hive version.
Change-Id: I3db69f1b8ca07ae98670429954f5f7a1a359eaec
Reviewed-on: http://gerrit.cloudera.org:8080/15026
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
in LocalCatalog Mode.
This change add a new method 'loadConstraints()' to the MetaProvider
interface.
1. In CatalogdMetaProvider implementation, we fetch the primary key
(PK) and foreign key(FK) information via the GetPartialCatalogObject()
RPC to the catalogd. This is modified to include PK/FK information.
This is because, on catalog side we eagerly load PK/FK information
which can be sent over to local catalog in a single RPC to Catalog.
This information is then stored in TableMetaRef object for future
consumers.
2. In the DirectMetaProvider implementation, we make two RPCs to HMS
to directly get PK/FK information.
Load constraints can be extended to include other constraints later
(for ex: unique constraints.)
Testing:
- Added tests in LocalCatalogTest, CatalogTest and PartialCatalogInfoTest
- This change also modifies the toSqlUtil for show create table
statements. Added a test for the same.
Change-Id: I7ea7e1bacf6eb502c67caf310a847b32687e0d58
Reviewed-on: http://gerrit.cloudera.org:8080/14731
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Implements read path for the date type in ORC scanner. The internal
representation of a date is an int32 meaning the number of days since
Unix epoch using proleptic Gregorian calendar.
Similarly to the Parquet implementation (IMPALA-7370) this
representation introduces an interoperability issue between Impala
and older versions of Hive (before 3.1). For more details see the
commit message of the mentioned Parquet implementation.
Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Reviewed-on: http://gerrit.cloudera.org:8080/14982
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Ranger provides column masking policies about how to show masked values
to specific users when reading specific columns. This patch adds support
to rewrite the query AST based on column masking policies.
We perform the column masking policies by replacing the TableRef with a
subquery doing the masking. For instance, the following query
select c_id, c_name from customer c join orders on c_id = o_cid
will be transfomed into
select c_id, c_name from (
select mask1(c_id) as c_id, mask2(c_name) as c_name from customer
) c
join orders
on c_id = o_cid
The transfomation is done in AST resolution. Just like view resolution,
if the table needs masking we replace it with a subquery(InlineViewRef)
containing the masking expressions.
This patch only adds support for mask types that don't require builtin
mask functions. So currently supported masking types are MASK_NULL and
CUSTOM.
Current Limitations:
- Users are required to have privileges on all columns of a masked
table(IMPALA-9223), since the table mask subquery contains all the
columns.
Tests:
- Add e2e tests for masked results
- Run core tests
Change-Id: I4cad60e0e69ea573b7ecfc011b142c46ef52ed61
Reviewed-on: http://gerrit.cloudera.org:8080/14894
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala supports creating a table using the schema of a file.
However, only Parquet is supported currently. This commit adds
support for creating tables from ORC files
The change relies on the ORC Java API with version 1.5 or
greater, because of a bug in earlier versions. Therefore, ORC is
listed as an external dependency, instead of relying on Hive's
ORC version (from Hive3, Hive also lists it as a dependency).
Also, the commit performs a little clean-up on the ParquetHelper
class, renaming it to ParquetSchemaExtractor and removing outdated
comments.
To create a table from an ORC file, run:
CREATE TABLE tablename LIKE ORC '/path/to/file'
Tests:
* Added analysis tests for primitive and complex types.
* Added e2e tests for creating tables from ORC files.
Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
Reviewed-on: http://gerrit.cloudera.org:8080/14811
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
FE test PlannerTest.testHdfs depends on the result of year(now()) to be
2019, which is wrong after we enter 2020. Replace it with another
expression not depending on now().
Change-Id: I7b3df560d69e40d3f2332ff242362bd36bbf6b64
Reviewed-on: http://gerrit.cloudera.org:8080/14965
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In HMS-3 the translation layer converts a managed kudu table into an
external kudu table and adds additional table property
'external.table.purge' to 'true'. This means any installation which
is using HMS-3 (or a Hive version which has HIVE-22158) will always
create Kudu tables as external tables. This is problematic since the
output of show create table will now be different and may confuse
the users.
In order to improve the user experience of such synchronized tables
(external tables with external.table.purge property set to true),
this patch adds support in Impala to create
external Kudu tables. Previous versions of Impala disallowed
creating a external Kudu table if the Kudu table did not exist.
After this patch, Impala will check if the Kudu table exists and if
it does not it will create a Kudu table based on the schema provided
in the create table statement. The command will error out if the Kudu
table already exists. However, this applies to only the synchronized
tables. Previous way to create a pure external table behaves the
same.
Following syntax of creating a synchronized table is now allowed:
CREATE EXTERNAL TABLE foo (
id int PRIMARY KEY,
name string)
PARTITION BY HASH PARTITIONS 8
STORED AS KUDU
TBLPROPERTIES ('external.table.purge'='true')
The syntax is very similar to creating a managed table, except for
the EXTERNAL keyword and additional table property. A synchronized
table will behave similar to managed Kudu tables (drops and renames
are allowed). The output of show create table on a synchronized
table will display the full column and partition spec similar to the
managed tables.
Testing:
1. After the CDP version bump all of the existing Kudu tables now
create synchronized tables so there is good coverage there.
2. Added additional tests which create synchronized tables and
compares the show create table output.
3. Ran exhaustive tests with both CDP and CDH builds.
Change-Id: I76f81d41db0cf2269ee1b365857164a43677e14d
Reviewed-on: http://gerrit.cloudera.org:8080/14750
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When the planner migrates predicates to inline views, it also creates
equivalent predicates based on the value transfer graph which is built
by transitive relationships among join conditions. These newly inferred
predicates are placed typically as 'other predicates' of an inner or
outer join.
However, for outer joins, this has the effect of adding extra predicates
in the WHERE clause which is incorrect since it may filter NULL values.
Since the original query did not have null filtering conditions in
the WHERE clause, we should not add new ones. In this fix we do the
following: during the migration of conjuncts to inline views, analyze
the predicate of type A <op> B and if it is an inferred predicate AND
either the left or right slots reference the output tuple of an outer
join, the inferred predicate is ignored.
Note that simple queries with combination of inner and outer joins may
not reproduce the problem. Due to the nature of predicate inferencing,
some combination of subqueries, inner joins, outer joins is needed. For
the query pattern, please see the example in the JIRA.
Tests:
- Added plan tests with left and right outer joins to inline-view.test
- One baseline plan in inline-view.test had to be updated
- Manually ran few queries on impala shell to verify result
correctness: by checking that NULL values are being produced for outer
joins.
- Ran regression tests on jenkins
Change-Id: Ie9521bd768c4b333069c34d5c1e11b10ea535827
Reviewed-on: http://gerrit.cloudera.org:8080/14813
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Hive can write timestamps that are outside Impala's valid
range (Impala: 1400-9999 Hive: 0001-9999). This change adds
validation logic to ORC reading that replaces out-of-range
timestamps with NULLs and adds a warning to the query.
The logic is very similar to the existing validation in
Parquet. Some differences:
- "time of day" is not checked separately as it doesn't make
sense with ORC's encoding
- instead of column name only column id is added to the warning
Testing:
- added a simple EE test that scans an existing ORC file
Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
Reviewed-on: http://gerrit.cloudera.org:8080/14832
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently -0.0/+0.0 values are hashed to different values due to
their different binary representation, while -0.0==+0.0 is true in
C++. This caused them to be distinct values in hash maps despite
being treated as equal in comparisons.
This commit fixes the hashing of -0.0/+0.0, thus changing the
behaviour of hash joins and aggregations (since aggregations
follow the behaviour of the join). That way, the canonical form for
-0/+0 is changed to +0.
Tests:
- Added e2e tests for aggregation (group by and distinct) and
join queries with -0.0 and +0.0 present.
Change-Id: I6bb1a817c81c452d041238c19cb6c9f602a5d565
Reviewed-on: http://gerrit.cloudera.org:8080/14588
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When mt_dop > 0, the summary is reporting the number of fragment
instances, instead of the number of hosts as the header would
imply.
This commit fixes the issue so the number of hosts will be shown
under the #Hosts column. The commit also adds an #Inst column
where the number of instances are shown (current behaviour).
Tests:
* Changed profile tests with mt_dop > 0.
* Updated benchmark tests and shell tests accordingly.
Change-Id: I3bdf9a06d9bd842b2397cd16c28294b6bec7af69
Reviewed-on: http://gerrit.cloudera.org:8080/14715
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala's DistributedPlanner may sometimes accidentally choose broadcast
distribution for inputs that are larger than the destination executor's
total memory. This could potentially happen if the cluster membership is
not accurately known and the planner's cost computation of the
broadcastCost vs partitionCost happens to favor the broadcast
distribution. This causes spilling and severely affects performance.
Although the DistributedPlanner does a mem_limit check before picking
broadcast, the mem_limit is not an accurate reflection since it is
assigned during admission control.
As a safety here we introduce an explicit configurable limit:
broadcast_bytes_limit for the size of the broadcast input and set it
to default of 32GB. The default is chosen based on analysis of existing
benchmark queries and representative workloads such that in vast
majority of the cases the parameter value does not need to be changed.
If the estimated input size on the build side is greater than this
threshold, the DistributedPlanner will fall back to a partition
distribution. Setting this parameter to 0 causes it to be ignored.
Testing:
- Ran all regression tests on Jenkins successfully
- Added a few unit testis in PlannerTest that (a) set the
broadcast_bytes_limit to a small value and checks whether the
distributed plan does hash partitioning on the build side instead
of broadcast, (b) pass a broadcast hint to override the config
setting, (c) verify the standard case where broadcast threshold
is larger than the build input size.
Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Reviewed-on: http://gerrit.cloudera.org:8080/14690
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This undoes the hack of pretending that it's not a test environment for
that single test. That had side effects, e.g. for the metadata loading
path.
Instead we have a special flag to enable the validation code in
frontend tests.
Note that the plans change to include join build sinks as an
expected result of undoing the hack.
Change-Id: I2e8823c562395e13f318d1ad6eed883d2d9d771f
Reviewed-on: http://gerrit.cloudera.org:8080/14707
Reviewed-by: Anurag Mantripragada <anurag@cloudera.com>
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
For 'case' function, if left WHEN condition is true, SimplifyConditionalsRule
will cast the THEN result-expression to original expression's type before return
result. In this Jira, we would like to remove the cast function for two reasons:
1. SimplifyConditionalsRule only applys to analyzed expression, which means
expression has already been casted to compatible type before it reaches the
expression rewrite step.
2. The cast function will cause IllegalStateException when 'CASE WHEN TRUE'
appearing in the where conjunction. For example:
Query: select * from functional.alltypessmall where case when true then id < 50 END
ERROR: IllegalStateException: null
Testing:
- Added e2e test to exprs.test
- Added unit test to ExprRewriteRulesTest
- Added unit test to ExprRewriterTest
Change-Id: I640d577200e76121c72685e4aaba1ef312a2d8b4
Reviewed-on: http://gerrit.cloudera.org:8080/14540
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Due to Hive-22158 all non-ACID tables are treated as external tables
instead of being managed tables. The ACID tests occasionally upgrade
non-ACID tables to ACID tables but that is not allowed for external
tables. Since all non-ACID tables are external due to HIVE-22158 some
of the ACID tests started to fail after a CDP_BUILD_NUMBER bump that
brought in a Hive version containing the mentioned change.
The fix is to set 'EXTERNAL' table property to false in the same step
when upgrading the table to ACID. Also in the tests this step is
executed from HIVE instead of Impala.
Tested with the original CDP_BUILD_NUMBER in bin/impala-config.sh and
also tested after bumping that number to 1579022.
Change-Id: I796403e04b3f06c99131db593473d5438446d5fd
Reviewed-on: http://gerrit.cloudera.org:8080/14633
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
* Update the distributed planner to reflect that broadcast join tables
are replicated in all fragments.
* Did a pass over the planner code looking at call sites of
getNumNodes() to confirm that they shouldn't be replaced by
getNumInstances()
Testing:
* Updated affected planner test where PARALLELPLANS had a
different join strategy.
* Added a targeted test to mem-limit-broadcast-join.test to
show that mt_dop affects join mode.
* Ran exhaustive tests.
Change-Id: I23395c2dadf6be0e8be99706ca3ab5f4964cbcf9
Reviewed-on: http://gerrit.cloudera.org:8080/14522
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change unifies mt_dop scheduling between the
union and scan cases.
Testing:
Manually checked that fragments with unions get parallelised
to the correct degree, both as a result of scans within the
fragment and input fragments.
Extend TestMtDopAdmissionSlots (renamed to TestMtDopScheduling)
to confirm that queries that were not parallelised before are
now parallelised. These tests verify the number of instances
of each operator using the ExecSummary embedded in the profile.
Change-Id: I0d2e9c86b530da3053e49d42b837dca0b1348ff2
Reviewed-on: http://gerrit.cloudera.org:8080/14384
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The bug was that filter routing table construction removed
filters from the TPlanNode structure for the join when
a finstance was not a producer of that filter. The
TPlanNode is shared between all instances of a fragment
on a backend, so this meant that the filter was removed
for all instances on that backend, often meaning that
no filters would be produced at all.
It was awkward fixing the bug within the framework of
the current data structures, where the routing table
is keyed by filter_id, so I ended up refactoring
the routing table somewhat. This also allowed
fixing a TODO about O(n^2) construction of the
routing table.
Testing:
Add regression test that timed out without fix.
Perf:
Ran a single node TPC-H workload with scale factor
30. No perf change.
Change-Id: I26e3628a982d5d9b8b24eb96b28aff11f8aa6669
Reviewed-on: http://gerrit.cloudera.org:8080/14511
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Add a temporary --mt_dop_auto_fallback to allow a graceful transition to
using mt_dop for workloads. When this flag is set, DML queries and joins
that would otherwise fail with an error when run with mt_dop > 0 fall
back to running with mt_dop = 0. This means that a user can set mt_dop
for their queries and it will only take effect when supported.
The behaviour generally does not change when this flag is not set,
with a couple of exceptions:
* I made mt_dop automatic for compute stats on all file formats
* mt_dop is allowed for single node plans with inserts. The
quirky validatePlan() logic previously disallowed this but
allowed joins in single node plans.
The checks added by this patch can be removed safely once mt_dop is
supported by default for all queries.
This includes some cleanup:
* isDmlStmt() was stale and incorrectly implemented.
* Various TreeNode methods did not return instances of subclasses of
the requested class, which was strange. This fix is required to
make 'contains(JoinNode.class)' work correctly. I checked the
callsites of the fixed functions and none of them would be affected
by this change because they specified a terminal class without
any subclasses.
I didn't actually use this fix in the end (I had to write a custom
tree traversal in hasUnsupportedMtDopJoin()), but figured I would
leave the improvement in here.
Testing:
Add some basic functional tests ensuring that the fallback takes
effect.
Run basic join and insert tests with this flag enabled.
Change-Id: Ie0d73d8744059874293697c8e104891a10dba04d
Reviewed-on: http://gerrit.cloudera.org:8080/14344
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-5036 added an optimisation for count(star) in Parquet scans
that avoids materialising dummy rows. This change provides similar
optimization for Kudu tables.
Instead of materializing empty rows when computing count star, we use
the NumRows field from the Kudu API. The Kudu scanner tuple is
modified to have one slot into which we will write the
num rows statistic. The aggregate function is changed from count to a
special sum function that gets initialized to 0.
Tests:
* Added end-to-end tests
̣* Added planner tests
* Run performance tests on tpch.lineitem Kudu table with 25 set as
scaling factor, on 1 node, with mt_dop set to 1, just to measure
the speedup gained when scanning. Counting the rows before the
optimization took around 400ms, and around 170ms after.
Change-Id: Ic99e0f954d0ca65779bd531ca79ace1fcb066fb9
Reviewed-on: http://gerrit.cloudera.org:8080/14347
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
TABLE' DDL.
Atlas needs table location to establish lineage between a newly
created external table and its table location.
The table location information is not available until the createTable
catalog op succeeds. After this change, location information is sent
to the backend in the TDDLExecResponse message which adds it to the
lineage graph. This information is sent only for create external
table queries.
Testing:
Added a test to verify the tableLocation field is populated for a
create external table query lineage. Also, modified the
lineage.test file to include location information for all lineages.
Change-Id: If02b0cc16d52c1956298171628f5737cab62ce9f
Reviewed-on: http://gerrit.cloudera.org:8080/14515
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In HdfsCachingUtil we set the expiration of cache directives to never.
This works well until the cache pool has max TTL set. Once max TTL is
set Impala will get an exception when it tries to add caching for tables
or partitions.
I changed HdfsCachingUtil to not set the expiration. This way the cache
directive inherits the expiration from the cache pool.
Testing
Added e2e test that creates a table in a cache pool that has max TTL.
Change-Id: I475b92704b19e337b2e62f766e5b978585bf6583
Reviewed-on: http://gerrit.cloudera.org:8080/14485
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
DDLs like 'create table' should generate minimal lineage graphs so
that consumers like Atlas can use information like 'queryText' to
establish lineages.
This change adds a call to the computeLineageGraph() method during
analysis phase of createTable which populates the graph with basic
information like queryText. If it is a CTAS, this graph is enhanced
in the "insert" phase with dependencies.
Testing:
Add an EE test to verify lineage information and also to check it
is flushed to disk properly.
Change-Id: Ia6c7ed9fe3265fd777fe93590cf4eb2d9ba0dd1e
Reviewed-on: http://gerrit.cloudera.org:8080/14458
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>