A recent change, IMPALA-5016, added an expr rewrite rule to simplfy
coalesce(). This rule eliminates the coalesce() when its first
parameter (that isn't constant null) is a SlotRef pointing to a
SlotDescriptor that is non-nullable (for example because it is from
a non-nullable Kudu column or because it is from an HDFS partition
column with no null partitions), under the assumption that the SlotRef
could never have a null value.
This assumption is violated when the SlotRef is the output of an
outer join, leading to incorrect results being returned. The problem
is that the nullability of a SlotDescriptor (which determines whether
there is a null indicator bit in the tuple for that slot) is a
slightly different property than the nullability of a SlotRef pointing
to that SlotDescriptor (since the SlotRef can still be NULL if the
entire tuple is NULL).
This patch removes the portion of the rewrite rule that considers
the nullability of the SlotDescriptor. This means that we're missing
out on some optimizations opportunities and we should revisit this in
a way that works with outer joins (IMPALA-5753)
Testing:
- Updated FE tests.
- Added regression tests to exprs.test
Change-Id: I1ca6df949f9d416ab207016236dbcb5886295337
Reviewed-on: http://gerrit.cloudera.org:8080/7567
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
This is intended to be merged at the same time as Part 2 but is
separated out to make the change more reviewable. Part 2 assumes
that it does not need special logic to handle this mode (e.g.
because the old aggs and joins don't use reservation).
Disable the --enable_partitioned_{aggregation,hash_join} options
and remove all product and test code associated with them.
Change-Id: I5ce2236d37c0ced188a4a81f7e00d4b8ac98e7e9
Reviewed-on: http://gerrit.cloudera.org:8080/7102
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Partition pruning has two mechanisms:
1) Simple predicates (e.g. binary predicates of the form
<SlotRef> <op> <LiteralExpr>) can be used to derive lists
of matching partition ids directly from the
partition key values. This is handled directly in the FE
and is very efficient for supported simple predicates.
2) General expr evaluation of predicates using the BE (via
FeSupport). This works for all predicates, so is the
mechanism used for predicates not supported by (1).
The issue was that (1) was being used when a binary
predicate contained an implicit cast on the SlotRef. While
this is OK when being evaluated by the BE, the simple
mechanism in (1) would not be able to match the partition
key values with the predicate literal because the partition
key values cannot be cast in the FE.
The fix is to force binary predicates involving a cast to be
evaluated in the BE.
Testing: A planner test was added to demonstrate the
expected partition pruning occurs.
Some modifications were made to the functional schema table
stringpartitionkey, so it will be necessary to reload those
tables:
load-data.py -w functional-query --table_names=stringpartitionkey
Change-Id: I94f597a6589f5e34d2b74abcd29be77c4161cd99
Reviewed-on: http://gerrit.cloudera.org:8080/7521
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
In order to conform to the ISO SQL specifications, this patch allows
the 'order by' clause to be optional for first_value() and last_value()
analytical functions.
Testing:
1. Added Analyzer tests
2. Added Planner tests which checks that a sort node is not used if both
'order by' and 'partition by' clauses are not used. Also, checks that
if only 'partition by' clause is used then the sorting is done only on
the partition column.
3. Added a query test that checks that the input is not reordered by
these analytic functions if 'order by' clause is not used
Change-Id: I5a3a56833ac062839629353ea240b361bc727d96
Reviewed-on: http://gerrit.cloudera.org:8080/7502
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
When checking if a join can be inverted, we forgot to also check that
the resulting join would not be a non-equi right semi-join or a
non-equi right outer-join. We currently do not support those kinds of
joins in the backend.
Testing:
-Added a planner test
Change-Id: I91ba66fe30139fcd44d4615a142f183266800aab
Reviewed-on: http://gerrit.cloudera.org:8080/7476
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Impala Public Jenkins
IMPALA-4000 added basic authorization support for Kudu
tables, but it had several limitations:
* Only the ALL privilege level can be granted to Kudu tables.
(Finer-grained levels such as only SELECT or only INSERT are
not supported.)
* Column level permissions on Kudu tables are not supported.
* Only users with ALL privileges on SERVER may create external
Kudu tables.
This patch relaxes the restrictions to allow:
* Allow column-level permissions
* Allow fine grained privileges SELECT and INSERT for those
statement types.
DELETE/UPDATE/UPSERT privileges now require ALL privileges
because Sentry will eventually get fine grained privilege
actions, and at that point Impala should support the more
specific actions (IMPALA-3840). The assumption is that the
Kudu table authorization support is currently so limited
that most users are not using this functionality yet, but
this is a behavior change that needs to be clearly stated in
the Impala release notes.
Testing: Adds FE and EE tests.
Change-Id: Ib12d2b32fa3e142e69bd8b0f24f53f9e5cbf7460
Reviewed-on: http://gerrit.cloudera.org:8080/7307
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
The fix for IMPALA-1654 has broken the compute incremental stats child
query generation logic for general partition expressions. This commit
fixes it and also adds new queries to fix the test gap. These tests
fail consistently without the patch.
Change-Id: I227fc06f580eb9174f60ad0f515a3641cec19268
Reviewed-on: http://gerrit.cloudera.org:8080/7379
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Impala Public Jenkins
In a recent patch (IMPALA-5036) a bug was introduced where a count(*)
query with a group by a string partition column returned incorrect
results. Data was being written into the tuple at an incorrect offset.
Testing:
- Added an end to end test where we are selecting from a table
partitioned by string.
Change-Id: I225547574c2b2259ca81cb642d082e151f3bed6b
Reviewed-on: http://gerrit.cloudera.org:8080/7481
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Impala currently supports total sorts (the entire set of data
is sorted) and top-n sorts (only the highest/lowest n elements
are sorted). This patch adds the ability to do partial sorts,
where the data is divided up into some number of subsets, each
of which is sorted individually.
It accomplishes this by adding a new exec node, PartialSortNode.
When PartialSortNode::GetNext() is called, it retrieves input
up to the query memory limit, uses the existing Sorter class to sort
it, and outputs it. This is faster than a total sort with SortNode
as it avoids the need to spill if the input is larger than the
memory limit.
Future work will look into setting a more restrictive memory limit
on the PartialSortNode. (IMPALA-5669)
In the planner, the SortNode plan node is used, with an enum value
indicating if it is a total or partial sort.
This also adds a new counter 'RunSize' to the runtime profile which
tracks the min, max, and avg size of the generated runs, in tuples.
As a first use case, partial sort is used where a total sort was
used previously for inserts/upserts into Kudu tables only. Future
work can extend this to other table sinks. (IMPALA-5649)
Testing:
- E2E test with a large INSERT into a Kudu table with a mem limit.
Checks that no spills occurred.
- Updated planner tests.
- Existing E2E tests and stress test verify correctness of INSERT.
- Perf tests on the 10 node cluster: inserting tpch_100.lineitem
into a Kudu table with mem_limit=3gb:
Previously: 5 runs are spilled, sort took 7m33s
Now: no spills, sort takes 6m19s, for ~18% speedup
Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38
Reviewed-on: http://gerrit.cloudera.org:8080/7267
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
Creating Kudu clients is very expensive as each will fetch
metadata from the Kudu master, so we should minimize the
number of Kudu clients that get created.
This patch stores a map from Kudu master addressed to Kudu
clients in KuduUtil to be used across the FE and catalog.
Another patch has already addressed the BE.
Future work will consider providing a way to invalidate
the stored Kudu clients in case something goes wrong
(IMPALA-5685)
This relies on two changes on the Kudu side: one that clears
non-covered range entries from the client's cache on table
open (d07ecd6ded01201c912d2e336611a6a941f48d98), and one
that automatically refreshes auth tokens when they expire
(603c1578c78c0377ffafdd9c427ebfd8a206bda3).
This patch disables some tests that no longer work as
they relied on Kudu metadata loading operations timing out,
but since we're reusing clients the metadata is already
loaded when the test is run.
Testing:
- Ran a stress test on a 10 node cluster: scan of a small
Kudu table, 1000 concurrent queries, load on the Kudu
master was reduced signficantly, from ~50% cpu to ~5%.
(with the BE changes included)
- Ran the Kudu e2e tests.
- Manually ran a test with concurrent INSERTs and
'ALTER TABLE ADD PARTITION' (which is affected by the
Kudu side change mentiond above) and verified
correctness.
Change-Id: I9b0b346f37ee43f7f0eefe34a093eddbbdcf2a5e
Reviewed-on: http://gerrit.cloudera.org:8080/6898
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
Privileges granted to a role assigned to a db/table whose name
contains upper case characters can disappear after a few seconds.
A privilege is inserted into the catalogObjectCache using a key
that uses the db/table name. The key gets converted to a lower
case before inserting.
Privilege name returned by sentryProxy is always lower case,
which might not match the privilegeName built in the catalog.
This triggers an update of the catalog object followed by a
removal of the old object. Since they both use the same key
in lower case it ends up deleting the newly updated object.
This change also adds a new catalogd startup option
(sentry_catalog_polling_frequency)
to configure the frequency at which catalogd polls the sentry service
to update any policy changes. The default value is 60 seconds.
Test:
Added a test which adds select privileges to 3 tables and dbs specified
in lower case, upper case and mixed case. The test verifies that the
privileges on the 3 tables do not disappear on a sentry update.
Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901
Reviewed-on: http://gerrit.cloudera.org:8080/7332
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
Bugs:
- FunctionCallExpr's toSql() doesn't include IGNORE NULLS if present
causing view definitions to break and leading to incorrect results.
- FunctionCallExpr's clone() implementation doesn't carry forward
IGNORE NULLS option if present. One case that breaks with this is
querying views containing analytic exprs causing wrong plans.
Fixed both the bugs and added a test that can reliably reproduce this.
Change-Id: I723897886c95763c3f29d6f24c4d9e7d43898ade
Reviewed-on: http://gerrit.cloudera.org:8080/7416
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Impala Public Jenkins
Kudu tables did not treat some table properties correctly.
Change-Id: I69fa661419897f2aab4632015a147b784a6e7009
Reviewed-on: http://gerrit.cloudera.org:8080/7454
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
The -use_local_tz_for_unix_timestamp_conversion flag exists
to specify if TIMESTAMPs should be interpreted as localtime
or UTC when converting to/from Unix time via builtins:
from_unixtime(bigint unixtime)
unix_timestamp(string datetime[, ...])
unix_timestamp(timestamp datetime)
However, the KuduScanner was calling into code that, when
the gflag above was set, interpreted Unix times as local
time. Unfortunately the write path (KuduTableSink) and some
FE TIMESTAMP code (see KuduUtil.java) did not have this
behavior, i.e. we were handling the gflag inconsistently.
Tests:
* Adds a custom cluster test to run Kudu test cases with
-use_local_tz_for_unix_timestamp_conversion.
* Adds tests for the new builtin
unix_micros_to_utc_timestamp() which run in a custom
cluster test (added test_local_tz_conversion.py) as well
as in the regular tests (added to test_exprs.py).
Change-Id: I423a810427353be76aa64442044133a9a22cdc9b
Reviewed-on: http://gerrit.cloudera.org:8080/7311
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
The following use of sequence file writer can lead to a crash:
> set compression_codec=gzip;
> set seq_compression_mode=record;
> set allow_unsupported_formats=1;
> create table seq_tbl like tbl stored as sequencefile;
> insert into seq_tbl select * from tbl;
This fix removes the MemPool::FreeAll() call from
HdfsSequenceTableWriter::Flush(). Freeing the memory pool in Flush()
is incorrect because a memory pool buffer is cached by the compressor
in the table writer which isn't reset across calls to Flush().
If the file that is being written is big enough,
HdfsSequenceTableWriter::AppendRows() will call Flush() multiple
times causing memory corruption.
Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec
Reviewed-on: http://gerrit.cloudera.org:8080/7394
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
This moves away from the PipelinedPlanNodeSet approach of enumerating
sets of concurrently-executing nodes because unions would force
creating many overlapping sets of nodes. The new approach computes
the peak resources during Open() and the peak resources between Open()
and Close() (i.e. while calling GetNext()) bottom-up for each plan node
in a fragment. The fragment resources are then combined to produce the
query resources.
The basic assumptions for the new resource estimates are:
* resources are acquired during or after the first call to Open()
and released in Close().
* Blocking nodes call Open() on their child before acquiring
their own resources (this required some backend changes).
* Blocking nodes call Close() on their children before returning
from Open().
* The peak resource consumption of the query is the sum of the
independent fragments (except for the parallel join build plans
where we can assume there will be synchronisation). This is
conservative but we don't synchronise fragment Open() and Close()
across exchanges so can't make stronger assumptions in general.
Also compute the sum of minimum reservations. This will be useful
in the backend to determine exactly when all of the initial
reservations have been claimed from a shared pool of initial reservations.
Testing:
* Updated planner tests to reflect behavioural changes.
* Added extra resource requirement planner tests for unions, subplans,
pipelines of blocking operators, and bushy join plans.
* Added single-node plans to resource-requirements tests. These have
more complex plan trees inside a single fragment, which is useful
for testing the peak resource requirement logic.
Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
Reviewed-on: http://gerrit.cloudera.org:8080/7223
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
IMPALA-4120 fixed an issue where lead/lag was potentially
operating on memory that the UDA didn't own, resulting in
potentially wrong results. As part of that fix, lead and lag
started allocating 'global' UDF memory (e.g. via Allocate()
rather than AllocateLocal()) in Init() which needs to be freed
in Serialize() or Finalize(), but only lead() was updated to
free the memory. This memory is eventually freed when the
fragment is torn down, but as a result of not freeing the
memory in Serialize or Finalize, the memory may be allocated
longer than necessary.
Change-Id: Id2b69b4ccb9cac076abca19bed6f0b1dd11dfff3
Reviewed-on: http://gerrit.cloudera.org:8080/7371
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
In commit 741421de, we accidently made it so that is_null=true
StringVals became is_null=false with len=0. Fix that and add
a regression test.
Change-Id: I34d288aad66a2609484058c9a177c02200cb6a6e
Reviewed-on: http://gerrit.cloudera.org:8080/7364
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Impala Public Jenkins
IMPALA-3742 introduced KuduPartitionExpr, which takes a row and passes
it to the Kudu client to determine what partitionit belongs to.
The DataStreamSender never frees the local allocations for the Kudu
partition exprs causing it to hang on to memory longer than it needs to.
This patch also fixes two other related issues:
- DataStreamSender was dropping the Status from AddRow in the Kudu
branch. Adds 'RETURN_IF_ERROR' and 'WARN_UNUSED_RESULT'
- Changes the HASH case in DataStreamSender to call FreeLocalAllocations
on a per-batch basis, instead of a per-row basis.
Testing:
- Added an e2e test that runs a large insert with a mem limit that
failed with oom previously.
Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Reviewed-on: http://gerrit.cloudera.org:8080/7346
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
Instead of materializing empty rows when computing count star, we use
the data stored in the Parquet RowGroup.num_rows field. The Parquet
scanner tuple is modified to have one slot into which we will write the
num rows statistic. The aggregate function is changed from count to a
special sum function that gets initialized to 0. We also add a rewrite
rule so that count(<literal>) is rewritten to count(*) in order to make
sure that this optimization is applied in all cases.
Testing:
- Added functional and planner tests
Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Reviewed-on: http://gerrit.cloudera.org:8080/6812
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Impala Public Jenkins
Reworks the FK/PK join detection logic to:
- more accurately recognize many-to-many joins
- avoid dim/dim joins for multi-column PKs
The new detection logic maintains our existing philosophy of generally
assuming a FK/PK join, unless there is strong evidence to the
contrary, as follows.
For each set of simple equi-join conjuncts between two tables, we
compute the joint NDV of the right-hand side columns by
multiplication, and if the joint NDV is significantly smaller than
the right-hand side row count, then we are fairly confident that the
right-hand side is not a PK. Otherwise, we assume the set of conjuncts
could represent a FK/PK relationship.
Extends the explain plan to include the outcome of the FK/PK detection
at EXPLAIN_LEVEL > STANDARD.
Performance testing:
1. Full TPC-DS run on 10TB:
- Q10 improved by >100x
- Q72 improved by >25x
- Q17,Q26,Q29 improved by 2x
- Q64 regressed by 10x
- Total runtime: Improved by 2x
- Geomean: Minor improvement
The regression of Q64 is understood and we will try to address it
in follow-on changes. The previous plan was better by accident and
not because of superior logic.
2. Nightly TPC-H and TPC-DS runs:
- No perf differences
Testing:
- The existing planner test cover the changes.
- Code/hdfs run passed.
Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
Reviewed-on: http://gerrit.cloudera.org:8080/7257
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
Implements HdfsScanner::GetNext() for the Avro, RC File, and
Sequence File scanners. Changes ProcessSplit() to repeatedly call
GetNext() to share the core scanning code between the legacy
ProcessSplit() interface (ProcessSplit()) and the new GetNext()
interface.
Summary of changes:
- Slightly change code flow for initial scan range that
only parses the file header. The new code sets
'only_parsing_header_' in Open() and then honors
that flag in GetNextInternal(). Before, all the logic
was inside ProcessSpit().
- Replace 'finished_' with 'eos_'.
- Add a RowBatch parameter to various functions.
- Change Close() to free all resources when a nullptr
RowBatch is passed.
Testing:
- Exhaustive tests passed on debug
- Core tests passed on asan
- TODO: Perf testing on cluster
Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Reviewed-on: http://gerrit.cloudera.org:8080/6527
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
This is done to simplify the CHAR(N) logic. I believe this is overall an
improvement - any benefits of the out-of-line storage that motivated
this optimisation originally were outweighed by the added complexity.
This also avoids IMPALA-5559 (fe/be have different notions of var-len),
which will unblock IMPALA-3200.
Pros:
* Reduce the number of code paths and improve test coverage.
(e.g. avoids IMPALA-5559: fe/be have different notions of var-len)
* Reduced memory to store non-NULL data (saves 12-byte StringValue)
* Fewer branches in code -> save CPU cycles.
* If CHAR(N) performance is important, reduced complexity makes it
easier to implement codegen.
Cons:
* Requires N bytes to store a NULL value.
* May hurt cache locality (although this is speculative in my mind).
The change is mostly mechanical - I removed MAX_CHAR_INLINE_LENGTH
and then removed branches that depended on that.
Testing:
Ran exhaustive build.
Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3
Reviewed-on: http://gerrit.cloudera.org:8080/7303
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
This is similar to the single-node execution optimisation, but applies
to slightly larger queries that should run in a distributed manner but
won't benefit from codegen.
This adds a new query option disable_codegen_rows_threshold that
defaults to 50,000. If fewer than this number of rows are processed
by a plan node per impalad, the cost of codegen almost certainly
outweighs the benefit.
Using rows processed as a threshold is justified by a simple
model that assumes the cost of codegen and execution per row for
the same operation are proportional. E.g. if x is the complexity
of the operation, n is the number of rows processed, C is a
constant factor giving the cost of codegen and Ec/Ei are constant
factor giving the cost of codegen'd and interpreted execution and
d, then the cost of the codegen'd operator is C * x + Ec * x * n
and the cost of the interpreted operator is Ei * x * n. Rearranging
means that interpretation is cheaper if n < C / (Ei - Ec), i.e. that
(at least with the simplified model) it makes sense to choose
interpretation or codegen based on a constant threshold. The
model also implies that it is somewhat safer to choose codegen
because the additional cost of codegen is O(1) but the additional
cost of interpretation is O(n).
I ran some experiments with TPC-H Q1, varying the input table size, to
determine what the cut-over point where codegen was beneficial was.
The cutover was around 150k rows per node for both text and parquet.
At 50k rows per node disabling codegen was very beneficial - around
0.12s versus 0.24s. To be somewhat conservative I set the default
threshold to 50k rows. On more complex queries, e.g. TPC-H Q10, the
cutover tends to be higher because there are plan nodes that process
many fewer than the max rows.
Fix a couple of minor issues in the frontend - the numNodes_
calculation could return 0 for Kudu, and the single node optimization
didn't handle the case where for a scan node with conjuncts, a limit
and missing stats correctly (it considered the estimate still valid.)
Testing:
Updated e2e tests that set disable_codegen to set
disable_codegen_rows_threshold to 0, so that those tests run both
with and without codegen still.
Added an e2e test to make sure that the optimisation is applied in
the backend.
Added planner tests for various cases where codegen should and shouldn't
be disabled.
Perf:
Added a targeted perf test for a join+agg over a small input, which
benefits from this change.
Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Reviewed-on: http://gerrit.cloudera.org:8080/7153
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
This change introduces a new rule to merge disjunct equality
predicates into an IN predicate. As with every rule being applied
bottom up, the rule merges the leaf OR predicates into an in predicate
and subsequently merges the OR predicate to the existing IN predicate
It will also merge two compatible IN predicates into a single IN
predicate.
Patch also addresses review comments to
normalize the binary predicates and testcases for the same.
binary predicates of the form constant <op> non constant are normalized
to non constant <op> constant
Change-Id: If02396b752c5497de9a92828c24c8062027dc2e2
Reviewed-on: http://gerrit.cloudera.org:8080/7110
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
The parser didn't account for the possibility of negative
numeric literals.
Testing:
Added a test that sets a negative value. Query tests send the whole
"set" statement to the backend for execution so exercise the parser.
Ran core tests.
Change-Id: I5c415dbed6ba1122919be75f5811444d88ee03b4
Reviewed-on: http://gerrit.cloudera.org:8080/7316
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
This change fixes the frontend tests to make them run on Java 8, by
replacing HashMap with LinkedHashMap and HashSet with LinkedHashSet
where needed.
To test this I ran the frontend tests using both Oracle Java 7 and
Oracle Java 8 and made sure they passed. I also verified that the tests
pass with OpenJDK 7.
Change-Id: Iad8e1dccec3a51293a109c420bd2b88b9d1e0625
Reviewed-on: http://gerrit.cloudera.org:8080/7073
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
Scale down the buffer size in hash joins and hash aggregations if
estimates indicate that the build side of the join is small.
This greatly reduces minimum memory requirements for joins in some
common cases, e.g. small dimension tables.
Currently this is not plumbed through to the backend and only takes
effect in planner tests.
Testing:
Added targeted planner tests for small/mid/large/unknown memory
requirements for aggregations and joins.
Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Reviewed-on: http://gerrit.cloudera.org:8080/6963
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
The bug was in the DCHECK. The DCHECK is intended to make sure that a
tuple's string data didn't get split across blocks. The logic assumed
that if the second-or-later string column was in the next-block, that
the strings were split between blocks. However, that assumption is
invalid if there are NULL strings, which do not belong in any block.
The fix for the DCHECK (which is still useful) is to count the number
of non-NULL strings and make sure that no non-NULL strings were split
between blocks.
Testing:
Added a test that reproduces the crash.
Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90
Reviewed-on: http://gerrit.cloudera.org:8080/7295
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
For queries where plan generation is terminated early due to LIMIT 0
or similar, some tuples may not have a mem layout because no PlanNode
has been generated to materialize them. The fix is to make
recomputeMemLayout() a no-op if the tuple does not have an existing
mem layout.
Testing:
- added regression test
Change-Id: I08548c6bfa7dbf4655e55636605bebb89d2a2239
Reviewed-on: http://gerrit.cloudera.org:8080/7264
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
Hash join node currently does not apply the limits correctly.
This issue gets masked most of the times since the planner sticks
an exchange node on top of most of the joins. This issue gets
exposed when NUM_NODES=1.
Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f
Reviewed-on: http://gerrit.cloudera.org:8080/6778
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Kudu recently added the ability to alter a column's default value
and storage attributes (KUDU-861). This patch adds the ability to
modify these from Impala using ALTER.
It also supports altering a column's comment for non-Kudu tables.
It does not support setting a column to be a primary key or
changing a column's nullability, because those are not supported on
the Kudu side yet.
Syntax:
ALTER TABLE <table> ALTER [COLUMN] <column>
SET <attr> <value> [<attr> <value> [<attr> <value>...]]
where <attr> is one of:
- DEFAULT, BLOCK_SIZE, ENCODING, COMPRESSION (Kudu tables)
- COMMENT (non-Kudu tables)
ALTER TABLE <table> ALTER [COLUMN] <column> DROP DEFAULT
This is similar to the existing CHANGE statement:
ALTER TABLE <table> CHANGE <column> <new_col_name> <type>
[COMMENT <comment>]
but the new syntax is more natural for setting column properties
when the column name and type are not being changed. Both ALTER
COLUMN and CHANGE COLUMN operations use AlterTableAlterColStmt and
are sent to the catalog as ALTER_COLUMN operations.
Testing:
- Added FE tests to ParserTest and AnalyzeDDLTest
- Added EE tests to test_kudu.py
Change-Id: Id2e8bd65342b79644a0fdcd925e6f17797e89ad6
Reviewed-on: http://gerrit.cloudera.org:8080/6955
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
The bug is that the join tried to bring the next spilled partition into
memory while still holding onto memory from the current partition.
The fix is to return earlier if the output batch is at capacity so
that resources are flushed.
Also reduce some of the redundancy in the loop that drives the spilling
logic and catch some dropped statuses..
Testing:
The failure was originally reproduced by my IMPALA-4703 patch. I was
able to cause a query failure with the current code by reducing the
memory limit for an existing query. Before it failed with up to 12MB of
memory. Now it succeeds with 8MB or less.
Ran exhaustive build.
Change-Id: I075388d348499c5692d044ac1bc38dd8dd0b10c7
Reviewed-on: http://gerrit.cloudera.org:8080/7180
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
The TableFlattener takes a nested dataset and creates an equivalent
unnested dataset. The unnested dataset is saved as Parquet.
When an array or map is encountered in the original table, the flattener
creates a new table and adds an id column to it which references the row
in the parent table. Joining on the id column should produce the
original dataset.
The flattened dataset should be loaded into Postgres in order to run the
query generator (in nested types mode) on it. There is a script that
automates generaration, flattening and loading random data into Postgres
and Impala:
testdata/bin/generate-load-nested.sh -f
Testing:
- ran ./testdata/bin/generate-load-nested.sh -f and random nested data
was generated and flattened as expected.
Change-Id: I7e7a8e53ada9274759a3e2128b97bec292c129c6
Reviewed-on: http://gerrit.cloudera.org:8080/5787
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
This change modifies the logic of NOT IN predicate so that
the planner can calculate the correct node cardinality. Prior
to this change, both IN and NOT IN predicates shared the
same selectivity, which resulted in the same cardinality
during planning.
The selectivity is calculated by the following heuristic:
selectivity = 1 - (num of predicate children /
num of distinct values)
Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Reviewed-on: http://gerrit.cloudera.org:8080/7168
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
Occasionally we'd see HBase fail to startup properly on CentOS 7
clusters. The symptom was that HBase would not open the required nodes
in zookeeper, signaling its readiness.
As a workaround, this change includes waiting for the Zookeeper nodes
into the retry logic.
Change-Id: Id8dbdff4ad02cac1322e7d580e0a6971daf6ea28
Reviewed-on: http://gerrit.cloudera.org:8080/7159
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: anujphadke <aphadke@cloudera.com>
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
This patch aligns the sorter's methods closer with the ExecNode methods
and moves the possibly-failing parts of Reset() into Open().
Testing:
Added WARN_UNUSED_RESULT to all the sorter methods that return Status to
prevent similar issues in future.
Add a test that sometimes goes down this code path. It was able to cause
a crash at least once every 5 executions.
Ran an exhaustive build to make sure there were no other regressions.
Change-Id: I7d4f9e93a44531901e663b3f1e18edc514363f74
Reviewed-on: http://gerrit.cloudera.org:8080/7134
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
A recent addition to test_create_table_like_file (IMPALA-2525)
relies on a file, enum.parquet, being preloaded into HDFS, which
is done by create-load-data.sh.
The problem is that the test creates the table as an internal
table with its location as the directory containing enum.parquet.
When the test completes and the table is dropped, enum.parquet
is deleted, so the test cannot be successfully run again, and a
snapshot generated from the contents of HDFS afterwards will
not contain the file.
The fix is to create the table as an external table.
Testing:
- Ran the test and verfied enum.parquet is still present in HDFS.
Change-Id: I6c386843e5ef5bf6fc208db1ff90be98fd8baacf
Reviewed-on: http://gerrit.cloudera.org:8080/7139
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
Codegen time under ASAN can take ~10s, making the 15s timeouts for
runtime filter tests a bit small. Double those timeouts to 30s.
Change-Id: I2280e08910430e271da2173e465731bba5aef6cf
Reviewed-on: http://gerrit.cloudera.org:8080/7097
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
This change executes the tests added to subplans.test and removes
a test which incorrectly references subplannull_data.test (a file
which does not exist)
Change-Id: I02b4f47553fb8f5fe3425cde2e0bcb3245c39b91
Reviewed-on: http://gerrit.cloudera.org:8080/7038
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
When generating IR functions during codegen, we used to
always tag the functions with the "AlwaysInline" attribute.
That potentially leads to excessive inlining, causing very
long optimization / compilation time with marginal performance
benefit at runtime. One of the reasons for doing it was that
the "target-cpu" and "target-features" attributes were
missing in the generated IR functions so the LLVM inliner
considers them incompatible with the cross-compiled functions.
As a result, the inliner will not inline the generated IR
functions into cross-compiled functions and vice versa unless
the "AlwaysInline" attributes exist.
This change fixes the problem above by setting the "target-cpu"
and "target-features" attributes of all IR functions to match
that of of the host's CPUs so both generated IR functions and
cross-compiled functions will have the same values for those
attributes. With these attributes set, we now rely on the
inliner of LLVM to determine whether a function is worth being
inlined. With this change, the codegen time of a query with very
long predicate went from 15s to 4s and the overall runtime went
from 19s to 8s.
Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce
Reviewed-on: http://gerrit.cloudera.org:8080/6941
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
The bug was that the constant exprs of a union were only
evaluated for the first fragment instance. However, for
a union inside a subplan, we should always evaluate the
constant exprs.
Testing:
- Added a regression test.
- Locally ran test_nested_types.py and the union tests in
test_queries.py
Change-Id: Icd2f21f0213188e2304f8e9536019c7940c07768
Reviewed-on: http://gerrit.cloudera.org:8080/7091
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
For every new iteration of a subplan there are leftover
rows from the previous iteration of a subplan. This change
transfers the ownership from the probe_batch_ to output_batch_
and resets the probe_batch_ on hitting the limit.
Change-Id: Iafd621d33a4e2fac42391504566ffd8dd0e18a67
Reviewed-on: http://gerrit.cloudera.org:8080/7014
Tested-by: Impala Public Jenkins
Reviewed-by: Lars Volker <lv@cloudera.com>
Adds a new query option DEFAULT_JOIN_DISTRIBUTION_MODE to
control which join distribution mode is chosen when the join
inputs have an unknown cardinality (e.g., missing stats) or when
the expected costs of the different strategies are equal.
Values for DEFAULT_JOIN_DISTRIBUTION_MODE: [BROADCAST, SHUFFLE]
Default: BROADCAST
Note that this change effectively undoes IMPALA-5120.
Testing:
- Added new planner tests
- Core/hdfs run passed
Change-Id: Ibd34442f422129d53bef5493fc9cbe7375a0765c
Reviewed-on: http://gerrit.cloudera.org:8080/7059
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
While support for TIMESTAMP columns in Kudu tables has been
committed (IMPALA-5137), it does not support TIMESTAMP
column default values.
This supports CREATE TABLE syntax to specify the default
values, but more importantly this fixes the loading of Kudu
tables that may have had default values set on
UNIXTIME_MICROS columns, e.g. if the table was created via
the python client. This involves fixing KuduColumn to hide
the LiteralExpr representing the default value because it
will be a BIGINT if the column type is TIMESTAMP. It is only
needed to call toSql() and toStringValue(), so helper
functions are added to KuduColumn to encapsulate special
logic for TIMESTAMP.
TODO: Add support and tests for ALTER setting the default
value (when IMPALA-4622 is committed).
Change-Id: I655910fb4805bb204a999627fa9f68e43ea8aaf2
Reviewed-on: http://gerrit.cloudera.org:8080/6936
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
This change adds a query option to disable reading Parquet statistics.
It provides a workaround when dealing with files that have corrupt
parquet statistics.
Note that Impala handles Parquet files affected by PARQUET-251 correctly
by ignoring statistics for anything but plain numeric types. This query
option is supposed to help with files affected by unknown or errors or
by errors that are yet to be made.
Change-Id: I427f7fde40d0f4b703751e40f3c2109a850643f7
Reviewed-on: http://gerrit.cloudera.org:8080/7001
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
This is a migration from an old and broken script from another
repository. Example use:
bin/single_node_perf_run.py --ninja --workloads targeted-perf \
--load --scale 4 --iterations 20 --num_impalads 3 \
--start_minicluster --query_names PERF_AGG-Q3 \
$(git rev-parse HEAD~1) $(git rev-parse HEAD)
The script can load data, run benchmarks, and compare the statistics
of those runs for significant differences in performance. It glues
together buildall.sh, bin/load-data.py, bin/run-workload.py, and
tests/benchmark/report_benchmark_results.py.
Change-Id: I70ba7f3c28f612a370915615600bf8dcebcedbc9
Reviewed-on: http://gerrit.cloudera.org:8080/6818
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
The current code only tests with the default setting
for parquet_dictionary_filtering, which is true. This
adds a test to verify that parquet_dictionary_filtering
set to false does not filter any row groups.
Change-Id: If3175ce1d01c806d822c2782d60ca10939e7179e
Reviewed-on: http://gerrit.cloudera.org:8080/7021
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
This change loads the missing tables in TPC-DS. In addition,
it also fixes up the loading of the partitioned table store_sales
so all partitions will be loaded. The existing TPC-DS queries are
also updated to use the parameters for qualification runs as noted
in the TPC-DS specification. Some hard-coded partition filters were
also removed. They were there due to the lack of dynamic partitioning
in the past. Some missing TPC-DS queries are also added to this change,
including query28 which discovered the infamous IMPALA-5251.
Having all tables in TPC-DS available paves the way for us to include
all supported TPCDS queries in our functional testing. Due to the change
in the data, planner tests and the E2E tests have different results than
before. The results of E2E tests were compared against the run done with
Netezza and Vertica. The divergence were all due to the truncation behavior
of decimal types in DECIMAL_V1.
Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9
Reviewed-on: http://gerrit.cloudera.org:8080/6877
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins