Commit Graph

1576 Commits

Author SHA1 Message Date
Grant Henke
4471eb3b95 IMPALA-5369: Remove old pom parent in testdata module
Change-Id: Ie9013aeb5afd631546b3333da9201d0345dc9321
Reviewed-on: http://gerrit.cloudera.org:8080/6992
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-25 20:36:25 +00:00
Sailesh Mukil
50bd015f2d IMPALA-5333: Add support for Impala to work with ADLS
This patch leverages the AdlFileSystem in Hadoop to allow
Impala to talk to the Azure Data Lake Store. This patch has
functional changes as well as adds test infrastructure for
testing Impala over ADLS.

We do not support ACLs on ADLS since the Hadoop ADLS
connector does not integrate ADLS ACLs with Hadoop users/groups.

For testing, we use the azure-data-lake-store-python client
from Microsoft. This client seems to have some consistency
issues. For example, a drop table through Impala will delete
the files in ADLS, however, listing that directory through
the python client immediately after the drop, will still show
the files. This behavior is unexpected since ADLS claims to be
strongly consistent. Some tests have been skipped due to this
limitation with the tag SkipIfADLS.slow_client. Tracked by
IMPALA-5335.

The azure-data-lake-store-python client also only works on CentOS 6.6
and over, so the python dependencies for Azure will not be downloaded
when the TARGET_FILESYSTEM is not "adls". While running ADLS tests,
the expectation will be that it runs on a machine that is at least
running CentOS 6.6.
Note: This is only a test limitation, not a functional one. Clusters
with older OSes like CentOS 6.4 will still work with ADLS.

Added another dependency to bootstrap_build.sh for the ADLS Python
client.

Testing: Ran core tests with and without TARGET_FILESYSTEM as
'adls' to make sure that all tests pass and that nothing breaks.

Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Reviewed-on: http://gerrit.cloudera.org:8080/6910
Tested-by: Impala Public Jenkins
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
2017-05-25 19:35:24 +00:00
Tim Armstrong
b4343895d8 IMPALA-4923: reduce memory transfer for selective scans
Most of the code changes are to restructure things so that the
scratch batch's tuple buffer is stored in a separate MemPool
from auxiliary memory such as decompression buffers. This part
of the change does not change the behaviour of the scanner in
itself, but allows us to recycle the tuple buffer without holding
onto unused auxiliary memory.

The optimisation is implemented in TryCompact(): if enough rows
were filtered out during the copy from the scratch batch to the
output batch, the fixed-length portions of the surviving rows
(if any) are copied to a new, smaller, buffer, and the original,
larger, buffer is reused for the next scratch batch.

Previously the large buffer was always attached to the output batch,
so a large buffer was transferred between threads for every scratch
batch processed. In combination with the decompression buffer change
in IMPALA-5304, this means that in many cases selective scans don't
produce nearly as many empty or near-empty batches and do not attach
nearly as much memory to each batch.

Performance:
Even on an 8 core machine I see some speedup on selective scans.
Profiling with "perf top" also showed that time in TCMalloc
was reduced - it went from several % of CPU time to a minimal
amount.

Running TPC-H on the same machine showed a ~5% overall improvement
and no regressions. E.g. Q6 got 20-25% faster.

I hope to do some additional cluster benchmarking on systems
with more cores to verify that the severe performance problems
there are fixed, but in the meantime it seems like we have enough
evidence that it will at least improve things.

Testing:
Add a couple of selective scans that exercise the new code paths.

Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Reviewed-on: http://gerrit.cloudera.org:8080/6949
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-25 02:55:36 +00:00
Alex Behm
ee0fc260d1 IMPALA-5309: Adds TABLESAMPLE clause for HDFS table refs.
Syntax:
<tableref> TABLESAMPLE SYSTEM(<number>) [REPEATABLE(<number>)]
The first number specifies the percent of table bytes to sample.
The second number specifies the random seed to use.

The sampling is coarse-grained. Impala keeps randomly adding
files to the sample until at least the desired percentage of
file bytes have been reached.

Examples:
SELECT * FROM t TABLESAMPLE SYSTEM(10)
SELECT * FROM t TABLESAMPLE SYSTEM(50) REPEATABLE(1234)

Testing:
- Added parser, analyser, planner, and end-to-end tests
- Private core/hdfs run passed

Change-Id: Ief112cfb1e4983c5d94c08696dc83da9ccf43f70
Reviewed-on: http://gerrit.cloudera.org:8080/6868
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-24 02:38:08 +00:00
Lars Volker
0c8b2d3dbe IMPALA-5144: Remove sortby() hint
The sortby() hint is superseded by the SORT BY SQL clause, which has
been introduced in IMPALA-4166. This changes removes the hint.

Change-Id: I83e1cd6fa7039035973676322deefbce00d3f594
Reviewed-on: http://gerrit.cloudera.org:8080/6885
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-22 00:40:04 +00:00
Lars Volker
3610533f4b IMPALA-5339: Fix analysis with sort.columns and expr rewrites
IMPALA-4166 introduced a bug by duplicating code that adds sort
expressions. Upon re-analysis, this code would hit an
IndexOutOfBoundsException.

Change-Id: Ibebba29509ae7eaa691fe305500cda6bd41a179a
Reviewed-on: http://gerrit.cloudera.org:8080/6921
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-20 00:56:45 +00:00
Zach Amsden
d6e612f5c7 IMPALA-5180: Don't use non-deterministic exprs in partition pruning
Non-deterministic exprs which evaluate as constant should not be
used during HDFS partition pruning.  We consider Exprs which have no
SlotRefs as bound by default, and thus we end up trying to apply
them indisrciminately.  Constant propagation makes this situation
easier to run into and the behavior is rather unexpected.

The fix for now is to explicitly disallow non-deterministic Exprs
in partition pruning.

Change-Id: I91054c6bf017401242259a1eff5e859085285546
Reviewed-on: http://gerrit.cloudera.org:8080/6575
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-19 08:50:25 +00:00
Matthew Jacobs
6226e59702 IMPALA-5137: Support TIMESTAMPs in Kudu range predicate DDL
Adds support in DDL for timestamps in Kudu range partition syntax.

For convenience, strings can be specified with or without
explicit casts to TIMESTAMP.

E.g.
create table ts_ranges (ts timestamp primary key, i int)
partition by range (
  partition '2009-01-02 00:00:00' <= VALUES < '2009-01-03 00:00:00'
) stored as kudu

Range bounds are converted to Kudu UNIXTIME_MICROS during
analysis.

Testing: Adds FE and EE tests.

Change-Id: Iae409b6106c073b038940f0413ed9d5859daaeff
Reviewed-on: http://gerrit.cloudera.org:8080/6849
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-19 00:41:46 +00:00
Matthew Jacobs
24c77f194b IMPALA-5137: Support pushing TIMESTAMP predicates to Kudu
This change builds on the support for reading and writing
TIMESTAMP columns to Kudu tables (see [1]), adding support
for pushing TIMESTAMP predicates to Kudu for scans.

Binary predicates and IN list predicates are supported.

Testing: Added some planner and EE tests to validate the
behavior.

1: https://gerrit.cloudera.org/#/c/6526/

Change-Id: I08b6c8354a408e7beb94c1a135c23722977246ea
Reviewed-on: http://gerrit.cloudera.org:8080/6789
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-18 21:09:51 +00:00
Matthew Jacobs
d04f96b990 IMPALA-5301: Set Kudu minicluster memory limit
By default, Kudu assumes it has 80% of system memory which
is far too high for the minicluster. This sets a mem limit
of 2gb and lowers the limit of the block cache. These values
were tested on a gerrit-verify-dryrun job as well as an
exhaustive run.

This patch also simplifies TestKuduMemLimits which was
unnecessarily creating a large table during test execution.

Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5
Reviewed-on: http://gerrit.cloudera.org:8080/6844
Reviewed-by: Todd Lipcon <todd@apache.org>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-17 23:44:37 +00:00
Matthew Jacobs
7c368999f8 IMPALA-5319: Fix test_hdfs_scan_node_errors failures
The recent Kudu TIMESTAMP patch (IMPALA-5137) made an
inadvertent change [1] to alltypeserror_tmp and
alltypeserrornonulls_tmp, changing 'timestamp_col' from
STRING to TIMESTAMP.

This seems to cause failures on exhaustive jobs which run
test_hdfs_scan_node_errors against all file-formats.
I haven't been able to reproduce this failure myself, so
cannot test whether this fixes the jobs that are failing, but
this change to revert these tables seems warranted given
they were changed inadvertently.

1: https://gerrit.cloudera.org/#/c/6526/11/testdata/datasets/functional/functional_schema_template.sql

Change-Id: I533f1921662802ea6e076eefac973f50c014fcb5
Reviewed-on: http://gerrit.cloudera.org:8080/6891
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
2017-05-17 16:34:14 +00:00
Lars Volker
1ada9dac88 IMPALA-4166: Add SORT BY sql clause
This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

Sort columns can only be specified for Hdfs tables and effectiveness may
vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.columns' table property and will result in an additional SORT node
being added to the plan before the final table sink. Specifying sort
columns also enables clustering during inserts, so the SORT node will
contain all partitioning columns first, followed by the sort columns. We
do this because sort columns add a SORT node to the plan and adding the
clustering columns to the SORT node is cheap.

Sort columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144). Until then, it is possible to specify
sort columns using both ways at the same time and the column lists
will be concatenated.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Reviewed-on: http://gerrit.cloudera.org:8080/6495
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-12 15:43:30 +00:00
Matthew Jacobs
a16a0fa84d IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP
Adds Impala support for TIMESTAMP types stored in Kudu.

Impala stores TIMESTAMP values in 96-bits and has nanosecond
precision. Kudu's timestamp is a 64-bit microsecond delta
from the Unix epoch (called UNIXTIME_MICROS), so a conversion
is necessary.

When writing to Kudu, TIMESTAMP values in nanoseconds are
averaged to the nearest microsecond.

When reading from Kudu, the KuduScanner returns
UNIXTIME_MICROS with 8bytes of padding so Impala can convert
the value to a TimestampValue in-line and copy the entire
row.

Testing:
Updated the functional_kudu schema to use TIMESTAMPs instead
of converting to STRING, so this provides some decent
coverage. Some BE tests were added, and some EE tests as
well.

TODO: Support pushing down TIMESTAMP predicates
TODO: Support TIMESTAMPs in range partitioning expressions

Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d
Reviewed-on: http://gerrit.cloudera.org:8080/6526
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-11 20:55:51 +00:00
Thomas Tauber-Marshall
b8c8fb1b43 IMPALA-5294: Kudu INSERT partitioning fails with constants
An INSERT into a Kudu table with a constant value being inserted
into a partition column causes an IllegalStateExcaption. This is
because DistributedPlanner removes constants from the list of
partition exprs before creating the KuduPartitionExpr, but
KuduPartitionExpr expects to get one expr per partition column.

The fix is to pass the full list of partition exprs into the
KuduPartitionExpr, instead of the list that has had constants
removed. This preserves the behavior that if all of the partition
exprs are constant we fall back to UNPARTITIONED.

One complication is that if a partition expr is a NullLiteral, it
must be cast to a specific type to be passed to the BE. The
InsertStmt will cast the partition exprs to the partition column
types, but these casts may be lost from the copies of the partition
exprs stored by the KuduPartitionExpr during reset(). To fix this,
the KuduPartitionExpr can store the types of the partition cols and
recast the partition exprs to those types during analyze().

Change-Id: I12cbb319f9a5c47fdbfee347b47650186b27f8f9
Reviewed-on: http://gerrit.cloudera.org:8080/6828
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-10 22:53:02 +00:00
Lars Volker
9270346825 IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types
This change adds functionality to write and read parquet::Statistics for
Decimal, String, and Timestamp values. As an exception, we don't read
statistics for CHAR columns, since CHAR support is broken in Impala
(IMPALA-1652).

This change also switches from using the deprecated fields 'min' and
'max' to populate the new fields 'min_value' and 'max_value' in
parquet::Statistics, that were added in parquet-format pull request #46.

The HdfsParquetScanner will preferably read the new fields if they are
populated and if the column order 'TypeDefinedOrder' has been used to
compute the statistics. For columns without a column order set or with
only the deprecated fields populated, the scanner will read them only if
they are of simple numeric type, i.e. boolean, integer, or floating
point.

This change removes the validation of the Parquet Statistics we write to
Hive from the tests, since Hive does not write the new fields. Instead
it adds a parquet file written by Hive that uses the deprecated fields
for its statistics. It uses that file to exercise the fallback logic for
supported types in a test.

This change also cleans up the interface of ParquetPlainEncoder in
parquet-common.h.

Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Reviewed-on: http://gerrit.cloudera.org:8080/6563
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
2017-05-09 15:47:21 +00:00
Lars Volker
12f3ecceab IMPALA-5287: Test skip.header.line.count on gzip
This change fixed IMPALA-4873 by adding the capability to supply a dict
'test_file_vars' to run_test_case(). Keys in this dict will be replaced
with their values inside test queries before they are executed.

Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
Reviewed-on: http://gerrit.cloudera.org:8080/6817
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-09 01:36:46 +00:00
Thomas Tauber-Marshall
aca07ee816 IMPALA-5120: Default to partitioned join when stats are missing
Previously, we defaulted to broadcast join when stats were
missing, but this can lead to disastrous plans when the
right hand side is actually large.

Its always difficult to make good plans when stats are missing,
but defaulting to partitioned joins should reduce the risk of
disastrous plans.

Testing:
- Added a planner test that joins a table with no stats.

Change-Id: Ie168ecfcd5e7c5d3c60d16926c151f8f134c81e0
Reviewed-on: http://gerrit.cloudera.org:8080/6803
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-08 19:05:11 +00:00
Jim Apple
374f1121da IMPALA-3224: De-Cloudera non-docs JIRA URLs
John Russell is planning to fix the URLS in docs in a separate commit.

Fixed using:

    (git ls-files | xargs replace \
    'https://issues.cloudera.org/browse/IMPALA' 'IMPALA' --) && \
    git checkout HEAD docs

Change-Id: I28ea06e89341de234f9005fdc72a2e43f0ab8182
Reviewed-on: http://gerrit.cloudera.org:8080/6487
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
2017-05-07 04:44:57 +00:00
Joe McDonnell
aa05c6493b IMPALA-3654: Parquet stats filtering for IN predicate
This generates min/max predicates for InPredicates that
have only constant values in the IN list. It is only
used for statistics filtering on Parquet files.

Change-Id: I4a88963a7206f40a867e49eceeaf03fdd4f71997
Reviewed-on: http://gerrit.cloudera.org:8080/6810
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-06 03:40:57 +00:00
Taras Bobrovytsky
50e3abdc3d IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals()
The slot descriptor vectors are not guaranteed to be sorted on the slot
index within a tuple. As a result, TupleDescriptor::LayoutEquals()
sometimes returned a wrong result.

In this patch, we sort the vectors of slot descriptors on the slot index
within the tuple before comparing the vectors.

Testing:
- ran EE tests locally.

Change-Id: I426ad244678dbfe517262dfb7bbf4adc0247a35e
Reviewed-on: http://gerrit.cloudera.org:8080/6610
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-04 02:04:03 +00:00
Michael Brown
8b459dffec IMPALA-5162,IMPALA-5163: stress test support on secure clusters
This patch adds support for running the stress test
(concurrent_select.py) and loading nested data (load_nested.py) into a
Kerberized, SSL-enabled Impala cluster. It assumes the calling user
already has a valid Kerberos ticket. One way to do that is:

1. Get access to a keytab and krb5.config
2. Set KRB5_CONFIG and KRB5CCNAME appropriately
3. Run kinit(1)
4. Run load_nested.py and/or concurrent_select.py within this
   environment.

Because our Python clients already support Kerberos and SSL, we simply
need to make sure to use the correct options when calling the entry
points and initializing the clients:

Impala: Impyla
Hive: Impyla
HDFS: hdfs.ext.kerberos.KerberosClient

With this patch, I was able to manually do a short concurrent_select.py
run against a secure cluster without connection or auth errors, and I
was able to do the same with load_nested.py for a cluster that already
had TPC-H loaded.

Follow-ons for future cleanup work:

IMPALA-5263: support CA bundles when running stress test against SSL'd
             Impala

IMPALA-5264: fix InsecurePlatformWarning under stress test with SSL

Change-Id: I0daad57bb8ceeb5071b75125f11c1997ed7e0179
Reviewed-on: http://gerrit.cloudera.org:8080/6763
Reviewed-by: Matthew Mulder <mmulder@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-02 04:56:01 +00:00
Thomas Tauber-Marshall
801c95f39f IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu
are currently painful because we just send rows randomly,
which creates a lot of work for Kudu since it partitions
and sorts data before writing, causing writes to be slow
and leading to timeouts.

We can alleviate this by sending the rows to Kudu already
partitioned and sorted. This patch partitions and sorts
rows according to Kudu's partitioning scheme for INSERTs
and UPSERTs. A followup patch will handle UPDATE and DELETE.

It accomplishes this by inserting an exchange node and a sort
node into the plan before the operation. Both the exchange and
the sort are given a KuduPartitionExpr which takes a row and
calls into the Kudu client to return its partition number.

It also disallows INSERT hints for Kudu tables, since the
hints that we support (SHUFFLE, CLUSTER, SORTBY), so longer
make sense.

Testing:
- Updated planner tests.
- Ran the Kudu functional tests.
- Ran performance tests demonstrating that we can now handle much
  larger inserts without having timeouts.

Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4
Reviewed-on: http://gerrit.cloudera.org:8080/6559
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-02 01:40:43 +00:00
Zach Amsden
77304530f1 IMPALA-5003: Constant propagation in scan conjuncts
Implements constant propagation within conjuncts and applies the
optimization to scan conjuncts and collection conjuncts within Hdfs
scan nodes.  The optimization is applied during planning.  At scan
nodes in particular, we want to optimize to enable partition pruning.
In certain cases, we might end up with a FALSE conditional, which
now will convert to an EmptySet node.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  Added Kudu, datasource, Hdfs and HBase tests to validate
we can create EmptySetNodes.

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Reviewed-on: http://gerrit.cloudera.org:8080/6389
Tested-by: Impala Public Jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2017-05-02 01:12:14 +00:00
Dan Hecht
741421de09 IMPALA-5252: Fix crash in HiveUdfCall::GetStringVal() when mem_limit exceeded
We need to check for AllocateLocal() returning NULL. CopyFrom() takes
care of that for us.  Also adjust a few other places in the code base
that didn't have the check.

The new test reproduces the crash, but in order to get this test file to
execute, I had to move the xfail to be a function decorator. Apparently
xfail as a statement causes the test to not run at all. We should run
all of these queries even if they are non-determistic to at least verify
that impalad does not crash.

Change-Id: Iafefef24479164cc4d2b99191d2de28eb8b311b6
Reviewed-on: http://gerrit.cloudera.org:8080/6761
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-29 02:23:51 +00:00
Thomas Tauber-Marshall
6cddb952ce IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
Previously, exprs used in sorts were evaluated lazily. This can
potentially be bad for performance if the exprs are expensive to
evaluate, and it can lead to crashes if the exprs are
non-deterministic, as this violates assumptions of our sorting
algorithm.

This patch addresses these issues by materializing ordering exprs.
It does so when the expr is non-deterministic (including when it
contains a UDF, which we cannot currently know if they are
non-deterministic), or when its cost exceeds a threshold (or the
cost is unknown).

Testing:
- Added e2e tests in test_sort.py.
- Updated planner tests.

Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e
Reviewed-on: http://gerrit.cloudera.org:8080/6322
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-26 22:34:04 +00:00
Michael Ho
42ca45e830 IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg
Since commit d2d3f4c (on asf-master), TAggregateExpr contains
the logical input types of the Aggregate Expr. The reason they
are included is that merging aggregate expressions will have
input tyes of the intermediate values which aren't necessarily
the same as the input types. For instance, NDV() uses a binary
blob as its intermediate value and it's passed to its merge
aggregate expressions as a StringVal but the input type of NDV()
in the query could be DecimalVal. In this case, we consider
DecimalVal as the logical input type while StringVal is the
intermediate type. The logical input types are accessed by the
BE via GetConstFnAttr() during interpretation and constant
propagation during codegen.

To handle distinct aggregate expressions (e.g. select count(distinct)),
the FE uses 2-phase aggregation by introducing an extra phase of
split/merge aggregation in which the distinct aggregate expressions'
inputs are coverted and added to the group-by expressions in the first
phase while the non-distinct aggregate expressions go through the normal
split/merge treatement.

The bug is that the existing code incorrectly propagates the intermediate
types of the non-grouping aggregate expressions as the logical input types
to the merging aggregate expressions in the second phase of aggregation.
The input aggregate expressions for the non-distinct aggregate expressions
in the second phase aggregation are already merging aggregate expressions
(from phase one) in which case we should not treat its input types as
logical input types.

This change fixes the problem above by checking if the input aggregate
expression passed to FunctionCallExpr.createMergeAggCall() is already
a merging aggregate expression. If so, it will use the logical input
types recorded in its 'mergeAggInputFn_' as references for its logical
input types instead of the aggregate expression input types themselves.

Change-Id: I158303b20d1afdff23c67f3338b9c4af2ad80691
Reviewed-on: http://gerrit.cloudera.org:8080/6724
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-26 21:40:32 +00:00
aphadke
5809317c9a IMPALA-4893: Efficiently update the rows read counter for sequence file
Update the rows read counter after processing the scan range instead of updating
it after reading every row for sequence files to save CPU cycles.

Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
Reviewed-on: http://gerrit.cloudera.org:8080/6522
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-26 01:12:01 +00:00
Attila Jeges
59b2db6ba7 IMPALA-3079: Fix sequence file writer
This change fixes the following issues in the Sequence File Writer:
1. ReadWriteUtil::VLongRequiredBytes() and ReadWriteUtil::PutVLong()
   were broken. As a result, Impala created corrupt uncompressed
   sequence files.

2. KEY_CLASS_NAME was missing from the sequence file header. As a
   result, Hive could not read back uncompressed sequence files
   created by Impala.

3. Impala created record-compressed sequence files with empty keys
   block. As a result, Hive could not read back record-compressed
   sequence files created by Impala.

4. Impala created block-compressed files with:
   - empty key-lengths block
   - empty keys block
   - empty value-lengths block
   This resulted in invalid block-compressed sequence files that Hive could
   not read back.

5. In some cases the wrong Record-compression flag was written to the
   sequence file header. As a result, Hive could not read back record-
   compressed sequence files created by Impala.

6. Impala added 'sync_marker' instead of 'neg1_sync_marker' to the
   beginning of blocks in block-compressed sequence files. Hive could
   not read these files back.

7. The calculation of block sizes in SnappyBlockCompressor class was
   incorrect for odd-length buffers.

Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487
Reviewed-on: http://gerrit.cloudera.org:8080/6107
Reviewed-by: Michael Ho <kwho@cloudera.com>
Reviewed-by: Attila Jeges <attilaj@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-25 21:07:53 +00:00
Thomas Tauber-Marshall
915a16345c IMPALA-5125: SimplifyConditionalsRule incorrectly handles aggregates
This patch addresses 3 issues:
- SelectList.reset() didn't properly reset some of its members, though
  they're documented as needing to be reset. This was causing a crash
  when the Planner attempted to make an aggregation node for an agg
  function that had been eliminated by expr rewriting. While I'm here,
  I added resetting of all of SelectList's members that need to be
  reset, and fixed the documentation of one member that shouldn't be
  reset.
- SimplifyConditionalsRule was changing the meaning of queries that
  contain agg functions, e.g. because "select if(true, 0, sum(id))"
  is not equivalent to "select 0". The fix is to not return the
  simplfied expr if it removes all aggregates.
- ExprRewriteRulesTest was performing rewrites on the result exprs of
  the SelectStmt, which causes problems if the result exprs have been
  substituted. In normal query execution, we don't rewrite the result
  exprs anyway, so the fix is to match normal query execution and
  rewrite the select list exprs.

Testing:
- Added e2e test to exprs.test.
- Added unit test to ExprRewriteRulesTest.

Change-Id: Ic20b1621753980b47a612e0885804363b733f6da
Reviewed-on: http://gerrit.cloudera.org:8080/6653
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-24 21:41:11 +00:00
Taras Bobrovytsky
75553165ee IMPALA-4883: Union Codegen
For each non-passthrough child of the Union node, codegen the loop that
does per row tuple materialization.

Testing:
Ran test_queries.py test locally in exchaustive mode.

Benchmark:
Ran a local benchmark on a local 10 GB TPCDS dataset on an unpartitioned
store_sales table.

SELECT
  COUNT(c),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
) t

Before: 39s704ms
Operator          #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE           1  194.504us  194.504us        1           1   28.00 KB        -1.00 B  FINALIZE
12:EXCHANGE            1   17.284us   17.284us        3           1          0        -1.00 B  UNPARTITIONED
11:AGGREGATE           3    2s202ms    2s934ms        3           1  115.00 KB       10.00 MB
00:UNION               3   32s514ms   34s926ms  288.01M     288.01M    3.08 MB              0
|--02:SCAN HDFS        3  158.373ms  216.085ms   28.80M      28.80M  489.71 MB        1.88 GB  tpcds_10_parquet.store_sales
|--03:SCAN HDFS        3  167.002ms  171.738ms   28.80M      28.80M  489.74 MB        1.88 GB  tpcds_10_parquet.store_sales
|--04:SCAN HDFS        3  125.331ms  145.496ms   28.80M      28.80M  489.57 MB        1.88 GB  tpcds_10_parquet.store_sales
|--05:SCAN HDFS        3  148.478ms  194.311ms   28.80M      28.80M  489.69 MB        1.88 GB  tpcds_10_parquet.store_sales
|--06:SCAN HDFS        3  143.995ms  162.781ms   28.80M      28.80M  489.57 MB        1.88 GB  tpcds_10_parquet.store_sales
|--07:SCAN HDFS        3  169.731ms  250.201ms   28.80M      28.80M  489.58 MB        1.88 GB  tpcds_10_parquet.store_sales
|--08:SCAN HDFS        3  164.110ms  254.374ms   28.80M      28.80M  489.61 MB        1.88 GB  tpcds_10_parquet.store_sales
|--09:SCAN HDFS        3  135.631ms  162.117ms   28.80M      28.80M  489.63 MB        1.88 GB  tpcds_10_parquet.store_sales
|--10:SCAN HDFS        3  138.736ms  167.778ms   28.80M      28.80M  489.67 MB        1.88 GB  tpcds_10_parquet.store_sales
01:SCAN HDFS           3  202.015ms  248.728ms   28.80M      28.80M  489.68 MB        1.88 GB  tpcds_10_parquet.store_sales

After: 20s177ms
Operator          #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE           1  174.617us  174.617us        1           1   28.00 KB        -1.00 B  FINALIZE
12:EXCHANGE            1   16.693us   16.693us        3           1          0        -1.00 B  UNPARTITIONED
11:AGGREGATE           3    2s830ms    3s615ms        3           1  115.00 KB       10.00 MB
00:UNION               3    4s296ms    5s258ms  288.01M     288.01M    3.08 MB              0
|--02:SCAN HDFS        3    1s212ms    1s340ms   28.80M      28.80M  488.82 MB        1.88 GB  tpcds_10_parquet.store_sales
|--03:SCAN HDFS        3    1s387ms    1s570ms   28.80M      28.80M  489.37 MB        1.88 GB  tpcds_10_parquet.store_sales
|--04:SCAN HDFS        3    1s224ms    1s347ms   28.80M      28.80M  487.22 MB        1.88 GB  tpcds_10_parquet.store_sales
|--05:SCAN HDFS        3    1s245ms    1s321ms   28.80M      28.80M  489.25 MB        1.88 GB  tpcds_10_parquet.store_sales
|--06:SCAN HDFS        3    1s232ms    1s505ms   28.80M      28.80M  484.21 MB        1.88 GB  tpcds_10_parquet.store_sales
|--07:SCAN HDFS        3    1s348ms    1s518ms   28.80M      28.80M  488.20 MB        1.88 GB  tpcds_10_parquet.store_sales
|--08:SCAN HDFS        3    1s231ms    1s335ms   28.80M      28.80M  483.58 MB        1.88 GB  tpcds_10_parquet.store_sales
|--09:SCAN HDFS        3    1s179ms    1s349ms   28.80M      28.80M  482.76 MB        1.88 GB  tpcds_10_parquet.store_sales
|--10:SCAN HDFS        3    1s121ms    1s154ms   28.80M      28.80M  486.59 MB        1.88 GB  tpcds_10_parquet.store_sales
01:SCAN HDFS           3    1s284ms    1s523ms   28.80M      28.80M  486.70 MB        1.88 GB  tpcds_10_parquet.store_sales

Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439
Reviewed-on: http://gerrit.cloudera.org:8080/6459
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-21 04:53:09 +00:00
Thomas Tauber-Marshall
baba8960b3 IMPALA-5217: KuduTableSink checks null constraints incorrectly
KuduTableSink uses the referenced_columns map to translate between the
index into the output exprs 'j' and the index into columns in the Kudu
table 'col', but we incorrectly use 'j' when calling into the Kudu table
schema to check the nullability of columns.

Testing:
- Added e2e tests to kudu_insert.test

Change-Id: I8ed458278f135288a821570939de8ee294183df2
Reviewed-on: http://gerrit.cloudera.org:8080/6670
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-20 23:27:59 +00:00
aphadke
8660c404c9 IMPALA-5145: Do not constant fold null in CastExprs
Constant folding null values in CastExprs causes CTAS statements
to fail. This regresses the observed behavior before constant folding
was introduced. This change does not constant fold null in CastExprs.

Change-Id: Ia7aa1ab7f53a9dcc7560ded321a9d1e1ee2d18e3
Reviewed-on: http://gerrit.cloudera.org:8080/6663
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-20 19:39:17 +00:00
Tim Armstrong
9a29dfc91b IMPALA-3748: minimum buffer requirements in planner
Compute the minimum buffer requirement for spilling nodes and
per-host estimates for the entire plan tree.

This builds on top of the existing resource estimation code, which
computes the sets of plan nodes that can execute concurrently. This is
cleaned up so that the process of producing resource requirements is
clearer. It also removes the unused VCore estimates.

Fixes various bugs and other issues:
* computeCosts() was not called for unpartitioned fragments, so
  the per-operator memory estimate was not visible.
* Nested loop join was not treated as a blocking join.
* The TODO comment about union was misleading
* Fix the computation for mt_dop > 1 by distinguishing per-instance and
  per-host estimates.
* Always generate an estimate instead of unpredictably returning
  -1/"unavailable" in many circumstances - there was little rhyme or
  reason to when this happened.
* Remove the special "trivial plan" estimates. With the rest of the
  cleanup we generate estimates <= 10MB for those trivial plans through
  the normal code path.

I left one bug (IMPALA-4862) unfixed because it is subtle, will affect
estimates for many plans and will be easier to review once we have the
test infra in place.

Testing:
Added basic planner tests for resource requirements in both the MT and
non-MT cases.

Re-enabled the explain_level tests, which appears to be the only
coverage for many of these estimates. Removed the complex and
brittle test cases and replaced with a couple of much simpler
end-to-end tests.

Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37
Reviewed-on: http://gerrit.cloudera.org:8080/5847
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-18 20:36:08 +00:00
Tim Armstrong
96316e3b34 IMPALA-5173: crash with hash join feeding directly into nlj
The background for this bug is that we can't transfer ownership
of BufferdBlockMgr::Blocks that are attached to RowBatches.

The NestedLoopJoinNode accumulates row batches on its right
side and tries to take ownership of the memory, which doesn't
work as expected in this case.

The fix is to copy the data when we encounter one of these
(likely very rare) cases.

Testing:
Added a regression test that produces a crash before the fix and
succeeds after the fix.

Change-Id: I0c04952e591d17e5ff7e994884be4c4c899ae192
Reviewed-on: http://gerrit.cloudera.org:8080/6568
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-18 20:11:23 +00:00
Laszlo Gaal
9e7fb830fd IMPALA-4088: Assign fix values to the minicluster server ports
The minicluster setup logic assigned fixed port numbers to several
but not all listening sockets of the data nodes. This change
assigns similar port ranges to all the listening ports that were
so far allowed to pick their own port numbers, interfering with
other components, e.g. HBase.

Change-Id: Iecf312873b7026c52b0ac0e71adbecab181925a0
Reviewed-on: http://gerrit.cloudera.org:8080/6531
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-07 22:57:16 +00:00
Lars Volker
85d7f5eb2b IMPALA-4733: Change HBase ports to non-ephemeral
We've seen repeated test failures because HBase tries to bind to ports
in the ephemeral port range, which sometimes would already be occupied
by outgoing connections of other proccesses.

This change changes the ports to the new default HBase ports
(HBASE-10123):

HBase Master Port: 60000 -> 16000
HBase Master Web UI Port: 60010 -> 16010
HBase ReqionServer Port: 60020 -> 16020
HBase ReqionServer Web UI Port: 60030 -> 16030
HBase Status Multicast Port: 60100 -> 16100

This made it necessary to change the default KMS port, too
(HADOOP-12811):

KMS HTTP port: 16000 -> 9600

Change-Id: I6f8af325e34b6e352afd75ce5ddd2446ce73d857
Reviewed-on: http://gerrit.cloudera.org:8080/6524
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-04 00:28:29 +00:00
Joe McDonnell
077c07eec7 IMPALA-4859: Push down IS NULL / IS NOT NULL to Kudu
This detects IS NULL / IS NOT NULL and creates a Kudu
predicate to push this to Kudu.

For testing, there are planner tests to verify that the
predicate is pushed to Kudu. There are also end-to-end
tests for correctness.

Change-Id: I9c96fec8d41f77222879c0ffdd6940b168e47e65
Reviewed-on: http://gerrit.cloudera.org:8080/5958
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-25 04:51:36 +00:00
Matthew Jacobs
878fcf5a74 IMPALA-5111: Fix check when creating NOT NULL PK col in Kudu
The fix for IMPALA-4616 broke the ability to create a PK key
col in a Kudu table as explicitly 'NOT NULL'. While this is
the default, it should be possible to specify.

The precondition that was failing was fixed, and some tests
were added/modified.

Change-Id: I557eea7cd994d6a2ed38893d283d08107e78f789
Reviewed-on: http://gerrit.cloudera.org:8080/6465
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-24 21:22:50 +00:00
Taras Bobrovytsky
a50c344077 IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.

Testing:
- Added new planner and end to end tests that cover the new
  functionality.
- Updated existing tests to reflect the new behavior.

Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:

SELECT
  COUNT(ss_sold_time_sk),
  COUNT(ss_item_sk),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
) t

Before:
Total Time: 43s164ms

Summary:
Operator          #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE           1  224.721us  224.721us        1           1   28.00 KB        -1.00 B  FINALIZE
12:EXCHANGE            1   24.578us   24.578us        3           1          0        -1.00 B  UNPARTITIONED
11:AGGREGATE           3    2s402ms    3s060ms        3           1  119.00 KB       10.00 MB
00:UNION               3   35s380ms   37s846ms  288.01M     288.01M    3.08 MB              0
|--02:SCAN HDFS        3  184.197ms  219.931ms   28.80M      28.80M  535.03 MB        1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS        3  131.956ms  153.401ms   28.80M      28.80M  534.98 MB        1.88 GB  store_sales_unpartitioned
|--04:SCAN HDFS        3  178.456ms  247.721ms   28.80M      28.80M  534.98 MB        1.88 GB  store_sales_unpartitioned
|--05:SCAN HDFS        3  189.398ms  242.251ms   28.80M      28.80M  535.01 MB        1.88 GB  store_sales_unpartitioned
|--06:SCAN HDFS        3  122.786ms  156.528ms   28.80M      28.80M  534.98 MB        1.88 GB  store_sales_unpartitioned
|--07:SCAN HDFS        3  147.467ms  183.391ms   28.80M      28.80M  535.13 MB        1.88 GB  store_sales_unpartitioned
|--08:SCAN HDFS        3  147.502ms  186.273ms   28.80M      28.80M  535.01 MB        1.88 GB  store_sales_unpartitioned
|--09:SCAN HDFS        3  130.086ms  154.682ms   28.80M      28.80M  535.04 MB        1.88 GB  store_sales_unpartitioned
|--10:SCAN HDFS        3  122.701ms  161.056ms   28.80M      28.80M  534.89 MB        1.88 GB  store_sales_unpartitioned
01:SCAN HDFS           3  287.863ms  330.436ms   28.80M      28.80M  534.98 MB        1.88 GB  store_sales_unpartitioned

After:
Total Time: 19s139ms

Summary:
Operator          #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE           1  166.241us  166.241us        1           1   28.00 KB        -1.00 B  FINALIZE
12:EXCHANGE            1   71.695us   71.695us        3           1          0        -1.00 B  UNPARTITIONED
11:AGGREGATE           3    2s971ms    3s809ms        3           1    3.08 MB       10.00 MB
00:UNION               3  207.956ms  222.846ms  288.01M     288.01M          0              0
|--02:SCAN HDFS        3    1s533ms    1s535ms   28.80M      28.80M  532.28 MB        1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS        3    1s554ms    1s669ms   28.80M      28.80M  525.73 MB        1.88 GB  store_sales_unpartitioned
|--04:SCAN HDFS        3    1s568ms    1s716ms   28.80M      28.80M  525.03 MB        1.88 GB  store_sales_unpartitioned
|--05:SCAN HDFS        3    1s503ms    1s617ms   28.80M      28.80M  527.43 MB        1.88 GB  store_sales_unpartitioned
|--06:SCAN HDFS        3    1s560ms    1s634ms   28.80M      28.80M  528.52 MB        1.88 GB  store_sales_unpartitioned
|--07:SCAN HDFS        3    1s489ms    1s643ms   28.80M      28.80M  534.81 MB        1.88 GB  store_sales_unpartitioned
|--08:SCAN HDFS        3    1s534ms    1s581ms   28.80M      28.80M  528.10 MB        1.88 GB  store_sales_unpartitioned
|--09:SCAN HDFS        3    1s558ms    1s674ms   28.80M      28.80M  526.77 MB        1.88 GB  store_sales_unpartitioned
|--10:SCAN HDFS        3    1s504ms    1s692ms   28.80M      28.80M  527.83 MB        1.88 GB  store_sales_unpartitioned
01:SCAN HDFS           3    1s682ms    1s911ms   28.80M      28.80M  526.14 MB        1.88 GB  store_sales_unpartitioned

Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Reviewed-on: http://gerrit.cloudera.org:8080/5816
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-21 22:24:01 +00:00
Joe McDonnell
5bb988b1c5 IMPALA-4996: Single-threaded KuduScanNode
This introduces KuduScanNodeMt, the single-threaded version
of KuduScanNode that materializes the tuples in GetNext().
KuduScanNodeMt is enabled by the same condition as
HdfsScanNodeMt: mt_dop is greater than or equal to 1.

To share code between the two implementations, KuduScanNode
and KuduScanNodeMt are now subclasses of KuduScanNodeBase,
which implements the shared code. The KuduScanner is
minimally impacted, as it already had the required GetNext
interface.

Since the KuduClient is a heavy-weight object, it is now
shared at the QueryState level. We try to share the
KuduClient as much as possible, but there are times when
the KuduClient cannot be shared. Each Kudu table has
master addresses stored in the Hive Metastore. We only
share KuduClients for tables that have an identical value
for the master addresses. In the ideal case, every Kudu
table will have the same value, but there is no explicit
guarantee of this.

The testing for this is a modified version of
kudu-scan-node.test run with various mt_dop values.

Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Reviewed-on: http://gerrit.cloudera.org:8080/6312
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-17 19:33:31 +00:00
Taras Bobrovytsky
529a5f99b9 IMPALA-4787: Optimize APPX_MEDIAN() memory usage
Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator       #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail
-------------------------------------------------------------------------------------------------------------------------
06:AGGREGATE        1  124.726us  124.726us       1           1  28.00 KB        -1.00 B  FINALIZE
05:EXCHANGE         1   29.544us   29.544us       3           1         0        -1.00 B  UNPARTITIONED
02:AGGREGATE        3   86.406us  120.372us       3           1  44.00 KB       10.00 MB
04:AGGREGATE        3    1s840ms    2s824ms   2.00K          -1   1.02 GB      128.00 MB  FINALIZE
03:EXCHANGE         3    1s163ms    1s989ms   6.00K          -1         0              0  HASH(c1)
01:AGGREGATE        3    3s356ms    3s416ms   6.00K          -1   1.95 GB      128.00 MB  STREAMING
00:SCAN HDFS        3   64.962ms   65.490ms  65.54M          -1  25.97 MB       64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator       #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail
------------------------------------------------------------------------------------------------------------------------
06:AGGREGATE        1   73.961us  73.961us       1           1  28.00 KB        -1.00 B  FINALIZE
05:EXCHANGE         1   18.101us  18.101us       3           1         0        -1.00 B  UNPARTITIONED
02:AGGREGATE        3   75.795us  83.969us       3           1  44.00 KB       10.00 MB
04:AGGREGATE        3    1s608ms   2s683ms   2.00K          -1   1.02 GB      128.00 MB  FINALIZE
03:EXCHANGE         3  826.683ms   1s322ms   6.00K          -1         0              0  HASH(c1)
01:AGGREGATE        3    2s457ms   2s672ms   6.00K          -1   3.14 GB      128.00 MB  STREAMING
00:SCAN HDFS        3   81.514ms  89.056ms  65.54M          -1  25.94 MB       64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator       #Hosts   Avg Time   Max Time    #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail
---------------------------------------------------------------------------------------------------------------------
06:AGGREGATE        1  114.686us  114.686us        1           1  28.00 KB        -1.00 B  FINALIZE
05:EXCHANGE         1   18.214us   18.214us        3           1         0        -1.00 B  UNPARTITIONED
02:AGGREGATE        3  147.055us  165.464us        3           1  28.00 KB       10.00 MB
04:AGGREGATE        3    2s043ms    2s147ms   14.82K          -1   4.94 GB      128.00 MB  FINALIZE
03:EXCHANGE         3  840.528ms  943.254ms   15.61K          -1         0              0  HASH(ss_customer_sk)
01:AGGREGATE        3    1s769ms    1s869ms   15.61K          -1   5.32 GB      128.00 MB  STREAMING
00:SCAN HDFS        3   17.941ms   37.109ms  183.59K          -1   1.94 MB       16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator       #Hosts   Avg Time   Max Time    #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail
---------------------------------------------------------------------------------------------------------------------
06:AGGREGATE        1  125.915us  125.915us        1           1  28.00 KB        -1.00 B  FINALIZE
05:EXCHANGE         1   72.179us   72.179us        3           1         0        -1.00 B  UNPARTITIONED
02:AGGREGATE        3   79.054us   83.385us        3           1  28.00 KB       10.00 MB
04:AGGREGATE        3    6.559ms    7.669ms   14.82K          -1  17.32 MB      128.00 MB  FINALIZE
03:EXCHANGE         3   67.370us   85.068us   15.60K          -1         0              0  HASH(ss_customer_sk)
01:AGGREGATE        3   19.245ms   24.472ms   15.60K          -1   9.48 MB      128.00 MB  STREAMING
00:SCAN HDFS        3   53.173ms   55.844ms  183.59K          -1   1.18 MB       16.00 MB  tpcds_parquet.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Reviewed-on: http://gerrit.cloudera.org:8080/6025
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-16 05:59:40 +00:00
Joe McDonnell
6441ca65bd IMPALA-5039: Fix variability in parquet dictionary filtering test
The tests for dictionary filtering look at how many row groups are
processed and how many are filtered by matching text in the profile.
However, the number of row groups processed and filtered by any
individual fragment depends on how the work is split and how many
impalads are running. This causes variability in the test output.

To fix this, the test needs a way to aggregate the results across
fragments. This fix introduces the following syntax for specifying
these aggregates:
aggregate(function_name, field_name): expected_value
This searches the runtime profile for lines that contain
'field_name: number'. It skips the averaged fragment, as this is
derived from all the other fragments.

Currently, only SUM is implemented, and the expected_value is
required to be an integer. It should be easy to implement other
interesting functions like COUNT and MIN/MAX. It would also be
possible to extend it to floats.

Switching the dictionary filtering tests over to this new syntax
eliminates the variability in the tests.

Change-Id: I6b7b84d973b3ac678a24e82900f2637d569158bb
Reviewed-on: http://gerrit.cloudera.org:8080/6301
Tested-by: Impala Public Jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2017-03-13 17:37:15 +00:00
Lars Volker
32d45f4262 IMPALA-4615: Fix create_table.sql command order
INSERT OVERWRITE commands in Hive will only affect partitions that Hive
knows about. If an external table gets dropped and recreated, then
'MSCK REPAIR TABLE' needs to be executed to recover any preexisting
partitions. Otherwise, an INSERT OVERWRITE will not remove the data
files in those partitions and will fail to move the new data in place.

More information can be found here:
http://www.ericlin.me/hive-insert-overwrite-does-not-remove-existing-data

I tested the fix by running the following commands, making sure that the
second run of the .sql script completed without errors and validating
the number of lines was correct (10) after both runs.

export JDBC_URL="jdbc:hive2://${HS2_HOST_PORT}/default;"
export HS2_HOST_PORT=localhost:11050

beeline -n $USER -u "${JDBC_URL}" -f ${IMPALA_HOME}/testdata/avro_schema_resolution/create_table.sql
beeline -n $USER -u "${JDBC_URL}" -f ${IMPALA_HOME}/testdata/avro_schema_resolution/create_table.sql

Change-Id: I0f68eeb75ba2f43b96b8f3d82f902e291d3bd396
Reviewed-on: http://gerrit.cloudera.org:8080/6317
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-09 06:00:15 +00:00
Alex Behm
7d8acee814 IMPALA-4725: Query option to control Parquet array resolution.
Summary of changes:
Introduces a new query option PARQUET_ARRAY_RESOLUTION to
control the path-resolution behavior for Parquet files
with nested arrays. The values are:
- THREE_LEVEL
  Assumes arrays are encoded with the 3-level representation.
  Also resolves arrays encoded with a single level.
  Does not attempt a 2-level resolution.
- TWO_LEVEL
  Assumes arrays are encoded with the 2-level representation.
  Also resolves arrays encoded with a single level.
  Does not attempt a 3-level resolution.
- TWO_LEVEL_THEN_THREE_LEVEL
  First tries to resolve assuming the 2-level representation,
  and if unsuccessful, tries the 3-level representation.
  Also resolves arrays encoded with a single level.
  This is the current Impala behavior and is used as the
  default value for compatibility.

Note that 'failure' to resolve a schema path with a given
array-resolution policy does not necessarily mean a warning or
error is returned by the query. A mismatch might be treated
like a missing field which is necessary to support schema
evolution. There is no way to reliably distinguish the
'bad resolution' and 'legitimately missing field' cases.

The new query option is independent of and can be combined
with the existing PARQUET_FALLBACK_SCHEMA_RESOLUTION.

Background:
Arrays can be represented in several ways in Parquet:
- Three Level Encoding (standard)
- Two Level Encoding (legacy)
- One Level Encoding (legacy)
More details are in the "Lists" section of the spec:
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md

Unfortunately, there is no reliable metadata within Parquet files
to indicate which encoding was used. There is even the possibility
of having mixed encodings within the same file if there are multiple
arrays.

As a result, Impala currently tries to auto-detect the file encoding
when resolving a schema path in a Parquet file using the
TWO_LEVEL_THEN_THREE_LEVEL policy.

However, regardless of whether a Parquet data file uses the 2-level
or 3-level encoding, the index-based resolution may return incorrect
results if the representation in the Parquet file does not
exactly match the attempted array-resoution policy. Intuitively,
when attempting a 2-level resolution on a 3-level file, the matched
schema node may not be deep enough in the schema tree, but could still
be a scalar node with expected type. Similarly, when attempting a
3-level resolution on a 2-level file a level may be incorrectly
skipped.

The name-based policy generally does not have this problem because it
avoids traversing incorrect schema paths. However, the index-based
resoution allows a different set of schema-evolution operations,
so just using name-based resolution is not an acceptable workaround
in all cases.

Testing:
- Added new Parquet data files that show how incorrect results
  can be returned with a mismatched file encoding and resolution
  policy. Added both 2-level and 3-level versions of the data.
- Added a new test in test_nested_types.py that shows the behavior
  with the new PARQUET_ARRAY_RESOLUTION query option.
- Locally ran test_scanners.py and test_nested_types.py on core.

Change-Id: I4f32e19ec542d4d485154c9d65d0f5e3f9f0a907
Reviewed-on: http://gerrit.cloudera.org:8080/6250
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-09 05:07:44 +00:00
Alex Behm
d3cc23e569 IMPALA-5021: Fix count(*) remaining rows overflow in Parquet.
Zero-slot scans of Parquet files that have num_rows > MAX_INT32
in the footer metadata used to run forever due to an overflow when
calculating the remaining number of rows to process.

Testing:
- Added a regression test using a file with num_rows = 2*MAX_INT32.
- Locally ran test_scanners.py which succeeded.
- Private core/hdfs run succeeded

Change-Id: Ib9f8a6b83f8f621451d5977423ef81a6e4b124bd
Reviewed-on: http://gerrit.cloudera.org:8080/6286
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-08 02:00:30 +00:00
Lars Volker
768fc0ea27 IMPALA-4734: Set parquet::RowGroup::sorting_columns
This changes the HdfsParquetTableWriter to populate the
parquet::RowGroup::sorting_columns list with all columns mentioned in a
'sortby()' hint within INSERT statements. The columns are added to the
list in the order in which they appear inside the hint.

The change also adds backports.tempfile to the python requirements to
provide 'tempfile.TemporaryDirectory' on python 2.7.

The change also changes the default ordering for columns mentioned in
'sortby()' hints from descending to ascending.

To test this change, we write a table with a 'sortby()' hint and verify,
that the sorting_columns get populated correctly.

Change-Id: Ib42aab585e9e627796e9510e783652d49d74b56c
Reviewed-on: http://gerrit.cloudera.org:8080/6219
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-07 09:07:05 +00:00
Joe McDonnell
9b923a1a27 IMPALA-4624: Implement Parquet dictionary filtering
Here is a basic summary of the changes:
Frontend looks for conjuncts that operate on a single slot and pass a
map from slot id to the conjunct index through thrift to the backend.
The conjunct indices are the indices into the normal PlanNode conjuncts list.
The conjuncts need to satisfy certain conditions:
1. They are bound on a single slot
2. They are deterministic (no random functions)
3. They evaluate to FALSE on a NULL input. This is because the dictionary
does not include NULLs, so any condition that evaluates to TRUE on NULL
cannot be evaluated by looking only at the dictionary.

The backend converts the indices into ExprContexts. These are cloned in
the scanner threads.

The dictionary read codepath has been removed from ReadDataPage into its
own function, InitDictionary. This has also been turned into its own step
in row group initialization. ReadDataPage will not see any dictionary
pages unless the parquet file is invalid.

For dictionary filtering, we initialize dictionaries only as needed to evaluate
the conjuncts. The Parquet scanner evaluates the dictionary filter conjuncts on the
dictionary to see if any dictionary entry passes. If no entry passes, the row
group is eliminated. If the row group passes the dictionary filtering, then we
initialize all remaining dictionaries.

Dictionary filtering is controlled by a new query option,
parquet_dictionary_filtering, which is on by default.

Since column chunks can have a mixture of encodings, dictionary filtering
uses three tests to determine whether this is purely dictionary encoded:
1. If the encoding_stats is in the parquet file, then use it to determine if
there are only dictionary encoded pages (i.e. there are no data pages with
an encoding other than PLAIN_DICTIONARY).
-OR-
2. If the encoding stats are not present, then look at the encodings. The column
is purely dictionary encoded if:
a) PLAIN_DICTIONARY is present
AND
b) Only PLAIN_DICTIONARY, RLE, or BIT_PACKED encodings are listed
-OR-
3. If this file was written by an older version of Impala, then we know that
dictionary failover happens when the dictionary reaches 40,000 values.
Dictionary filtering can proceed as long as the dictionary is smaller than
that.

parquet-mr writes the encoding list correctly in the current version in our
environment (1.5.0). This means that check #2 works on some existing files
(potentially most existing parquet-mr files).
parquet-mr writes the encoding stats starting in 1.9.0. This is the version
where check #1 will start working.

Impala's parquet writer now implements both, so either check above will work.

Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7
Reviewed-on: http://gerrit.cloudera.org:8080/5904
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-06 23:20:34 +00:00
Nathan Salmon
34353218ce IMPALA-4675: Case-insensitive matching of Parquet fields.
The query option PARQUET_FALLBACK_SCHEMA_RESOLUTION
allows matching of Parquet fields by name instead of by
index (the default).

Parquet column names are case sensitive, but Impala treats
db/table/column/field names as case-insensitive. Today,
there is no way today to select Parquet columns with mixed
casing via SQL using the name-based field resolution policy.

This patch changes the matching of Parquet fields to be
case-insensitive.

Testing:
- Modified the data files backing complextypestbl
  to contain fields with mixed casing.
- Several existing tests run against this table,
  including the test for name-based resolution.
- I confirmed that without this fix, the existing
  name-based resolution tests fail on the modified
  data files.
- I locally ran test_scanners.py and test_nested_types.py
  on exhaustive with this fix.

Change-Id: I87395f84ba29b4c3d8e41be1ea4e89e500b8a9f4
Reviewed-on: http://gerrit.cloudera.org:8080/5891
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-03 10:20:07 +00:00
Lars Volker
996fb5eab3 IMPALA-2328: Address additional comments
- test_parquet_stats.py was missing and the tests weren't run during
GVO.

- The tests in parquet_stats.test assume that the queries were executed
in a single fragment, so they now run with 'num_nodes = 1'.

- Parquet columns are now resolved correctly.

- Parquet files with missing columns are now handled correctly.

- Predicates with implicit casts can now be evaluated against
parquet::Statistics.

- This change also cleans up some old friend declarations I came across.

Change-Id: I54c205fad7afc4a0b0a7d0f654859de76db29a02
Reviewed-on: http://gerrit.cloudera.org:8080/6147
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-03 02:34:10 +00:00
Dan Hecht
bf2e897209 IMPALA-4810: add DECIMAL test case to strict_mode tests
The string parsing code already errors if the decimal column either
overflows or underflows (i.e. loses scale). Let's just add a test
case.

Change-Id: Idd66c0fb5a4d201919d39f73dea08b87339d6469
Reviewed-on: http://gerrit.cloudera.org:8080/6150
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-03 01:43:42 +00:00