Commit Graph

855 Commits

Author SHA1 Message Date
Thomas Tauber-Marshall
4b486b0f90 IMPALA-1861: Simplify conditionals with constant conditions
When there are conditionals with constant values of TRUE or
FALSE we can simplify them during analysis using the ExprRewriter.

This patch introduces the SimplifyConditionalsRule with covers IF,
OR, AND, CASE, and DECODE.

It also introduces NormalizeExprsRule which normalizes AND and OR
such that if either child is a BoolLiteral, then the left child is a
BoolLiteral.

Testing:
- Added unit tests to ExprRewriteRulesTest.
- Added functional tests to expr.test
- Ran FE planner tests and BE expr-test.

Change-Id: Id70aaf9fd99f64bd98175b7e2dbba28f350e7d3b
Reviewed-on: http://gerrit.cloudera.org:8080/5585
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
2017-01-24 03:22:08 +00:00
Alex Behm
7438730052 IMPALA-4767: Workaround for HIVE-15653 to preserve table stats.
HIVE-15653 is a Hive Metastore bug that results in ALTER TABLE
commands wiping the table stats of unpartitioned tables.

Until the Hive bug is fixed, this patch adds a workaround
to Impala that forces the Metastore to preserve the table stats.

Testing: Private core/hdfs run passed.

Change-Id: Ic191c765f73624bc716badadd7215c8dca9d6b1f
Reviewed-on: http://gerrit.cloudera.org:8080/5731
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-01-20 01:18:10 +00:00
Tim Armstrong
69859bddfb IMPALA-4549: consistently treat 9999 as upper bound for timestamp year
Previously Impala was inconsistent about whether the year 10000 was
supported, as a result of inconsistency in boost, which reported the
maximum year as 9999 but sometimes allowed 10000. This meant that
Impala sometimes accepted the year 10000 and sometimes not.

Use the patched boost version and update tests accordingly.

Testing:
Ran an exhaustive build.

Change-Id: Iaf23b40833017789d879e5da7bb10384129e2d10
Reviewed-on: http://gerrit.cloudera.org:8080/5665
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-01-19 00:04:27 +00:00
Thomas Tauber-Marshall
89a3d3c1eb IMPALA-4716: Expr rewrite causes IllegalStateException
The DECODE constructor in CaseExpr uses the same decodeExpr object when
building the BinaryPredicates that compare the decodeExpr to each 'when'
of the DECODE. This causes problems when different BinaryPredicates try
to cast the same decodeExpr object to different types during analysis,
in this case leading to a Precondition check failure.

The solution is to clone the decodeExpr in the DECODE constructor in
CaseExpr for each generated BinaryPredicate.

Testing:
- Added a regression test to exprs.test

Change-Id: I4de9ed7118c8d18ec3f02ff74c9cca211c716e51
Reviewed-on: http://gerrit.cloudera.org:8080/5631
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Impala Public Jenkins
2017-01-13 04:44:07 +00:00
Joe McDonnell
5755261954 IMPALA-4036: invalid SQL generated for partitioned table with comment
For a table that has both a table comment and a partition specified,
"show create table" incorrectly outputs the comment before the partition.
This is not the correct order, and it results in an invalid SQL.

This transaction fixes the ordering (partition comes before comment) and
adds tests for this case.

Change-Id: I29a33cfd142b473997fdc3acfe3f0966bc7ed784
Reviewed-on: http://gerrit.cloudera.org:8080/5648
Tested-by: Impala Public Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2017-01-12 20:41:35 +00:00
Marcel Kornacker
70ae2e38eb IMPALA-4739: ExprRewriter fails on HAVING clauses
The bug was that expr rewrite rules such as ExtractCommonConjunctRule
analyzed their own output, which doesn't work for syntactic elements
that allow column aliases, such as the HAVING clause.
The fix was to remove the analysis step (the re-analysis happens anyway
in AnalysisCtx).

Change-Id: Ife74c61f549f620c42f74928f6474e8a5a7b7f00
Reviewed-on: http://gerrit.cloudera.org:8080/5662
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Impala Public Jenkins
2017-01-12 02:31:44 +00:00
Tim Armstrong
95ed4434f2 IMPALA-3202,IMPALA-2079: rework scratch file I/O
Refactor BufferedBlockMgr/TmpFileMgr to push more I/O logic into
TmpFileMgr, in anticipation of it being shared with BufferPool.
TmpFileMgr now handles:
* Scratch space allocation and recycling
* Read and write I/O

The interface is also greatly changed so that it is built around Write()
and Read() calls, abstracting away the details of temporary file
allocation from clients. This means the TmpFileMgr::File class can
be hidden from clients.

Write error recovery:
Also implement write error recovery in TmpFileMgr.

If an error occurs while writing to scratch and we have multiple
scratch directories, we will try one of the other directories
before cancelling the query. File-level blacklisting is used to
prevent excessive repeated attempts to resize a scratch file during
a single query. Device-level blacklisting is not implemented because
it is problematic to permanently take a scratch directory out of use.

To reduce the number of error paths, all I/O errors are now handled
asynchronously. Previously errors creating or extending the file were
returned synchronously from WriteUnpinnedBlock(). This required
modifying DiskIoMgr to create the file if not present when opened.

Also set the default max_errors value in the thrift definition file,
so that it is in effect for backend tests.

Future Work:
* Support for recycling variable-length scratch file ranges. I omitted
  this to avoid making the patch even large.

Testing:
Updated BufferedBlockMgr unit test to reflect changes in behaviour:
* Scratch space is no longer permanently associated with a block, and
  is remapped every time a new block is written to disk .
* Files are now blacklisted - updated existing tests and enable the
  disable blacklisting test.

Added some basic testing of recycling of scratch file ranges in
the TmpFileMgr unit test.

I also manually tested the code in two ways. First by removing permissions
for /tmp/impala-scratch and ensuring that a spilling query fails cleanly.
Second, by creating a tiny ramdisk (16M) and running with two scratch
directories: one on /tmp and one on the tiny ramdisk. When spilling, an
out of space error is encountered for the tiny ramdisk and impala spills
the remaining data (72M) to /tmp.

Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17
Reviewed-on: http://gerrit.cloudera.org:8080/5141
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-01-05 02:26:24 +00:00
Amos Bird
b3636c97d4 IMPALA-4033: Treat string-partition key values as case sensitive.
This commit makes ADD PARTITION operations treat string partition-key
values as case sensitive in consistent with other related partition DDL
operations.

Change-Id: I6fbe67d99df8a50a16a18456fde85d03d622c7a1
Reviewed-on: http://gerrit.cloudera.org:8080/5535
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-12-22 10:45:39 +00:00
Lars Volker
ce9b332ee9 IMPALA-4163: Add sortby() query hint
This change introduces the sortby() query plan hint for insert
statements. When specified, sortby(a, b) will add an additional sort
step to the plan to order data by columns a, b before inserting it into
the target table.

Change-Id: I37a3ffab99aaa5d5a4fd1ac674b3e8b394a3c4c0
Reviewed-on: http://gerrit.cloudera.org:8080/5051
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2016-12-17 05:37:43 +00:00
Lars Volker
1e683d4ee6 IMPALA-4403: Implement SHOW RANGE PARTITIONS for Kudu tables
Change-Id: Idf5b2fdd02938a42fa59ec98884e4ac915dd1f65
Reviewed-on: http://gerrit.cloudera.org:8080/5390
Reviewed-by: Lars Volker <lv@cloudera.com>
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2016-12-10 00:05:50 +00:00
Tim Armstrong
88448d1d4a IMPALA-4586: don't constant fold in backend
This patch ensures that setting the query option
enable_expr_rewrites=false will disable both constant folding in the
frontend (which it did already) and constant caching in the backend
(which is enabled in this patch). This gives a way for users to revert
to the old behaviour of non-deterministic UDFs before these
optimisations were added in Impala 2.8.

Before this patch, the backend would cache values based on IsConstant().
This meant that there was no way to override caching of values of
non-deterministic UDFs, e.g. with enable_expr_rewrites.

After this patch, we only cache literal values in the backend. This
offers the same performance as before in the common case where the
frontend will constant fold the expressions anyway.

Also rename some functions to more cleanly separate the backend concepts
of "constant" expressions and expressions that can be evaluated without
a TupleRow. In a future change (IMPALA-4617) we should remove the
IsConstant() analysis logic from the backend entirely and pass the
information from the frontend. We should also fix isConstant() in the
frontend so that it only returns true when it is safe to constant-fold
the expression (IMPALA-4606). Once that is done, we could revert back
to using IsConstant() instead of IsLiteral().

Testing:
Added targeted test to test constant folding of UDFs: we expect
different results depending on whether constant folding is enabled.

Also run TestUdfs with expr rewrites enabled and disabled, since this
can exercise different code paths. Refactored test_udfs somewhat to
avoid running uninteresting combinations of query options for
targeted tests and removed some 'drop * if not exists' statements
that aren't necessary when using unique_database.

This change revealed flakiness in test_mem_limit, which seems
to have only worked by coincidence. Updated TrackAllocation() to
actually set the query status when a memory limit is exceeded.
Looped this test for a while to make sure it isn't flaky any
more.

Also fix other test bugs where the vector argument is modified
in-place, which can leak out to other tests.

Change-Id: I0c76e3c8a8d92749256c312080ecd7aac5d99ce7
Reviewed-on: http://gerrit.cloudera.org:8080/5391
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2016-12-08 04:53:53 +00:00
Dimitris Tsirogiannis
5ea1798661 IMPALA-4619: Allow NULL as default value in Kudu tables
This commit fixes an issue where an error is thrown if the default value
for a Kudu column is set to NULL.

Change-Id: Ida27ce56f1dd7603485a69c680db3bcea6702aff
Reviewed-on: http://gerrit.cloudera.org:8080/5405
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2016-12-08 04:53:38 +00:00
Taras Bobrovytsky
1083639ff2 IMPALA-4585: Allow the $DATABASE template in the CATCH section
In a recent change (IMPALA-4363) we introduced a change where all file
paths in .test files should be replaced with '__HDFS_FILENAME__'. This
caused problems for tests on non-HDFS file systems and we also lost some
test coverage. This patch fixes the problem by allowing the $DATABASE
template in the catch section of the .test file.

Change-Id: If0f6ae8dea7ac4cdaf0c61ebd8f0c589c353a96e
Reviewed-on: http://gerrit.cloudera.org:8080/5372
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
2016-12-08 02:20:50 +00:00
Michael Ho
9337518137 IMPALA-4595: Ignore discarded functions after linking
For LLVM IR UDF, Impalad will link an external LLVM module
in which the IR UDF is defined with the main module. If it
happens that a symbol is defined in both modules, LLVM may
choose to discard the one defined in the external module.
The discarded function and its callee will not be present
in the linked module.

In IMPALA-4595, udf-sample.cc was compiled without any
optimization. Duplicated definition such as StringVal::null()
may have different inlining level between the external module
and the main module. When the duplicated definition in
the external module is discarded, some of its callee
functions (which are not inlined) may not be defined in the
main module so they can no longer be located in the linked
module. This trips up some code in the LlvmCodegen::LinkModule().
In particular, when parsing for functions in external module
which are materialized during linking, certain functions may
not be present due to the reason above. Impalad will hit
a DCHECK in debug build or crash due to null pointer access
in release build.

This change fixes the problem above by taking into account
that certain functions may not be defined anymore after linking.
This change also fixes two incorrect status propagation in
fe-support.cc.

Change-Id: Iaa056a0c888bfcc95b412e1bc1063bb607b58ab7
Reviewed-on: http://gerrit.cloudera.org:8080/5384
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
2016-12-07 22:52:35 +00:00
Bharath Vissapragada
bb63339377 IMPALA-3314: Fix Avro schema loading for partitioned tables.
Bug: Commit 6f31c7 fixed a crash when setting Avro schemas for
tables with storage altered to Avro file format. However the
fix was incomplete for partitioned/multi file format tables since
'hasAvroData_' is not set for all code paths that load the
partitioned tables (For example: HdfsTable#loadAllPartitions()).

Fix: Moved the code for setting 'hasAvroData_' to addPartition()
which is the common logic for all code paths adding new partitions.
Also fixed the test coverage gap by adding a new test for partitioned
tables altered to Avro format.

Change-Id: I7854ff002b2277ec4a5388216218a1d5ad142de8
Reviewed-on: http://gerrit.cloudera.org:8080/5388
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-12-07 09:45:11 +00:00
Dan Burkert
f83652c1da Replace INTO N BUCKETS with PARTITIONS N in CREATE TABLE
This commit also removes the now unused `DISTRIBUTE`, `SPLIT`, and
`BUCKETS` keywords that were going to be newly released in Impala 2.6,
but are now unused. Additionally, a few remaining uses of the
`DISTRIBUTE BY` syntax has been switched to `PARTITION BY`.

Change-Id: I32fdd5ef26c532f7a30220db52bdfbf228165922
Reviewed-on: http://gerrit.cloudera.org:8080/5382
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2016-12-07 07:31:16 +00:00
Matthew Jacobs
5188f879a7 IMPALA-4477: Bump Kudu version to latest master (60aa54e)
Bumps the toolchain version to get a newer Kudu build.

Also fixes test failures resulting from changes in Kudu.
Notably error strings have changed (IMPALA-4590) and the
number of replicas must be odd (IMPALA-4589).

Note: The toolchain binaries starting with this build are
now using the toolchain binutils rather than the system
binutils.

Testing: private exhaustive build.

Change-Id: If1912f058c240fbe82b06f77e31add7755289be1
Reviewed-on: http://gerrit.cloudera.org:8080/5369
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2016-12-07 05:11:13 +00:00
Alex Behm
534999382d IMPALA-4574: Do not treat UUID() like a constant expr.
A recent change (IMPALA-1788) lead UUID() to be constant folded,
and therefore, produce the same value for every invocation across
rows. Similar issues might also occur due to the BE optimizing
UUID() during codegen of scalar-fn-call.h/cc.

The fix is to not treat UUID() like a constant expr in both
the FE and BE.

Discussion:
The fix in this patch is rather blunt, but minimally invasive to
reduce the risk of adding new bugs. Ideally, the constness of an
Expr should be determined in one place and the FE and BE should agree
on which Exprs are constant. I considered the following alternatives
but concluded they were too risky:
1. Pass a flag from FE to BE for ever Expr indicating its constness.
   This simple solution would populate a thrift field with the
   result of Expr.isConstant() for every Expr in an Expr tree.
   There are several issues. Calling isConstant() for every Expr
   in an Expr tree is rather expensive due to repeated traversals
   of the tree. That could be mitigated by populating an isConstant
   flag during Expr.analyze() to avoid re-computing the constness
   repeatedly. This requires changes to analyze(), clone(), reset(),
   and possibly other places for many Exprs. There is potential
   for missing a place and adding a new bug.
2. The above solution could be limited to only FunctionCallExpr.
   However, the BE expr type FUNCTION_CALL which maps to
   scalar-fn-call.h/cc is created from various FE Exprs, not just
   FunctionCallExpr. So adding a flag only to scalar-fn-call.h/cc
   would be confusing because it would only sometimes be set
   in a meaningful way. This seems more confusing than the current
   straightforward solution.

Testing: Added FE and EE tests.

Change-Id: If2499f5f6ecdcb098623202c8e6dc2d02727194a
Reviewed-on: http://gerrit.cloudera.org:8080/5324
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-12-06 22:03:01 +00:00
Dimitris Tsirogiannis
cba93f1ac3 IMPALA-4561: Replace DISTRIBUTE BY with PARTITION BY in CREATE TABLE
Change-Id: I0e07c41eabb4c8cb95754cf04293cbd9e03d6ab2
Reviewed-on: http://gerrit.cloudera.org:8080/5317
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2016-12-06 10:41:53 +00:00
Dimitris Tsirogiannis
5bb9959fa4 IMPALA-4584: Make alter table operations on Kudu tables synchronous
This commit changes the behavior of alter table operations on Kudu
tables from asynchronous to synchronous. With this change, alter table
operations return when either the operations complete successfully or
a timeout is reached.

Change-Id: I385bce66691ae9040e72f97557e1bba31009e36b
Reviewed-on: http://gerrit.cloudera.org:8080/5364
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2016-12-06 03:53:15 +00:00
Dimitris Tsirogiannis
867b2434ca Additional functional testing for default values on Kudu tables
This commit also fixes an issue where an error is thrown if a default
value is set for a boolean column on a Kudu table.

Change-Id: I25b66275d29d1cf21df14e78ab58f625a83b0725
Reviewed-on: http://gerrit.cloudera.org:8080/5337
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
2016-12-06 00:04:43 +00:00
Tim Armstrong
1495b2007d IMPALA-4498: crash in to_utc_timestamp/from_utc_timestamp
The bugs was that the functions did not check whether the conversion
pushed the value out of range. The fix is to use boost's validation
immediately to check the validity of the timestamp and catch any
exceptions thrown.

It would be preferable to avoid the exceptions, but Boost does not
provide a straightforward way to disable the exceptions or extract
potentially-invalid values from a date object.

Testing:
Added expression tests that exercise out-of-range cases. Also
added additional tests to confirm that date addition and subtraction
weren't affected by similar bugs.

Change-Id: Idc427b06ac33ec874a05cb98d01c00e970d3dde6
Reviewed-on: http://gerrit.cloudera.org:8080/5251
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2016-12-05 23:33:44 +00:00
Dimitris Tsirogiannis
1da57019ad IMPALA-4579: SHOW CREATE VIEW fails for view containing a subquery
This commit fixes an issue where a SHOW CREATE VIEW statement throws an
analysis error if the view contains a subquery.

Change-Id: I4a89e46a022f0ccec198b6e3e2b30230103831ce
Reviewed-on: http://gerrit.cloudera.org:8080/5333
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2016-12-04 08:35:15 +00:00
Taras Bobrovytsky
858f5c2197 IMPALA-4363: Add Parquet timestamp validation
Before this patch, we would simply read the INT96 Parquet timestamp
representation and assume that it's valid. However, not all bit
permutations represent a valid timestamp. One of the boost functions
raised an exception (that we didn't catch) when passed an invalid
boost date object, which resulted in a crash. This patch fixes
problem by validating that the date falls into 1400..9999 year
range as we are scanning Parquet.

Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac
Reviewed-on: http://gerrit.cloudera.org:8080/5343
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2016-12-03 06:41:07 +00:00
Dimitris Tsirogiannis
da34ce9780 IMPALA-4527: Columns in Kudu tables created from Impala default to "NULL"
This commit reverts the behavior introduced by IMPALA-3719 which used
the Kudu default behavior for column nullability if none was specified
in the CREATE TABLE statement. With this commit, non-key columns of Kudu
tables that are created from Impala are by default nullable unless
specified otherwise.

Change-Id: I950d9a9c64e3851e11a641573617790b340ece94
Reviewed-on: http://gerrit.cloudera.org:8080/5259
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-12-02 02:06:22 +00:00
Tim Armstrong
b374061206 IMPALA-4564,IMPALA-4565: mt_dop fixes for old aggs and joins
Fix a test bug where we need to skip nested types tests for the old aggs
and joins.

Fix a product bug where *eos is not initialised by the MT scan node.
This causes incorrect results when the calling ExecNode does not
initialise the eos variable, e.g. the sort node and the old agg and join
nodes.

Testing:
Added a test that reproduces the incorrect results with the sort node
when run under ASAN

Tested the mt_dop tests locally with old aggs and joins to ensure they
pass.

Change-Id: I48c50c8aa0c23710eb099fba252bc3c0cb74b313
Reviewed-on: http://gerrit.cloudera.org:8080/5302
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2016-12-02 01:46:55 +00:00
Alex Behm
c97bffcce1 IMPALA-4458: Fix resource cleanup of cancelled mt scan nodes.
The bug was that HdfsScanNodeMt::Close() did not properly
clean up all in-flight resources when called through the
query cancellation path.

The main change is to clean up all resources when passing
a NULL batch into HdfsparquetScanner::Close() which also
needed similar changes in the scanner context.

Testing: Ran test_cancellation.py, test_scanners.py and
test_nested_types.py with MT_DOP=3. Added a test query
with a limit that was failing before.
A regular private hdfs/core test run succeeded.

Change-Id: Ib32f87b3289ed9e8fc2db0885675845e11207438
Reviewed-on: http://gerrit.cloudera.org:8080/5274
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-12-01 11:04:48 +00:00
Tim Armstrong
290db20dbc IMPALA-4554: fix projection of nested collections with mt_dop > 0
Change-Id: I42e72eae8dfa78f7d53708eb8f2f0da8b780692d
Reviewed-on: http://gerrit.cloudera.org:8080/5270
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2016-11-30 17:18:36 +00:00
Lars Volker
0f62bf35fd IMPALA-4550: Fix CastExpr analysis for substituted slots
During slot substitution, the type of the child of a CastExpr can
change. If the previous child type matched the CastExpr, then the cast
was flagged as noOp_. During substitution and subsequent re-analysis
the noOp_ flag was not revisited so that no cast was performed, even
after it had become necessary.

The fix is to always set noOp_ to the correct value in
CastExpr.analyze().

Change-Id: I7f29cdc359558fad6df455b8eec0e0eaed00e996
Reviewed-on: http://gerrit.cloudera.org:8080/5267
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-11-30 11:19:36 +00:00
Dimitris Tsirogiannis
9f497ba02f IMPALA-2890: Support ALTER TABLE statements for Kudu tables
With this commit, we add support for additional ALTER TABLE statements
against Kudu tables. The new supported ALTER TABLE operations for Kudu are:
- ADD/DROP range partitions. Syntax:
    ALTER TABLE <tbl_name> ADD [IF NOT EXISTS] RANGE <kudu_partition_spec>
    ALTER TABLE <tbl_name> DROP [IF EXISTS] RANGE <kudu_partition_spec>
- ADD/DROP/RENAME column. Syntax:
    ALTER TABLE <tbl_name> ADD COLUMNS (col_spec, [col_spec, ...])
    ALTER TABLE <tbl_name> DROP COLUMN <col_name>
    ALTER TABLE <tbl_name> CHANGE COLUMN <old> <new_name> <type>
- Rename Kudu table using the 'kudu.table_name' table property. Example:
  ALTER TABLE <tbl_name> SET TBLPROPERTY ('kudu.tbl_name'='<new_name>'),
  will change the underlying Kudu table name to <new_name>.
- Renaming the HMS/Catalog table entry of a Kudu table is supported using the
  existing ALTER TABLE <tbl_name> RENAME TO <new_tbl_name> syntax.

Not supported:
- ALTER TABLE <tbl_name> REPLACE COLUMNS

Change-Id: I04bc87e04e05da5cc03edec79d13cedfd2012896
Reviewed-on: http://gerrit.cloudera.org:8080/5136
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2016-11-30 04:55:03 +00:00
Tim Armstrong
16552f6eda IMPALA-4525: fix crash when codegen mem limit exceeded
The error path in OptimizeLlvmModule() has not worked correctly for a
long time because various places in the code assume that codegen'd
function pointers will be filled in (e.g. ScalarFnCall) . Since the
recent change "IMPALA-4397,IMPALA-3259: reduce codegen time and memory"
it is more likely to go down this path.

The cases when errors occur on this path: memory limit exceeded, internal
codegen bugs, and corrupt IR UDFs, are all cases when it is not correct
or safe to continue executing the query, so we should just fail the
query.

Testing:
Add a test where codegen reliably fails with memory limit exceeded.

Change-Id: Ib38d0a44b54c47617cad1b971244f477d344d505
Reviewed-on: http://gerrit.cloudera.org:8080/5211
Reviewed-by: Michael Ho <kwho@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-11-24 08:03:39 +00:00
Michael Ho
b82eed5ee0 IMPALA-4518: CopyStringVal() doesn't copy null string
Previously, CopyStringVal() mistakenly copies a null
StringVal as an empty string (i.e. a non-null string
with zero length). This change fixes the problem by
distinguishing between these two cases in CopyStringVal()
and handles them properly. Also added a test case for it.

This problem only started showing up recently due to
commit 51268c053f which
calls CopyStringVal() in OffsetFnInit(). All other
pre-existing callers of CopyStringVal() before that
commit checks if 'src' is null before calling it so
the problem never showed up. In that sense, this is
a latent bug exposed by the aforementioned commit.

Change-Id: I3a5b9349dd08556eba5cfedc8c0063cc59f5be03
Reviewed-on: http://gerrit.cloudera.org:8080/5198
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
2016-11-24 00:23:58 +00:00
Alex Behm
bbf5255d0e IMPALA-1788: Fold constant expressions.
Adds a new ExprRewriteRule for replacing constant expressions
with their literal equivalent via BE evaluation. Applies the
new rule together with the existing ones on the parse tree,
after analysis.

Limitations
- Constant folding is applied on the unresolved expressions.
  As a result, it only works for expressions that are constant
  within a single query block, as opposed to expressions that
  may become constant after fully substituting inline-view exprs.
- Exprs are not normalized, so some opportunities for constant
  folding are missed for certain expr-tree shapes.

This patch includes the following interesting changes:
- Introduces a timestamp literal that can only be produced
  by constant folding (not expressible directly via SQL).
- To make sure that rewrites have no user-visible effect,
  the original result types and column labels of the top-level
  statement are restored after the rewrites are performed.
- Does not fold exprs if their evaluation resulted in a
  warning or error, or if the resulting value is not
  representable by corresponding FE LiteralExpr.
- Fixes an existing issue with converting strings between
  the FE/BE. String produced in the BE that have characters
  with a value > 127 are not correctly deserialized into a
  Java String via thrift. We detect this case during constant
  folding and abandon folding of such exprs.
- Fixes several issues with detecting/reporting errors in
  NativeEvalConstExprs().
- Cleans up ExprContext::GetValue() into
  ExprContext::GetConstantValue() which clarifies its only use
  of evaluating exprs from the FE.

Testing:
- Modifies expr-test.cc to run all tests through the constant
  folding path.
- Adds basic planner and rewrite rule tests.
- Exhaustive test run passed

Change-Id: If672b703db1ba0bfc26e5b9130161798b40a69e9
Reviewed-on: http://gerrit.cloudera.org:8080/5109
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-11-23 21:11:30 +00:00
Tim Armstrong
4db330e69a IMPALA-4397,IMPALA-3259: reduce codegen time and memory
A handful of fixes to codegen memory usage:
* Delete the IR module when we're done with it (it can be fairly large)
* Track the compiled code size (typically not that large, but it can add
  up if there are many fragments).
* Estimate optimisation memory requirements and track it in the memory
  tracker. This is very crude but much better than not tracking it.

A handful of fixes to improve codegen time/cost, particularly targeted
at compute stats workloads:
* Avoid over-inlining when there are many aggregate functions,
  conjuncts, etc by adding "NoInline" attributes.
* Don't codegen non-grouping merge aggregations. They will only process
  one row per Impala daemon, so codegen is not worth it.
* Make the Hll algorithm more efficient by specialising the hash function
  based on decimal width.

Limitations:
* This doesn't tackle over-inlining of large expr trees, but a similar
  approach will be used there in a follow-on patch.

Perf:
Compute stats on functional_parquet.widetable_1000_cols goes from 1min+
of codegen to ~ 5s codegen on my machine. Local perf runs of tpc-h
and targeted perf showed no regressions and some moderate improvements
(1-2%).

Also did an experiment to understand the perf consequences of disabling
inlining. I manually set CODEGEN_INLINE_EXPRS_THRESHOLD to 0, and ran:

  drop stats tpch_20_parquet.lineitem
  compute stats tpch_20_parquet.lineitem;

There was no difference in time spent in the agg node: 30.7s with
inlining, 30.5s without.

Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Reviewed-on: http://gerrit.cloudera.org:8080/4956
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2016-11-23 08:18:17 +00:00
Alex Behm
8f2bb2f72f IMPALA-3809: Show Kudu-specific column metadata in DESCRIBE.
TODO:
- Corresponding changes to DESCRIBE EXTENDED/FORMATTED.

Testing:
A private core/hdfs run passed.

Change-Id: I83c91b540bc6d27cb4f21535fe12f3f8658c233e
Reviewed-on: http://gerrit.cloudera.org:8080/5125
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-11-22 23:06:05 +00:00
Lars Volker
8ea21d099f IMPALA-2523: Make HdfsTableSink aware of clustered input
IMPALA-2521 introduced clustering for insert statements. This change
makes the HdfsTableSink aware of clustered inputs, so that partitions
are opened, written, and closed one by one.

This change also adds/modifies tests in several ways:

- clustered insert tests switch from selecting all rows from
  alltypessmall to alltypes. Together with varying settings for
  batch_size, this results in a larger number of row batches being
  written.
- clustered insert tests select from alltypes instead of
  functional.alltypes to make sure we also select from various input
  formats.
- clustered insert tests have been added to select from alltypestiny to
  create inserts with 1 and 2 rows per partition respectively.
- exhaustive insert tests now use different values for batch_size: 1,
  16, 0 (meaning default, 1024). This is limited to uncompressed parquet
  files, to maintain a reasonable runtime. On my machine execution of
  test.insert took 1778 seconds, compared to 1002 seconds with the just
  default row batch size.
- There is additional testing in test_insert_behaviour.py to make sure
  that insertion over several row batches only creates one file per
  partition.
- It renames the test_insert method to make it unique in the file and
  allow for effective filtering with -k.
- It adds tests to the Analyzer test suite.

Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Reviewed-on: http://gerrit.cloudera.org:8080/4863
Reviewed-by: Lars Volker <lv@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-11-22 02:51:20 +00:00
Sailesh Mukil
178fd59142 IMPALA-4502: test_partition_ddl_predicates breaks on non-HDFS filesystems
This is because that test uses 'set cached' and 'set uncached' which
are not supported on non-HDFS filesystems. This patch creates a
separate test file for non-HDFS filesystems with only supported
queries and invokes the right file based on the filesystem.

Change-Id: I8606aa427cb6e50be3395cdde246abb53db5172c
Reviewed-on: http://gerrit.cloudera.org:8080/5164
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2016-11-22 00:42:57 +00:00
Dimitris Tsirogiannis
3db5ced4ce IMPALA-3726: Add support for Kudu-specific column options
This commit adds support for Kudu-specific column options in CREATE
TABLE statements. The syntax is:
CREATE TABLE tbl_name ([col_name type [PRIMARY KEY] [option [...]]] [, ....])
where option is:
| NULL
| NOT NULL
| ENCODING encoding_val
| COMPRESSION compression_algorithm
| DEFAULT expr
| BLOCK_SIZE num

The output of the SHOW CREATE TABLE statement was altered to include all the specified
column options for Kudu tables.

Change-Id: I727b9ae1b7b2387db752b58081398dd3f3449c02
Reviewed-on: http://gerrit.cloudera.org:8080/5026
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2016-11-18 11:41:01 +00:00
Taras Bobrovytsky
eb8120d218 IMPALA-3812: Fix error message for unsupported types
Before this patch an unclear error message was returned if DATE or
DATETIME appeared in the select list after a star expansion. This was
because DATE and DATETIME PrimitiveType was serialized as INVALID_TYPE.
This is fixed by serializing correctly.

Change-Id: I9019b4bfd219f94e554c795befd3ff5e39706ea9
Reviewed-on: http://gerrit.cloudera.org:8080/4859
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-11-17 05:31:34 +00:00
Thomas Tauber-Marshall
3833707dbd IMPALA-4466: Improve Kudu CRUD test coverage
The results in the test files were verified by hand.

This patch also introduces a new test section 'DML_RESULTS', which
takes the name of a table as a comment and the contents of the
table as its body and then verifies that the body matches the
actual contents of the table. This makes it easy to check that a
DML operation has the desired effect on the contents of a table,
rather than always having to add another test case that runs a
select on the table. For now, this section cannot be used in a
test along with the RESULTS or ERRORS sections.

TODO: Refactor the DML test case handling (IMPALA-4471)

Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6
Reviewed-on: http://gerrit.cloudera.org:8080/4953
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2016-11-17 02:54:30 +00:00
Thomas Tauber-Marshall
e6e2baea33 IMPALA-4372: 'Describe formatted' returns types in upper case
A recent change caused 'describe formatted' to display the types
in all upper case, but we want 'describe formatted' to match Hive's
'describe' output, which displays the types in lower case.

This patch also fixes several problems with test_describe_formatted,
which was encountering an error but reporting success.

Change-Id: I274b97d4d1247244247fb38a5ca7f4c10bba8d22
Reviewed-on: http://gerrit.cloudera.org:8080/4861
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2016-11-15 05:38:12 +00:00
Amos Bird
628685ae74 IMPALA-1654: General partition exprs in DDL operations.
This commit handles partition related DDL in a more general way. We can
now use compound predicates to specify a list of partitions in
statements like ALTER TABLE DROP PARTITION and COMPUTE INCREMENTAL
STATS, etc. It will also make sure some statements only accept one
partition at a time, such as PARTITION SET LOCATION and LOAD DATA. ALTER
TABLE ADD PARTITION remains using the old PartitionKeyValue's logic.

The changed partition related DDLs are as follows,

Table: p (i int) partitioned by (j int, k string)
Partitions:
+-------+---+-------+--------+------+--------------+-------------------+
| j     | k | #Rows | #Files | Size | Bytes Cached | Cache Replication |
+-------+---+-------+--------+------+--------------+-------------------+
| 1     | a | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        |
| 1     | b | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        |
| 1     | c | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        |
| 2     | d | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        |
| 2     | e | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        |
| 2     | f | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        |
| Total |   | -1    | 0      | 0B   | 0B           |                   |
+-------+---+-------+--------+------+--------------+-------------------+

1. show files in p partition (j<2, k='a');
2. alter table p partition (j<2, k in ("b","c") set cached in 'testPool';

// j can appear more than once,
3.1. alter table p partition (j<2, j>0, k<>"d") set uncached;
// it is the same as
3.2. alter table p partition (j<2 and j>0, not k="e") set uncached;
// we can also do 'or'
3.3. alter table p partition (j<2 or j>0, k like "%") set uncached;

// missing 'k' matches all values of k
4. alter table p partition (j<2) set fileformat textfile;
5. alter table p partition (k rlike ".*") set serdeproperties ("k"="v");
6. alter table p partition (j is not null) set tblproperties ("k"="v");
7. alter table p drop partition (j<2);
8. compute incremental stats p partition(j<2);

The remaining old partition related DDLs are as follows,

1. load data inpath '/path/from' into table p partition (j=2, k="d");
2. alter table p add partition (j=2, k="g");
3. alter table p partition (j=2, k="g") set location '/path/to';
4. insert into p partition (j=2, k="g") values (1), (2), (3);

General partition expressions or partially specified partition specs
allows partition predicates to return empty partition set no matter
'IF EXISTS' is specified.

Examples:

[localhost.localdomain:21000] >
alter table p drop partition (j=2, k="f");
Query: alter table p drop partition (j=2, k="f")
+-------------------------+
| summary                 |
+-------------------------+
| Dropped 1 partition(s). |
+-------------------------+
Fetched 1 row(s) in 0.78s
[localhost.localdomain:21000] >
alter table p drop partition (j=2, k<"f");
Query: alter table p drop partition (j=2, k<"f")
+-------------------------+
| summary                 |
+-------------------------+
| Dropped 2 partition(s). |
+-------------------------+
Fetched 1 row(s) in 0.41s
[localhost.localdomain:21000] >
alter table p drop partition (k="a");
Query: alter table p drop partition (k="a")
+-------------------------+
| summary                 |
+-------------------------+
| Dropped 1 partition(s). |
+-------------------------+
Fetched 1 row(s) in 0.25s
[localhost.localdomain:21000] > show partitions p;
Query: show partitions p
+-------+---+-------+--------+------+--------------+-------------------+
| j     | k | #Rows | #Files | Size | Bytes Cached | Cache Replication |
+-------+---+-------+--------+------+--------------+-------------------+
| 1     | b | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        |
| 1     | c | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        |
| Total |   | -1    | 0      | 0B   | 0B           |                   |
+-------+---+-------+--------+------+--------------+-------------------+
Fetched 3 row(s) in 0.01s

Change-Id: I2c9162fcf9d227b8daf4c2e761d57bab4e26408f
Reviewed-on: http://gerrit.cloudera.org:8080/3942
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-11-15 03:27:36 +00:00
Matthew Jacobs
cfac09de10 IMPALA-3710: Kudu DML should ignore conflicts, pt2
Second part of IMPALA-3710, which removed the IGNORE DML
option and changed the following errors on Kudu DML
operations to be ignored:
1) INSERT where the PK already exists
2) UPDATE/DELETE where the PK doesn't exist

This changes other data-related errors to be ignored as
well:
3) NULLs in non-nullable columns, i.e. null constraint
  violoations.
4) Rows with PKs that are in an 'uncovered range'.

It became clear that we can't differentiate between (3) and
(4) because both return a Kudu 'NotFound' error code. The
Impala error codes have been simplified as well: we just
report a generic KUDU_NOT_FOUND error in these cases.

This also adds some metadata to the thrift report sent to
the coordinator from sinks so the total number of rows with
errors can be added to the profile. Note that this does not
include a breakdown of error counts by type/code because we
cannot differentiate between all of these cases yet.

An upcoming change will add this new info to the beeswax
interface and show it in the shell output (IMPALA-3713).

Testing: Updated kudu_crud tests to check the number of rows
with errors.

Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c
Reviewed-on: http://gerrit.cloudera.org:8080/4985
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-11-09 06:43:41 +00:00
Tim Armstrong
d7246d64c7 IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
This change enables codegen for all builtin aggregate functions,
e.g. timestamp functions and group_concat.

There are several parts to the change:
* Adding support for generic UDAs. Previous the codegen code did not
  handle multiple input arguments or NULL return values.
* Defaulting to using the UDA interface when there is not a special
  codegen path (we have implementations of all builtin aggregate
  functions for the interpreted path).
* Remove all the logic to disable codegen for the special cases that now
  are supported.

Also fix the generation of code to get/set NULL bits since I needed
to add functionality there anyway.

Testing:
Add tests that check that codegen was enabled for builtin aggregate
functions. Also fix some gaps in the preexisting tests.

Also add tests for UDAs that check input/output nulls are handled
correctly, in anticipation of enabling codegen for arbitrary UDAs.
The tests are run with both codegen enabled and disabled. To avoid
flaky tests, we switch the UDF tests to use "unique_database".

Perf:
Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1,
since my original approach regressed it ~5%. In the end the problem was
to do with the ordering of loads/stores to the slot and null bit in the
generated code: the previous version of the code exploited some
properties of the particular aggregate function. I ended up replicating
this behaviour to avoid regressing perf.

Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260
Reviewed-on: http://gerrit.cloudera.org:8080/4655
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-11-09 03:27:12 +00:00
Matthew Jacobs
08d89a5cc3 IMPALA-3710: Kudu DML should ignore conflicts by default
Removes the non-standard IGNORE syntax that was allowed for
DML into Kudu tables to indicate that certain errors should
be ignored, i.e. not fail the query and continue. However,
because there is no way to 'roll back' mutations that
occurred before an error occurs, tables are left in an
inconsistent state and it's difficult to know what rows were
successfully modified vs which rows were not. Instead, this
change makes it so that we always 'ignore' these conflicts,
i.e. a 'best effort'. In the future, when Kudu will provide
the mechanisms Impala needs to provide a notion of isolation
levels, then Impala will be able to provide options for more
traditional semantics.

After this change, the following errors are ignored:
* INSERT where the PK already exists
* UPDATE/DELETE where the PK doesn't exist

Another follow-up patch will change other violations to be
handled in this way as well, e.g. nulls inserted in
non-nullable cols.

Reporting:
The number of rows inserted is reported to the coordinator,
which makes the aggregate available to the shell and via the
profile.
TODO: Return rows modified for INSERT via HS2 (IMPALA-1789).
TODO: Return rows modified for other CRUD (beeswax+hs2) (IMPALA-3713).
TODO: Return error counts for specific warnings (IMPALA-4416).

Testing:
Updated tests. Ran all functional tests. More tests will be
needed when other conflicts are handled in the same way.

Change-Id: I83b5beaa982d006da4997a2af061ef7c22cad3f1
Reviewed-on: http://gerrit.cloudera.org:8080/4911
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-11-08 20:34:00 +00:00
Tim Armstrong
381e719065 IMPALA-4266: Java udf returning string can give incorrect results
The memory management of string results was wrong: strings returned from
Exprs must live until the next time FreeLocalAllocations() is called.
Otherwise the buffer holding the string is freed or reused by the next
UDF call. The fix is to copy string values into a buffer with the
right lifetime.

Testing:
Added a regression test based on Bharath's example that reproduced the
bug reliably.

Change-Id: I705d271814cb1143f67d8a12f4fd87bab7a8e161
Reviewed-on: http://gerrit.cloudera.org:8080/4941
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-11-08 02:47:11 +00:00
Tim Armstrong
10fa472fa6 IMPALA-4302,IMPALA-2379: constant expr arg fixes
This patch fixes two issues around handling of constant expr args.
The patches are combined because they touch some of the same code
and depend on some of the same memory management cleanup.

First, it fixes IMPALA-2379, where constant expr args were not visible
to UDAFs. The issue is that the input exprs need to be opened before
calling the UDAF Init() function.

Second, it avoids overhead from repeated evaluation of constant
arguments for ScalarFnCall expressions on both the codegen'd and
interpreted paths. A common example is an IN predicate with a
long list of constant values.

The interpreted path was inefficient because it always evaluated all
children expressions. Instead in this patch constant args are
evaluated once and cached. The memory management of the AnyVal*
objects was somewhat nebulous - adjusted it so that they're allocated
from ExprContext::mem_pool_, which has the correct lifetime.

The codegen'd path was inefficient only with varargs - with fixed
arguments the LLVM optimiser is able to infer after inlining that
the expressions are constant and remove all evaluation. However,
for varargs it stores the vararg values into a heap-allocated buffer.
The LLVM optimiser is unable to remove these stores because they
have a side-effect that is visible to code outside the function.

The codegen'd path is improved by evaluating varargs into an automatic
buffer that can be optimised out. We also make a small related change
to bake the string constants into the codegen'd code.

Testing:
Ran exhaustive build.

Added regression test for IMPALA-2379 and MemPool test for aligned
allocation. Added a test for in predicates with constant strings.

Perf:
Added a targeted query that demonstrates the improvement. Also manually
validated the non-codegend perf. Also ran TPC-H and targeted perf
queries locally - didn't see any significant changes.

+--------------------+-------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| Workload           | Query                         | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters |
+--------------------+-------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| TARGETED-PERF(_20) | primitive_filter_in_predicate | parquet / none / none | 1.19   | 9.82        | I -87.85%  |   3.82%   |   0.71%        | 1           | 10    |
+--------------------+-------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+

(I) Improvement: TARGETED-PERF(_20) primitive_filter_in_predicate [parquet / none / none] (9.82s -> 1.19s [-87.85%])
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+
| Operator     | % of Query | Avg      | Base Avg | Delta(Avg) | StdDev(%) | Max      | Base Max | Delta(Max) | #Hosts | #Rows  | Est #Rows |
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+
| 01:AGGREGATE | 14.39%     | 155.88ms | 214.61ms | -27.37%    |   2.68%   | 163.38ms | 227.53ms | -28.19%    | 1      | 1      | 1         |
| 00:SCAN HDFS | 85.60%     | 927.46ms | 9.43s    | -90.16%    |   4.49%   | 1.01s    | 9.50s    | -89.42%    | 1      | 13.77K | 14.05K    |
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+

Change-Id: I45c3ed8c9d7a61e94a9b9d6c316e8a53d9ff6c24
Reviewed-on: http://gerrit.cloudera.org:8080/4838
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-11-08 02:44:51 +00:00
Thomas Tauber-Marshall
5cc133947f IMPALA-4260: Alter table add column drops all the column stats
Hive expects types for column stats to be specified as all lower
case. For some reason, it doesn't check this when the stats are
first written, but it does check when performing an 'alter table'.
This causes it to drop stats that Impala wrote because we specify
type names in upper case.

This patch converts the types that Impala sends to Hive for the
column stats to all lower case and adds a regression test.

I also filed HIVE-15061 to track the issue from the Hive end.

Change-Id: Ia373ec917efa7ab9f2a59b8a870b7ebc30175dda
Reviewed-on: http://gerrit.cloudera.org:8080/4845
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2016-11-07 22:29:19 +00:00
Matthew Jacobs
50f7753d2b IMPALA-3771: Expose kudu client timeout and set default
The Kudu client timeout was too low for Impala usage. This
sets the default timeout to 3 minutes and exposes it as a
gflag.

New timeout tests were added.

Change-Id: Iad95e8e38aad4f76d21bac6879db6c02b3c3e045
Reviewed-on: http://gerrit.cloudera.org:8080/4849
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2016-11-05 06:43:45 +00:00
Thomas Tauber-Marshall
832fb53763 IMPALA-3725 Support Kudu UPSERT in Impala
This patch introduces a new query statement, UPSERT, for Kudu
tables which operates like an INSERT and uses all of the analysis,
planning, and execution machinery as INSERT, except that if
there's a primary key collision instead of returning an error an
update is performed.

New syntax:
[with_clause] UPSERT INTO [TABLE] table_name [(column list)]
{
  query_stmt
 | VALUES (value [, value...]) [, (value [, (value...)]) ...]
}

where column list must contain all of the key columns in
table_name, if specified, and table_name must be a Kudu table.

This patch also improves the behavior of INSERTing into Kudu
tables without specifying all of the key columns - this now
results in an analysis exception, rather than attempting the
INSERT and receiving an error back from Kudu.

Change-Id: I8df5cea36b642e267f85ff6b163f3dd96b8386e9
Reviewed-on: http://gerrit.cloudera.org:8080/4047
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2016-11-05 04:16:54 +00:00