Commit Graph

6534 Commits

Author SHA1 Message Date
John Russell
4afabd4e31 IMPALA-5310: [DOCS] Reserve 'repeatable' keyword from TABLESAMPLE clause
Overlooked the new keyword when the clause was
originally introduced.

Change-Id: Ie8e6713fb97ced279f0aedfe8f42c09a7e6edae9
Reviewed-on: http://gerrit.cloudera.org:8080/9066
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-19 21:26:12 +00:00
Vuk Ercegovac
db98dc6504 IMPALA-4993: extend dictionary filtering to collections
Currently, top-level scalar columns in parquet files can
be used at runtime to prune row-groups by evaluating certain
conjuncts over the column's dictionary (if available).

This change extends such pruning to scalar values that are
stored in collection type columns. Currently, dictionary
pruning works by finding eligible conjuncts for top-level
slots. Since only top-level slots are supported, the slots
are implicitly part of the scan node's tuple descriptor.
With this change, we track eligible conjuncts by slot as well
as the tuple that contains the slot (either top-level or
nested collection). Since collection conjuncts are already
managed by a map that associates tuple descriptors to a list
of their conjuncts, this extension follows the existing
representation.

The frontend builds the mapping of SlotId to conjuncts that
are dictionary filterable. This mapping now includes SlotId's
that reference nested tuples. The backend is adjusted to
use the same representation. In addition, collection
readers are decomposed into scalar filterable columns and
other, non-dictionary filterable readers. When filtering
a row group using a conjunct associated to a (possibly)
nested collection type, an additional tuple buffer is
allocated per tuple descriptor.

Testing:
- e2e test extended to illustrate row-groups that are pruned
  by nested collection dictionary filters.

Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Reviewed-on: http://gerrit.cloudera.org:8080/8775
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-19 20:37:25 +00:00
Tim Armstrong
579e33207b IMPALA-6368: make test_chars parallel
Previously it had to be executed serially because it modified tables in
the functional database.

This change separates out tests that use temporary tables and runs those
in a unique_database.

Testing:
Ran locally in a loop with parallelism of 4 for a while.

Change-Id: I2f62ede90f619b8cebbb1276bab903e7555d9744
Reviewed-on: http://gerrit.cloudera.org:8080/9022
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-19 09:55:52 +00:00
Dimitris Tsirogiannis
3f00d10e1b IMPALA-4886: Expose table metrics in the catalog web UI.
The following changes are included in this commit:
* Adds a lightweight framework for registering metrics in the JVM.
* Adds table-level metrics and enables these metrics to be exposed
through the catalog web UI.
* Adds a CatalogUsageMonitor class that monitors and reports the catalog
usage in terms of the tables with the highest memory requirements and
the tables with the highest number of metadata operations. The catalog
usage information is exposed in the /catalog page of the catalog web UI.

Change-Id: I37d407979e6d3b1a444b6b6265900b148facde9e
Reviewed-on: http://gerrit.cloudera.org:8080/8529
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-19 09:25:01 +00:00
Sailesh Mukil
d8ae8801ae IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing
On systems that have Kerberos 1.11 or earlier, service principals with
IP addresses are not supported due to a bug:

http://krbdev.mit.edu/rt/Ticket/Display.html?id=7603

Since our BE tests use such principals, they fail on older platforms with the
above mentioned kerberos versions.

Kudu fixed this by adding a workaround which overrides krb5_realm_override.

ba2ae3de4a

However, when we moved Kudu's security library into Impala, we did not
add the appropriate build flags that allow it to be used. This patch fixes
that.

Testing: Verified that the failing test runs successfully on CentOs 6.4
with Kerberos 1.10.3

Change-Id: I60e291e8aa1b59b645b856d33c658471f314c221
Reviewed-on: http://gerrit.cloudera.org:8080/9006
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-19 01:21:45 +00:00
Michael Ho
e714f2b33c IMPALA-2397: Use atomics for IntGauge and IntCounter
This change removes the spinlock in IntGauge and IntCounter
and uses AtomicInt64 instead. As shown in IMPALA-2397, multiple
threads can be contending for the spinlocks of some global metrics
under concurrent queries.

This change also breaks up SimpleMetric is renamed to ScalarMetric
and broken into two subclasses:
- LockedMetric:
  - a value store for any primitive type (int,float,string etc).
  - atomic read and write via GetValue() and SetValue() respectively.

- AtomicMetric:
  - the basis of IntGauge and IntCounter. Support atomic increment
    of the metric value via Increment() interface.
  - atomic read and write via GetValue() and SetValue() respectively.
  - only support int64_t type.

Change-Id: I48dfa5443cd771916b53541a0ffeaf1bcc7e7606
Reviewed-on: http://gerrit.cloudera.org:8080/9012
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-18 23:31:52 +00:00
Michael Ho
b3d38b5c86 Revert "IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC"
This reverts commit df3a440fff.

Apparently, linking Impalad against GPerfTools 2.6.3 caused Impalad to fail
on certain platforms (OLE6). The failure's symptom is SIGSEGV when trying to
exec Impalad binary. It's unclear which commit in GPerfTools could have caused
it so backing up this change to allow Impala to unbreak some platforms for now.

Change-Id: I97cccca74fb199d6ff0a42fe818f8789a0d66e83
Reviewed-on: http://gerrit.cloudera.org:8080/9057
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-18 23:25:09 +00:00
Bikramjeet Vig
028a83e654 IMPALA-6382: Cap spillable buffer size and max row size query options
Currently the default and min spillable buffer size and max row size
query options accept any valid int64 value. Since the planner depends
on these values for memory estimations, if a very large value close to
the limits of int64 is set, the variables representing or relying on
these estimates can overflow during different phases of query execution.

This patch puts a reasonable upper limit of 1TB to these query options
to prevent such a situation.

Testing:
Added backend query option tests.

Change-Id: I36d3915f7019b13c3eb06f08bfdb38c71ec864f1
Reviewed-on: http://gerrit.cloudera.org:8080/9023
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-18 23:08:26 +00:00
John Russell
ca7d03cfe9 [DOCS] Minor editorial change
Turn "royal we" into imperative statement.

Change-Id: Ib78e851761796a1751e6adaaffa049b1fbb58b88
Reviewed-on: http://gerrit.cloudera.org:8080/9064
Reviewed-by: Alex Rodoni <arodoni@cloudera.com>
Reviewed-by: John Russell <jrussell@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-18 21:53:40 +00:00
Tim Armstrong
f5d73f5e76 IMPALA-6419: Revert "IMPALA-6383: free memory after skipping parquet row groups"
This reverts commit 10fb24afb9.

Change-Id: I4dd62380d02b61ca46f856b4eb40670b71e28140
Reviewed-on: http://gerrit.cloudera.org:8080/9054
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-18 21:25:28 +00:00
Taras Bobrovytsky
35a3e186d6 IMPALA-5478: Run TPCDS queries with decimal_v2 enabled
We add new TPCDS .test files that are expected to be run with decimal_v2
enabled. The new expected results were generated using Impala and I
inspected them manually.

Change-Id: Ib867c51a521ec4a087bc127d99aee4b95ba97733
Reviewed-on: http://gerrit.cloudera.org:8080/8985
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-18 03:28:51 +00:00
Joe McDonnell
d9b6fd0730 IMPALA-6386: Invalidate metadata at table level for dataload
Dataload currently executes bin/load-data.py for TPC-H,
TPC-DS, and functional-query concurrently. One of the final
steps for bin/load-data.py is to run a global "invalidate
metadata". Global "invalidate metadata" commands are known
to cause problem on concurrent systems. See IMPALA-5087.
For dataload, if TPC-H executes "invalidate metadata" while
TPC-DS is still creating tables and adding partitions,
the TPC-DS executor might erroneously believe that a table
does not exist.

This changes dataload to invalidate metadata at an
individual table level rather than globally. This
prevents the concurrency issue.

This also changes the names of some of the intermediate
SQL files generated by generate-schema-statements.py
and consumed by load-data.py to make them less confusing.

Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f
Reviewed-on: http://gerrit.cloudera.org:8080/9009
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-17 22:52:58 +00:00
Csaba Ringhofer
dcc7be0ed4 IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges
USE and SHOW TABLES should be allowed if there is at least one
table in a database where the user has table or column
privileges. Impala incorrectly checked only for table privileges.

To test this issue in AuthorizationTest.java, 'functional_avro'
is added as a test database with only column level permissions.

Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8
Reviewed-on: http://gerrit.cloudera.org:8080/8973
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-17 22:40:13 +00:00
Lars Volker
b6e43133e6 IMPALA-6399: Increase timeout in test_observability to reduce flakiness
Change-Id: I58f7e7b367e73675be42e85f55fd7698d51f92af
Reviewed-on: http://gerrit.cloudera.org:8080/9034
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
2018-01-17 22:31:33 +00:00
Tianyi Wang
6cc76d7201 IMPALA-6353: Fix crash in snappy decompressor
SnappyDecompressor::MaxOutputLen assumes the input pointer to be
non-null. It's not true when the parquet file is corrupted and the
compressed_page_size field in a page header is 0. This patch handles
this error instead of failing a DCHECK.

Testing: A bad parquet file with 0 compressed_page_size is added. It
crashes impala without this patch.

Change-Id: I0d42937aab92a74f8e104d2f7fcd64dc24f6a500
Reviewed-on: http://gerrit.cloudera.org:8080/8977
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-17 04:18:24 +00:00
Taras Bobrovytsky
f8b406222d IMPALA-6388: Fix the Union node number of hosts estimation
Before this patch, we would estimate the number of hosts for the union
node by looking only at the first union operand. This is obviously
incorrect and lead us to underestimate the value.

We fix the problem by setting the estimate to be the maximum of its
children.

Testing:
- Added a planner test that reproduces the issue

Change-Id: I51e1ecca8dbc84b2b5a72708667b2799d00279f0
Reviewed-on: http://gerrit.cloudera.org:8080/9017
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-17 01:41:16 +00:00
Adam Holley
4c43cace87 IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" command
Examples of new command:
ALTER TABLE t1 SET ROW FORMAT DELIMITED FIELDS TERMINATED BY '\002';
ALTER TABLE t1 SET ROW FORMAT DELIMITED LINES TERMINATED BY '\001';

Testing:
Added parser tests and unit tests for alter statements including
partition options.

Change-Id: I96e347463504915a6f33932552e4d1f61e9b1154
Reviewed-on: http://gerrit.cloudera.org:8080/8928
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-16 23:58:24 +00:00
Dimitris Tsirogiannis
3fc42ded02 IMPALA-5058: Improve the concurrency of DDL/DML operations
Problem: A long running table metadata operation (e.g. refresh) could
prevent any other metadata operation from making progress if it
coincided with the catalog topic creation operations. The problem was due
to the conservative locking scheme used when catalog topics were
created. In particular, in order to collect a consistent snapshot of
metadata changes, the global catalog lock was held for the entire
duration of that operation.

Solution: To improve the concurrency of catalog operations the following
changes are performed:
* A range of catalog versions determines the catalog changes to be
  included in a catalog update. Any catalog changes that do not fall in
  the specified range are ignored (to be processed in subsequent catalog
  topic updates).
* The catalog allows metadata operations to make progress while
  collecting catalog updates.
* To prevent starvation of catalog updates (i.e. frequently updated
  catalog objects skipping catalog updates indefinitely), we keep track
  of the number of times a catalog object has skipped an update and if
  that number exceeds a threshold it is included in the next catalog
  topic update even if its version is not in the specified topic update
  version range. Hence, the same catalog object may be sent in two
  consecutive catalog topic updates.

This commit also changes the way deletions are handled in the catalog and
disseminated to the impalad nodes through the statestore. In particular:
* Deletions in the catalog are explicitly recorded in a log with
the catalog version in which they occurred. As before, added and deleted
catalog objects are sent to the statestore.
* Topic entries associated with deleted catalog objects have non-empty
values (besided keys) that contain minimal object metadata including the
catalog version.
* Statestore is no longer using the existence or not of
topic entry values in order to identify deleted topic entries. Deleted
topic entries should be explicitly marked as such by the statestore
subscribers that produce them.
* Statestore subscribers now use the 'deleted' flag to determine if a
topic entry corresponds to a deleted item.
* Impalads use the deleted objects' catalog versions when updating the
local catalog cache from a catalog update and not the update's maximum
catalog version.

Testing:
- No new tests were added as these paths are already exercised by
existing tests.
- Run all core and exhaustive tests.

Change-Id: If12467a83acaeca6a127491d89291dedba91a35a
Reviewed-on: http://gerrit.cloudera.org:8080/7731
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Impala Public Jenkins
Reviewed-on: http://gerrit.cloudera.org:8080/8752
2018-01-16 23:01:32 +00:00
Lars Volker
888a16cad5 KUDU-2256: Add GetTransferSize() to RpcContext
This changes adds GetTransferSize() to RpcContext to retrieve the
payload size of the inbound call. This makes it easier to track the
memory of incoming RPCs in the handler methods.

To test this I added a CHECK to one of the handler methods in
CalculatorService.

Change-Id: Iab2519bad1815aeccaa119f1605638bfd3604382
Reviewed-on: http://gerrit.cloudera.org:8080/8998
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <todd@apache.org>
Reviewed-on: http://gerrit.cloudera.org:8080/9019
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-16 21:43:46 +00:00
Zoram Thanga
91d109d6e9 IMPALA-6307: CTAS statement fails with duplicate column exception.
A CTAS statement with a 'partition by' clause causes the statement
to fail with a duplicate column name exception. This is happening
because on expression rewrite, the partition defs state is not reset.

IMPALA-5796 added TableDef::reset(). This patch expands the method by
adding calls to reset ColumnDefs and PartitionColumnDefs.

Testing:
  * Regression test added to AnalyzeDDLTest.
  * Exhaustive Jenkins build and test.

Change-Id: Iee053abecd4384e15eec8db10cb06f5ace159da2
Reviewed-on: http://gerrit.cloudera.org:8080/8930
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-13 03:44:43 +00:00
Tim Armstrong
6bff0bd766 IMPALA-6363: avoid cscope build races
Use the -ignore_readdir_race flag for find so that find doesn't fail if
a directory disappears under it. From what I could tell the flag has
been in GNU find for a long time and is also available in other OS
flavours like BSD and OS X.

Make the step depend on gen-deps so that it can index thrift, protobuf,
etc, output.

Change-Id: I22bdb7c64036cb88a8a10907af35c5e3a55a9195
Reviewed-on: http://gerrit.cloudera.org:8080/9007
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-13 03:24:06 +00:00
Tim Armstrong
10fb24afb9 IMPALA-6383: free memory after skipping parquet row groups
Before this patch, resources were only flushed after breaking out of
NextRowGroup(). This is a problem because resources can be allocated
for skipped row groups (e.g. for reading dictionaries).

Testing:
Tested in conjunction with a prototype buffer pool patch that was
DCHECKing before the change.

Added DCHECKs to the current version to ensure the streams are cleared
up as expected.

Change-Id: Ibc2f8f27c9b238be60261539f8d4be2facb57a2b
Reviewed-on: http://gerrit.cloudera.org:8080/9002
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-13 02:48:08 +00:00
Michael Ho
df3a440fff IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC
KRPC in general tends to put more pressure on the thread
caches due to allocations of more small objects (i.e. <1MB).
While some of them are being addressed in KUDU-1865, it's shown
that the following TCMalloc workarounds will provide reasonable
performance with KRPC:

- TCMALLOC_TRANSFER_NUM_OBJ:
   - maximum number of object per classe type to transfer between
     thread and central caches.
   - the default value of 512 in 2.5.2 seems to cause the spin lock
     in the central cache to be held for too long with KRPC. 2.6.0
     and latter reverts this value to 32 by default.

- TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
  - total amount of memory allocated to all thread caches in bytes
  - the default value is 32MB. We need to bump it to 1GB which is the
    internal cap in TCMalloc.

This change upgrades GPerfTools/TCMalloc to 2.6.3 to pick up the
change of the default value of TCMALLOC_TRANSFER_NUM_OBJ.

In addition, when KRPC is enabled and FLAGS_TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
has the default value of 0, we will automatically bump the thread cache sizes
to 1GB. Without these workarounds, stress test with KRPC will grind to a halt
due to contention for the spinlock in TCMalloc's central cache. With these
workarounds, the stress test completes within the same ballpark as thrift.

Also did a perf run with Thrift. The regression in TPCH-Q2 is mostly due to sensitivity
in runtime filter timing and the avg can be dragged up due to a bad run when filters
arrive late. No regression as measured in targeted-perf.

+------------+-----------------------+---------+------------+------------+----------------+
| Workload   | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+------------+-----------------------+---------+------------+------------+----------------+
| TPCH(_300) | parquet / none / none | 18.93   | -0.84%     | 10.08      | +1.45%         |
+------------+-----------------------+---------+------------+------------+----------------+

+------------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
| Workload   | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
+------------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
| TPCH(_300) | TPCH-Q2  | parquet / none / none | 6.28   | 3.25        | R +93.41%  | * 49.77% * | * 12.47% *     | 1           | 3     |
| TPCH(_300) | TPCH-Q4  | parquet / none / none | 5.00   | 4.77        |   +4.83%   |   0.41%    |   0.03%        | 1           | 3     |
| TPCH(_300) | TPCH-Q13 | parquet / none / none | 21.29  | 20.69       |   +2.90%   |   0.55%    |   0.37%        | 1           | 3     |
| TPCH(_300) | TPCH-Q11 | parquet / none / none | 1.73   | 1.71        |   +0.94%   |   1.69%    |   2.85%        | 1           | 3     |
| TPCH(_300) | TPCH-Q14 | parquet / none / none | 6.03   | 5.99        |   +0.76%   |   0.00%    |   0.95%        | 1           | 3     |
| TPCH(_300) | TPCH-Q16 | parquet / none / none | 6.97   | 6.93        |   +0.58%   |   0.74%    |   0.73%        | 1           | 3     |
| TPCH(_300) | TPCH-Q3  | parquet / none / none | 29.15  | 29.03       |   +0.40%   |   1.63%    |   1.39%        | 1           | 3     |
| TPCH(_300) | TPCH-Q1  | parquet / none / none | 14.01  | 13.96       |   +0.34%   |   1.28%    |   0.51%        | 1           | 3     |
| TPCH(_300) | TPCH-Q6  | parquet / none / none | 1.27   | 1.27        |   -0.03%   |   3.69%    |   0.07%        | 1           | 3     |
| TPCH(_300) | TPCH-Q9  | parquet / none / none | 30.99  | 31.13       |   -0.45%   |   0.54%    |   0.19%        | 1           | 3     |
| TPCH(_300) | TPCH-Q5  | parquet / none / none | 48.03  | 48.33       |   -0.63%   |   4.72%    |   0.11%        | 1           | 3     |
| TPCH(_300) | TPCH-Q7  | parquet / none / none | 46.85  | 47.41       |   -1.18%   |   1.59%    |   0.46%        | 1           | 3     |
| TPCH(_300) | TPCH-Q8  | parquet / none / none | 7.92   | 8.03        |   -1.39%   |   3.67%    |   5.63%        | 1           | 3     |
| TPCH(_300) | TPCH-Q19 | parquet / none / none | 30.98  | 31.51       |   -1.67%   |   1.33%    |   0.82%        | 1           | 3     |
| TPCH(_300) | TPCH-Q18 | parquet / none / none | 33.55  | 34.13       |   -1.71%   |   1.15%    |   1.46%        | 1           | 3     |
| TPCH(_300) | TPCH-Q10 | parquet / none / none | 9.46   | 9.64        |   -1.82%   |   0.63%    |   0.75%        | 1           | 3     |
| TPCH(_300) | TPCH-Q22 | parquet / none / none | 6.00   | 6.16        |   -2.58%   |   0.08%    |   5.12%        | 1           | 3     |
| TPCH(_300) | TPCH-Q15 | parquet / none / none | 3.41   | 3.50        |   -2.60%   |   1.40%    |   0.46%        | 1           | 3     |
| TPCH(_300) | TPCH-Q12 | parquet / none / none | 3.24   | 3.33        |   -2.86%   |   1.36%    |   1.55%        | 1           | 3     |
| TPCH(_300) | TPCH-Q17 | parquet / none / none | 4.65   | 4.83        |   -3.58%   |   1.17%    |   0.42%        | 1           | 3     |
| TPCH(_300) | TPCH-Q21 | parquet / none / none | 96.15  | 100.63      |   -4.45%   |   0.29%    |   3.18%        | 1           | 3     |
| TPCH(_300) | TPCH-Q20 | parquet / none / none | 3.40   | 3.64        |   -6.63%   |   4.82%    | * 12.70% *     | 1           | 3     |
+------------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+

+---------------------+-----------------------+---------+------------+------------+----------------+
| Workload            | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+---------------------+-----------------------+---------+------------+------------+----------------+
| TARGETED-PERF(_300) | parquet / none / none | 59.31   | -1.40%     | 8.80       | -2.24%         |
+---------------------+-----------------------+---------+------------+------------+----------------+

+---------------------+--------------------------------------------------------+-----------------------+---------+-------------+------------+------------+----------------+-------------+-------+
| Workload            | Query                                                  | File Format           | Avg(s)  | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
+---------------------+--------------------------------------------------------+-----------------------+---------+-------------+------------+------------+----------------+-------------+-------+
| TARGETED-PERF(_300) | primitive_conjunct_ordering_2                          | parquet / none / none | 36.27   | 30.52       |   +18.87%  | * 17.02% * |   2.42%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_broadcast_join_1                             | parquet / none / none | 1.17    | 1.02        |   +14.59%  | * 12.82% * |   0.02%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_filter_bigint_in_list                        | parquet / none / none | 1.03    | 0.92        |   +11.54%  |   2.51%    |   2.53%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_filter_bigint_selective                      | parquet / none / none | 0.37    | 0.34        |   +6.93%   |   7.94%    |   1.32%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_filter_string_selective                      | parquet / none / none | 0.47    | 0.44        |   +5.97%   |   6.11%    |   1.12%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_groupby_bigint_highndv                       | parquet / none / none | 24.35   | 23.87       |   +1.99%   |   0.82%    |   0.63%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_many_fragments                               | parquet / none / none | 61.64   | 60.93       |   +1.17%   |   3.25%    |   1.07%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_top-n_all                                    | parquet / none / none | 36.63   | 36.31       |   +0.87%   |   0.35%    |   4.62%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_conjunct_ordering_4                          | parquet / none / none | 0.87    | 0.86        |   +0.66%   |   0.47%    |   0.03%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_long_predicate                               | parquet / none / none | 28.83   | 28.69       |   +0.49%   |   0.24%    |   0.10%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_topn_bigint                                  | parquet / none / none | 5.53    | 5.51        |   +0.34%   |   2.23%    |   0.07%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_broadcast_join_3                             | parquet / none / none | 58.41   | 58.27       |   +0.24%   |   2.51%    |   0.02%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_decimal_arithmetic                           | parquet / none / none | 96.67   | 96.59       |   +0.09%   |   0.41%    |   0.08%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_count_star                                   | parquet / none / none | 0.09    | 0.09        |   +0.03%   |   0.72%    |   0.73%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_groupby_decimal_lowndv.test                  | parquet / none / none | 3.26    | 3.26        |   -0.00%   |   0.09%    |   1.57%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_broadcast_join_2                             | parquet / none / none | 4.40    | 4.41        |   -0.10%   |   0.01%    |   1.24%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_shuffle_join_union_all_with_groupby          | parquet / none / none | 67.43   | 67.58       |   -0.21%   |   0.31%    |   0.38%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_intrinsic_appx_median                        | parquet / none / none | 35.14   | 35.27       |   -0.34%   |   0.39%    |   0.47%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_groupby_decimal_highndv                      | parquet / none / none | 25.94   | 26.07       |   -0.51%   |   5.33%    |   2.30%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_exchange_broadcast                           | parquet / none / none | 76.54   | 76.96       |   -0.54%   |   4.50%    |   4.70%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_filter_string_like                           | parquet / none / none | 5.52    | 5.56        |   -0.68%   |   0.92%    |   3.17%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_many_independent_fragments                   | parquet / none / none | 254.76  | 256.50      |   -0.68%   |   5.95%    |   2.19%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_shuffle_join_one_to_many_string_with_groupby | parquet / none / none | 228.43  | 230.08      |   -0.72%   |   0.62%    |   1.34%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_empty_build_join_1                           | parquet / none / none | 1.90    | 1.92        |   -1.26%   |   1.19%    |   2.74%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_exchange_shuffle                             | parquet / none / none | 78.99   | 80.26       |   -1.59%   |   0.75%    |   1.61%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_shuffle_1mb_rows                             | parquet / none / none | 1008.91 | 1027.39     |   -1.80%   |   2.33%    |   0.72%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_groupby_bigint_pk                            | parquet / none / none | 96.58   | 98.62       |   -2.07%   |   1.08%    |   1.98%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_groupby_bigint_lowndv                        | parquet / none / none | 3.26    | 3.33        |   -2.10%   |   3.00%    |   0.06%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_small_join_1                                 | parquet / none / none | 0.42    | 0.43        |   -2.54%   |   0.23%    |   1.54%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_filter_string_non_selective                  | parquet / none / none | 0.90    | 0.93        |   -2.54%   |   0.18%    |   2.45%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_intrinsic_to_date                            | parquet / none / none | 77.56   | 79.81       |   -2.82%   |   0.39%    |   2.79%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_filter_decimal_non_selective                 | parquet / none / none | 0.80    | 0.83        |   -3.56%   |   0.12%    |   2.68%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_filter_bigint_non_selective                  | parquet / none / none | 1.00    | 1.05        |   -4.60%   |   0.31%    |   5.18%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_conjunct_ordering_1                          | parquet / none / none | 4.91    | 5.16        |   -4.89%   |   0.44%    |   0.41%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_orderby_all                                  | parquet / none / none | 54.67   | 58.30       |   -6.23%   |   0.45%    |   0.81%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_conjunct_ordering_5                          | parquet / none / none | 11.91   | 12.70       |   -6.24%   |   1.04%    |   0.53%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_filter_decimal_selective                     | parquet / none / none | 0.86    | 0.94        |   -8.58%   | * 24.14% * | * 36.04% *     | 1           | 3     |
| TARGETED-PERF(_300) | primitive_orderby_bigint                               | parquet / none / none | 15.06   | 16.57       |   -9.14%   |   3.50%    |   0.10%        | 1           | 3     |
| TARGETED-PERF(_300) | primitive_filter_in_predicate                          | parquet / none / none | 1.11    | 1.24        |   -10.09%  | * 11.28% * | * 12.71% *     | 1           | 3     |
| TARGETED-PERF(_300) | primitive_orderby_bigint_expression                    | parquet / none / none | 18.16   | 24.86       | I -26.97%  |   0.96%    | * 20.73% *     | 1           | 3     |
| TARGETED-PERF(_300) | primitive_conjunct_ordering_3                          | parquet / none / none | 0.94    | 1.71        | I -44.74%  |   2.68%    | * 42.73% *     | 1           | 3     |
+---------------------+--------------------------------------------------------+-----------------------+---------+-------------+------------+------------+----------------+-------------+-------+

Change-Id: I5be574435af51fb7a875b16888cca260b341190e
Reviewed-on: http://gerrit.cloudera.org:8080/8991
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-12 22:35:13 +00:00
John Russell
ceeb130c5d IMPALA-2172, IMPALA-6391: [DOCS] Distinguish char_length() from length()
Modify both char_length() and length() usage notes to say when they
return the same or different results.

Include the same example, showing both STRING and CHAR types,
under both functions.

Change-Id: I18cabfce66351bb890bfbfc26b93466204a82625
Reviewed-on: http://gerrit.cloudera.org:8080/9014
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-12 22:24:34 +00:00
Jim Apple
717cb9d172 Update copyright date to 2018.
Change-Id: I8b55f6cd8a94197f48affad2b623af021e66d1df
Reviewed-on: http://gerrit.cloudera.org:8080/8925
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
2018-01-12 21:38:38 +00:00
John Russell
b5e2f338ab IMPALA-5736: [DOCS] Document --query_option for impala-shell
Change-Id: I5fa4fc27d6566e87fdabe57edc176133d586a84b
Reviewed-on: http://gerrit.cloudera.org:8080/8771
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-12 20:39:52 +00:00
John Russell
e0c9930037 IMPALA-2181: [DOCS] Document changes to SET output
Change-Id: Iade7cb326715ebbb8518230d518d05601d615f61
Reviewed-on: http://gerrit.cloudera.org:8080/8865
Reviewed-by: John Russell <jrussell@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-12 20:11:20 +00:00
Lars Volker
6dc7237fc1 IMPALA-6387: Increase wait for Breakpad crash handling
It seems that a recent slowdown of our test infrastructure might have
caused Breakpad to take a longer time to write Minidumps. There could
also be a more fundamental issue leading to hangs. To rule this out,
this change increases the default timeout to something larger to allow
the tests to complete.

Change-Id: I84742be9af9444607fde4baf8ea1c0092ff181fe
Reviewed-on: http://gerrit.cloudera.org:8080/9018
Tested-by: Lars Volker <lv@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2018-01-12 17:22:56 +00:00
John Russell
5842c8406d IMPALA-1767: [DOCS] Document new Boolean operators
In a new subtopic:

IS [NOT] TRUE
IS [NOT] FALSE

Folded into IS [NOT] NULL:

IS [NOT] UNKNOWN
Change-Id: Iefebf210418ec2d47b154bd37166b76720f085bb
Reviewed-on: http://gerrit.cloudera.org:8080/8942
Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-12 09:04:48 +00:00
Tim Armstrong
fd5c3a7e18 IMPALA-6290: limit ScannerContext to 1 buffer at a time
This is a prerequisite for constraining the number of buffers per scan
range. Before this patch, calling ReadBytes(), SkipBytes(), etc could
cause an arbitrary number of I/O buffers to accumulate in
'completed_io_buffers_'. E.g. if we allocated 3 * 8MB I/O buffers for
a range and then called ReadBytes(30MB), we would hit resource
exhaustion as soon as 3 buffers were accumulated in
'completed_io_buffers_'.

The fix is to avoid accumulating any buffers in 'completed_io_buffers_'.
Instead of adding them to 'completed_io_buffers_', completed buffers
are just returned to the I/O manager. It turned out that this did not
weaken the ScannerContext's guarantees about memory lifetime, because
ScannerContext::GetBytesInternal() cleared 'boundary_buffer_' each
time it was called regardless. I checked that this behaviour wasn't
a bug by inspecting the scanner code. I could not find any cases
where scanners depended on returned memory remaining valid beyond
the next Read*()/Get*()/Skip*() call on the stream.

This change makes that lifetime explicit in the comments. A
side-effect of this fix is that scanners do not need to call
ReleaseCompletedResources() in CommitRows() and means that the
ScannerContext only ever needs to hold one I/O buffer at a time.

This change also reimplements SkipBytes() to avoid it accumulating
memory in the boundary buffer for large skip sizes.

Also clarifies some of the invariants in ScannerContext. E.g. some
places assumed io_buffer_ != NULL, but that is no longer needed.

Testing:
Ran core tests with ASAN and exhaustive tests with DEBUG.

Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0
Reviewed-on: http://gerrit.cloudera.org:8080/8814
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-12 02:06:33 +00:00
Bharath Vissapragada
20daa4d516 IMPALA-6384: RequestPoolService should honor custom group mapping config
Due to the way in which we instantiate fair scheduler allocation
loader, we donot read the config overrides from the HDFS config
files.

This is an unexpected behavior from users' POV since we typically
support overrides like custom user -> group mapping via HDFS
config (for ex: LDAPGroupsMapping) that eventually affects the
query -> pool assignment.

Fix: This patch loads the hadoop default configuration so that the
underlying QueuePlacementPolicy is based on user specified overrides.

Testing (manual): Changed the core-site.xml to use LDAPGroupsMapping
instead of the default ShellBasedUnixGroupsMapping and confirmed that
the correct group mapping plugin is loaded, by adding additional logging.

Also, modified TestRequestPoolService to assert that the core-site xml
overrides are loaded.

Change-Id: Ibb93870c0cc37e2432a643a274931f1d3d13fb96
Reviewed-on: http://gerrit.cloudera.org:8080/9000
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-11 22:52:29 +00:00
John Russell
b27537a15b IMPALA-4252: [DOCS] Document min/max filters for Kudu tables
Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2
Reviewed-on: http://gerrit.cloudera.org:8080/8986
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-11 21:49:23 +00:00
Philip Zeyliger
ab81c48d7a Fix typo in test_observability.
Should fix "NameError: global name 'dgb_str' is not defined".

Change-Id: Ida3f355c6c6be5ed52e4d445f8f80665cdc8e2b8
Reviewed-on: http://gerrit.cloudera.org:8080/9003
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Zoram Thanga <zoram@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-11 20:10:01 +00:00
Philip Zeyliger
604e48d2f3 IMPALA-6330, IMPALA-5702: Avoid boost's trim() to workaround crash after dynamic linking.
Replaces boost::algorithm::trim() with std::string methods when parsing
/proc/self/smaps and adds a trivial unit test for MemInfo::ParseSmaps().

I did *not* replace other uses of trim() with equivalents from
be/src/gutil/strings/strip.h at this moment.

The backstory here is that
TestAdmissionControllerStress::test_admission_controller_with_flags
fails occasionally on dynamically linked builds of Impala. I was able
to reproduce the failure reliably (within 3 tries) with the following:

  $ ./buildall.sh -notests -so -noclean
  $ bin/start-impala-cluster.py  --impalad_args="--memory_maintenance_sleep_time_ms=1"
  $ impala-shell.sh --query 'select max(t.c1), avg(t.c2), min(t.c3), avg(c4), avg(c5), avg(c6) from (select max(tinyint_col) over (order by int_col) c1, avg(tinyint_col) over (order by smallint_col) c2, min(tinyint_col) over (order by smallint_col desc) c3, rank() over (order by int_col desc) c4, dense_rank() over (order by bigint_col) c5, first_value(tinyint_col) over (order by bigint_col desc) c6 from functional.alltypes) t;'

The stack trace looks like:

  (gdb) bt
  #0  0x00007fe230df2428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
  #1  0x00007fe230df402a in __GI_abort () at abort.c:89
  #2  0x00007fe23312026d in __gnu_cxx::__verbose_terminate_handler() () at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/vterminate.cc:95
  #3  0x00007fe2330d8b66 in __cxxabiv1::__terminate(void (*)()) (handler=<optimized out>) at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_terminate.cc:47
  #4  0x00007fe2330d8bb1 in std::terminate() () at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_terminate.cc:57
  #5  0x00007fe2330d8cb8 in __cxxabiv1::__cxa_throw(void*, std::type_info*, void (*)(void*)) (obj=0x8e54080, tinfo=0x7fe233356210 <typeinfo for std::bad_cast>, dest=0x7fe23311ea70 <std::bad_cast::~bad_cast()>) at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_throw.cc:87
  #6  0x00007fe233110332 in std::__throw_bad_cast() () at ../../../../../gcc-4.9.2/libstdc++-v3/src/c++11/functexcept.cc:63
  #7  0x00007fe2330e8ad7 in std::use_facet<std::ctype<char> >(std::locale const&) (__loc=...) at /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-ubuntu-16-04/toolchain/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/locale_classes.tcc:137
  #8  0x00000000008d2cdf in void boost::algorithm::trim<std::string>(std::string&, std::locale const&) ()
  #9  0x00007fe2396d5057 in impala::MemInfo::ParseSmaps() () at /home/philip/src/Impala/be/src/util/mem-info.cc:132
  ...

My best theory is that there's a race/bug, wherein the std::locale* static initialization
work is getting somehow 'reset' by the dynamic linker, when more libraries are linked
in as a result of the query. My evidence to support this theory is scant, but
I do notice that LD_DEBUG=all prints the following when the query is executed
(but not right at startup):

  binding file /home/philip/src/Impala/toolchain/gcc-4.9.2/lib64/libstdc++.so.6 [0] to
  /home/philip/src/Impala/toolchain/gflags-2.2.0-p1/lib/libgflags.so.2.2 [0]:
  normal symbol `std::locale::facet::_S_destroy_c_locale(__locale_struct*&)'

Note that there are BSS segments for some of std::locale::facet::* inside
of libgflags.so.

  $nm toolchain/gflags-2.2.0-p1/lib/libgflags.so | c++filt | grep facet | grep ' B '
  00000000002e2d10 B std::locale::facet::_S_c_locale
  00000000002e2d0c B std::locale::facet::_S_once

I'm not the first to run into variants of these issues, though the results
are fairly unhelpful:

  http://www.boost.org/doc/libs/1_58_0/libs/locale/doc/html/faq.html
  https://stackoverflow.com/questions/26990412/c-boost-crashes-while-using-locale
  https://svn.boost.org/trac10/ticket/4671
  http://clang-developers.42468.n3.nabble.com/std-use-facet-lt-std-ctype-lt-char-gt-gt-crashes-on-linux-td4033967.html
  https://unix.stackexchange.com/questions/719/can-we-get-compiler-information-from-an-elf-binary
  https://stackoverflow.com/questions/42376100/linking-with-library-causes-collate-facet-to-be-missing-from-char
  http://lists.llvm.org/pipermail/cfe-dev/2012-July/023289.html
  https://gcc.gnu.org/ml/libstdc++/2014-11/msg00122.html

Change-Id: I8dd807f869a9359d991ba515177fb2298054520e
Reviewed-on: http://gerrit.cloudera.org:8080/8888
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-11 08:05:30 +00:00
Tim Armstrong
c0c1202dbf IMPALA-6381: increase test_exchange_delays timeout for isilon
Change-Id: Ie82030403fa238b673b0a3ccdc7731b0d78b63af
Reviewed-on: http://gerrit.cloudera.org:8080/8993
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-11 00:42:02 +00:00
Adam Holley
a34df684f7 IMPALA-6371: Additional check for delimiters
The check validates the codepoint of the Java char.

Testing:
- Added tests for valid/invalid unicode in HdfsStorageDescriptorTest.

Change-Id: If8dc335d39dd02f602cf93682bccf84b2c099dde
Reviewed-on: http://gerrit.cloudera.org:8080/8959
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-10 22:31:32 +00:00
John Russell
3394eb1be7 [DOCS] Add phony doc build targets 'html' and 'pdf'
The better to do a quick verification using one format
or the other, by issuing 'make html' or 'make pdf'.
'make all' still builds both.

Change-Id: Ic096259a773966871b09a023bf12eb6c362167af
Reviewed-on: http://gerrit.cloudera.org:8080/8994
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
2018-01-10 21:13:56 +00:00
John Russell
409b58150a IMPALA-6278: [DOCS] Add release note subtopics
Primarily placeholders that link to the 2.11
CHANGELOG file on the web.

Change-Id: I968f53c6652197774cdec364c47bc10277e6877a
Reviewed-on: http://gerrit.cloudera.org:8080/8992
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-10 20:38:23 +00:00
Jinchul
6041865031 IMPALA-3651: Adds murmur_hash() built-in function
murmur_hash relys on HashUtil::MurmurHash2_64 which MurmurHash2 64-bit
version.

Testing:
Add unit tests for primitive types: ExprTest.MurmurHashFunction
Add E2E tests into exprs.test

Change-Id: I14d56ffb8fab256f3f66a2669271fd4b3c50cc29
Reviewed-on: http://gerrit.cloudera.org:8080/8893
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-10 20:17:26 +00:00
John Russell
31c6a1719a [DOCS] Recommend using Kudu Java API for rapid DMLs
Change-Id: I0098f0c3d5d07c89e6bb589c4c04edce300c1ad3
Reviewed-on: http://gerrit.cloudera.org:8080/8976
Reviewed-by: Jean-Daniel Cryans <jdcryans@apache.org>
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-10 18:42:04 +00:00
John Russell
1f4d687a9b IMPALA-5317: [DOCS] Doc for DATE_TRUNC() function
Change-Id: Ifcf38903bb10db12cbb8d73a2dc875aef29cd359
Reviewed-on: http://gerrit.cloudera.org:8080/8768
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-10 18:41:31 +00:00
Taras Bobrovytsky
c86b0a9736 IMPALA-5014: Part 2: Round when casting decimal to timestamp
When there are too many digits to the right of the dot in a decimal, we
would always truncate when casting to timestamp. In this patch we change
the behavior to round instead of truncating when decimal_v2 is enabled.

Testing:
- Added some EE tests, ran BE tests on my machine.

Change-Id: I8fb3a7d976ab980b8572d7e9524850572bad57da
Reviewed-on: http://gerrit.cloudera.org:8080/8969
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-10 05:47:23 +00:00
Xianda Ke
514dfaf9fd IMPALA-6128: Add support for AES-CTR encryption when spilling to disk
CFB mode is a stream cipher and is secure when used with a different nonce/IV
for every message. However it can be a performance bottleneck.
CTR mode is also stream cipher and is secure, 4~6x faster than CFB mode in
OpenSSL. AES-CTR+SHA256 is about 40~70% faster than AES-CFB+SHA256.

CTR mode is used if OpenSSL version>=1.0.1 at runtime, otherwise
fall back to using CFB mode.

Testing:
run runtime tmp-file-mgr-test, openssl-util-test, buffer-pool-test and
buffered-tuple-stream-test
The ut case openssl-util-test.EncryptInPlace tests encryption in both modes.

Change-Id: I9debc240615dd8cdbf00ec8730cff62ffef52aff
Reviewed-on: http://gerrit.cloudera.org:8080/8861
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-10 05:39:09 +00:00
Taras Bobrovytsky
f810458ca4 IMPALA-6231: Implement decimal_v2 fuzz test
Implement a test that generates random decimal numbers in the pytest
framework, performs a random mathemtaical operation in Impala and
verifies that the result is correct by doing the same operating using
the Python decimal module. We try to generate not only completely random
decimal numbers, but also numbers that have interesting properties, such
as the number being a power of two.

Change-Id: I4328125de5c583ec8ead1f78d9a08703b18b2d85
Reviewed-on: http://gerrit.cloudera.org:8080/8898
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Zach Amsden <zamsden@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-10 03:03:52 +00:00
Jinchul
99962d2e81 IMPALA-4168: Adds Oracle-style hint placement for INSERT/UPSERT
Allow to specify Oracle-style hint on INSERT/UPSERT statements. For example,
- insert /* +noshuffle */ into table functional.alltypes partition(year,
month) select * from functional.alltypes;
- upsert /* +noshuffle */ into functional_kudu.alltypes select * from
functional.alltypes;

Testing:
Add unit tests to ParserTest#TestPlanHints
Add plan check tests to PlannerTest#testInsert, PlannerTest#testKuduUpsert
Add tests to ToSqlTest#planHintsTest

Change-Id: Ied7629d70197a0270cdc0853e00cc021fdb4dc20
Reviewed-on: http://gerrit.cloudera.org:8080/8676
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-10 03:03:49 +00:00
aphadke
38461c524f IMPALA-5052: Read and write signed integer logical types in Parquet
This patch maps a signed integer logical type in parquet to a supported
Impala column type. This change introduces the following mapping -

  INT_8  -> TINYINT
  INT_16 -> SMALLINT
  INT_32 -> INT
  INT_64 -> BIGINT

Also, added a parquet file with the following schema for testing -

  schema {
    optional int32 id;
    optional int32 tinyint_col (INT_8);
    optional int32 smallint_col (INT_16);
    optional int32 int_col;
    optional int64 bigint_col;
  }

Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Reviewed-on: http://gerrit.cloudera.org:8080/8548
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Tianyi Wang <twang@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-09 04:55:59 +00:00
Tianyi Wang
c4d950b9e9 IMPALA-3887: Wait for HDFS replication in data loading
When the data loading finishes, it is possible for some HDFS blocks to
be under replicated. If impala gets the metadata before the replication
is done, some tests may fail. This patch adds a replication waiting step
in the data loading script.
Resubmitted with filesystem type check.

Change-Id: I64d9a8ea1d0a32b40047321b50a7139a8f48eac8
Reviewed-on: http://gerrit.cloudera.org:8080/8916
Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-09 03:24:36 +00:00
Bharath Vissapragada
6a87eb20a5 IMPALA-6348: Redact only sensitive fields in runtime profiles
Without this patch, redaction is applied to every field in the
runtime profile. This approach has an undesired side effect when
Kerberos auth + email redaction is in place.

Since the redaction applies to every field, even principals
(from Connected/Delegated User fields) are redacted, as the Kerberos
principal format generally pattern matches with an email redactor
template.

This is particularly problematic for monitoring tools that consume
runtime profiles and use these fields to group the queries by user.

This patch fixes the problem by redacting only the following sensitive
fields.

- Query Statement
- Error logs (since they can contain column references etc.)
- Query Status
- Query Plan

Other fields in the runtime profile are left unredacted.

Change-Id: Iae3b6726009bf458a7ec73131e5d659b12ab73cf
Reviewed-on: http://gerrit.cloudera.org:8080/8934
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-06 22:54:17 +00:00
Zoltan Borok-Nagy
ce65b43d47 IMPALA-2248: Make idle_session_timeout a query option
This commit makes idle_session_timeout a query option.

idle_session_timeout currently can be set as a command line
option, which will be the default timeout for sessions.
HS2 sessions can override it with a smaller value by setting
it in the configuration overlay of HS2 OpenSession().

However, we can't override idle_session_timeout for JDBC/ODBC
connections, because we cannot put this in the connection string.

This commit is a workaround for this problem, it allows JDBC/ODBC
connections to set the session timeout as a query option
with the SET statement.

After this commit, the session timeout can be overridden to
any value, i.e. the command line flag idle_session_timeout
doesn't limit this option anymore.

I created an automated test case in JdbcTest.java based on
test_hs2.py::test_concurrent_session_mixed_idle_timeout. I also
extended the test_session_expiration and test_set_and_unset
test suites.

Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e
Reviewed-on: http://gerrit.cloudera.org:8080/8490
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-06 01:47:47 +00:00
Pranay
302ec25b2e IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Currently DictDecoder class and DictEncoder class uses std::vector
to store the tables mapping codeword to value and vice-versa. It is
hard to detect the memory usage by these tables when they becomes
very large, since this memory is not accounted by Impala's memory
mangement infrastructure.

This patch uses the memory tracker of HdfsScanner to track the memory used
by dictionary in DictDecoder class. Similary it uses memory tracker of
HdfsTableSink to track the memory used by dictionary in DictEncoder class.

Memory for the dictionary, stored as std::vector is still allocated
from std:allocator but the amount allocated is accounted by
introducing a counter which is incremented and decremented as the
memory is consumed and released by vector.

Testing
-------
Ran all the backend and end-end tests with no failures.

Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Reviewed-on: http://gerrit.cloudera.org:8080/8034
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-06 01:30:36 +00:00