impala

mirror of https://github.com/apache/impala.git synced 2025-12-30 03:01:44 -05:00

Author	SHA1	Message	Date
John Russell	4afabd4e31	IMPALA-5310: [DOCS] Reserve 'repeatable' keyword from TABLESAMPLE clause Overlooked the new keyword when the clause was originally introduced. Change-Id: Ie8e6713fb97ced279f0aedfe8f42c09a7e6edae9 Reviewed-on: http://gerrit.cloudera.org:8080/9066 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-19 21:26:12 +00:00
Vuk Ercegovac	db98dc6504	IMPALA-4993: extend dictionary filtering to collections Currently, top-level scalar columns in parquet files can be used at runtime to prune row-groups by evaluating certain conjuncts over the column's dictionary (if available). This change extends such pruning to scalar values that are stored in collection type columns. Currently, dictionary pruning works by finding eligible conjuncts for top-level slots. Since only top-level slots are supported, the slots are implicitly part of the scan node's tuple descriptor. With this change, we track eligible conjuncts by slot as well as the tuple that contains the slot (either top-level or nested collection). Since collection conjuncts are already managed by a map that associates tuple descriptors to a list of their conjuncts, this extension follows the existing representation. The frontend builds the mapping of SlotId to conjuncts that are dictionary filterable. This mapping now includes SlotId's that reference nested tuples. The backend is adjusted to use the same representation. In addition, collection readers are decomposed into scalar filterable columns and other, non-dictionary filterable readers. When filtering a row group using a conjunct associated to a (possibly) nested collection type, an additional tuple buffer is allocated per tuple descriptor. Testing: - e2e test extended to illustrate row-groups that are pruned by nested collection dictionary filters. Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd Reviewed-on: http://gerrit.cloudera.org:8080/8775 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-19 20:37:25 +00:00
Tim Armstrong	579e33207b	IMPALA-6368: make test_chars parallel Previously it had to be executed serially because it modified tables in the functional database. This change separates out tests that use temporary tables and runs those in a unique_database. Testing: Ran locally in a loop with parallelism of 4 for a while. Change-Id: I2f62ede90f619b8cebbb1276bab903e7555d9744 Reviewed-on: http://gerrit.cloudera.org:8080/9022 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-19 09:55:52 +00:00
Dimitris Tsirogiannis	3f00d10e1b	IMPALA-4886: Expose table metrics in the catalog web UI. The following changes are included in this commit: * Adds a lightweight framework for registering metrics in the JVM. * Adds table-level metrics and enables these metrics to be exposed through the catalog web UI. * Adds a CatalogUsageMonitor class that monitors and reports the catalog usage in terms of the tables with the highest memory requirements and the tables with the highest number of metadata operations. The catalog usage information is exposed in the /catalog page of the catalog web UI. Change-Id: I37d407979e6d3b1a444b6b6265900b148facde9e Reviewed-on: http://gerrit.cloudera.org:8080/8529 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-19 09:25:01 +00:00
Sailesh Mukil	d8ae8801ae	IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing On systems that have Kerberos 1.11 or earlier, service principals with IP addresses are not supported due to a bug: http://krbdev.mit.edu/rt/Ticket/Display.html?id=7603 Since our BE tests use such principals, they fail on older platforms with the above mentioned kerberos versions. Kudu fixed this by adding a workaround which overrides krb5_realm_override. `ba2ae3de4a` However, when we moved Kudu's security library into Impala, we did not add the appropriate build flags that allow it to be used. This patch fixes that. Testing: Verified that the failing test runs successfully on CentOs 6.4 with Kerberos 1.10.3 Change-Id: I60e291e8aa1b59b645b856d33c658471f314c221 Reviewed-on: http://gerrit.cloudera.org:8080/9006 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-19 01:21:45 +00:00
Michael Ho	e714f2b33c	IMPALA-2397: Use atomics for IntGauge and IntCounter This change removes the spinlock in IntGauge and IntCounter and uses AtomicInt64 instead. As shown in IMPALA-2397, multiple threads can be contending for the spinlocks of some global metrics under concurrent queries. This change also breaks up SimpleMetric is renamed to ScalarMetric and broken into two subclasses: - LockedMetric: - a value store for any primitive type (int,float,string etc). - atomic read and write via GetValue() and SetValue() respectively. - AtomicMetric: - the basis of IntGauge and IntCounter. Support atomic increment of the metric value via Increment() interface. - atomic read and write via GetValue() and SetValue() respectively. - only support int64_t type. Change-Id: I48dfa5443cd771916b53541a0ffeaf1bcc7e7606 Reviewed-on: http://gerrit.cloudera.org:8080/9012 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-18 23:31:52 +00:00
Michael Ho	b3d38b5c86	Revert "IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC" This reverts commit `df3a440fff`. Apparently, linking Impalad against GPerfTools 2.6.3 caused Impalad to fail on certain platforms (OLE6). The failure's symptom is SIGSEGV when trying to exec Impalad binary. It's unclear which commit in GPerfTools could have caused it so backing up this change to allow Impala to unbreak some platforms for now. Change-Id: I97cccca74fb199d6ff0a42fe818f8789a0d66e83 Reviewed-on: http://gerrit.cloudera.org:8080/9057 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-18 23:25:09 +00:00
Bikramjeet Vig	028a83e654	IMPALA-6382: Cap spillable buffer size and max row size query options Currently the default and min spillable buffer size and max row size query options accept any valid int64 value. Since the planner depends on these values for memory estimations, if a very large value close to the limits of int64 is set, the variables representing or relying on these estimates can overflow during different phases of query execution. This patch puts a reasonable upper limit of 1TB to these query options to prevent such a situation. Testing: Added backend query option tests. Change-Id: I36d3915f7019b13c3eb06f08bfdb38c71ec864f1 Reviewed-on: http://gerrit.cloudera.org:8080/9023 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-18 23:08:26 +00:00
John Russell	ca7d03cfe9	[DOCS] Minor editorial change Turn "royal we" into imperative statement. Change-Id: Ib78e851761796a1751e6adaaffa049b1fbb58b88 Reviewed-on: http://gerrit.cloudera.org:8080/9064 Reviewed-by: Alex Rodoni <arodoni@cloudera.com> Reviewed-by: John Russell <jrussell@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-18 21:53:40 +00:00
Tim Armstrong	f5d73f5e76	IMPALA-6419: Revert "IMPALA-6383: free memory after skipping parquet row groups" This reverts commit `10fb24afb9`. Change-Id: I4dd62380d02b61ca46f856b4eb40670b71e28140 Reviewed-on: http://gerrit.cloudera.org:8080/9054 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-18 21:25:28 +00:00
Taras Bobrovytsky	35a3e186d6	IMPALA-5478: Run TPCDS queries with decimal_v2 enabled We add new TPCDS .test files that are expected to be run with decimal_v2 enabled. The new expected results were generated using Impala and I inspected them manually. Change-Id: Ib867c51a521ec4a087bc127d99aee4b95ba97733 Reviewed-on: http://gerrit.cloudera.org:8080/8985 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-18 03:28:51 +00:00
Joe McDonnell	d9b6fd0730	IMPALA-6386: Invalidate metadata at table level for dataload Dataload currently executes bin/load-data.py for TPC-H, TPC-DS, and functional-query concurrently. One of the final steps for bin/load-data.py is to run a global "invalidate metadata". Global "invalidate metadata" commands are known to cause problem on concurrent systems. See IMPALA-5087. For dataload, if TPC-H executes "invalidate metadata" while TPC-DS is still creating tables and adding partitions, the TPC-DS executor might erroneously believe that a table does not exist. This changes dataload to invalidate metadata at an individual table level rather than globally. This prevents the concurrency issue. This also changes the names of some of the intermediate SQL files generated by generate-schema-statements.py and consumed by load-data.py to make them less confusing. Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f Reviewed-on: http://gerrit.cloudera.org:8080/9009 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-17 22:52:58 +00:00
Csaba Ringhofer	dcc7be0ed4	IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges USE and SHOW TABLES should be allowed if there is at least one table in a database where the user has table or column privileges. Impala incorrectly checked only for table privileges. To test this issue in AuthorizationTest.java, 'functional_avro' is added as a test database with only column level permissions. Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8 Reviewed-on: http://gerrit.cloudera.org:8080/8973 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-17 22:40:13 +00:00
Lars Volker	b6e43133e6	IMPALA-6399: Increase timeout in test_observability to reduce flakiness Change-Id: I58f7e7b367e73675be42e85f55fd7698d51f92af Reviewed-on: http://gerrit.cloudera.org:8080/9034 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Lars Volker <lv@cloudera.com>	2018-01-17 22:31:33 +00:00
Tianyi Wang	6cc76d7201	IMPALA-6353: Fix crash in snappy decompressor SnappyDecompressor::MaxOutputLen assumes the input pointer to be non-null. It's not true when the parquet file is corrupted and the compressed_page_size field in a page header is 0. This patch handles this error instead of failing a DCHECK. Testing: A bad parquet file with 0 compressed_page_size is added. It crashes impala without this patch. Change-Id: I0d42937aab92a74f8e104d2f7fcd64dc24f6a500 Reviewed-on: http://gerrit.cloudera.org:8080/8977 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-17 04:18:24 +00:00
Taras Bobrovytsky	f8b406222d	IMPALA-6388: Fix the Union node number of hosts estimation Before this patch, we would estimate the number of hosts for the union node by looking only at the first union operand. This is obviously incorrect and lead us to underestimate the value. We fix the problem by setting the estimate to be the maximum of its children. Testing: - Added a planner test that reproduces the issue Change-Id: I51e1ecca8dbc84b2b5a72708667b2799d00279f0 Reviewed-on: http://gerrit.cloudera.org:8080/9017 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-17 01:41:16 +00:00
Adam Holley	4c43cace87	IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" command Examples of new command: ALTER TABLE t1 SET ROW FORMAT DELIMITED FIELDS TERMINATED BY '\002'; ALTER TABLE t1 SET ROW FORMAT DELIMITED LINES TERMINATED BY '\001'; Testing: Added parser tests and unit tests for alter statements including partition options. Change-Id: I96e347463504915a6f33932552e4d1f61e9b1154 Reviewed-on: http://gerrit.cloudera.org:8080/8928 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-16 23:58:24 +00:00
Dimitris Tsirogiannis	3fc42ded02	IMPALA-5058: Improve the concurrency of DDL/DML operations Problem: A long running table metadata operation (e.g. refresh) could prevent any other metadata operation from making progress if it coincided with the catalog topic creation operations. The problem was due to the conservative locking scheme used when catalog topics were created. In particular, in order to collect a consistent snapshot of metadata changes, the global catalog lock was held for the entire duration of that operation. Solution: To improve the concurrency of catalog operations the following changes are performed: * A range of catalog versions determines the catalog changes to be included in a catalog update. Any catalog changes that do not fall in the specified range are ignored (to be processed in subsequent catalog topic updates). * The catalog allows metadata operations to make progress while collecting catalog updates. * To prevent starvation of catalog updates (i.e. frequently updated catalog objects skipping catalog updates indefinitely), we keep track of the number of times a catalog object has skipped an update and if that number exceeds a threshold it is included in the next catalog topic update even if its version is not in the specified topic update version range. Hence, the same catalog object may be sent in two consecutive catalog topic updates. This commit also changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. In particular: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core and exhaustive tests. Change-Id: If12467a83acaeca6a127491d89291dedba91a35a Reviewed-on: http://gerrit.cloudera.org:8080/7731 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins Reviewed-on: http://gerrit.cloudera.org:8080/8752	2018-01-16 23:01:32 +00:00
Lars Volker	888a16cad5	KUDU-2256: Add GetTransferSize() to RpcContext This changes adds GetTransferSize() to RpcContext to retrieve the payload size of the inbound call. This makes it easier to track the memory of incoming RPCs in the handler methods. To test this I added a CHECK to one of the handler methods in CalculatorService. Change-Id: Iab2519bad1815aeccaa119f1605638bfd3604382 Reviewed-on: http://gerrit.cloudera.org:8080/8998 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon <todd@apache.org> Reviewed-on: http://gerrit.cloudera.org:8080/9019 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-16 21:43:46 +00:00
Zoram Thanga	91d109d6e9	IMPALA-6307: CTAS statement fails with duplicate column exception. A CTAS statement with a 'partition by' clause causes the statement to fail with a duplicate column name exception. This is happening because on expression rewrite, the partition defs state is not reset. IMPALA-5796 added TableDef::reset(). This patch expands the method by adding calls to reset ColumnDefs and PartitionColumnDefs. Testing: * Regression test added to AnalyzeDDLTest. * Exhaustive Jenkins build and test. Change-Id: Iee053abecd4384e15eec8db10cb06f5ace159da2 Reviewed-on: http://gerrit.cloudera.org:8080/8930 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-13 03:44:43 +00:00
Tim Armstrong	6bff0bd766	IMPALA-6363: avoid cscope build races Use the -ignore_readdir_race flag for find so that find doesn't fail if a directory disappears under it. From what I could tell the flag has been in GNU find for a long time and is also available in other OS flavours like BSD and OS X. Make the step depend on gen-deps so that it can index thrift, protobuf, etc, output. Change-Id: I22bdb7c64036cb88a8a10907af35c5e3a55a9195 Reviewed-on: http://gerrit.cloudera.org:8080/9007 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-13 03:24:06 +00:00
Tim Armstrong	10fb24afb9	IMPALA-6383: free memory after skipping parquet row groups Before this patch, resources were only flushed after breaking out of NextRowGroup(). This is a problem because resources can be allocated for skipped row groups (e.g. for reading dictionaries). Testing: Tested in conjunction with a prototype buffer pool patch that was DCHECKing before the change. Added DCHECKs to the current version to ensure the streams are cleared up as expected. Change-Id: Ibc2f8f27c9b238be60261539f8d4be2facb57a2b Reviewed-on: http://gerrit.cloudera.org:8080/9002 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-13 02:48:08 +00:00
Michael Ho	df3a440fff	IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC KRPC in general tends to put more pressure on the thread caches due to allocations of more small objects (i.e. <1MB). While some of them are being addressed in KUDU-1865, it's shown that the following TCMalloc workarounds will provide reasonable performance with KRPC: - TCMALLOC_TRANSFER_NUM_OBJ: - maximum number of object per classe type to transfer between thread and central caches. - the default value of 512 in 2.5.2 seems to cause the spin lock in the central cache to be held for too long with KRPC. 2.6.0 and latter reverts this value to 32 by default. - TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES - total amount of memory allocated to all thread caches in bytes - the default value is 32MB. We need to bump it to 1GB which is the internal cap in TCMalloc. This change upgrades GPerfTools/TCMalloc to 2.6.3 to pick up the change of the default value of TCMALLOC_TRANSFER_NUM_OBJ. In addition, when KRPC is enabled and FLAGS_TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES has the default value of 0, we will automatically bump the thread cache sizes to 1GB. Without these workarounds, stress test with KRPC will grind to a halt due to contention for the spinlock in TCMalloc's central cache. With these workarounds, the stress test completes within the same ballpark as thrift. Also did a perf run with Thrift. The regression in TPCH-Q2 is mostly due to sensitivity in runtime filter timing and the avg can be dragged up due to a bad run when filters arrive late. No regression as measured in targeted-perf. +------------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +------------+-----------------------+---------+------------+------------+----------------+ \| TPCH(_300) \| parquet / none / none \| 18.93 \| -0.84% \| 10.08 \| +1.45% \| +------------+-----------------------+---------+------------+------------+----------------+ +------------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ \| Workload \| Query \| File Format \| Avg(s) \| Base Avg(s) \| Delta(Avg) \| StdDev(%) \| Base StdDev(%) \| Num Clients \| Iters \| +------------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ \| TPCH(_300) \| TPCH-Q2 \| parquet / none / none \| 6.28 \| 3.25 \| R +93.41% \| * 49.77% * \| * 12.47% * \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q4 \| parquet / none / none \| 5.00 \| 4.77 \| +4.83% \| 0.41% \| 0.03% \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q13 \| parquet / none / none \| 21.29 \| 20.69 \| +2.90% \| 0.55% \| 0.37% \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q11 \| parquet / none / none \| 1.73 \| 1.71 \| +0.94% \| 1.69% \| 2.85% \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q14 \| parquet / none / none \| 6.03 \| 5.99 \| +0.76% \| 0.00% \| 0.95% \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q16 \| parquet / none / none \| 6.97 \| 6.93 \| +0.58% \| 0.74% \| 0.73% \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q3 \| parquet / none / none \| 29.15 \| 29.03 \| +0.40% \| 1.63% \| 1.39% \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q1 \| parquet / none / none \| 14.01 \| 13.96 \| +0.34% \| 1.28% \| 0.51% \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q6 \| parquet / none / none \| 1.27 \| 1.27 \| -0.03% \| 3.69% \| 0.07% \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q9 \| parquet / none / none \| 30.99 \| 31.13 \| -0.45% \| 0.54% \| 0.19% \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q5 \| parquet / none / none \| 48.03 \| 48.33 \| -0.63% \| 4.72% \| 0.11% \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q7 \| parquet / none / none \| 46.85 \| 47.41 \| -1.18% \| 1.59% \| 0.46% \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q8 \| parquet / none / none \| 7.92 \| 8.03 \| -1.39% \| 3.67% \| 5.63% \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q19 \| parquet / none / none \| 30.98 \| 31.51 \| -1.67% \| 1.33% \| 0.82% \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q18 \| parquet / none / none \| 33.55 \| 34.13 \| -1.71% \| 1.15% \| 1.46% \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q10 \| parquet / none / none \| 9.46 \| 9.64 \| -1.82% \| 0.63% \| 0.75% \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q22 \| parquet / none / none \| 6.00 \| 6.16 \| -2.58% \| 0.08% \| 5.12% \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q15 \| parquet / none / none \| 3.41 \| 3.50 \| -2.60% \| 1.40% \| 0.46% \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q12 \| parquet / none / none \| 3.24 \| 3.33 \| -2.86% \| 1.36% \| 1.55% \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q17 \| parquet / none / none \| 4.65 \| 4.83 \| -3.58% \| 1.17% \| 0.42% \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q21 \| parquet / none / none \| 96.15 \| 100.63 \| -4.45% \| 0.29% \| 3.18% \| 1 \| 3 \| \| TPCH(_300) \| TPCH-Q20 \| parquet / none / none \| 3.40 \| 3.64 \| -6.63% \| 4.82% \| * 12.70% * \| 1 \| 3 \| +------------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ +---------------------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +---------------------+-----------------------+---------+------------+------------+----------------+ \| TARGETED-PERF(_300) \| parquet / none / none \| 59.31 \| -1.40% \| 8.80 \| -2.24% \| +---------------------+-----------------------+---------+------------+------------+----------------+ +---------------------+--------------------------------------------------------+-----------------------+---------+-------------+------------+------------+----------------+-------------+-------+ \| Workload \| Query \| File Format \| Avg(s) \| Base Avg(s) \| Delta(Avg) \| StdDev(%) \| Base StdDev(%) \| Num Clients \| Iters \| +---------------------+--------------------------------------------------------+-----------------------+---------+-------------+------------+------------+----------------+-------------+-------+ \| TARGETED-PERF(_300) \| primitive_conjunct_ordering_2 \| parquet / none / none \| 36.27 \| 30.52 \| +18.87% \| * 17.02% * \| 2.42% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_broadcast_join_1 \| parquet / none / none \| 1.17 \| 1.02 \| +14.59% \| * 12.82% * \| 0.02% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_filter_bigint_in_list \| parquet / none / none \| 1.03 \| 0.92 \| +11.54% \| 2.51% \| 2.53% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_filter_bigint_selective \| parquet / none / none \| 0.37 \| 0.34 \| +6.93% \| 7.94% \| 1.32% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_filter_string_selective \| parquet / none / none \| 0.47 \| 0.44 \| +5.97% \| 6.11% \| 1.12% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_groupby_bigint_highndv \| parquet / none / none \| 24.35 \| 23.87 \| +1.99% \| 0.82% \| 0.63% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_many_fragments \| parquet / none / none \| 61.64 \| 60.93 \| +1.17% \| 3.25% \| 1.07% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_top-n_all \| parquet / none / none \| 36.63 \| 36.31 \| +0.87% \| 0.35% \| 4.62% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_conjunct_ordering_4 \| parquet / none / none \| 0.87 \| 0.86 \| +0.66% \| 0.47% \| 0.03% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_long_predicate \| parquet / none / none \| 28.83 \| 28.69 \| +0.49% \| 0.24% \| 0.10% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_topn_bigint \| parquet / none / none \| 5.53 \| 5.51 \| +0.34% \| 2.23% \| 0.07% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_broadcast_join_3 \| parquet / none / none \| 58.41 \| 58.27 \| +0.24% \| 2.51% \| 0.02% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_decimal_arithmetic \| parquet / none / none \| 96.67 \| 96.59 \| +0.09% \| 0.41% \| 0.08% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_count_star \| parquet / none / none \| 0.09 \| 0.09 \| +0.03% \| 0.72% \| 0.73% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_groupby_decimal_lowndv.test \| parquet / none / none \| 3.26 \| 3.26 \| -0.00% \| 0.09% \| 1.57% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_broadcast_join_2 \| parquet / none / none \| 4.40 \| 4.41 \| -0.10% \| 0.01% \| 1.24% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_shuffle_join_union_all_with_groupby \| parquet / none / none \| 67.43 \| 67.58 \| -0.21% \| 0.31% \| 0.38% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_intrinsic_appx_median \| parquet / none / none \| 35.14 \| 35.27 \| -0.34% \| 0.39% \| 0.47% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_groupby_decimal_highndv \| parquet / none / none \| 25.94 \| 26.07 \| -0.51% \| 5.33% \| 2.30% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_exchange_broadcast \| parquet / none / none \| 76.54 \| 76.96 \| -0.54% \| 4.50% \| 4.70% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_filter_string_like \| parquet / none / none \| 5.52 \| 5.56 \| -0.68% \| 0.92% \| 3.17% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_many_independent_fragments \| parquet / none / none \| 254.76 \| 256.50 \| -0.68% \| 5.95% \| 2.19% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_shuffle_join_one_to_many_string_with_groupby \| parquet / none / none \| 228.43 \| 230.08 \| -0.72% \| 0.62% \| 1.34% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_empty_build_join_1 \| parquet / none / none \| 1.90 \| 1.92 \| -1.26% \| 1.19% \| 2.74% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_exchange_shuffle \| parquet / none / none \| 78.99 \| 80.26 \| -1.59% \| 0.75% \| 1.61% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_shuffle_1mb_rows \| parquet / none / none \| 1008.91 \| 1027.39 \| -1.80% \| 2.33% \| 0.72% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_groupby_bigint_pk \| parquet / none / none \| 96.58 \| 98.62 \| -2.07% \| 1.08% \| 1.98% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_groupby_bigint_lowndv \| parquet / none / none \| 3.26 \| 3.33 \| -2.10% \| 3.00% \| 0.06% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_small_join_1 \| parquet / none / none \| 0.42 \| 0.43 \| -2.54% \| 0.23% \| 1.54% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_filter_string_non_selective \| parquet / none / none \| 0.90 \| 0.93 \| -2.54% \| 0.18% \| 2.45% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_intrinsic_to_date \| parquet / none / none \| 77.56 \| 79.81 \| -2.82% \| 0.39% \| 2.79% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_filter_decimal_non_selective \| parquet / none / none \| 0.80 \| 0.83 \| -3.56% \| 0.12% \| 2.68% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_filter_bigint_non_selective \| parquet / none / none \| 1.00 \| 1.05 \| -4.60% \| 0.31% \| 5.18% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_conjunct_ordering_1 \| parquet / none / none \| 4.91 \| 5.16 \| -4.89% \| 0.44% \| 0.41% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_orderby_all \| parquet / none / none \| 54.67 \| 58.30 \| -6.23% \| 0.45% \| 0.81% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_conjunct_ordering_5 \| parquet / none / none \| 11.91 \| 12.70 \| -6.24% \| 1.04% \| 0.53% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_filter_decimal_selective \| parquet / none / none \| 0.86 \| 0.94 \| -8.58% \| * 24.14% * \| * 36.04% * \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_orderby_bigint \| parquet / none / none \| 15.06 \| 16.57 \| -9.14% \| 3.50% \| 0.10% \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_filter_in_predicate \| parquet / none / none \| 1.11 \| 1.24 \| -10.09% \| * 11.28% * \| * 12.71% * \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_orderby_bigint_expression \| parquet / none / none \| 18.16 \| 24.86 \| I -26.97% \| 0.96% \| * 20.73% * \| 1 \| 3 \| \| TARGETED-PERF(_300) \| primitive_conjunct_ordering_3 \| parquet / none / none \| 0.94 \| 1.71 \| I -44.74% \| 2.68% \| * 42.73% * \| 1 \| 3 \| +---------------------+--------------------------------------------------------+-----------------------+---------+-------------+------------+------------+----------------+-------------+-------+ Change-Id: I5be574435af51fb7a875b16888cca260b341190e Reviewed-on: http://gerrit.cloudera.org:8080/8991 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-12 22:35:13 +00:00
John Russell	ceeb130c5d	IMPALA-2172, IMPALA-6391: [DOCS] Distinguish char_length() from length() Modify both char_length() and length() usage notes to say when they return the same or different results. Include the same example, showing both STRING and CHAR types, under both functions. Change-Id: I18cabfce66351bb890bfbfc26b93466204a82625 Reviewed-on: http://gerrit.cloudera.org:8080/9014 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-12 22:24:34 +00:00
Jim Apple	717cb9d172	Update copyright date to 2018. Change-Id: I8b55f6cd8a94197f48affad2b623af021e66d1df Reviewed-on: http://gerrit.cloudera.org:8080/8925 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2018-01-12 21:38:38 +00:00
John Russell	b5e2f338ab	IMPALA-5736: [DOCS] Document --query_option for impala-shell Change-Id: I5fa4fc27d6566e87fdabe57edc176133d586a84b Reviewed-on: http://gerrit.cloudera.org:8080/8771 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-12 20:39:52 +00:00
John Russell	e0c9930037	IMPALA-2181: [DOCS] Document changes to SET output Change-Id: Iade7cb326715ebbb8518230d518d05601d615f61 Reviewed-on: http://gerrit.cloudera.org:8080/8865 Reviewed-by: John Russell <jrussell@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-12 20:11:20 +00:00
Lars Volker	6dc7237fc1	IMPALA-6387: Increase wait for Breakpad crash handling It seems that a recent slowdown of our test infrastructure might have caused Breakpad to take a longer time to write Minidumps. There could also be a more fundamental issue leading to hangs. To rule this out, this change increases the default timeout to something larger to allow the tests to complete. Change-Id: I84742be9af9444607fde4baf8ea1c0092ff181fe Reviewed-on: http://gerrit.cloudera.org:8080/9018 Tested-by: Lars Volker <lv@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-01-12 17:22:56 +00:00
John Russell	5842c8406d	IMPALA-1767: [DOCS] Document new Boolean operators In a new subtopic: IS [NOT] TRUE IS [NOT] FALSE Folded into IS [NOT] NULL: IS [NOT] UNKNOWN Change-Id: Iefebf210418ec2d47b154bd37166b76720f085bb Reviewed-on: http://gerrit.cloudera.org:8080/8942 Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-12 09:04:48 +00:00
Tim Armstrong	fd5c3a7e18	IMPALA-6290: limit ScannerContext to 1 buffer at a time This is a prerequisite for constraining the number of buffers per scan range. Before this patch, calling ReadBytes(), SkipBytes(), etc could cause an arbitrary number of I/O buffers to accumulate in 'completed_io_buffers_'. E.g. if we allocated 3 * 8MB I/O buffers for a range and then called ReadBytes(30MB), we would hit resource exhaustion as soon as 3 buffers were accumulated in 'completed_io_buffers_'. The fix is to avoid accumulating any buffers in 'completed_io_buffers_'. Instead of adding them to 'completed_io_buffers_', completed buffers are just returned to the I/O manager. It turned out that this did not weaken the ScannerContext's guarantees about memory lifetime, because ScannerContext::GetBytesInternal() cleared 'boundary_buffer_' each time it was called regardless. I checked that this behaviour wasn't a bug by inspecting the scanner code. I could not find any cases where scanners depended on returned memory remaining valid beyond the next Read()/Get()/Skip*() call on the stream. This change makes that lifetime explicit in the comments. A side-effect of this fix is that scanners do not need to call ReleaseCompletedResources() in CommitRows() and means that the ScannerContext only ever needs to hold one I/O buffer at a time. This change also reimplements SkipBytes() to avoid it accumulating memory in the boundary buffer for large skip sizes. Also clarifies some of the invariants in ScannerContext. E.g. some places assumed io_buffer_ != NULL, but that is no longer needed. Testing: Ran core tests with ASAN and exhaustive tests with DEBUG. Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Reviewed-on: http://gerrit.cloudera.org:8080/8814 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-12 02:06:33 +00:00
Bharath Vissapragada	20daa4d516	IMPALA-6384: RequestPoolService should honor custom group mapping config Due to the way in which we instantiate fair scheduler allocation loader, we donot read the config overrides from the HDFS config files. This is an unexpected behavior from users' POV since we typically support overrides like custom user -> group mapping via HDFS config (for ex: LDAPGroupsMapping) that eventually affects the query -> pool assignment. Fix: This patch loads the hadoop default configuration so that the underlying QueuePlacementPolicy is based on user specified overrides. Testing (manual): Changed the core-site.xml to use LDAPGroupsMapping instead of the default ShellBasedUnixGroupsMapping and confirmed that the correct group mapping plugin is loaded, by adding additional logging. Also, modified TestRequestPoolService to assert that the core-site xml overrides are loaded. Change-Id: Ibb93870c0cc37e2432a643a274931f1d3d13fb96 Reviewed-on: http://gerrit.cloudera.org:8080/9000 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-11 22:52:29 +00:00
John Russell	b27537a15b	IMPALA-4252: [DOCS] Document min/max filters for Kudu tables Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2 Reviewed-on: http://gerrit.cloudera.org:8080/8986 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-11 21:49:23 +00:00
Philip Zeyliger	ab81c48d7a	Fix typo in test_observability. Should fix "NameError: global name 'dgb_str' is not defined". Change-Id: Ida3f355c6c6be5ed52e4d445f8f80665cdc8e2b8 Reviewed-on: http://gerrit.cloudera.org:8080/9003 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Zoram Thanga <zoram@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-11 20:10:01 +00:00
Philip Zeyliger	604e48d2f3	IMPALA-6330, IMPALA-5702: Avoid boost's trim() to workaround crash after dynamic linking. Replaces boost::algorithm::trim() with std::string methods when parsing /proc/self/smaps and adds a trivial unit test for MemInfo::ParseSmaps(). I did not replace other uses of trim() with equivalents from be/src/gutil/strings/strip.h at this moment. The backstory here is that TestAdmissionControllerStress::test_admission_controller_with_flags fails occasionally on dynamically linked builds of Impala. I was able to reproduce the failure reliably (within 3 tries) with the following: $ ./buildall.sh -notests -so -noclean $ bin/start-impala-cluster.py --impalad_args="--memory_maintenance_sleep_time_ms=1" $ impala-shell.sh --query 'select max(t.c1), avg(t.c2), min(t.c3), avg(c4), avg(c5), avg(c6) from (select max(tinyint_col) over (order by int_col) c1, avg(tinyint_col) over (order by smallint_col) c2, min(tinyint_col) over (order by smallint_col desc) c3, rank() over (order by int_col desc) c4, dense_rank() over (order by bigint_col) c5, first_value(tinyint_col) over (order by bigint_col desc) c6 from functional.alltypes) t;' The stack trace looks like: (gdb) bt #0 0x00007fe230df2428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x00007fe230df402a in __GI_abort () at abort.c:89 #2 0x00007fe23312026d in __gnu_cxx::__verbose_terminate_handler() () at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/vterminate.cc:95 #3 0x00007fe2330d8b66 in __cxxabiv1::__terminate(void ()()) (handler=<optimized out>) at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_terminate.cc:47 #4 0x00007fe2330d8bb1 in std::terminate() () at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_terminate.cc:57 #5 0x00007fe2330d8cb8 in __cxxabiv1::__cxa_throw(void, std::type_info, void ()(void)) (obj=0x8e54080, tinfo=0x7fe233356210 <typeinfo for std::bad_cast>, dest=0x7fe23311ea70 <std::bad_cast::~bad_cast()>) at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_throw.cc:87 #6 0x00007fe233110332 in std::__throw_bad_cast() () at ../../../../../gcc-4.9.2/libstdc++-v3/src/c++11/functexcept.cc:63 #7 0x00007fe2330e8ad7 in std::use_facet<std::ctype<char> >(std::locale const&) (__loc=...) at /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-ubuntu-16-04/toolchain/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/locale_classes.tcc:137 #8 0x00000000008d2cdf in void boost::algorithm::trim<std::string>(std::string&, std::locale const&) () #9 0x00007fe2396d5057 in impala::MemInfo::ParseSmaps() () at /home/philip/src/Impala/be/src/util/mem-info.cc:132 ... My best theory is that there's a race/bug, wherein the std::locale static initialization work is getting somehow 'reset' by the dynamic linker, when more libraries are linked in as a result of the query. My evidence to support this theory is scant, but I do notice that LD_DEBUG=all prints the following when the query is executed (but not right at startup): binding file /home/philip/src/Impala/toolchain/gcc-4.9.2/lib64/libstdc++.so.6 [0] to /home/philip/src/Impala/toolchain/gflags-2.2.0-p1/lib/libgflags.so.2.2 [0]: normal symbol `std::locale::facet::_S_destroy_c_locale(__locale_struct&)' Note that there are BSS segments for some of std::locale::facet:: inside of libgflags.so. $nm toolchain/gflags-2.2.0-p1/lib/libgflags.so \| c++filt \| grep facet \| grep ' B ' 00000000002e2d10 B std::locale::facet::_S_c_locale 00000000002e2d0c B std::locale::facet::_S_once I'm not the first to run into variants of these issues, though the results are fairly unhelpful: http://www.boost.org/doc/libs/1_58_0/libs/locale/doc/html/faq.html https://stackoverflow.com/questions/26990412/c-boost-crashes-while-using-locale https://svn.boost.org/trac10/ticket/4671 http://clang-developers.42468.n3.nabble.com/std-use-facet-lt-std-ctype-lt-char-gt-gt-crashes-on-linux-td4033967.html https://unix.stackexchange.com/questions/719/can-we-get-compiler-information-from-an-elf-binary https://stackoverflow.com/questions/42376100/linking-with-library-causes-collate-facet-to-be-missing-from-char http://lists.llvm.org/pipermail/cfe-dev/2012-July/023289.html https://gcc.gnu.org/ml/libstdc++/2014-11/msg00122.html Change-Id: I8dd807f869a9359d991ba515177fb2298054520e Reviewed-on: http://gerrit.cloudera.org:8080/8888 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-11 08:05:30 +00:00
Tim Armstrong	c0c1202dbf	IMPALA-6381: increase test_exchange_delays timeout for isilon Change-Id: Ie82030403fa238b673b0a3ccdc7731b0d78b63af Reviewed-on: http://gerrit.cloudera.org:8080/8993 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-11 00:42:02 +00:00
Adam Holley	a34df684f7	IMPALA-6371: Additional check for delimiters The check validates the codepoint of the Java char. Testing: - Added tests for valid/invalid unicode in HdfsStorageDescriptorTest. Change-Id: If8dc335d39dd02f602cf93682bccf84b2c099dde Reviewed-on: http://gerrit.cloudera.org:8080/8959 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-10 22:31:32 +00:00
John Russell	3394eb1be7	[DOCS] Add phony doc build targets 'html' and 'pdf' The better to do a quick verification using one format or the other, by issuing 'make html' or 'make pdf'. 'make all' still builds both. Change-Id: Ic096259a773966871b09a023bf12eb6c362167af Reviewed-on: http://gerrit.cloudera.org:8080/8994 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2018-01-10 21:13:56 +00:00
John Russell	409b58150a	IMPALA-6278: [DOCS] Add release note subtopics Primarily placeholders that link to the 2.11 CHANGELOG file on the web. Change-Id: I968f53c6652197774cdec364c47bc10277e6877a Reviewed-on: http://gerrit.cloudera.org:8080/8992 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-10 20:38:23 +00:00
Jinchul	6041865031	IMPALA-3651: Adds murmur_hash() built-in function murmur_hash relys on HashUtil::MurmurHash2_64 which MurmurHash2 64-bit version. Testing: Add unit tests for primitive types: ExprTest.MurmurHashFunction Add E2E tests into exprs.test Change-Id: I14d56ffb8fab256f3f66a2669271fd4b3c50cc29 Reviewed-on: http://gerrit.cloudera.org:8080/8893 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-10 20:17:26 +00:00
John Russell	31c6a1719a	[DOCS] Recommend using Kudu Java API for rapid DMLs Change-Id: I0098f0c3d5d07c89e6bb589c4c04edce300c1ad3 Reviewed-on: http://gerrit.cloudera.org:8080/8976 Reviewed-by: Jean-Daniel Cryans <jdcryans@apache.org> Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-10 18:42:04 +00:00
John Russell	1f4d687a9b	IMPALA-5317: [DOCS] Doc for DATE_TRUNC() function Change-Id: Ifcf38903bb10db12cbb8d73a2dc875aef29cd359 Reviewed-on: http://gerrit.cloudera.org:8080/8768 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-10 18:41:31 +00:00
Taras Bobrovytsky	c86b0a9736	IMPALA-5014: Part 2: Round when casting decimal to timestamp When there are too many digits to the right of the dot in a decimal, we would always truncate when casting to timestamp. In this patch we change the behavior to round instead of truncating when decimal_v2 is enabled. Testing: - Added some EE tests, ran BE tests on my machine. Change-Id: I8fb3a7d976ab980b8572d7e9524850572bad57da Reviewed-on: http://gerrit.cloudera.org:8080/8969 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-10 05:47:23 +00:00
Xianda Ke	514dfaf9fd	IMPALA-6128: Add support for AES-CTR encryption when spilling to disk CFB mode is a stream cipher and is secure when used with a different nonce/IV for every message. However it can be a performance bottleneck. CTR mode is also stream cipher and is secure, 4~6x faster than CFB mode in OpenSSL. AES-CTR+SHA256 is about 40~70% faster than AES-CFB+SHA256. CTR mode is used if OpenSSL version>=1.0.1 at runtime, otherwise fall back to using CFB mode. Testing: run runtime tmp-file-mgr-test, openssl-util-test, buffer-pool-test and buffered-tuple-stream-test The ut case openssl-util-test.EncryptInPlace tests encryption in both modes. Change-Id: I9debc240615dd8cdbf00ec8730cff62ffef52aff Reviewed-on: http://gerrit.cloudera.org:8080/8861 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-10 05:39:09 +00:00
Taras Bobrovytsky	f810458ca4	IMPALA-6231: Implement decimal_v2 fuzz test Implement a test that generates random decimal numbers in the pytest framework, performs a random mathemtaical operation in Impala and verifies that the result is correct by doing the same operating using the Python decimal module. We try to generate not only completely random decimal numbers, but also numbers that have interesting properties, such as the number being a power of two. Change-Id: I4328125de5c583ec8ead1f78d9a08703b18b2d85 Reviewed-on: http://gerrit.cloudera.org:8080/8898 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Zach Amsden <zamsden@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-10 03:03:52 +00:00
Jinchul	99962d2e81	IMPALA-4168: Adds Oracle-style hint placement for INSERT/UPSERT Allow to specify Oracle-style hint on INSERT/UPSERT statements. For example, - insert /* +noshuffle / into table functional.alltypes partition(year, month) select from functional.alltypes; - upsert /* +noshuffle / into functional_kudu.alltypes select from functional.alltypes; Testing: Add unit tests to ParserTest#TestPlanHints Add plan check tests to PlannerTest#testInsert, PlannerTest#testKuduUpsert Add tests to ToSqlTest#planHintsTest Change-Id: Ied7629d70197a0270cdc0853e00cc021fdb4dc20 Reviewed-on: http://gerrit.cloudera.org:8080/8676 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-10 03:03:49 +00:00
aphadke	38461c524f	IMPALA-5052: Read and write signed integer logical types in Parquet This patch maps a signed integer logical type in parquet to a supported Impala column type. This change introduces the following mapping - INT_8 -> TINYINT INT_16 -> SMALLINT INT_32 -> INT INT_64 -> BIGINT Also, added a parquet file with the following schema for testing - schema { optional int32 id; optional int32 tinyint_col (INT_8); optional int32 smallint_col (INT_16); optional int32 int_col; optional int64 bigint_col; } Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Reviewed-on: http://gerrit.cloudera.org:8080/8548 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Tianyi Wang <twang@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-09 04:55:59 +00:00
Tianyi Wang	c4d950b9e9	IMPALA-3887: Wait for HDFS replication in data loading When the data loading finishes, it is possible for some HDFS blocks to be under replicated. If impala gets the metadata before the replication is done, some tests may fail. This patch adds a replication waiting step in the data loading script. Resubmitted with filesystem type check. Change-Id: I64d9a8ea1d0a32b40047321b50a7139a8f48eac8 Reviewed-on: http://gerrit.cloudera.org:8080/8916 Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-09 03:24:36 +00:00
Bharath Vissapragada	6a87eb20a5	IMPALA-6348: Redact only sensitive fields in runtime profiles Without this patch, redaction is applied to every field in the runtime profile. This approach has an undesired side effect when Kerberos auth + email redaction is in place. Since the redaction applies to every field, even principals (from Connected/Delegated User fields) are redacted, as the Kerberos principal format generally pattern matches with an email redactor template. This is particularly problematic for monitoring tools that consume runtime profiles and use these fields to group the queries by user. This patch fixes the problem by redacting only the following sensitive fields. - Query Statement - Error logs (since they can contain column references etc.) - Query Status - Query Plan Other fields in the runtime profile are left unredacted. Change-Id: Iae3b6726009bf458a7ec73131e5d659b12ab73cf Reviewed-on: http://gerrit.cloudera.org:8080/8934 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-06 22:54:17 +00:00
Zoltan Borok-Nagy	ce65b43d47	IMPALA-2248: Make idle_session_timeout a query option This commit makes idle_session_timeout a query option. idle_session_timeout currently can be set as a command line option, which will be the default timeout for sessions. HS2 sessions can override it with a smaller value by setting it in the configuration overlay of HS2 OpenSession(). However, we can't override idle_session_timeout for JDBC/ODBC connections, because we cannot put this in the connection string. This commit is a workaround for this problem, it allows JDBC/ODBC connections to set the session timeout as a query option with the SET statement. After this commit, the session timeout can be overridden to any value, i.e. the command line flag idle_session_timeout doesn't limit this option anymore. I created an automated test case in JdbcTest.java based on test_hs2.py::test_concurrent_session_mixed_idle_timeout. I also extended the test_session_expiration and test_set_and_unset test suites. Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e Reviewed-on: http://gerrit.cloudera.org:8080/8490 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-06 01:47:47 +00:00
Pranay	302ec25b2e	IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder Currently DictDecoder class and DictEncoder class uses std::vector to store the tables mapping codeword to value and vice-versa. It is hard to detect the memory usage by these tables when they becomes very large, since this memory is not accounted by Impala's memory mangement infrastructure. This patch uses the memory tracker of HdfsScanner to track the memory used by dictionary in DictDecoder class. Similary it uses memory tracker of HdfsTableSink to track the memory used by dictionary in DictEncoder class. Memory for the dictionary, stored as std::vector is still allocated from std:allocator but the amount allocated is accounted by introducing a counter which is incremented and decremented as the memory is consumed and released by vector. Testing ------- Ran all the backend and end-end tests with no failures. Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Reviewed-on: http://gerrit.cloudera.org:8080/8034 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-06 01:30:36 +00:00

1 2 3 4 5 ...

6534 Commits