impala

mirror of https://github.com/apache/impala.git synced 2026-01-06 06:01:03 -05:00

Author	SHA1	Message	Date
Dimitris Tsirogiannis	3db5ced4ce	IMPALA-3726: Add support for Kudu-specific column options This commit adds support for Kudu-specific column options in CREATE TABLE statements. The syntax is: CREATE TABLE tbl_name ([col_name type [PRIMARY KEY] [option [...]]] [, ....]) where option is: \| NULL \| NOT NULL \| ENCODING encoding_val \| COMPRESSION compression_algorithm \| DEFAULT expr \| BLOCK_SIZE num The output of the SHOW CREATE TABLE statement was altered to include all the specified column options for Kudu tables. Change-Id: I727b9ae1b7b2387db752b58081398dd3f3449c02 Reviewed-on: http://gerrit.cloudera.org:8080/5026 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-11-18 11:41:01 +00:00
Attila Jeges	60414f0633	IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present This change introduces a new catalogd startup option (init_first_metastore_client_timeout_seconds) that specifies the time in seconds catalogd should spend on retrying to establish a connection to HMS the first time on startup before giving up and exiting fatally. Setting this startup option to a value that is greater than the HMS startup time will allow CM to start Impala at the same time or even before HMS. The default value of init_first_metastore_client_timeout_seconds is 120 seconds. Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b Reviewed-on: http://gerrit.cloudera.org:8080/5095 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins	2016-11-18 03:12:12 +00:00
Jim Apple	76719de5d1	Don't overwrite user's .ssh/config file when bootstrapping From bash's manual page on redirecting with '>' Redirection of output causes the file whose name results from the expansion of word to be opened for writing on file descriptor n, or the standard output (file descriptor 1) if n is not specified. If the file does not exist it is created; if it does exist it is truncated to zero size. Change-Id: I0d1a56441fcb5a2a2aed043fc1ece866c5d8287a Reviewed-on: http://gerrit.cloudera.org:8080/4967 Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Impala Public Jenkins	2016-11-18 03:06:52 +00:00
Alex Behm	263f222557	IMPALA-4490: Only generate runtime filters for hash join nodes. Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95 Reviewed-on: http://gerrit.cloudera.org:8080/5117 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-18 00:26:35 +00:00
Jim Apple	3be0f122a5	IMPALA-3398: Add docs to main Impala branch. These are refugees from doc_prototype. They can be rendered with the DITA Open Toolkit version 2.3.3 by: /tmp/dita-ot-2.3.3/bin/dita \ -i impala.ditamap \ -f html5 \ -o $(mktemp -d) \ -filter impala_html.ditaval Change-Id: I8861e99adc446f659a04463ca78c79200669484f Reviewed-on: http://gerrit.cloudera.org:8080/5014 Reviewed-by: John Russell <jrussell@cloudera.com> Tested-by: John Russell <jrussell@cloudera.com>	2016-11-17 22:38:44 +00:00
Tim Armstrong	46f5ad48e3	IMPALA-3202: refactor scratch file management into TmpFileMgr This is a pure refactoring patch that moves all of the logic for allocating scratch file ranges into TmpFileMgr in anticipation of this logic being used by the new BufferPool. There should be no behavioural changes. Also remove a bunch of TODOs that we're not going to fix. Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62 Reviewed-on: http://gerrit.cloudera.org:8080/4898 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-11-17 21:56:14 +00:00
Jim Apple	45ac10aa40	IMPALA-4476: Use unique_database to stop races in test_udfs.py These tests have been failing nondeterministically in larger machines with 16 cores. This change should stop races in haddop fs -put and drop/create function. Change-Id: I520a8b817ad7e32dba299c2535033f55f1bd1c84 Reviewed-on: http://gerrit.cloudera.org:8080/5124 Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Impala Public Jenkins	2016-11-17 21:35:56 +00:00
Henry Robinson	2648bfbd90	Improve message output from run-step.sh run-step prints a message to tell the reader what it's doing. However, that message wasn't flushed so that run-step could print OK or FAILED on the same line. The result was that long-running steps wouldn't print anything to the log until they were done, at least in Jenkins contexts. This patch changes it so that the message is flushed, and then the result is printed on a separate line (including the time it took to run the step). $ run-step "Hello world!" helloworld.out sleep 5 Hello world! (logging to /tmp/helloworld.out)... OK (Took: 0 min 5 sec) Change-Id: Iaced729f0ef6aa93174cd90b1516d3c34fe41a22 Reviewed-on: http://gerrit.cloudera.org:8080/5116 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-17 09:35:14 +00:00
Taras Bobrovytsky	eb8120d218	IMPALA-3812: Fix error message for unsupported types Before this patch an unclear error message was returned if DATE or DATETIME appeared in the select list after a star expansion. This was because DATE and DATETIME PrimitiveType was serialized as INVALID_TYPE. This is fixed by serializing correctly. Change-Id: I9019b4bfd219f94e554c795befd3ff5e39706ea9 Reviewed-on: http://gerrit.cloudera.org:8080/4859 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-17 05:31:34 +00:00
Tim Armstrong	0ab3d7691e	IMPALA-4392: restore PeakMemoryUsage to DataSink profiles The join build sink patches refactored the DataSink interface and inadvertently removed this counter from the profile. The problem was that the sink MemTracker was not initialized with the sink's profile. The fix is for the sink to create the MemTracker itself. Testing: Ran core tests. Manually checked profile to make sure the counter appeared in HdfsTableSink, DataStreamSender, etc. Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9 Reviewed-on: http://gerrit.cloudera.org:8080/4969 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-11-17 04:37:56 +00:00
Matthew Jacobs	77a2941a42	IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting Adds support in the shell to report the number of modified rows for all DML operations, as well as the number of rows with errors. Testing: Added shell tests. Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1 Reviewed-on: http://gerrit.cloudera.org:8080/5103 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-17 04:13:25 +00:00
Thomas Tauber-Marshall	3833707dbd	IMPALA-4466: Improve Kudu CRUD test coverage The results in the test files were verified by hand. This patch also introduces a new test section 'DML_RESULTS', which takes the name of a table as a comment and the contents of the table as its body and then verifies that the body matches the actual contents of the table. This makes it easy to check that a DML operation has the desired effect on the contents of a table, rather than always having to add another test case that runs a select on the table. For now, this section cannot be used in a test along with the RESULTS or ERRORS sections. TODO: Refactor the DML test case handling (IMPALA-4471) Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6 Reviewed-on: http://gerrit.cloudera.org:8080/4953 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-11-17 02:54:30 +00:00
Dan Hecht	ab0d21ab79	IMPALA-4493: fix string-compare-test when using clang Only the 0 value or sign bit is specified in the return value for strncmp(), so fix up the test accordingly. Testing: - verified the new test still reproduces IMPALA-4436 - verify the new test passes under ASAN build Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7 Reviewed-on: http://gerrit.cloudera.org:8080/5110 Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Internal Jenkins	2016-11-17 01:46:23 +00:00
Alex Behm	f5e660dd6e	IMPALA-4470: Avoid creating a NumericLiteral from NaN/infinity/-0. Our NumericLiteral is backed by a BigDecimal which cannot represent the special float values NaN, infinity or negative zero. As a result, when evaluating constant expressions from the FE we hit an exception when trying to create a NumericLiteral from a NaN or infinity value. Before, negative zero would silently get converted to zero which is dangerous. The fix is to treat the expr evaluation as a failure and not replace the constant Expr with a LiteralExpr. Change-Id: I8243b2ee9fa9c470d078b385583f2f48b606a230 Reviewed-on: http://gerrit.cloudera.org:8080/5050 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-16 23:55:42 +00:00
Matthew Jacobs	107fc4e9f9	IMPALA-4477: Upgrade Kudu version to latest master Change the toolchain build and Kudu version to use the latest master, using Kudu commit e836ac. Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3 Reviewed-on: http://gerrit.cloudera.org:8080/5106 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-11-16 21:57:37 +00:00
Alex Behm	0a654b3186	Run MT_DOP tests on all file formats. Change-Id: I28d5bcc48bbe32fb970b41daa919096061a05beb Reviewed-on: http://gerrit.cloudera.org:8080/5025 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-15 23:47:40 +00:00
Jim Apple	9434e38abb	clang-tidy should tidy tests; fix alignas error in clang builds. run_clang_tidy.sh was mistakenly using -notests, which doesn't even compile the tests, rather than -skiptests, which compiles (but does not run) the backend tests. When I discovered this, I also found that all clang builds (including tidy and asan) had been broken by my previous alignas commit (`10a4c5a2e4`). This patch fixes that as well. Change-Id: Ib7066039f78d7ee039db619b96e25665b4d63503 Reviewed-on: http://gerrit.cloudera.org:8080/5094 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2016-11-15 23:33:45 +00:00
Michael Ho	38ee3b6942	IMPALA-4444: Transfer row group resources to row batch on scan failure Previously, if any column reader fails in HdfsParqetScanner::AssembleRows(), the memory pools associated with the ScratchTupleBatch will be freed. This is problematic as ScratchTupleBatch may contain memory pools which are still referenced by row batches shipped upstream. This is possible because memory pools used by parquet column readers (e.g. decompressor_pool_) won't be transferred to a ScratchTupleBatch until the data page is exhausted. So, the memory pools of the previous data page is always attached to the ScratchTupleBatch of the current data page. On a scan failure, it's not necessarily safe to free the memory pool attached to the current ScratchTupleBatch. This patch fixes the problem above by transferring the memory pool and other resources associated with a row group to the current row batch in the parquet scanner on scan failure so it can eventually be freed by upstream operators as the row batch is consumed. Change-Id: Id70df470e98dd96284fd176bfbb946e9637ad126 Reviewed-on: http://gerrit.cloudera.org:8080/5052 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2016-11-15 23:02:50 +00:00
Dan Hecht	6937fa9a4c	IMPALA-4436: StringValue::StringCompare() should match strncmp() According to the C standard, strncmp() interprets characters as unsigned, whereas StringCompare() uses char (which happens to be signed). This means that for values greater than 127, they don't give the same result (which is especially bad considering StringCompare() falls back to strncmp(), and so the answer depends on the mismatched position). Fix StringCompare() to interpret as unsigned char. Change-Id: Ic0750f98d8c5ef7d0c0ea279cd1f80b4acbad1be Reviewed-on: http://gerrit.cloudera.org:8080/5083 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-11-15 20:47:18 +00:00
Jim Apple	4b774880c9	Increase wait times for startup of Hive and its Metastore On Ubuntu 14.04 on AWS EC2 m4.4x, instances, these components frequently take more than 30 seconds to start. I have seen the HMS take more than 90 seconds; this patch sets a more conservative timeout default. Change-Id: I43eb8646cca495578c8f9730faa04812957d2917 Reviewed-on: http://gerrit.cloudera.org:8080/5068 Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Internal Jenkins	2016-11-15 20:35:01 +00:00
Jim Apple	b3cbc960a7	IMPALA-4434: In Python, ''.split('\n') is [''], which has length 1 This test simply may have never been run in GMT or UTC - it appears to have an easy-to-make off-by-one error. Change-Id: Iac4943085b0693deb380499cd0e141eb672bead8 Reviewed-on: http://gerrit.cloudera.org:8080/5061 Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Internal Jenkins	2016-11-15 15:29:26 +00:00
Alex Behm	91b5264e52	IMPALA-4479: Use correct isSet() thrift function when evaluating constant bool exprs. Change-Id: Ie3ba195a5241ca630bd0cf71b83d423733b06546 Reviewed-on: http://gerrit.cloudera.org:8080/5088 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-15 11:17:43 +00:00
Thomas Tauber-Marshall	e6e2baea33	IMPALA-4372: 'Describe formatted' returns types in upper case A recent change caused 'describe formatted' to display the types in all upper case, but we want 'describe formatted' to match Hive's 'describe' output, which displays the types in lower case. This patch also fixes several problems with test_describe_formatted, which was encountering an error but reporting success. Change-Id: I274b97d4d1247244247fb38a5ca7f4c10bba8d22 Reviewed-on: http://gerrit.cloudera.org:8080/4861 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-11-15 05:38:12 +00:00
Jim Apple	0ea4a666dc	IMPALA-4433: Always generate testdata using the same time zone setting Before this change, testdata was generated using the java.util.TimeZone.getDefault() TimeZone of the machine it was running on. This patch standardizes on "America/Los_Angeles", which matches the existing expected results in the end-to-end tests. Change-Id: Iaf7cc796e44e9ff64880f9ae852f40961592f279 Reviewed-on: http://gerrit.cloudera.org:8080/5058 Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Internal Jenkins	2016-11-15 04:18:33 +00:00
Sailesh	f4a5d863c3	IMPALA-4465: Don't hold process wide lock while serializing Runtime Profile in GetRuntimeProfileStr() This patch changes the code so that the query_exec_state_map_lock_ is not held while serializing the RuntimeProfile, since that is a pretty expensive operation and happens at least once per query. This change makes a lot of client calls have less lock contention including Webserver calls and query registration/unregistration. Change-Id: I3ad8a1d6644259f177dfb3b29b3ba1ad6a76210a Reviewed-on: http://gerrit.cloudera.org:8080/5035 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2016-11-15 03:33:14 +00:00
Amos Bird	628685ae74	IMPALA-1654: General partition exprs in DDL operations. This commit handles partition related DDL in a more general way. We can now use compound predicates to specify a list of partitions in statements like ALTER TABLE DROP PARTITION and COMPUTE INCREMENTAL STATS, etc. It will also make sure some statements only accept one partition at a time, such as PARTITION SET LOCATION and LOAD DATA. ALTER TABLE ADD PARTITION remains using the old PartitionKeyValue's logic. The changed partition related DDLs are as follows, Table: p (i int) partitioned by (j int, k string) Partitions: +-------+---+-------+--------+------+--------------+-------------------+ \| j \| k \| #Rows \| #Files \| Size \| Bytes Cached \| Cache Replication \| +-------+---+-------+--------+------+--------------+-------------------+ \| 1 \| a \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| 1 \| b \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| 1 \| c \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| 2 \| d \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| 2 \| e \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| 2 \| f \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| Total \| \| -1 \| 0 \| 0B \| 0B \| \| +-------+---+-------+--------+------+--------------+-------------------+ 1. show files in p partition (j<2, k='a'); 2. alter table p partition (j<2, k in ("b","c") set cached in 'testPool'; // j can appear more than once, 3.1. alter table p partition (j<2, j>0, k<>"d") set uncached; // it is the same as 3.2. alter table p partition (j<2 and j>0, not k="e") set uncached; // we can also do 'or' 3.3. alter table p partition (j<2 or j>0, k like "%") set uncached; // missing 'k' matches all values of k 4. alter table p partition (j<2) set fileformat textfile; 5. alter table p partition (k rlike ".*") set serdeproperties ("k"="v"); 6. alter table p partition (j is not null) set tblproperties ("k"="v"); 7. alter table p drop partition (j<2); 8. compute incremental stats p partition(j<2); The remaining old partition related DDLs are as follows, 1. load data inpath '/path/from' into table p partition (j=2, k="d"); 2. alter table p add partition (j=2, k="g"); 3. alter table p partition (j=2, k="g") set location '/path/to'; 4. insert into p partition (j=2, k="g") values (1), (2), (3); General partition expressions or partially specified partition specs allows partition predicates to return empty partition set no matter 'IF EXISTS' is specified. Examples: [localhost.localdomain:21000] > alter table p drop partition (j=2, k="f"); Query: alter table p drop partition (j=2, k="f") +-------------------------+ \| summary \| +-------------------------+ \| Dropped 1 partition(s). \| +-------------------------+ Fetched 1 row(s) in 0.78s [localhost.localdomain:21000] > alter table p drop partition (j=2, k<"f"); Query: alter table p drop partition (j=2, k<"f") +-------------------------+ \| summary \| +-------------------------+ \| Dropped 2 partition(s). \| +-------------------------+ Fetched 1 row(s) in 0.41s [localhost.localdomain:21000] > alter table p drop partition (k="a"); Query: alter table p drop partition (k="a") +-------------------------+ \| summary \| +-------------------------+ \| Dropped 1 partition(s). \| +-------------------------+ Fetched 1 row(s) in 0.25s [localhost.localdomain:21000] > show partitions p; Query: show partitions p +-------+---+-------+--------+------+--------------+-------------------+ \| j \| k \| #Rows \| #Files \| Size \| Bytes Cached \| Cache Replication \| +-------+---+-------+--------+------+--------------+-------------------+ \| 1 \| b \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| 1 \| c \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| Total \| \| -1 \| 0 \| 0B \| 0B \| \| +-------+---+-------+--------+------+--------------+-------------------+ Fetched 3 row(s) in 0.01s Change-Id: I2c9162fcf9d227b8daf4c2e761d57bab4e26408f Reviewed-on: http://gerrit.cloudera.org:8080/3942 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-15 03:27:36 +00:00
Bharath Vissapragada	3f2f008ac4	IMPALA-3552: Make incremental stats max serialized size configurable The fix "IMPALA-2648/IMPALA-2664" introduced a conservative limitation on the maximum serialized size of incremental stats. As a side effect, some users with very large tables are experiencing regressions especially when they upgrade impala and the serialized size goes beyond 200MB. To mitigate the issue, the change introduces a new gflag, 'inc_stats_size_limit_bytes' to make the max serialized size configurable, which allows impala users to specify their own maximum serialized size. Default value for inc_stats_size_limit_bytes is 200MB. The change introduces a TBackendGflags class to pass the gflags from backend to the Frontend and the Catalog via thrift. This also revamps existing query options to use the TBackendConfig. Change-Id: I33684725a61eabc67237503e61178305d37d3cb5 Reviewed-on: http://gerrit.cloudera.org:8080/4867 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Internal Jenkins	2016-11-15 03:22:11 +00:00
Michael Ho	cac02d6b76	IMPALA-4452: Always call AggFnEvaluator::Open() before AggFnEvaluator::Init() As part of the fix for IMPALA-2379, the expression contexts of aggregation function evaluators are expected to be opened before their initFn() are called so \ constant arguments can be accessed in initFn(). However, the legacy aggregation node wasn't updated to follow this order for singleton result tuple (i.e. no group-by). This patch fixes the problem by deferring the creation of the singleton tuple to a point in AggregationNode::Open() after the expression contexts of all aggregate function evaluators have been opened. PartitionedAggregationNode() was already updated to follow this order. This patch also fixes a minor bug in which uninitialized entries of agg_fn_ctxs_[] may be accessed in AggregationNode::Close() if AggregationNode::Prepare() fails. Change-Id: I2f261dee47821c517d8dbe1babf4112462d85807 Reviewed-on: http://gerrit.cloudera.org:8080/5049 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2016-11-14 22:38:09 +00:00
Jim Apple	10a4c5a2e4	IMPALA-4480: zero_length_region_ must be as aligned as max_align_t MemPool::TryAllocateAligned returns memory that might be that aligned, and it returns &MemPool::zero_length_region_ when called to allocate a block of size 0. While I'm here, do some things to make diagnosing test failures from terminal output easier. Change-Id: Ia31b27e38897f357478c4eedaab0c787e731b2d4 Reviewed-on: http://gerrit.cloudera.org:8080/5062 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-11-14 21:25:42 +00:00
Matthew Jacobs	4258b9f09e	IMPALA-4477: Upgrade Kudu version to latest master Change the toolchain build and Kudu version to use the latest master, using Kudu commit 88b023. Change-Id: I21c5bc0d28df83cd2e57cd30b6ab416e0d430775 Reviewed-on: http://gerrit.cloudera.org:8080/5054 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-13 02:52:33 +00:00
Thomas Tauber-Marshall	d15f86cb6f	IMPALA-4454: test_kudu.TestShowCreateTable flaky The cause of the flakiness is Kudu CREATE TABLE operations that are sometimes taking a long time, leading to timeouts in the hiveserver2 connection. This patch adds the ability for tests using the 'conn' pytest fixture to specify a timeout to connect(), and sets a timeout of 5 minutes for this test. Change-Id: I2727c27ff66140ac4043bcad332cd4e1d72b255f Reviewed-on: http://gerrit.cloudera.org:8080/5040 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-11 20:04:01 +00:00
Jim Apple	c68734ad65	IMPALA-4455: MemPoolTest.TryAllocateAligned failure: sizeof v. alignof This was testing all memory alignments up to and including sizeof(max_align_t), but the standard says nothing about that. It does say things about alignof(max_align_t), including that malloc() returns memory at least that aligned. In both gcc and clang on our currently supported platforms, max_align_t has sizeof == 32 and alignof == 16, so this test expected an alignment that malloc was not guaranteed to provide. Change-Id: Ic2dbabcb9af2874d8ed354738243dfca9c492b08 Reviewed-on: http://gerrit.cloudera.org:8080/5022 Reviewed-by: Henry Robinson <henry@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Jim Apple <jbapple@cloudera.com>	2016-11-11 03:44:07 +00:00
David Knupp	b14f319708	IMPALA-4461: Make sure data gets loaded for wide hbase tables. Ths patch reverts a change that broke the exhaustive suite of Impala tests. The change was introduced here: `ce4c5f6743` The orginal problem was that data load was failing when run against a remote cluster, due to a 4000 byte max for SERDEPROPERTIES.PARAM_VALUE, a limitation that is well described in HIVE-1364. Locally, when we load data, we work around the issue here: https://github.com/apache/incubator-impala/blob/master/bin/create-test-configuration.sh#L99 When testing on CDH remote cluster however, this "fix" never gets applied. (It also assumes the database will always by postgres.) I made this change without realizing its full effect, or appreciating exactly how exhaustive our exhaustive test suite really is. Another solution will need to be found for the case of remote cluster testing, but this should unblock the local build for now. As far as testing, I ran the full suite of tests in query_test/ test_scanners.py, and they all pass after removing these lines. Change-Id: If2148d6546789c6c53c8e045717081b24ce76689 Reviewed-on: http://gerrit.cloudera.org:8080/5033 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-11-11 00:37:59 +00:00
Alex Behm	0aeb68050b	IMPALA-1286: Extract common conjuncts from disjunctions. Adds a new ExprRewriteRule to extract common conjuncts from disjunctions. Examples: (a AND b AND c) OR (b AND d) ==> b AND ((a AND c) OR (d)) (a AND b) OR (a AND b) ==> a AND b (a AND b AND c) OR (c) ==> c Adds a new query option ENABLE_EXPR_REWRITES to enable/disable non-essential expr rewrites in the FE. Note that some rewrites are required, e.g., BetweenToCompoundRule. Disabling the rewrites is useful for testing, in particular, to make sure that the exprs specified in expr-test.cc are executed as written. Testing: Added a new unit test in ExprRewriteRulesTest. Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Reviewed-on: http://gerrit.cloudera.org:8080/4877 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-09 09:44:59 +00:00
Matthew Jacobs	cfac09de10	IMPALA-3710: Kudu DML should ignore conflicts, pt2 Second part of IMPALA-3710, which removed the IGNORE DML option and changed the following errors on Kudu DML operations to be ignored: 1) INSERT where the PK already exists 2) UPDATE/DELETE where the PK doesn't exist This changes other data-related errors to be ignored as well: 3) NULLs in non-nullable columns, i.e. null constraint violoations. 4) Rows with PKs that are in an 'uncovered range'. It became clear that we can't differentiate between (3) and (4) because both return a Kudu 'NotFound' error code. The Impala error codes have been simplified as well: we just report a generic KUDU_NOT_FOUND error in these cases. This also adds some metadata to the thrift report sent to the coordinator from sinks so the total number of rows with errors can be added to the profile. Note that this does not include a breakdown of error counts by type/code because we cannot differentiate between all of these cases yet. An upcoming change will add this new info to the beeswax interface and show it in the shell output (IMPALA-3713). Testing: Updated kudu_crud tests to check the number of rows with errors. Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c Reviewed-on: http://gerrit.cloudera.org:8080/4985 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-11-09 06:43:41 +00:00
Tim Armstrong	d7246d64c7	IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. The tests are run with both codegen enabled and disabled. To avoid flaky tests, we switch the UDF tests to use "unique_database". Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Reviewed-on: http://gerrit.cloudera.org:8080/4655 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-11-09 03:27:12 +00:00
Jim Apple	6775893894	IMPALA-4447: Rein in overly broad sed that dirties the tree This patch fixes a sed expression to make sure it only laters the code it is meant to alter, not the comment describing the code. Tested with tests/run-tests.py query_test/test_udfs.py Change-Id: I51a0498d24b7fccc05b6183123501766cb36f85e Reviewed-on: http://gerrit.cloudera.org:8080/5008 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-11-09 02:44:36 +00:00
Tim Armstrong	27e9f0aea1	IMPALA-4446: expr-test fails under ASAN Various places in the LikePredicate code assumed StringVal is null-terminated. There is no such guarantee. By coincidence string literals were sometimes backed by std::string storage that was null-terminated, so this bug was latent until recently. Testing: Was able to reproduce the failure locally under ASAN, now the test passes. Running the full ASAN tests to verify, but putting this up for review first to unbreak the build sooner. Change-Id: I0ac10d34dd6463ab52e41de1002ef065cfe63a20 Reviewed-on: http://gerrit.cloudera.org:8080/5000 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-11-09 02:03:24 +00:00
aphadke	f3d23be478	IMPALA-4258: Remove duplicated and unused test macros Macros defined in test-macros.h are either duplicated in gtest-util.h or are unused anywhere in the code. This change deletes test-macros.h Change-Id: I08539d7e46b89d7e0a4338510b65f9867814c275 Reviewed-on: http://gerrit.cloudera.org:8080/4917 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-11-09 01:23:41 +00:00
Jim Apple	4af2ea4fe4	IMPALA-4438: Serialize test_failpoints.py to reduce memory pressure On EC2 c3.4xlarge instances, with 8cores and 30GB RAM, this test could trigger the Linux OOM killer by running tests in parallel. This patch switches to serial execution, which makes the test take four minutes, rather than one to two minutes. Change-Id: Iea4a588e1228d38f90387a077cbe530257636b7d Reviewed-on: http://gerrit.cloudera.org:8080/4999 Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Internal Jenkins	2016-11-09 01:06:33 +00:00
Jim Apple	ae24bf2850	Add -build_shared_libs for default build for speed. This is already recommended by the wiki: https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala Change-Id: Ic83db07e59ff339dcce7362bd296ebcfd60b71d6 Reviewed-on: http://gerrit.cloudera.org:8080/4970 Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Internal Jenkins	2016-11-09 00:36:28 +00:00
Matthew Jacobs	08d89a5cc3	IMPALA-3710: Kudu DML should ignore conflicts by default Removes the non-standard IGNORE syntax that was allowed for DML into Kudu tables to indicate that certain errors should be ignored, i.e. not fail the query and continue. However, because there is no way to 'roll back' mutations that occurred before an error occurs, tables are left in an inconsistent state and it's difficult to know what rows were successfully modified vs which rows were not. Instead, this change makes it so that we always 'ignore' these conflicts, i.e. a 'best effort'. In the future, when Kudu will provide the mechanisms Impala needs to provide a notion of isolation levels, then Impala will be able to provide options for more traditional semantics. After this change, the following errors are ignored: * INSERT where the PK already exists * UPDATE/DELETE where the PK doesn't exist Another follow-up patch will change other violations to be handled in this way as well, e.g. nulls inserted in non-nullable cols. Reporting: The number of rows inserted is reported to the coordinator, which makes the aggregate available to the shell and via the profile. TODO: Return rows modified for INSERT via HS2 (IMPALA-1789). TODO: Return rows modified for other CRUD (beeswax+hs2) (IMPALA-3713). TODO: Return error counts for specific warnings (IMPALA-4416). Testing: Updated tests. Ran all functional tests. More tests will be needed when other conflicts are handled in the same way. Change-Id: I83b5beaa982d006da4997a2af061ef7c22cad3f1 Reviewed-on: http://gerrit.cloudera.org:8080/4911 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-08 20:34:00 +00:00
Martin Grund	ce4c5f6743	IMPALA-4365: Enabling end-to-end tests on a remote cluster This patch lays the groundwork for loading data and running end-to-end tests on a remote CDH cluster. The requirements for the cluster to run the tests are: - Managed by Cloudera Manager (CM) - GPL Extras need to be installed - KMS and KeyTrustee installed and available as a service - SERDEPROPERTIES in the Hive DB modified to accept wide tables - Hive warehouse dir points to /test-warehouse The actual data loading is done via a new script, remote_data_load.py, which takes the CM host as an argument. It can be run from a client machine that is not a node of the cluster, but it needs to have the Impala repo checked out and Impala built. This insures that all of the necessary data load scripts are available, as well as setting up the environment properly (client binaries like beeline and the hbase shell are available, python libraries like cm_api are installed, necessary environment variables are defined, etc.) It should be noted that running remote_data_load.py will overwrite any local XML config files with the configurations downloaded from the remote cluster. Usage: remote_data_load.py [options] <cm_host address> Options: -h, --help show this help message and exit --snapshot-file=SNAPSHOT_FILE Path to the test-warehouse archive --cm-user=CM_USER Cloudera Manager admin user --cm-pass=CM_PASS Cloudera Manager admin user password --gateway=GATEWAY Gateway host to upload the data from. If not set, uses the CM host as gateway. --ssh-user=SSH_USER System user on the remote machine with passwordless SSH configured. --no-load Do not try to load the snapshot --exploration-strategy=EXPLORATION_STRATEGY --test Run end-to-end tests against cluster Testing: This patch is being submitted with the understanding that there are still clean up issues that need to be addressed in the remote data load script, for which JIRA's have been filed. However, since many of the existing build scripts also had to be modified, it is more important to make sure that no regressions were inadvertently introduced into the existing data load process. Loading data to a local mini-cluster was checked repeatedly while this patch was being developed, as well as running it against the Jenkins job that provides the test-warehouse snapshot used by the many other Impala CI builds that run daily. Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 Reviewed-on: http://gerrit.cloudera.org:8080/4769 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2016-11-08 10:16:55 +00:00
Tim Armstrong	ef689edf36	IMPALA-4437: fix crash in disk-io-mgr This fixes another issue where the 'buffer_' field was not set to NULL on an error, triggering a DCHECK. Testing: Added a unit test that triggers the bug on the two different codepaths that I fixed. Change-Id: Ib76cf5ba8d368b2b37bdc1d2133b8ddcb39f9e00 Reviewed-on: http://gerrit.cloudera.org:8080/4979 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-11-08 08:29:58 +00:00
Tim Armstrong	51b1310681	IMPALA-3872: allow providing PyPi mirror for python packages We still rely on the python.org json API, which doesn't seem to be mirrored (instead there's a html-based index format implemented by the mirrors). The mirror can be provided by setting the PYPI_MIRROR environment variable. The default is "https://pypi.python.org". Change-Id: Ibc11f010332c0225121c86c9930e35c7ac01409c Reviewed-on: http://gerrit.cloudera.org:8080/4770 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-11-08 05:34:50 +00:00
Tim Armstrong	381e719065	IMPALA-4266: Java udf returning string can give incorrect results The memory management of string results was wrong: strings returned from Exprs must live until the next time FreeLocalAllocations() is called. Otherwise the buffer holding the string is freed or reused by the next UDF call. The fix is to copy string values into a buffer with the right lifetime. Testing: Added a regression test based on Bharath's example that reproduced the bug reliably. Change-Id: I705d271814cb1143f67d8a12f4fd87bab7a8e161 Reviewed-on: http://gerrit.cloudera.org:8080/4941 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-11-08 02:47:11 +00:00
Tim Armstrong	10fa472fa6	IMPALA-4302,IMPALA-2379: constant expr arg fixes This patch fixes two issues around handling of constant expr args. The patches are combined because they touch some of the same code and depend on some of the same memory management cleanup. First, it fixes IMPALA-2379, where constant expr args were not visible to UDAFs. The issue is that the input exprs need to be opened before calling the UDAF Init() function. Second, it avoids overhead from repeated evaluation of constant arguments for ScalarFnCall expressions on both the codegen'd and interpreted paths. A common example is an IN predicate with a long list of constant values. The interpreted path was inefficient because it always evaluated all children expressions. Instead in this patch constant args are evaluated once and cached. The memory management of the AnyVal* objects was somewhat nebulous - adjusted it so that they're allocated from ExprContext::mem_pool_, which has the correct lifetime. The codegen'd path was inefficient only with varargs - with fixed arguments the LLVM optimiser is able to infer after inlining that the expressions are constant and remove all evaluation. However, for varargs it stores the vararg values into a heap-allocated buffer. The LLVM optimiser is unable to remove these stores because they have a side-effect that is visible to code outside the function. The codegen'd path is improved by evaluating varargs into an automatic buffer that can be optimised out. We also make a small related change to bake the string constants into the codegen'd code. Testing: Ran exhaustive build. Added regression test for IMPALA-2379 and MemPool test for aligned allocation. Added a test for in predicates with constant strings. Perf: Added a targeted query that demonstrates the improvement. Also manually validated the non-codegend perf. Also ran TPC-H and targeted perf queries locally - didn't see any significant changes. +--------------------+-------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ \| Workload \| Query \| File Format \| Avg(s) \| Base Avg(s) \| Delta(Avg) \| StdDev(%) \| Base StdDev(%) \| Num Clients \| Iters \| +--------------------+-------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ \| TARGETED-PERF(_20) \| primitive_filter_in_predicate \| parquet / none / none \| 1.19 \| 9.82 \| I -87.85% \| 3.82% \| 0.71% \| 1 \| 10 \| +--------------------+-------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ (I) Improvement: TARGETED-PERF(_20) primitive_filter_in_predicate [parquet / none / none] (9.82s -> 1.19s [-87.85%]) +--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+ \| Operator \| % of Query \| Avg \| Base Avg \| Delta(Avg) \| StdDev(%) \| Max \| Base Max \| Delta(Max) \| #Hosts \| #Rows \| Est #Rows \| +--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+ \| 01:AGGREGATE \| 14.39% \| 155.88ms \| 214.61ms \| -27.37% \| 2.68% \| 163.38ms \| 227.53ms \| -28.19% \| 1 \| 1 \| 1 \| \| 00:SCAN HDFS \| 85.60% \| 927.46ms \| 9.43s \| -90.16% \| 4.49% \| 1.01s \| 9.50s \| -89.42% \| 1 \| 13.77K \| 14.05K \| +--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+ Change-Id: I45c3ed8c9d7a61e94a9b9d6c316e8a53d9ff6c24 Reviewed-on: http://gerrit.cloudera.org:8080/4838 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-11-08 02:44:51 +00:00
Sailesh	e3483c44a3	IMPALA-4441: Divide-by-zero in RuntimeProfile::SummaryStatsCounter::SetStats This patch anticipates the case where total_num_values_ can be 0 and makes sure a divide-by-zero is not possible. Change-Id: I33f1e6fb45505dce7d79497d1632c5f63a409151 Reviewed-on: http://gerrit.cloudera.org:8080/4975 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins	2016-11-08 00:55:47 +00:00
Matthew Jacobs	1a99b78227	IMPALA-4442: Fix FE ParserTests UnsatisfiedLinkError In some development environments, the ParserTests may always fail with an java.lang.UnsatisfiedLinkError: org.apache.impala.service.FeSupport.NativeGetStartupOptions()[B at o.a.i.service.FeSupport.NativeGetStartupOptions(Native Method) at o.a.i.service.FeSupport.GetStartupOptions(FeSupport.java:268) at o.a.i.common.RuntimeEnv.<init>(RuntimeEnv.java:47) at o.a.i.common.RuntimeEnv.<clinit>(RuntimeEnv.java:34) at o.a.i.testutil.TestUtils.assumeKuduIsSupported(TestUtils.java:288) at o.a.i.analysis.ParserTest.TestKuduUpdate(ParserTest.java:1697) I believe the issue is related to some static loading of classes and/or libraries in Java because changing the ParserTest to initialize the Frontend makes the error go away. I haven't been able to pin-point the exact issue with loading, but it makes sense that the ParserTest should initialize the Frontend static state if it will be called by libfesupport later since it seems to be an issue affecting some environments and not others, i.e. subject to environmental factors. This fixes the issue by changing ParserTest to extend FrontendTestBase which initializes the Frontend class statically. Change-Id: I1828504f79c51679f9ca07176bffbe248d450e87 Reviewed-on: http://gerrit.cloudera.org:8080/4976 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-08 00:31:35 +00:00
Thomas Tauber-Marshall	3be4b3efd0	IMPALA-1169: Admission control info on the queries debug webpage This patch adds a new event, 'Queued', to the query event log to indicate when a query is queued by the admission controller. This means that queries on the '/queries' page that are currently queued will display this as their 'Last Event', making it possible to see which queries are current queued. It also adds a column to show the resource pool associated with the queries, and it updates the wording of the first event that gets marked for each query from 'Start execution' to 'Query submitted', since this is before planning and admission control and therefore execution hasn't actually startd yet. Change-Id: I504e3c829a14318721e3a42de6281bcc578f7283 Reviewed-on: http://gerrit.cloudera.org:8080/4756 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-11-07 23:26:02 +00:00

1 2 3 4 5 ...

5231 Commits