impala

mirror of https://github.com/apache/impala.git synced 2025-12-21 10:58:31 -05:00

Author	SHA1	Message	Date
Matthew Jacobs	99ed6dc67a	IMPALA-4134,IMPALA-3704: Kudu INSERT improvements 1.) IMPALA-4134: Use Kudu AUTO FLUSH Improves performance of writes to Kudu up to 4.2x in bulk data loading tests (load 200 million rows from lineitem). 2.) IMPALA-3704: Improve errors on PK conflicts The Kudu client reports an error for every PK conflict, and all errors were being returned in the error status. As a result, inserts/updates/deletes could return errors with thousands errors reported. This changes the error handling to log all reported errors as warnings and return only the first error in the query error status. 3.) Improve the DataSink reporting of the insert stats. The per-partition stats returned by the data sink weren't useful for Kudu sinks. Firstly, the number of appended rows was not being displayed in the profile. Secondly, the 'stats' field isn't populated for Kudu tables and thus was confusing in the profile, so it is no longer printed if it is not set in the thrift struct. Testing: Ran local tests, including new tests to verify the query profile insert stats. Manual cluster testing was conducted of the AUTO FLUSH functionality, and that testing informed the default mutation buffer value of 100MB which was found to provide good results. Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 Reviewed-on: http://gerrit.cloudera.org:8080/4728 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-10-25 02:06:10 +00:00
Lars Volker	2fa1633e40	IMPALA-4329: Prevent crash in scheduler when no backends are registered The scheduler crashed with a segmentation fault when there were no backends registered: After not being able to find a local backend (none are configured at all) in ComputeScanRangeAssignment(), the previous code would eventually try to return the top of assignment_ctx.assignment_heap in SelectRemoteBackendHost(), but that heap would be empty. Subsequently, when using the IP address of that heap node, a segmentation fault would occur. This change adds a check and aborts scheduling with an error. It also contains a test. Change-Id: I6d93158f34841ea66dc3682290266262c87ea7ff Reviewed-on: http://gerrit.cloudera.org:8080/4776 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-10-21 03:16:30 +00:00
Alex Behm	2a04b0e21a	IMPALA-3943: Address post-merge comments. Adds code comments and issues a warning for Parquet files with num_rows=0 but at least one non-empty row group. Change-Id: I72ccf00191afddb8583ac961f1eaf11e5eb28791 Reviewed-on: http://gerrit.cloudera.org:8080/4696 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-10-14 05:41:22 +00:00
Thomas Tauber-Marshall	b2c2fe7813	IMPALA-3786: Replace "cloudera" with "apache" (part 2) As part of the ASF transition, we need to replace references to Cloudera in Impala with references to Apache. This primarily means changing Java package names from com.cloudera.impala.* to org.apache.impala.* A prior patch renamed all the files as necessary, and this patch performs the actual code changes. Most of the changes in this patch were generated with some commands of the form: find . \| grep "\.java\\|\.py\\|\.h\\|\.cc" \| \ xargs sed -i s/'com$.$cloudera$\.$impala/org\1apache\2impala/g along with some manual fixes. After this patch, the remaining references to Cloudera in the repo mostly fall into the categories: - External components that have cloudera in their own package names, eg. com.cloudera.kudu/llama - URLs, eg. https://repository.cloudera.com/ Change-Id: I0d35fa6602a7fc0c212b2ef5e2b3322b77dde7e2 Reviewed-on: http://gerrit.cloudera.org:8080/3937 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Internal Jenkins	2016-09-29 21:14:13 +00:00
Tim Armstrong	241c7e0197	IMPALA-3201: in-memory buffer pool implementation This patch implements basic in-memory buffer management, with reservations managed by ReservationTrackers. Locks are fine-grained so that the buffer pool can scale to many concurrent queries. Includes basic tests for buffer pool setup, allocation and reservations. Change-Id: I4bda61c31cc02d26bc83c3d458c835b0984b86a0 Reviewed-on: http://gerrit.cloudera.org:8080/4070 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-09-28 23:38:20 +00:00
Bikramjeet Vig	9313dcdb83	IMPALA-3671: Add query option to limit scratch space usage Currently we can only disable spilling via a startup option which means we need to restart the cluster for this. This patch adds a new query option 'SCRATCH_LIMIT' that limits the amount of scratch directory space that can be used. This would be useful to prevent runaway queries or to prevent queries from spilling when that is not desired. This also adds a 'ScratchSpace' counter to the runtime profile of the BlockMgr that keeps track of the scratch space allocated. Valid values for the SCRATCH_LIMIT query option are: - unspecified or a limit of -1 means no limit - a limit of 0 (zero) means spilling is disabled - an int (= number of bytes) - a float followed by "M" (MB) or "G" (GB) Testing: A new test file "test_scratch_limit.py" was added for testing functionality. Change-Id: Ibf8842626ded1345b632a0ccdb9a580e6a0ad470 Reviewed-on: http://gerrit.cloudera.org:8080/4497 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-09-24 02:48:46 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Juan Yu	2ab130aa0a	IMPALA-3575: Add retry to backend connection request and rpc timeout This patch adds a configurable timeout for all backend client RPC to avoid query hang issue. Prior to this change, Impala doesn't set socket send/recv timeout for backend client. RPC will wait forever for data. In extreme cases of bad network or destination host has kernel panic, sender will not get response and RPC will hang. Query hang is hard to detect. If hang happens at ExecRemoteFragment() or CancelPlanFragments(), query cannot be canelled unless you restart coordinator. Added send/recv timeout to all RPCs to avoid query hang. For catalog client, keep default timeout to 0 (no timeout) because ExecDdl() could take very long time if table has many partitons, mainly waiting for HMS API call. Added a wrapper RetryRpcRecv() to wait for receiver response for longer time. This is needed by certain RPCs. For example, TransmitData() by DataStreamSender, receiver could hold response to add back pressure. If an RPC fails, the connection is left in an unrecoverable state. we don't put the underlying connection back to cache but close it. This is to make sure broken connection won't cause more RPC failure. Added retry for CancelPlanFragment RPC. This reduces the chance that cancel request gets lost due to unstable network, but this can cause cancellation takes longer time. and make test_lifecycle.py more flaky. The metric num-fragments-in-flight might not be 0 yet due to previous tests. Modified the test to check the metric delta instead of comparing to 0 to reduce flakyness. However, this might not capture some failures. Besides the new EE test, I used the following iptables rule to inject network failure to verify RPCs never hang. 1. Block network traffic on a port completely iptables -A INPUT -p tcp -m tcp --dport 22002 -j DROP 2. Randomly drop 5% of TCP packets to slowdown network iptables -A INPUT -p tcp -m tcp --dport 22000 -m statistic --mode random --probability 0.05 -j DROP Change-Id: Id6723cfe58df6217f4a9cdd12facd320cbc24964 Reviewed-on: http://gerrit.cloudera.org:8080/3343 Reviewed-by: Juan Yu <jyu@cloudera.com> Tested-by: Internal Jenkins	2016-07-18 13:29:24 -07:00
Michael Ho	ed5ec6772f	IMPALA-1619: Support 64-bit allocations. This change extends MemPool, FreePool and StringBuffer to support 64-bit allocations, fixes a bug in decompressor and extends various places in the code to support 64-bit allocation sizes. With this change, the text scanner can now decompress compressed files larger than 1GB. Note that the UDF interfaces FunctionContext::Allocate() and FunctionContext::Reallocate() still use 32-bit for the input argument to avoid breaking compatibility. In addition, the byte size of a tuple is still assumed to be within 32-bit. If it needs to be upgraded to 64-bit, it will be done in a separate change. A new test has been added to test the decompression of a 2GB snappy block compressed text file. Change-Id: Ic1af1564953ac02aca2728646973199381c86e5f Reviewed-on: http://gerrit.cloudera.org:8080/3575 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2016-07-08 15:42:09 -07:00
Michael Ho	a07fc367ee	Revert "IMPALA-1619: Support 64-bit allocations." This reverts commit 1ffb2bd5a2a2faaa759ebdbaf49bf00aa8f86b5e. Unbreak the packaging builds for now. Change-Id: Id079acb83d35b51ba4dfe1c8042e1c5ec891d807 Reviewed-on: http://gerrit.cloudera.org:8080/3543 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Michael Ho <kwho@cloudera.com>	2016-07-05 13:37:26 -07:00
Michael Ho	5f3dfdf6c7	IMPALA-1619: Support 64-bit allocations. This change extends MemPool, FreePool and StringBuffer to support 64-bit allocations, fixes a bug in decompressor and extends various places in the code to support 64-bit allocation sizes. With this change, the text scanner can now decompress compressed files larger than 1GB. Note that the UDF interfaces FunctionContext::Allocate() and FunctionContext::Reallocate() still use 32-bit for the input argument to avoid breaking compatibility. In addition, the byte size of a tuple is still assumed to be within 32-bit. If it needs to be upgraded to 64-bit, it will be done in a separate change. Change-Id: I7ed28083d809a86d801a9c063a0aa32c50d32b20 Reviewed-on: http://gerrit.cloudera.org:8080/2781 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-07-05 13:37:25 -07:00
Tim Armstrong	547be27e77	IMPALA-3745: parquet invalid data handling Added checks/error handling: * Negative string lengths while decoding dictionary or data page. * Buffer overruns while decoding dictionary or data page. * Some metadata FILECHECKs were converted to statuses. Testing: Unit tests for: * decoding of strings with negative lengths * truncation of all parquet types * dictionary creation correctly handling error returns from Decode(). End-to-end tests for handling of negative string lengths in dictionary- and plain-encoded data in corrupt files, and for handling of buffer overruns for string data. The corrupted parquet files were generated by hacking Impala's parquet writer to write invalid lengths, and by hacking it to write plain-encoded data instead of dictionary-encoded data by default. Performance: set num_nodes=1; set num_scanner_threads=1; select * from biglineitem where l_orderkey = -1; I inspected MaterializeTupleTime. Before the average was 8.24s and after was 8.36s (a 1.4% slowdown, within the standard deviation of 1.8%). Change-Id: Id565a2ccb7b82f9f92cc3b07f05642a3a835bece Reviewed-on: http://gerrit.cloudera.org:8080/3387 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-06-15 21:33:39 -07:00
Tim Armstrong	2dad444c8c	IMPALA-3732: handle string length overflow in avro files Avro string lengths are encoded as 64-bit integers. Impala can only handle up to 32-bit integers, so we need to be careful about handling out-of-range integers. Negative integers were already handled by a previous patch, but if a positive 64-bit integer is truncated to a 32-bit integer, the result can be a negative length. This patch fixes CHAR/VARCHAR behaviour, where we can just truncate the string, and STRING, where we can't truncate the string, so must return an error. Testing: Added unit tests for STRING, CHAR, and VARCHAR that exercise the string overflow handling. Change-Id: If6541e7c68255bf599b26386a55057c93e62af51 Reviewed-on: http://gerrit.cloudera.org:8080/3383 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-06-15 02:33:17 -07:00
Henry Robinson	3dff390e48	IMPALA-3682: Don't retry unrecoverable socket creation errors If a thrift client can't create a socket, all subsequent calls to Open() should fail fast since socket creation errors are treated as unrecoverable. Testing: manual testing with a bad SSL configuration. Impalad startup fails fast, rather than retrying 10 times as previously. Change-Id: I394be287143eefc79cf22865898b71ca24c41328 Reviewed-on: http://gerrit.cloudera.org:8080/3317 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Henry Robinson <henry@cloudera.com>	2016-06-14 10:32:46 -07:00
Skye Wanderman-Milne	01287a3ba9	IMPALA-3441, IMPALA-3659: check for malformed Avro data This patch adds error checking to the Avro scanner (both the codegen'd and interepted paths), including out-of-bounds checks and data validity checks. I ran a local benchmark using the following queries: set num_scanner_threads=1; select count(i) from default.avro_bigints_big; # file contains only longs select max(l_orderkey) from biglineitem_avro; # file has tpch.lineitem schema Both benchmark queries see negligable or no performance impact. This patch adds a new Avro scanner unit test and an end-to-end test that queries several corrupted files, as well as updates the zig-zag varlen int unit test. Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132 Reviewed-on: http://gerrit.cloudera.org:8080/3072 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-06-13 18:32:32 -07:00
Michael Ho	0243a21da8	IMPALA-3242: Remove most usages of RuntimeState::SetMemLimitExceeded() There are multiple places in the code which call RuntimeState::SetMemLimitExceeded(). Most of them are unnecessary as the error status constructed will eventually be propagated up the tree of exec nodes. There is no obvious reason to treat query memory limit exceeded differently. In some cases such as scan-node, calling SetMemLimitExceeded() is actually confusing as all scanner threads may pick up error status when any thread exceeds query memory limit, causing a lot of noise in the log. This change replaces most calls to RuntimeState::SetMemLimitExceeded() with MemTracker::MemLimitExceeded(). The remaining places are: the old hash table code, the UDF framework and QueryMaintenance() which checks for memory limit periodically. The query maintenance case will be removed eventually once IMPALA-2399 is fixed. Change-Id: Ic0ca128c768d1e73713866e8c513a1b75e6b4b59 Reviewed-on: http://gerrit.cloudera.org:8080/3140 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2016-05-23 08:40:19 -07:00
Casey Ching	39a28185e8	Re-enable Kudu in build using client stubs when needed The stubs in Impala broke during the merge commit. This commit removes the stubs in hopes of improving robustness of the build. The original problem (Kudu clients are only available for some OSs) is now addressed by moving the stubbing into a dummy Kudu client. The dummy client only allows linking to succeed, if any client method is called, Impala will crash. Before calling any such method, Kudu availability must be checked. Change-Id: I4bf1c964faf21722137adc4f7ba7f78654f0f712 Reviewed-on: http://gerrit.cloudera.org:8080/2585 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2016-03-29 23:57:54 +00:00
David Alves	7381304a23	Merge branch 'feature/kudu' into cdh5-trunk This is the final merge commit that merges the 'feature/kudu' branch into cdh5-trunk. Change-Id: Ib3dfb4fc7a69c5cb1c5789422ee52fa192ed677a	2016-03-13 19:28:43 -07:00
Michael Ho	9ed3b685a1	IMPALA-2399: Check for mem limit in allocations in parquet scanner and decompressor This change replaces all calls to MemPool::Allocate() with MemPool::TryAllocate() in the parquet scanner and the decompressor. Also streamline CheckQueryState() to avoid unnecessary spinlock acquisition for the common case when there is no error. Also removes some dead code in the text converter. MemPool::Allocate() is also updated to return a valid pointer instead of NULL when the allocation size is zero. NULL is only returned during allocation failure. This change also updates CollectionValueBuilder::GetFreeMemory() to return Status in case it exceeds memory limit. As part of the change, the max allocation limit (2 GB) is also removed from it as 64-bit allocations are supported in MemPool with this change. Change-Id: Ic70400407b7662999332448f4d1bce2cc344ca89 Reviewed-on: http://gerrit.cloudera.org:8080/2203 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-03-12 02:05:45 +00:00
casey	804cfbdd64	Get and use Kudu from the toolchain by default This is for review purposes only. This patch will be merged with David's big merge patch. Changes: 1) Make Kudu compilation dependent on the OS since not all OSs support Kudu. 2) Only run Kudu related tests when Kudu is supported (see #1). 3) Look for Kudu locally, but in a different location. To use a local build of Kudu, set KUDU_BUILD_DIR to the path Kudu was built in and set KUDU_CLIENT_DIR to the path KUDU was installed in. Example: git clone https://github.com/cloudera/kudu.git ...build 3rd party etc... mkdir -p $KUDU_BUILD_DIR cd $KUDU_BUILD_DIR cmake <path to Kudu source dir> make DESTDIR=$KUDU_CLIENT_DIR make install 4) Look for Kudu in the toolchain if not using a local Kudu build. 5) Add Kudu service startup scripts. The Kudu in the toolchain is actually a parcel that has been renamed (the contents were not modified in any way), that mean the Kudu service binaries are there. Those binaries are now used to run the Kudu service. Change-Id: I3db88cbd27f2ea2394f011bc8d1face37411ed58	2016-03-11 11:38:05 -08:00
David Alves	82222abaf5	Merge branch 'feature/kudu' into cdh5-trunk This merges the 'feature/kudu' branch with cdh5-trunk as of commit: 055500cc753f87f6d1c70627321fcc825044e183 This patch is not a pure merge patch in the sense that goes beyond conflict resolution to also address reviews to the 'feature/kudu' branch as a whole. The review items and their resolution can be inspected at: http://gerrit.cloudera.org:8080/#/c/1403/ Change-Id: I6dd4270cd17a4f5c02811c343726db3504275a92	2016-03-11 11:37:58 -08:00
Henry Robinson	9a2ac6cd7f	IMPALA-2987: Distinguish between already-closed and never-seen data stream receivers This patch adds an output parameter 'already_unregistered' to FindRecvrOrWait() to signal to the caller in which of two cases it may have returned NULL. If 'already_unregistered' is true, the receiver has already been setup and closed (possibly by cancellation, possibly by the fragment deliberately closing its inputs in the case of a limit). This is not an error - cancellation will be signalled to the sender from the coordinator, and deliberate closure means the coordinator will tear down the query shortly. If 'already_unregistered' is set to false by FindRecvrOrWait(), the DataStreamMgr has never seen the intended receiver. This means the sender has waited for a full timeout period without the upstream receiver being established; this signals a likely query setup problem (as long as datastream_sender_timeout_ms is set sufficiently large) and so we return an error. We need to tweak the two timeout parameters here: * datastream_sender_timeout_ms needs to be large enough to avoid false negatives for problems during query setup (otherwise queries will unexpectedly cancel that would otherwise have succeeded, if slowly). * STREAM_EXPIRATION_TIME_MS needs to be set high enough that a query will not continue executing for longer than STREAM_EXPIRATION_TIME_MS after it closes its input (otherwise the sender will get already_unregistered=false, and cancel). This case will only trigger when a sender tries to call TransmitData() after the receiver has been closed for STREAM_EXPIRATION_TIME_MS; this should not happen in non-error cases as receivers are not closed before consuming their entire input. In this patch the former has been set to 2 minutes, and the latter to 5 minutes. Change-Id: Ib1734992c7199b9dd4b03afca5372022051b6fbd Reviewed-on: http://gerrit.cloudera.org:8080/2305 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Henry Robinson <henry@cloudera.com>	2016-03-01 22:15:40 -08:00
Juan Yu	c9b33ddf63	IMPALA-1886/IMPALA-2154: Add support for multi-stream bz2/gzip compressed files. Fix a bug in which Impala only reads the first stream of a multi-stream bz2/gzip file. Changes the bz2 decoder to read the file in a streaming fashion rather than reading the entire file into memory before it can be decompressed. Change-Id: Icbe617d03a69953f0bf3aa0f7c30d34bc612f9f8 (cherry picked from commit b6d0b4e059329633dc50f1f73ebe35b7ac317a8e) Reviewed-on: http://gerrit.cloudera.org:8080/2219 Reviewed-by: Juan Yu <jyu@cloudera.com> Tested-by: Internal Jenkins	2016-02-28 21:31:37 -08:00
Michael Ho	a57be09fc7	IMPALA-2816: Include null indicators when checking if a row fits in a block This change includes the potential space needed for null indicators when determining whether a row can fit in a new I/O write block or small buffers in buffered tuple stream. For rows with size close to the I/O block size, there may not be enough space to hold the entire row in a block after reserving the header space for null indicators. This change also updates the buffered tuple stream BE test to test for this corner case. Change-Id: I256974281d555f9a015c17ea23a1b4d5e9055c97 Reviewed-on: http://gerrit.cloudera.org:8080/1973 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2016-02-09 08:42:58 +00:00
Tim Armstrong	fbcc2433ea	Enforce enum values in generate_error_codes.py Fixes a bug in generate_error_codes where the enum value specified in generate_error_codes.py was not actually used. Change-Id: If7e3269d12a839106c595d44da09c573a8a177f2 Reviewed-on: http://gerrit.cloudera.org:8080/1894 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-02-05 22:20:46 +00:00
Sailesh Mukil	f6309c4bd1	Fix error code number gap in thrift/generate_error_codes.py One of the error codes were removed leaveing a gap in the error code numbers. This change just closes the gap. Change-Id: I2e424e55439459d4c7a84dd393f55d72400dabf0 Reviewed-on: http://gerrit.cloudera.org:8080/1891 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2016-02-05 11:11:34 +00:00
Sailesh Mukil	6982a49b58	IMPALA-2598: Re-enable SSL and Kerberos on server-server This patch removes the workaround that disallows SSL and Kerberos being enabled at the same time. It was previously disallowed because SSL and Kerberos wouldn't work together for server-server (or daemon-daemon) communication by causing a hang. The issue has been addressed in the following patches: http://gerrit.cloudera.org:8080/#/c/1594/ http://gerrit.cloudera.org:8080/#/c/1599/ This patch should be merged only after the above 2 are merged. Change-Id: I63d492d1733204edd1249aff2cb3b168ec82ea92 Reviewed-on: http://gerrit.cloudera.org:8080/1772 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2016-01-20 04:10:02 +00:00
Michael Ho	e01ab4f1b2	IMPALA-2620: FunctionContext::Allocate() and friends should check for memory limits. FunctionContext::Allocate(), FunctionContextImpl::AllocateLocal() and FunctionContext::Reallocate() allocate memory without taking memory limits into account. The problem is that these functions invoke FreePool::Allocate() which may call MemPool::Allocate() that doesn't check against the memory limits. This patch fixes the problem by making these FunctionContext functions check for memory limits and set an error in the FunctionContext object if memory limits are exceeded. An alternative would be for these functions to call MemPool::TryAllocate() instead and return NULL if memory limits are exceeded. However, this may break some existing external UDAs which don't check for allocation failures, leading to unexpected crashes of Impala. Therefore, we stick with this ad hoc approach until the UDF/UDA interfaces are updated in the future releases. Callers of these FunctionContext functions are also updated to handle potential failed allocations instead of operating on NULL pointers. The query status will be polled at various locations and terminate the query. This patch also fixes MemPool to handle the case in which malloc may return NULL. It propagates the failure to the callers instead of continuing to run with NULL pointers. In addition, errors during aggregate functions' initialization are now properly propagated. Change-Id: Icefda795cd685e5d0d8a518cbadd37f02ea5e733 Reviewed-on: http://gerrit.cloudera.org:8080/1445 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2015-12-19 04:45:55 +00:00
Henry Robinson	2d39f1cc59	IMPALA-2598: (Workaround) Disallow Kerberos and SSL on server<->server connections This patch does the following: * Prevents Impala from starting if 'internal' Kerberos and SSL are enabled at the same time. * Changes the required configuration to enable 'internal' SSL to include --ssl_client_ca_certificate. This allows 'external' SSL to be configured without enabling 'internal' SSL. Test are included for the first item. For the second, the appropriate test is to try to connect with an internal SSL client to a non-SSL server. However, this causes the connection to hang, which is not an easy condition to detect in a test case. Change-Id: I7fa545045fed57e161fb37898d5782937c710a0c Reviewed-on: http://gerrit.cloudera.org:8080/1318 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins Reviewed-on: http://gerrit.cloudera.org:8080/1323	2015-10-25 04:26:27 +00:00
Skye Wanderman-Milne	e4d12931dd	Update the Avro scanner bad version header error message to include the file. This makes cases like IMPALA-2402 much easier to diagnose, since this message will mostly occur if Impala tries to read a file that's not an Avro data file. Change-Id: I6504e668905ecc6964b77a6fe0cfc9c7511fd5c0 Reviewed-on: http://gerrit.cloudera.org:8080/1202 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-10-12 14:41:03 -07:00
Juan Yu	41509ce3c1	IMPALA-2477: Parquet metadata randomly 'appears stale' Stream::ReadBytes() could fail by other reasons than 'stale metadata'. Adding Errorcode Check to make sure Impala return proper error message. It also fixes IMPALA-2488 metadata.test_stale_metadata fails on non-hdfs filesystem. Change-Id: I9a25df3fb49f721bf68d1b07f42a96ce170abbaa Reviewed-on: http://gerrit.cloudera.org:8080/1166 Reviewed-by: Juan Yu <jyu@cloudera.com> Tested-by: Internal Jenkins	2015-10-07 14:47:41 -07:00
Skye Wanderman-Milne	68fef6a5bf	IMPALA-2213: make Parquet scanner fail query if the file size metadata is stale This patch changes the Parquet scanner to check if it can't read the full footer scan range, indicating that file has been overwritten by a shorter file without refreshing the table metadata. Before it would DCHECK. This patch adds a test for this case, as well as the case where the new file is longer than the metadata states (which fails with an existing error). Change-Id: Ie2031ac2dc90e4f2573bd3ca8a3709db60424f07 Reviewed-on: http://gerrit.cloudera.org:8080/1084 Tested-by: Internal Jenkins Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2015-10-01 13:58:39 -07:00
Skye Wanderman-Milne	0c5e6a804f	IMPALA-2443: add support for more Parquet array encodings This patch adds full support for the various Parquet array encodings, as well as tests that use files from https://github.com/apache/hive/tree/master/data/files. This should allow us to read any existing array data. Change-Id: I3d22ae237b1dc82ee75a83c1d4890d76316fadee Reviewed-on: http://gerrit.cloudera.org:8080/826 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-10-01 13:58:37 -07:00
Matthew Jacobs	70b9954593	IMPALA-2189: [RM] Retry logic for Llama RPC may throw exception The code in resource-broker.cc that makes RPCs to Llama will attempt to retry the RPC some number of times (which is configurable) if the RPC returns a failure. If the RPC throws (which thrift may do), we try to reset the connection and then make the RPC again, but this time not guarded by a try/catch block. If this RPC throws, the process will crash. This fixes the issue by removing the try/catch and instead using the ClientCache DoRpc function which handles this already. Some additional Llama RPC calling wrappers were removed as well. Change-Id: Iba5add47a77fe9257e73eea5711ef4b948abe76a Reviewed-on: http://gerrit.cloudera.org:8080/881 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-09-30 17:17:52 -07:00
Tim Armstrong	5b1157b44b	IMPALA-2079: Part 2: handling tmp device errors Tmp devices are blacklisted when a write error is encountered for that device. No more scratch space will be allocated on the blacklisted device, based on the assumption that the device is likely to be misconfigured or failing. This patch does not attempt to recover the query that experienced the write error. It also does not attempt to remap any existing blocks away from the temporary device. This behaviour is unit tested for several failure scenarios. This patch adds additional test infrastructure required for testing BufferedBlockMgr behavior in the presence of faults and in configurations with multiple tmp directories. Adds metrics tmp-file-mgr.active-scratch-dirs and tmp-file-mgr.active-scratch-dirs.list that track the number and set of active scratch dirs and expose it in the Impala web UI. Change-Id: I9d80ed3a7afad6ff8e5d739b6ea2bc0949f16746 Reviewed-on: http://gerrit.cloudera.org:8080/579 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-09-03 01:54:32 +00:00
Skye Wanderman-Milne	bcc73a36da	Nested types: read and materialize nested types in Parquet scanner This patch modifies the Parquet scanner to resolve nested schemas, and read and materialize collection types. The high-level modification is to create a CollectionColumnReader that recursively materializes map- and array-type slots. This patch also adds many tests, most of which query a new table called complextypestbl. This table contains hand-generated data that is meant to expose edge cases in the scanner. The tests mostly test the scanner, with a few tests of other functionality (e.g. array serialization). I ran a local benchmark comparing this scanner code to the original scanner code on an expanded version of tpch_parquet.lineitem with 48009720 rows. My benchmark involved selecting different numbers of columns with a single scanner thread, and I looked at the HDFS scan node time in the query profiles. This code introduces a 10%-20% regression in single-threaded scan time. Change-Id: Id27fb728934e8346444f61752c9278d8010e5f3a Reviewed-on: http://gerrit.cloudera.org:8080/576 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-09-02 19:23:54 +00:00
Tim Armstrong	b402003532	Nested Types: Dedup tuples when de/serializing This patch extends the deduplication of tuples in row batches to work on non-adjacent tuples. This deduplication requires an additional data structure (a hash table) and adds additional performance overhead (up to 3x serialization time), so it is only enabled for row batches with compositions that are likely to blow up due to non-adjacent duplication of large tuples. This avoids performance regression in typical cases, while preventing size blow-ups in problematic cases, such as joining three streams of tuples some of which contain may contain large collections. A test is included that ensures that adjacent deduplication is enabled. The row batch serialize benchmark shows that deduplication does not regress performance of serialization or deserialization. Change-Id: I3c71ad567d1c972a0f417d19919c2b28891fb407 Reviewed-on: http://gerrit.cloudera.org:8080/573 Tested-by: Internal Jenkins Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2015-08-31 18:45:57 +00:00
Henry Robinson	b8cd78823a	IMPALA-1795: Add support for passwords for SSL private key files This patch allows administrators to configure all Impala daemons with a password for the private key file used to negotiate connections with clients which present the corresponding public key. This private key is obtained by running a user-supplied shell command and using the result. The command is supplied by setting --ssl_private_key_password_cmd. The output of the command is truncated to a maximum of 1024 bytes (this is a limitation of RunShellProcess(), but should not be significant for this use case), and then all trailing whitespace is trimmed (this is to avoid unexpected trailing newlines etc. from shell output). If the password is incorrect clients will be unable to connect to the server, whether or not they have the correct public key. If the command exits with an error, the server will not start. Change-Id: Icc13933fdf50a6170c859989626da5772fe5040d Reviewed-on: http://gerrit.cloudera.org:8080/623 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-08-26 03:03:32 +00:00
Skye Wanderman-Milne	ec671a406c	Simplify HdfsParquetScanner::AssembleRows() logic. This patch updates AssembleRows() to have fewer exit and error paths, as well as to explicitly distinguish between the row group being finished and an error occurring. It functionally changes the behavior in only two minor ways: - The entire row group will be read regardless of how many values the file metadata says there are. Previously it would only read up to the number stated in the metadata, and then had extra logic for checking if there were any values remaining. - If abort_on_error is false and there is an error reading a row group, subsequent row groups will still be read (except if OOM). Before this would sometimes happen and sometimes not. Change-Id: Id1836cfe2a507e46cb030be32b4c1553f478f639 Reviewed-on: http://gerrit.cloudera.org:8080/624 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-08-22 01:34:32 +00:00
Skye Wanderman-Milne	8b66b11bb8	Nested types: BE changes for Avro struct support Most of this patch is rewriting the schema resolution logic to handle recursive schemas. The other changes are for reading and codegening recursive schemas. Change-Id: I257db05e02ed99c62c8dcfd0136b9e8f392d5933 Reviewed-on: http://gerrit.cloudera.org:8080/86 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-07-25 04:08:04 +00:00
David Alves	7fb89b61b5	Add util methods to map between Impala and Kudu types This adds helper methods to map between Impala and Kudu types to util/kudu-util.h. Change-Id: Ib48d327034c5d40d67eab9e27e2fc381184536bb Reviewed-on: http://gerrit.sjc.cloudera.com:8080/6623 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: jenkins	2015-06-01 16:01:04 -07:00
Skye Wanderman-Milne	09018a3756	Nested types: initialize repetition level decoders in HdfsParquetScanner The decoders aren't used yet, but will be when we materialize arrays. In addition, setting up the repetition level decoder makes sure the definition level encoder is initialized to the right place in the data buffer. Change-Id: Ic85ae812b10c747b36d884794d8dcf5976dfe74f Reviewed-on: http://gerrit.cloudera.org:8080/405 Reviewed-by: Casey Ching <casey@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-05-22 05:19:25 +00:00
Skye Wanderman-Milne	7801aa499f	Use codegen to inject runtime constants in exprs This patch introduces the function GetConstant(), which is used by expr compute function and UDFs to access query constants. There is a corresponding GetIrConstant() function that returns the IR versions of the same constants. Currently the only implemented constants are the expr's return type and argument types, but other constants can be easily be added to these functions. Interpreted expr functions run normally, but cross-compiled functions can be passed to InlineConstants(), which looks for calls to GetConstant() and replaces them with the result of calling GetIrConstant(). I used this technique in the decimal functions that previously were not switching on the type at all. The performance of LeastGreatest() after this patch is the same as it was before it switched on the type. Change-Id: I8b55744551830d894318a7bab6b6f045fb8bed41 Reviewed-on: http://gerrit.cloudera.org:8080/352 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-05-15 02:24:04 +00:00
Henry Robinson	8b4a748d5e	IMPALA-1726: Statestore to timeout hung RPCs If a Heartbeat() RPC appears hung, the statestore should abort that RPC so as not to hold on to a sender thread, and to trigger the failure detector to evict the hung node. We could just add a TCP timeout to the client cache used by the statestore, but doing so would mean that all RPCs were subject to the timeout, and UpdateState() typically takes much longer than Heartbeat() by design, so setting a reasonable timeout would be impossible. Instead, this patch adds a second client cache designed only for Heartbeat() RPCs, with an aggressive timeout of 3s by default. (Heartbeat() usually takes ~1-2ms). A timeout for UpdateState() is also set to avoid thread starvation, but this is much less aggressive at 300s. This patch also adds ClientConnection::DoRpc(), which calls an RPC and handles various failure modes, including timeout. If DoRpc() returns an error, the statestore handles it in the usual way, including updating the failure detector if the failed RPC is Heartbeat(). Change-Id: I2f2462278e59581937c9c10910625d2724a11efa Reviewed-on: http://gerrit.cloudera.org:8080/206 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins	2015-03-20 14:37:14 -07:00
ishaan	e02a38fa32	Use try/finally instead of the with context manager in generate_error_codes.py This will make the code compatible with python 2.4 Change-Id: I39c23256907520183f5f7797097f2fb1ad0e5cfc	2015-03-01 09:21:14 -08:00
Martin Grund	b582cdc22b	IMPALA-1598: Adding Error Codes to Log Messages This patch introduces the concept of error codes for errors that are recorded in Impala and are going to be presented to the client. These error codes are used to aggregate and group incoming error / warning messages to reduce the spill on the shell and increase the usefulness of the messages. By splitting the message string from the implementation, it becomes possible to edit the string independently of the code and pave the way for internationalization. Error messages are defined as a combination of an enum value and a string. Both are defined in the Error.thrift file that is automatically generated using the script in common/thrift/generate_error_codes.py. The goal of the script is to have a central understandable repository of error messages. Adding new messages to this file will require rebuilding the thrift part. The proxy class ErrorMessage is responsible to represent an error and capture the parameters that are used to format the error message string. When error messages are recorded they are recorded based on the following algorithm: - If an error message is of type GENERAL, do not aggregate this message and simply add it to the total number of messages - If an error messages is of specific type, record the first error message as a sample and for all other occurrences increment the count. - The coordinator will merge all error messages except the ones of type GENERAL and display a count. For example, in the case of the parquet file spanning multiple blocks the output will look like: Parquet files should not be split into multiple hdfs-blocks. file=hdfs://localhost:20500/fid.parq (1 of 321 similar) All messages are always logged to VLOG. In the coordinator error messages are merged across all backends to retain readability in the case of large clusters. The current version of this patch adds these new error codes to some of the most important error messages as a reference implementation. Change-Id: I1f1811631836d2dd6048035ad33f7194fb71d6b8 Reviewed-on: http://gerrit.cloudera.org:8080/39 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: Internal Jenkins	2015-03-01 03:37:32 +00:00

46 Commits