impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Tim Armstrong	07fd332089	IMPALA-7869: break up parquet-column-readers.cc Move parquet classes into exec/parquet. Move CollectionColumnReader and ParquetLevelDecoder into separate files. Remove unnecessary 'encoding_' field from ParquetLevelDecoder. Switch BOOLEAN decoding to use composition instead of inheritance. This lets the boolean decoding use the faster batched implementations in ScalarColumnReader and avoids some confusing aspects of the class hierarchy, like the ReadValueBatch() implementation on the base class that was shared between BoolColumnReader and CollectionColumnReader. Improve compile times by instantiating BitPacking templates in a separate file (this looks to give a 30s+ speedup for compiling parquet-column-readers.cc). Testing: Ran exhaustive tests. Change-Id: I0efd5c50b781fe9e3c022b33c66c06cfb529c0b8 Reviewed-on: http://gerrit.cloudera.org:8080/11949 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-27 02:01:12 +00:00
Csaba Ringhofer	810841115a	IMPALA-7595: Check the validity of the time part of Parquet timestamps Before this fix Impala did not check whether a timestamp's time part is out of the valid [0, 24 hour) range when reading Parquet files, so these timestamps were memcopied as they were to slots, leading to results like: 1970-01-01 -00:00:00.000000001 1970-01-01 24:00:00 Different parts of Impala treat these timestamp differently: - string conversion leads to invalid representation that cannot be converted back to timestamp - timezone conversions handle the overflowing time part and give a valid timestamp result (at least since CCTZ, I did not check older versions of Impala) - Parquet writing inserts these timestamp as they are, so the resulting Parquet file will also contain corrupt timestamps The fix adds a check that converts these corrupt timestamps to NULL, similarly to the handling of timestamp outside the [1400..10000) range. A new error code is added for this case. If both the date and the time part is corrupt, then error about corrupt time is returned. Testing: - added a new scanner test that reads a corrupted Parquet file with edge values Change-Id: Ibc0ae651b6a0a028c61a15fd069ef9e904231058 Reviewed-on: http://gerrit.cloudera.org:8080/11521 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-01 13:20:40 +00:00
Tim Armstrong	f46de21140	IMPALA-1760: Implement shutdown command This is the same patch except with fixes for the test failures on EC and S3 noted in the JIRA. This allows graceful shutdown of executors and partially graceful shutdown of coordinators (new operations fail, old operations can continue). Details: * In order to allow future admin commands, this is implemented with function-like syntax and does not add any reserved words. * ALL privilege is required on the server * The coordinator impalad that the client is connected to can be shut down directly with ":shutdown()". * Remote shutdown of another impalad is supported, e.g. with ":shutdown('hostname')", so that non-coordinators can be shut down and for the convenience of the client, which does not have to connect to the specific impalad. There is no assumption that the other impalad is registered in the statestore; just that the coordinator can connect to the other daemon's thrift endpoint. This simplifies things and allows shutdown in various important cases, e.g. statestore down. * The shutdown time limit can be overridden to force a quicker or slower shutdown by specifying a deadline in seconds after the statement is executed. * If shutting down, a banner is shown on the root debug page. Workflow: 1. (if a coordinator) clients are prevented from submitting queries to this coordinator via some out-of-band mechanism, e.g. load balancer 2. the shutdown process is started via ":shutdown()" 3. a bit is set in the statestore and propagated to coordinators, which stop scheduling fragment instances on this daemon (if an executor). 4. the query startup grace period (which is ideally set to the AC queueing delay plus some additional leeway) expires 5. once the daemon is quiesced (i.e. no fragments, no registered queries), it shuts itself down. 6. If the daemon does not successfully quiesce (e.g. rogue clients, long-running queries), after a longer timeout (counted from the start of the shutdown process) it will shut down anyway. What this does: * Executors can be shut down without causing a service-wide outage * Shutting down an executor will not disrupt any short-running queries and will wait for long-running queries up to a threshold. * Coordinators can be shut down without query failures only if there is an out-of-band mechanism to prevent submission of more queries to the shut down coordinator. If queries are submitted to a coordinator after shutdown has started, they will fail. * Long running queries or other issues (e.g. stuck fragments) will slow down but not prevent eventual shutdown. Limitations: * The startup grace period needs to be configured to be greater than the latency of statestore updates + scheduling + admission + coordinator startup. Otherwise a coordinator may send a fragment instance to the shutting down impalad. (We could automate this configuration as a follow-on) * The startup grace period means a minimum latency for shutdown, even if the cluster is idle. * We depend on the statestore detecting the process going down if queries are still running on that backend when the timeout expires. This may still be subject to existing problems, e.g. IMPALA-2990. Tests: * Added parser, analysis and authorization tests. * End-to-end test of shutting down impalads. * End-to-end test of shutting down then restarting an executor while queries are running. * End-to-end test of shutting down a coordinator - New queries cannot be started on coord, existing queries continue to run - Exercises various Beeswax and HS2 operations. Change-Id: I8f3679ef442745a60a0ab97c4e9eac437aef9463 Reviewed-on: http://gerrit.cloudera.org:8080/11484 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-26 01:28:36 +00:00
Tim Armstrong	4845f98bee	IMPALA-7420: different error code for internal cancellation I started by converting scan and spill-to-disk because the cancellation there is always meant to be internal to the scan and spill-to-disk subsystems. I updated all places that checked for TErrorCode::CANCELLED to treat CANCELLED_INTERNALLY the same. This is to aid triage and debugging of bugs like IMPALA-7418 where an "internal" cancellation leaks out into the query state. This will make it easier to determine if an internal cancellation somehow "leaked" out. Testing: Ran exhaustive tests. Change-Id: If25d5b539d68981359e4d881cae7b08728ba2999 Reviewed-on: http://gerrit.cloudera.org:8080/11464 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-19 22:19:07 +00:00
Tim Armstrong	16a04ce81b	Revert "IMPALA-1760: Implement shutdown command" This reverts commit `fda44aed9d`. A couple of the tests broken on S3 and erasure coding. Reverting to unblock testing until we can come up with a proper fix. Change-Id: Icef47b3aa67bc056c40592d47e93c4ebc57be98c Reviewed-on: http://gerrit.cloudera.org:8080/11435 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-09-14 01:12:22 +00:00
Tim Armstrong	fda44aed9d	IMPALA-1760: Implement shutdown command This allows graceful shutdown of executors and partially graceful shutdown of coordinators (new operations fail, old operations can continue). Details: * In order to allow future admin commands, this is implemented with function-like syntax and does not add any reserved words. * ALL privilege is required on the server * The coordinator impalad that the client is connected to can be shut down directly with ":shutdown()". * Remote shutdown of another impalad is supported, e.g. with ":shutdown('hostname')", so that non-coordinators can be shut down and for the convenience of the client, which does not have to connect to the specific impalad. There is no assumption that the other impalad is registered in the statestore; just that the coordinator can connect to the other daemon's thrift endpoint. This simplifies things and allows shutdown in various important cases, e.g. statestore down. * The shutdown time limit can be overridden to force a quicker or slower shutdown by specifying a deadline in seconds after the statement is executed. * If shutting down, a banner is shown on the root debug page. Workflow: 1. (if a coordinator) clients are prevented from submitting queries to this coordinator via some out-of-band mechanism, e.g. load balancer 2. the shutdown process is started via ":shutdown()" 3. a bit is set in the statestore and propagated to coordinators, which stop scheduling fragment instances on this daemon (if an executor). 4. the query startup grace period (which is ideally set to the AC queueing delay plus some additional leeway) expires 5. once the daemon is quiesced (i.e. no fragments, no registered queries), it shuts itself down. 6. If the daemon does not successfully quiesce (e.g. rogue clients, long-running queries), after a longer timeout (counted from the start of the shutdown process) it will shut down anyway. What this does: * Executors can be shut down without causing a service-wide outage * Shutting down an executor will not disrupt any short-running queries and will wait for long-running queries up to a threshold. * Coordinators can be shut down without query failures only if there is an out-of-band mechanism to prevent submission of more queries to the shut down coordinator. If queries are submitted to a coordinator after shutdown has started, they will fail. * Long running queries or other issues (e.g. stuck fragments) will slow down but not prevent eventual shutdown. Limitations: * The startup grace period needs to be configured to be greater than the latency of statestore updates + scheduling + admission + coordinator startup. Otherwise a coordinator may send a fragment instance to the shutting down impalad. (We could automate this configuration as a follow-on) * The startup grace period means a minimum latency for shutdown, even if the cluster is idle. * We depend on the statestore detecting the process going down if queries are still running on that backend when the timeout expires. This may still be subject to existing problems, e.g. IMPALA-2990. Tests: * Added parser, analysis and authorization tests. * End-to-end test of shutting down impalads. * End-to-end test of shutting down then restarting an executor while queries are running. * End-to-end test of shutting down a coordinator - New queries cannot be started on coord, existing queries continue to run - Exercises various Beeswax and HS2 operations. Change-Id: I4d5606ccfec84db4482c1e7f0f198103aad141a0 Reviewed-on: http://gerrit.cloudera.org:8080/10744 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-11 23:57:20 +00:00
Tim Armstrong	518bcd3e14	IMPALA-6892: CheckHashAndDecrypt() includes file and host The error text with AES-GCM enabled looks like: Error reading 44 bytes from scratch file '/tmp/impala-scratch/0:0_d43635d0-8f55-485e-8899-907af289ac86' on backend tarmstrong-box:22000 at offset 0: verification of read data failed. OpenSSL error in EVP_DecryptFinal: 139634997483216:error:0607C083:digital envelope routines:EVP_CIPHER_CTX_ctrl:no cipher set:evp_enc.c:610: 139634997483216:error:0607C083:digital envelope routines:EVP_CIPHER_CTX_ctrl:no cipher set:evp_enc.c:610: 139634997483216:error:0607C083:digital envelope routines:EVP_CIPHER_CTX_ctrl:no cipher set:evp_enc.c:610: 139634997483216:error:0607C083:digital envelope routines:EVP_CIPHER_CTX_ctrl:no cipher set:evp_enc.c:610: Testing: Added a backend test to exercise the code path and verify the error code. Change-Id: I0652d6cdfbb4e543dd0ca46b7cc65edc4e41a2d8 Reviewed-on: http://gerrit.cloudera.org:8080/10204 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-04-26 07:14:30 +00:00
Vuk Ercegovac	2894884deb	IMPALA-6670: refresh lib-cache entries from plan When an impalad is in executor-only mode, it receives no catalog updates. As a result, lib-cache entries are never refreshed. A consequence is that udf queries can return incorrect results or may not run due to resolution issues. Both cases are caused by the executor using a stale copy of the lib file. For incorrect results, an old version of the method may be used. Resolution issues can come up if a method is added to a lib file. The solution in this change is to capture the coordinator's view of the lib file's last modified time when planning. This last modified time is then shipped with the plan to executors. Executors must then use both the lib file path and the last modified time as a key for the lib-cache. If the coordinator's last modified time is more recent than the executor's lib-cache entry, then the entry is refreshed. Brief discussion of alternatives: - lib-cache always checks last modified time + easy/local change to lib-cache - adds an fs lookup always. rejected for this reason - keep the last modified time in the catalog - bound on staleness is too loose. consider the case where fn's f1, f2, f3 are created with last modified times of t1, t2, t3. treat the fn's last modified time as a low-watermark; if the cache entry has a more recent time, use it. Such a scheme would allow the version at t2 to persist. An old fn may keep the state from converging to the latest. This could end up with strange cases where different versions of the lib are used across executors for a single query. In contrast, the change in this path relies on the statestore to push versions forward at all coordinators, so will push all versions at all caches forward as well. Testing: - added an e2e custom cluster test Change-Id: Icf740ea8c6a47e671427d30b4d139cb8507b7ff6 Reviewed-on: http://gerrit.cloudera.org:8080/9697 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-24 04:38:53 +00:00
Joe McDonnell	93e7a72dba	IMPALA-6543: Limit RowBatch serialization size to INT_MAX The serialization format of a row batch relies on tuple offsets. In its current form, the tuple offsets are int32s. This means that it is impossible to generate a valid serialization of a row batch that is larger than INT_MAX. This changes RowBatch::SerializeInternal() to return an error if trying to serialize a row batch larger than INT_MAX. This prevents a DCHECK on debug builds when creating a row larger than 2GB. This also changes the compression logic in RowBatch::Serialize() to avoid a DCHECK if LZ4 will not be able to compress the row batch. Instead, it returns an error. This modifies row-batch-serialize-test to verify behavior at each of the limits. Specifically: RowBatches up to size LZ4_MAX_INPUT_SIZE succeed. RowBatches with size range [LZ4_MAX_INPUT_SIZE+1, INT_MAX] fail on LZ4 compression. RowBatches with size > INT_MAX fail with RowBatch too large. Change-Id: I3b022acdf3bc93912d6d98829b30e44b65890d91 Reviewed-on: http://gerrit.cloudera.org:8080/9367 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins	2018-02-26 23:08:54 +00:00
Tim Armstrong	95f1666309	IMPALA-6077: remove Parquet BIT_PACKED def level support The encoding was added in an early version of the Parquet spec and deprecated even in the Parquet 1.0 spec. Parquet-MR switched to generating RLE at the same time as the spec changed in mid-2013. Impala always wrote RLE: see commit `6e293090e6`. The Impala implementation of BIT_PACKED was never correct because it implemented little endian bit unpacking instead of the big endian unpacking required by the spec for levels. Testing: Updated tests to reflect expected behaviour for supported and unsupported def level encodings. Cherry-picks: not for 2.x. Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae Reviewed-on: http://gerrit.cloudera.org:8080/9241 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-02-12 21:59:37 +00:00
Bikramjeet Vig	de29925912	IMPALA-6222: Add details to error msg on failure to get min reservation This patch adds the following details to the error message encountered on failure to get minimum memory reservation: - which ReservationTracker hit its limit - top 5 admitted queries that are consuming the most memory under the ReservationTracker that hit its limit Testing: - added tests to reservation-tracker-test.cc that verify the error message returned for different cases. - tested "initial reservation failed" condition manually to verify the error message returned. Change-Id: Ic4675fe923b33fdc4ddefd1872e6d6b803993d74 Reviewed-on: http://gerrit.cloudera.org:8080/8781 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-14 22:34:30 +00:00
Michael Ho	d60eb192a9	IMPALA-6285: Don't print stack trace on RPC errors. There is not much benefit in printing the stack trace when Thrift RPC hits an error. As long as we print enough info about the error and identify the caller, that should be sufficient. In fact, it has been observed that stack crawl caused unnecessary CPU spikes in the past. This change replaces Status() with Status::Expected() in DoRpc(), RetryRpc(), RetryRpcRecv() and Coordinator::BackendState::Exec() to avoid unnecessary stack crawls. Testing done: private core build. Verified error strings with test_rpc_timeout.py and test_rpc_exception.py Change-Id: Ia83294494442ef21f7934f92ba9112e80d81fa58 Reviewed-on: http://gerrit.cloudera.org:8080/8788 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-07 23:58:40 +00:00
Michael Ho	a4916e6d5f	IMPALA-6281: Fix use-after-free in InitAuth() Previously, we implicitly create a local string object created from the char* in argv[0] when calling InitAuth(). This string object goes out of scope once InitAuth() returns but the pointer of this local string's buffer is passed to the Sasl library which may reference it after the local string has been deleted, leading to use-after-free. This bug is exposed by recent change to enable Kerberos with KRPC as we now always initialize Sasl even if Kerberos is not enabled. This change fixes the problem above by making a copy of 'appname' passed to InitAuth(). Also, the new code enforces that multiple calls to InitAuth() must use the same 'appname' or it will fail. Testing done: Verified rpc-mgr-test and thrift-server-test no longer fail in ASAN build. Change-Id: I1f29c2396df114264dfc23726b8ba778f50e12e9 Reviewed-on: http://gerrit.cloudera.org:8080/8777 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-07 09:48:50 +00:00
Michael Ho	e4a2f5d212	IMPALA-6238: Enhance TErrorCode::DATASTREAM_SENDER_TIMEOUT message This change augments the message of TErrorCode::DATASTREAM_SENDER_TIMEOUT to include the source address when KRPC is enabled. The source address is not readily available in Thrift. The new message includes the destination plan node id in case there are multiple exchange nodes in a fragment instance. Testing done: Confirmed the error message by testing with following options: "--stress_datastream_recvr_delay_ms=90000 datastream_sender_timeout_ms=1000" Change-Id: Ie3e83773fe6feda057296e7d5544690aa9271fa0 Reviewed-on: http://gerrit.cloudera.org:8080/8751 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-05 02:09:21 +00:00
Zoltan Borok-Nagy	7e368b8f0f	IMPALA-5987: LZ4 Codec silently produces bogus compressed data for large inputs When Lz4Compressor::MaxOutputLen returns 0, it means that the input is too large to compress. When invoked Lz4Compressor::ProcessBlock with an input too large, it silently produced a bogus result. This bogus result even decompresses successfully, but not to the data that was originally compressed. After this commit, Lz4Compressor::ProcessBlock will return error if it cannot compress the input. I also added a comment on Codec::MaxOutputLen() that return value 0 means that the input is too large. I added some checks after the invocations of MaxOutputLen() where the compressor can be a Lz4Compressor. I added an automated test case to be/src/util/decompress-test.cc. Change-Id: Ifb0bc4ed98c5d7b628b791aa90ead36347b9fbb8 Reviewed-on: http://gerrit.cloudera.org:8080/8748 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-04 21:43:00 +00:00
Sailesh Mukil	d428c16f1e	IMPALA-5053: [SECURITY] Make KRPC work with Kerberos KuduRPC has support for Kerberos. However, since Impala's client transport still uses the Thrift transport stack, we need to make sure that a single security configuration applies to both internal communication (KuduRPC) and external communication (Thrift's TSaslTransport). This patch changes InitAuth() to start Sasl regardless of security configuration, since KRPC uses plain SASL for negotiation on insecure clusters. It also moves some utility code out of authentication.cc into auth-util.cc for resuse by the RpcMgr while enabling kerberos. The MiniKDC related code is moved out of thrift-server-test.cc into a new file called mini-kdc-wrapper.h/cc. This file exposes a new class MiniKdcWrapper which can be easily used by the tests to configure the kerberos environment, create the keytab, start the KDC and also initialize the Impala security library. Tests are added to rpc-mgr-test for kerberos tests over KRPC. thrift-server-test also has a mechanical change to use MiniKdcWrapper. Also tested on a live cluster configured to use kerberos. Change-Id: I8cec5cca5fdb4b1d46bab19e86cb1a8a3ad718fd Reviewed-on: http://gerrit.cloudera.org:8080/8270 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-02 04:55:10 +00:00
Michael Ho	b4ea57a7e3	IMPALA-4856: Port data stream service to KRPC This patch implements a new data stream service which utilizes KRPC. Similar to the thrift RPC implementation, there are 3 major components to the data stream services: KrpcDataStreamSender serializes and sends row batches materialized by a fragment instance to a KrpcDataStreamRecvr. KrpcDataStreamMgr is responsible for routing an incoming row batch to the appropriate receiver. The data stream service runs on the port FLAGS_krpc_port which is 29000 by default. Unlike the implementation with thrift RPC, KRPC provides an asynchronous interface for invoking remote methods. As a result, KrpcDataStreamSender doesn't need to create a thread per connection. There is one connection between two Impalad nodes for each direction (i.e. client and server). Multiple queries can multi-plex on the same connection for transmitting row batches between two Impalad nodes. The asynchronous interface also prevents avoids the possibility that a thread is stuck in the RPC code for extended amount of time without checking for cancellation. A TransmitData() call with KRPC is in essence a trio of RpcController, a serialized protobuf request buffer and a protobuf response buffer. The call is invoked via a DataStreamService proxy object. The serialized tuple offsets and row batches are sent via "sidecars" in KRPC to avoid extra copy into the serialized request buffer. Each impalad node creates a singleton DataStreamService object at start-up time. All incoming calls are served by a service thread pool created as part of DataStreamService. By default, the number of service threads equals the number of logical cores. The service threads are shared across all queries so the RPC handler should avoid blocking as much as possible. In thrift RPC implementation, we make a thrift thread handling a TransmitData() RPC to block for extended period of time when the receiver is not yet created when the call arrives. In KRPC implementation, we store TransmitData() or EndDataStream() requests which arrive before the receiver is ready in a per-receiver early sender list stored in KrpcDataStreamMgr. These RPC calls will be processed and responded to when the receiver is created or when timeout occurs. Similarly, there is limited space in the sender queues in KrpcDataStreamRecvr. If adding a row batch to a queue in KrpcDataStreamRecvr causes the buffer limit to exceed, the request will be stashed in a queue for deferred processing. The stashed RPC requests will not be responded to until they are processed so as to exert back pressure to the senders. An alternative would be to reply with an error and the request / row batches need to be sent again. This may end up consuming more network bandwidth than the thrift RPC implementation. This change adopts the behavior of allowing one stashed request per sender. All rpc requests and responses are serialized using protobuf. The equivalent of TRowBatch would be ProtoRowBatch which contains a serialized header about the meta-data of the row batch and two Kudu Slice objects which contain pointers to the actual data (i.e. tuple offsets and tuple data). This patch is based on an abandoned patch by Henry Robinson. TESTING ------- * Builds {exhaustive/debug, core/release, asan} passed with FLAGS_use_krpc=true. TO DO ----- * Port some BE tests to KRPC services. Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1 Reviewed-on: http://gerrit.cloudera.org:8080/8023 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-09 20:05:08 +00:00
Lars Volker	322e2dc802	IMPALA-5890: Abort queries if scanner hits IO errors Prior to this fix, an error in ScannerContext::Stream::GetNextBuffer() could leave the stream in an inconsistent state: - The DiskIoMgr hits EOF unexpected, cancels the scan range and enqueues a buffer with eosr set. - The ScannerContext::Stream tries to read more bytes, but since it has hit eosr, it tries to read beyond the end of the scan range using DiskIoMgr::Read(). - The previous read error resulted in a new file handle being opened. The now truncated, smaller file causes the seek to fail. - Then during error handling, the BaseSequenceScanner calls SkipToSync() and trips over the NULL pointer in in the IO buffer. In my reproduction this only happens with the file handle cache enabled, which causes Impala to see two different sized handles: the one from the cache when the query starts, and the one after reopening the file. To fix this, we change the I/O manager to always return DISK_IO_ERROR for errors and we abort a query if we receive such an error in the scanner. This change also fixes GetBytesInternal() to maintain the invariant that the output buffer points to the boundary buffer whenever the latter contains some data. I tested this by running the repro from the JIRA and impalad did not crash but aborted the queries. I also ran the repro with abort_on_error=1, and with the file handle cache disabled. Text files are not affected by this problem, since the text scanner doesn't try to recover from errors during ProcessRange() but wraps it in RETURN_IF_ERROR instead. With this change queries abort with the same error. Parquet files are also not affected since they have the metadata at the end. Truncated files immediately fail with this error: WARNINGS: File 'hdfs://localhost:20500/test-warehouse/tpch.partsupp_parquet/foo.0.parq' has an invalid version number: <UTF8 Garbage> Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Reviewed-on: http://gerrit.cloudera.org:8080/8011 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-13 09:20:53 +00:00
Philip Zeyliger	39f23bb8b9	IMPALA-3642: Adding backend addresses to error statuses for some scratch failures. Adds GetBackendAddress() (which is host:port) to error messages stemming from SCRATCH_LIMIT_EXCEEDED, SCRATCH_READ_TRUNCATED, and SCRATCH_ALLOCATION_FAILED messages. Testing: * Unit tests assert the string is updated for SCRATCH_LIMIT_EXCEEDED and SCRATCH_ALLOCATION_FAILED. SCRATCH_READ_TRUNCATED doesn't have an existing test, and I didn't add a new one. * Manually testing a query that spills after "chmod 000 /tmp/impala-scratch": $ chmod 000 /tmp/impala-scratch $ impala-shell [dev:21000] > set mem_limit=100m; MEM_LIMIT set to 100m [dev:21000] > select count() from tpch_parquet.lineitem join tpch_parquet.orders on l_orderkey = o_orderkey; Query: select count() from tpch_parquet.lineitem join tpch_parquet.orders on l_orderkey = o_orderkey Query submitted at: 2017-09-11 11:07:06 (Coordinator: http://dev:25000) Query progress can be monitored at: http://dev:25000/query_plan?query_id=5c48ff8f4103c194:1b40a6c00000000 WARNINGS: Could not create files in any configured scratch directories (--scratch_dirs=/tmp/impala-scratch) on backend 'dev:22002'. See logs for previous errors that may have prevented creating or writing scratch files. Opening '/tmp/impala-scratch/5c48ff8f4103c194:1b40a6c00000000_08e8d63b-169d-4571-a0fe-c48fa08d73e6' for write failed with errno=13 description=Error(13): Permission denied Opening '/tmp/impala-scratch/5c48ff8f4103c194:1b40a6c00000000_08e8d63b-169d-4571-a0fe-c48fa08d73e6' for write failed with errno=13 description=Error(13): Permission denied Opening '/tmp/impala-scratch/5c48ff8f4103c194:1b40a6c00000000_08e8d63b-169d-4571-a0fe-c48fa08d73e6' for write failed with errno=13 description=Error(13): Permission denied Change-Id: If31a50fdf6031312d0348d48aeb8f9688274cac2 Reviewed-on: http://gerrit.cloudera.org:8080/7816 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-12 03:54:21 +00:00
Joe McDonnell	e993b9712c	IMPALA-5750: Catch exceptions from boost thread creation The boost thread constructor will throw boost::thread_resource_error if it is unable to spawn a thread on the system (e.g. due to a ulimit). This uncaught exception crashes Impala. Systems with a large number of nodes and threads are hitting this limit. This change catches the exception from the thread constructor and converts it to a Status. This requires several changes: 1. util/thread.h's Thread constructor is now private and all Threads are constructed via a new Create() static factory method. 2. util/thread-pool.h's ThreadPool requires that Init() be called after the ThreadPool is constructed. 3. To propagate the Status, Threads cannot be created in constructors, so this is moved to initialization methods that can return Status. 4. Threads now use unique_ptr's for management in all cases. Threads cannot be used as stack-allocated local variables or direct declarations in classes. Query execution code paths will now handle the error: 1. If the scan node fails to spawn any scanner thread, it will abort the query. 2. Failing to spawn a fragment instance from the query state in StartFInstances() will correctly report the error to the coordinator and tear down the query. Testing: This introduces the parameter thread_creation_fault_injection, which will cause Thread::Create() calls in eligible locations to fail randomly roughly 1% of the time. Quite a few locations of Thread::Create() and ThreadPool::Init() are necessary for startup and cannot be eligible. However, all the locations used for query execution are marked as eligible and governed by this parameter. The code was tested by setting this parameter to true and running queries to verify that queries either run to completion with the correct result or fail with appropriate status. Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Reviewed-on: http://gerrit.cloudera.org:8080/7730 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-07 03:25:30 +00:00
Tim Armstrong	d637642534	IMPALA-5852: improve MINIMUM_RESERVATION_UNAVAILABLE error Augment the error message to mention that oversubscription is likely the problem and hint at solutions. Change-Id: I8e367e1b0cb08e11fdd0546880df23b785e3b7c9 Reviewed-on: http://gerrit.cloudera.org:8080/7861 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-29 03:26:20 +00:00
Tim Armstrong	d8bc570b67	IMPALA-5823: fix SET_DENY_RESERVATION_PROBABILITY Sometimes the client is not open when the debug action fires at the start of Open() or Prepare(). In that case we should set the probability when the client is opened later. This caused one of the large row tests to start failing with a "failed to repartition" error in the aggregation. The error is a false positive caused by two distinct keys hashing to the same partition. Removing the check allows the query to succeed because the keys hash to different partitions in the next round of repartitioning. If we repeatedly get unlucky and have collisions, the query will still fail when it reaches MAX_PARTITION_DEPTH. Testing: Ran TestSpilling in a loop for a couple of hours, including the exhaustive-only tests. Change-Id: Ib26b697544d6c2312a8e1fe91b0cf8c0917e5603 Reviewed-on: http://gerrit.cloudera.org:8080/7771 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-23 07:18:33 +00:00
Tim Armstrong	ed87c40600	IMPALA-3208: max_row_size option Adds support for a "max_row_size" query option that instructs Impala to reserve enough memory to process rows of the specified size. For spilling operators, the planner reserves enough memory to process rows of this size. The advantage of this compared to simply specifying larger values for min_spillable_buffer_size and default_spillable_buffer_size is that operators may be able to handler larger rows without increasing the size of all their buffers. The default value is 512KB. I picked that number because it doesn't increase minimum reservations too much even with smaller buffers like 64kb but should be large enough for almost all reasonable workloads. This is implemented in the aggs and joins using the variable page size support added to BufferedTupleStream in an earlier commit. The synopsis is that each stream requires reservation for one default-sized page per read and write iterator, and temporarily requires reservation for a max-sized page when reading or writing larger pages. The max-sized write reservation is released immediately after the row is appended and the max-size read reservation is released after advancing to the next row. The sorter and analytic simply use max-sized buffers for all pages in the stream. Testing: Updated existing planner tests to reflect default max_row_size. Added new planner tests to test the effect of the query option. Added "set" test to check validation of query option. Added end-to-end tests exercising spilling operators with large rows with and without spilling induced by SET_DENY_RESERVATION_PROBABILITY. Change-Id: Ic70f6dddbcef124bb4b329ffa2e42a74a1826570 Reviewed-on: http://gerrit.cloudera.org:8080/7629 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-23 03:27:26 +00:00
Matthew Jacobs	7264c54751	IMPALA-5644,IMPALA-5810: Min reservation improvements Rejects queries during admission control if: * the largest (across all backends) min buffer reservation is greater than the query mem_limit or buffer_pool_limit * the sum of the min buffer reservations across the cluster is larger than the pool max mem resources There are some other interesting cases to consider later: * every per-backend min buffer reservation is less than the associated backend's process mem_limit; the current admission control code doesn't know about other backend's proc mem_limits. Also reduces minimum non-reservation memory (IMPALA-5810). See the JIRA for experimental results that show this slightly improves min memory requirements for small queries. One reason to tweak this is to compensate for the fact that BufferedBlockMgr didn't count small buffers against the BlockMgr limit, but BufferPool counts all buffers against it. Testing: * Adds new test cases in test_admission_controller.py * Adds BE tests in reservation-tracker-test for the reservation-util code. Change-Id: Iabe87ce8f460356cfe4d1be4d7092c5900f9d79b Reviewed-on: http://gerrit.cloudera.org:8080/7678 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-22 08:27:12 +00:00
Tim Armstrong	9f4d9ff68f	IMPALA-5778: clarify --read_size option. Remove BTS_BLOCK_OVERFLOW error code, which is no longer used and referenced --read_size. Improve the flag description. The output is now: -read_size ((Advanced) The preferred I/O request size in bytes to issue to HDFS or the local filesystem. Increasing the read size will increase memory requirements. Decreasing the read size may decrease I/O throughput.) type: int32 default: 8388608 Testing: Tested that Impala built and basic queries could run. Change-Id: I3c20a9d55f89170b11f569c90b7f2949ddbe4211 Reviewed-on: http://gerrit.cloudera.org:8080/7623 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-11 01:18:03 +00:00
Tim Armstrong	a98b90bd38	IMPALA-4674: Part 2: port backend exec to BufferPool Always create global BufferPool at startup using 80% of memory and limit reservations to 80% of query memory (same as BufferedBlockMgr). The query's initial reservation is computed in the planner, claimed centrally (managed by the InitialReservations class) and distributed to query operators from there. min_spillable_buffer_size and default_spillable_buffer_size query options control the buffer size that the planner selects for spilling operators. Port ExecNodes to use BufferPool: * Each ExecNode has to claim its reservation during Open() * Port Sorter to use BufferPool. * Switch from BufferedTupleStream to BufferedTupleStreamV2 * Port HashTable to use BufferPool via a Suballocator. This also makes PAGG memory consumption more efficient (avoid wasting buffers) and improve the spilling algorithm: * Allow preaggs to execute with 0 reservation - if streams and hash tables cannot be allocated, it will pass through rows. * Halve the buffer requirement for spilling aggs - avoid allocating buffers for aggregated and unaggregated streams simultaneously. * Rebuild spilled partitions instead of repartitioning (IMPALA-2708) TODO in follow-up patches: * Rename BufferedTupleStreamV2 to BufferedTupleStream * Implement max_row_size query option. Testing: * Updated tests to reflect new memory requirements Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e Reviewed-on: http://gerrit.cloudera.org:8080/5801 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-05 01:03:02 +00:00
Michael Ho	b38d9826d7	IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. Also, codegen functions will not return error status should the IR function fails the LLVM verification step. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Reviewed-on: http://gerrit.cloudera.org:8080/5483 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-18 11:08:25 +00:00
Tim Armstrong	a3ce5b4488	IMPALA-5085: large rows in BufferedTupleStreamV2 The stream defaults to pages of default_page_len_. If a row doesn't fit in that page, it will allocate another page up to max_page_len_ bytes and append a single row to that page, then immediately unpin the page. This means that when writing a stream, the large page only needs to be kept in memory temporarily, which helps with memory requirements. E.g. consider a hash join that is repartitioning 1 unpinned stream into 16 unpinned streams. We will need default_page_len_ * 15 + max_page_len_ * 2 bytes of reservation because when processing a large row we only need one large write buffer at a time. Also switches the stream to lazily allocating write pages, so that we don't need to allocate a page until we know the size of the row to go in it. This required a mechanism to "save" reservation in PrepareForRead()/PrepareForWrite(). A SubReservation APi is added to BufferPool for this purpose and the stream now saves read and write reservation for lazy page allocation. It also saves reservation instead of double-pinning pages in the read/write case. The large row cases are not as optimised for memory consumption or performance - queries processing very large numbers of large rows are an extreme edge case that is likely to hit other performance bottlenecks first. Pages with large rows can have up to 50% internal fragmentation. To avoid duplicating more logic between AddRow() and AllocateRow() I restructured things so that AddRowSlow() is implemented in terms of AllocateRowSlow(). AllocateRow() now takes a function as an argument to populate the row. Testing: * Added tests for the case where 0 rows are added to the stream * Extend BigRow to exercise the new code. * Also test large strings and read/write streams. Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5 Reviewed-on: http://gerrit.cloudera.org:8080/6638 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-17 10:08:29 +00:00
Attila Jeges	21f9063304	Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet" Reverting IMPALA-2716 as SparkSQL does not agree with the approach taken. More details can be found at: https://issues.apache.org/jira/browse/SPARK-12297 Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd Reviewed-on: http://gerrit.cloudera.org:8080/6896 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-23 01:46:22 +00:00
Matthew Jacobs	a16a0fa84d	IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP Adds Impala support for TIMESTAMP types stored in Kudu. Impala stores TIMESTAMP values in 96-bits and has nanosecond precision. Kudu's timestamp is a 64-bit microsecond delta from the Unix epoch (called UNIXTIME_MICROS), so a conversion is necessary. When writing to Kudu, TIMESTAMP values in nanoseconds are averaged to the nearest microsecond. When reading from Kudu, the KuduScanner returns UNIXTIME_MICROS with 8bytes of padding so Impala can convert the value to a TimestampValue in-line and copy the entire row. Testing: Updated the functional_kudu schema to use TIMESTAMPs instead of converting to STRING, so this provides some decent coverage. Some BE tests were added, and some EE tests as well. TODO: Support pushing down TIMESTAMP predicates TODO: Support TIMESTAMPs in range partitioning expressions Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d Reviewed-on: http://gerrit.cloudera.org:8080/6526 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-11 20:55:51 +00:00
Attila Jeges	5803a0b074	IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE LIKE <file> will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. HDFS tables created by Impala using CREATE TABLE LIKE <other table> will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Reviewed-on: http://gerrit.cloudera.org:8080/5939 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-02 20:24:08 +00:00
Tim Armstrong	955b257cfb	IMPALA-5073: Part 1: add option to use mmap() for buffer pool Support allocating with mmap instead of TCMalloc to give more control over memory usage. Also tell Linux to back larger buffers with huge pages when possible to reduce TLB pressure. The main complication is that memory returned by mmap() is not necessarily aligned to a huge page boundary, so we need to "fix up" the mapping ourselves. Adds additional memory metrics, since we previously relied on the assumption that all memory was allocated through TCMalloc. memory.total-used tracks the total across the buffer pool and TCMalloc. When the buffer pool is not present, they just report the TCMalloc values. This can be enabled with the --mmap_buffers flag. The transparent huge pages support can be disabled with the --madvise_huge_pages startup flag. At some point this should become the default, but it requires more work to validate perf and resource used (virtual address space, etc). Testing: Added some unit tests to test edge cases and the different supported flags. Many pre-existing tests also exercise the modified code. Change-Id: Ifbc748f74adcbbdcfa45f3ec7df98284925acbd6 Reviewed-on: http://gerrit.cloudera.org:8080/6474 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-04-18 09:53:54 +00:00
Tim Armstrong	42002b91cb	IMPALA-5124: add tests for scratch read errors Adds tests for read errors from permissions (i.e. open() fails), corrupt data (integrity check fails) and truncated files (read() fails). Fixes a couple of bugs: * Truncated reads were not detected in TmpFilemgr * IoMgr buffers weren't returned on error paths (this isn't a true leak but results in DCHECKs being hit). Change-Id: I3f2b93588dd47f70a4863ecad3b5556c3634ccb4 Reviewed-on: http://gerrit.cloudera.org:8080/6562 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-04-18 06:34:47 +00:00
Dan Hecht	6242478d0c	Fix merge conflict Commits `fcc2d81` and `1335af3` conflicted. Change-Id: Ia5444d6b44b9aeea18f7861849513a2bde5c881f Reviewed-on: http://gerrit.cloudera.org:8080/5967 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-10 21:34:54 +00:00
Tim Armstrong	1335af3684	IMPALA-4842: BufferedBlockMgrTest.WriteError is flaky The test should allow Unpin() to fail with a scratch allocation error to handle the case where the first write fails and blacklists the scratch disk around the same time that the second write starts. Usually either the second write succeeds because it started before the first write failed or it fails with CANCELLED because the BufferedBlockMgr::is_cancelled_ flag is set. There is a small window for a race after the disk is blacklisted in TmpFileMgr but before BufferedBlockMgr::WriteComplete() is called. Testing: I was able to reproduce the problem locally by adding some delays to the test. I added a variant of the WriteError test that more reliably reproduces the bug. Ran both WriteError tests in a loop locally to try to flush out flakiness. Change-Id: I9878d7000b03a64ee06c2088a8c30e318fe1d2a3 Reviewed-on: http://gerrit.cloudera.org:8080/5940 Tested-by: Impala Public Jenkins Reviewed-by: Michael Ho <kwho@cloudera.com>	2017-02-10 01:28:16 +00:00
Bharath Vissapragada	fcc2d817b8	IMPALA-1427: Improvements to "Unknown disk-ID" warning - Removes the runtime unknown disk ID reporting and instead moves it to the explain plan as a counter that prints the number of scan ranges missing disk IDs in the corresponding HDFS scan nodes. - Adds a warning to the header of query profile/explain plan with a list of tables missing disk ids. - Removes reference to enabling dfs block metadata configuration, since it doesn't apply anymore. - Removes VolumeId terminology from the runtime profile. Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2 Reviewed-on: http://gerrit.cloudera.org:8080/5828 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-08 02:02:15 +00:00
Taras Bobrovytsky	858f5c2197	IMPALA-4363: Add Parquet timestamp validation Before this patch, we would simply read the INT96 Parquet timestamp representation and assume that it's valid. However, not all bit permutations represent a valid timestamp. One of the boost functions raised an exception (that we didn't catch) when passed an invalid boost date object, which resulted in a crash. This patch fixes problem by validating that the date falls into 1400..9999 year range as we are scanning Parquet. Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac Reviewed-on: http://gerrit.cloudera.org:8080/5343 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2016-12-03 06:41:07 +00:00
Matthew Jacobs	cfac09de10	IMPALA-3710: Kudu DML should ignore conflicts, pt2 Second part of IMPALA-3710, which removed the IGNORE DML option and changed the following errors on Kudu DML operations to be ignored: 1) INSERT where the PK already exists 2) UPDATE/DELETE where the PK doesn't exist This changes other data-related errors to be ignored as well: 3) NULLs in non-nullable columns, i.e. null constraint violoations. 4) Rows with PKs that are in an 'uncovered range'. It became clear that we can't differentiate between (3) and (4) because both return a Kudu 'NotFound' error code. The Impala error codes have been simplified as well: we just report a generic KUDU_NOT_FOUND error in these cases. This also adds some metadata to the thrift report sent to the coordinator from sinks so the total number of rows with errors can be added to the profile. Note that this does not include a breakdown of error counts by type/code because we cannot differentiate between all of these cases yet. An upcoming change will add this new info to the beeswax interface and show it in the shell output (IMPALA-3713). Testing: Updated kudu_crud tests to check the number of rows with errors. Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c Reviewed-on: http://gerrit.cloudera.org:8080/4985 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-11-09 06:43:41 +00:00
Tim Armstrong	6587c08f70	IMPALA-4387: validate decimal type in Avro file schema This patch prevents an invalid decimal type in an Avro file schema from crashing Impala. Most invalid Avro schemas are caught by the frontend, but file schemas still need to be validated by the backend. After this patch files with bad schemas are skipped. Testing: This was hit very rarely by the scanner fuzzing. Added a regression test that scans a file with a bad schema. Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc Reviewed-on: http://gerrit.cloudera.org:8080/4876 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-10-30 00:12:58 +00:00
Matthew Jacobs	99ed6dc67a	IMPALA-4134,IMPALA-3704: Kudu INSERT improvements 1.) IMPALA-4134: Use Kudu AUTO FLUSH Improves performance of writes to Kudu up to 4.2x in bulk data loading tests (load 200 million rows from lineitem). 2.) IMPALA-3704: Improve errors on PK conflicts The Kudu client reports an error for every PK conflict, and all errors were being returned in the error status. As a result, inserts/updates/deletes could return errors with thousands errors reported. This changes the error handling to log all reported errors as warnings and return only the first error in the query error status. 3.) Improve the DataSink reporting of the insert stats. The per-partition stats returned by the data sink weren't useful for Kudu sinks. Firstly, the number of appended rows was not being displayed in the profile. Secondly, the 'stats' field isn't populated for Kudu tables and thus was confusing in the profile, so it is no longer printed if it is not set in the thrift struct. Testing: Ran local tests, including new tests to verify the query profile insert stats. Manual cluster testing was conducted of the AUTO FLUSH functionality, and that testing informed the default mutation buffer value of 100MB which was found to provide good results. Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 Reviewed-on: http://gerrit.cloudera.org:8080/4728 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-10-25 02:06:10 +00:00
Lars Volker	2fa1633e40	IMPALA-4329: Prevent crash in scheduler when no backends are registered The scheduler crashed with a segmentation fault when there were no backends registered: After not being able to find a local backend (none are configured at all) in ComputeScanRangeAssignment(), the previous code would eventually try to return the top of assignment_ctx.assignment_heap in SelectRemoteBackendHost(), but that heap would be empty. Subsequently, when using the IP address of that heap node, a segmentation fault would occur. This change adds a check and aborts scheduling with an error. It also contains a test. Change-Id: I6d93158f34841ea66dc3682290266262c87ea7ff Reviewed-on: http://gerrit.cloudera.org:8080/4776 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-10-21 03:16:30 +00:00
Alex Behm	2a04b0e21a	IMPALA-3943: Address post-merge comments. Adds code comments and issues a warning for Parquet files with num_rows=0 but at least one non-empty row group. Change-Id: I72ccf00191afddb8583ac961f1eaf11e5eb28791 Reviewed-on: http://gerrit.cloudera.org:8080/4696 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-10-14 05:41:22 +00:00
Thomas Tauber-Marshall	b2c2fe7813	IMPALA-3786: Replace "cloudera" with "apache" (part 2) As part of the ASF transition, we need to replace references to Cloudera in Impala with references to Apache. This primarily means changing Java package names from com.cloudera.impala.* to org.apache.impala.* A prior patch renamed all the files as necessary, and this patch performs the actual code changes. Most of the changes in this patch were generated with some commands of the form: find . \| grep "\.java\\|\.py\\|\.h\\|\.cc" \| \ xargs sed -i s/'com$.$cloudera$\.$impala/org\1apache\2impala/g along with some manual fixes. After this patch, the remaining references to Cloudera in the repo mostly fall into the categories: - External components that have cloudera in their own package names, eg. com.cloudera.kudu/llama - URLs, eg. https://repository.cloudera.com/ Change-Id: I0d35fa6602a7fc0c212b2ef5e2b3322b77dde7e2 Reviewed-on: http://gerrit.cloudera.org:8080/3937 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Internal Jenkins	2016-09-29 21:14:13 +00:00
Tim Armstrong	241c7e0197	IMPALA-3201: in-memory buffer pool implementation This patch implements basic in-memory buffer management, with reservations managed by ReservationTrackers. Locks are fine-grained so that the buffer pool can scale to many concurrent queries. Includes basic tests for buffer pool setup, allocation and reservations. Change-Id: I4bda61c31cc02d26bc83c3d458c835b0984b86a0 Reviewed-on: http://gerrit.cloudera.org:8080/4070 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-09-28 23:38:20 +00:00
Bikramjeet Vig	9313dcdb83	IMPALA-3671: Add query option to limit scratch space usage Currently we can only disable spilling via a startup option which means we need to restart the cluster for this. This patch adds a new query option 'SCRATCH_LIMIT' that limits the amount of scratch directory space that can be used. This would be useful to prevent runaway queries or to prevent queries from spilling when that is not desired. This also adds a 'ScratchSpace' counter to the runtime profile of the BlockMgr that keeps track of the scratch space allocated. Valid values for the SCRATCH_LIMIT query option are: - unspecified or a limit of -1 means no limit - a limit of 0 (zero) means spilling is disabled - an int (= number of bytes) - a float followed by "M" (MB) or "G" (GB) Testing: A new test file "test_scratch_limit.py" was added for testing functionality. Change-Id: Ibf8842626ded1345b632a0ccdb9a580e6a0ad470 Reviewed-on: http://gerrit.cloudera.org:8080/4497 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-09-24 02:48:46 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Juan Yu	2ab130aa0a	IMPALA-3575: Add retry to backend connection request and rpc timeout This patch adds a configurable timeout for all backend client RPC to avoid query hang issue. Prior to this change, Impala doesn't set socket send/recv timeout for backend client. RPC will wait forever for data. In extreme cases of bad network or destination host has kernel panic, sender will not get response and RPC will hang. Query hang is hard to detect. If hang happens at ExecRemoteFragment() or CancelPlanFragments(), query cannot be canelled unless you restart coordinator. Added send/recv timeout to all RPCs to avoid query hang. For catalog client, keep default timeout to 0 (no timeout) because ExecDdl() could take very long time if table has many partitons, mainly waiting for HMS API call. Added a wrapper RetryRpcRecv() to wait for receiver response for longer time. This is needed by certain RPCs. For example, TransmitData() by DataStreamSender, receiver could hold response to add back pressure. If an RPC fails, the connection is left in an unrecoverable state. we don't put the underlying connection back to cache but close it. This is to make sure broken connection won't cause more RPC failure. Added retry for CancelPlanFragment RPC. This reduces the chance that cancel request gets lost due to unstable network, but this can cause cancellation takes longer time. and make test_lifecycle.py more flaky. The metric num-fragments-in-flight might not be 0 yet due to previous tests. Modified the test to check the metric delta instead of comparing to 0 to reduce flakyness. However, this might not capture some failures. Besides the new EE test, I used the following iptables rule to inject network failure to verify RPCs never hang. 1. Block network traffic on a port completely iptables -A INPUT -p tcp -m tcp --dport 22002 -j DROP 2. Randomly drop 5% of TCP packets to slowdown network iptables -A INPUT -p tcp -m tcp --dport 22000 -m statistic --mode random --probability 0.05 -j DROP Change-Id: Id6723cfe58df6217f4a9cdd12facd320cbc24964 Reviewed-on: http://gerrit.cloudera.org:8080/3343 Reviewed-by: Juan Yu <jyu@cloudera.com> Tested-by: Internal Jenkins	2016-07-18 13:29:24 -07:00
Michael Ho	ed5ec6772f	IMPALA-1619: Support 64-bit allocations. This change extends MemPool, FreePool and StringBuffer to support 64-bit allocations, fixes a bug in decompressor and extends various places in the code to support 64-bit allocation sizes. With this change, the text scanner can now decompress compressed files larger than 1GB. Note that the UDF interfaces FunctionContext::Allocate() and FunctionContext::Reallocate() still use 32-bit for the input argument to avoid breaking compatibility. In addition, the byte size of a tuple is still assumed to be within 32-bit. If it needs to be upgraded to 64-bit, it will be done in a separate change. A new test has been added to test the decompression of a 2GB snappy block compressed text file. Change-Id: Ic1af1564953ac02aca2728646973199381c86e5f Reviewed-on: http://gerrit.cloudera.org:8080/3575 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2016-07-08 15:42:09 -07:00
Michael Ho	a07fc367ee	Revert "IMPALA-1619: Support 64-bit allocations." This reverts commit 1ffb2bd5a2a2faaa759ebdbaf49bf00aa8f86b5e. Unbreak the packaging builds for now. Change-Id: Id079acb83d35b51ba4dfe1c8042e1c5ec891d807 Reviewed-on: http://gerrit.cloudera.org:8080/3543 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Michael Ho <kwho@cloudera.com>	2016-07-05 13:37:26 -07:00
Michael Ho	5f3dfdf6c7	IMPALA-1619: Support 64-bit allocations. This change extends MemPool, FreePool and StringBuffer to support 64-bit allocations, fixes a bug in decompressor and extends various places in the code to support 64-bit allocation sizes. With this change, the text scanner can now decompress compressed files larger than 1GB. Note that the UDF interfaces FunctionContext::Allocate() and FunctionContext::Reallocate() still use 32-bit for the input argument to avoid breaking compatibility. In addition, the byte size of a tuple is still assumed to be within 32-bit. If it needs to be upgraded to 64-bit, it will be done in a separate change. Change-Id: I7ed28083d809a86d801a9c063a0aa32c50d32b20 Reviewed-on: http://gerrit.cloudera.org:8080/2781 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-07-05 13:37:25 -07:00

1 2

85 Commits