impala

mirror of https://github.com/apache/impala.git synced 2025-12-23 03:44:48 -05:00

Author	SHA1	Message	Date
Tim Armstrong	d637642534	IMPALA-5852: improve MINIMUM_RESERVATION_UNAVAILABLE error Augment the error message to mention that oversubscription is likely the problem and hint at solutions. Change-Id: I8e367e1b0cb08e11fdd0546880df23b785e3b7c9 Reviewed-on: http://gerrit.cloudera.org:8080/7861 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-29 03:26:20 +00:00
Tim Armstrong	d8bc570b67	IMPALA-5823: fix SET_DENY_RESERVATION_PROBABILITY Sometimes the client is not open when the debug action fires at the start of Open() or Prepare(). In that case we should set the probability when the client is opened later. This caused one of the large row tests to start failing with a "failed to repartition" error in the aggregation. The error is a false positive caused by two distinct keys hashing to the same partition. Removing the check allows the query to succeed because the keys hash to different partitions in the next round of repartitioning. If we repeatedly get unlucky and have collisions, the query will still fail when it reaches MAX_PARTITION_DEPTH. Testing: Ran TestSpilling in a loop for a couple of hours, including the exhaustive-only tests. Change-Id: Ib26b697544d6c2312a8e1fe91b0cf8c0917e5603 Reviewed-on: http://gerrit.cloudera.org:8080/7771 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-23 07:18:33 +00:00
Tim Armstrong	ed87c40600	IMPALA-3208: max_row_size option Adds support for a "max_row_size" query option that instructs Impala to reserve enough memory to process rows of the specified size. For spilling operators, the planner reserves enough memory to process rows of this size. The advantage of this compared to simply specifying larger values for min_spillable_buffer_size and default_spillable_buffer_size is that operators may be able to handler larger rows without increasing the size of all their buffers. The default value is 512KB. I picked that number because it doesn't increase minimum reservations too much even with smaller buffers like 64kb but should be large enough for almost all reasonable workloads. This is implemented in the aggs and joins using the variable page size support added to BufferedTupleStream in an earlier commit. The synopsis is that each stream requires reservation for one default-sized page per read and write iterator, and temporarily requires reservation for a max-sized page when reading or writing larger pages. The max-sized write reservation is released immediately after the row is appended and the max-size read reservation is released after advancing to the next row. The sorter and analytic simply use max-sized buffers for all pages in the stream. Testing: Updated existing planner tests to reflect default max_row_size. Added new planner tests to test the effect of the query option. Added "set" test to check validation of query option. Added end-to-end tests exercising spilling operators with large rows with and without spilling induced by SET_DENY_RESERVATION_PROBABILITY. Change-Id: Ic70f6dddbcef124bb4b329ffa2e42a74a1826570 Reviewed-on: http://gerrit.cloudera.org:8080/7629 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-23 03:27:26 +00:00
Matthew Jacobs	7264c54751	IMPALA-5644,IMPALA-5810: Min reservation improvements Rejects queries during admission control if: * the largest (across all backends) min buffer reservation is greater than the query mem_limit or buffer_pool_limit * the sum of the min buffer reservations across the cluster is larger than the pool max mem resources There are some other interesting cases to consider later: * every per-backend min buffer reservation is less than the associated backend's process mem_limit; the current admission control code doesn't know about other backend's proc mem_limits. Also reduces minimum non-reservation memory (IMPALA-5810). See the JIRA for experimental results that show this slightly improves min memory requirements for small queries. One reason to tweak this is to compensate for the fact that BufferedBlockMgr didn't count small buffers against the BlockMgr limit, but BufferPool counts all buffers against it. Testing: * Adds new test cases in test_admission_controller.py * Adds BE tests in reservation-tracker-test for the reservation-util code. Change-Id: Iabe87ce8f460356cfe4d1be4d7092c5900f9d79b Reviewed-on: http://gerrit.cloudera.org:8080/7678 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-22 08:27:12 +00:00
Tim Armstrong	9f4d9ff68f	IMPALA-5778: clarify --read_size option. Remove BTS_BLOCK_OVERFLOW error code, which is no longer used and referenced --read_size. Improve the flag description. The output is now: -read_size ((Advanced) The preferred I/O request size in bytes to issue to HDFS or the local filesystem. Increasing the read size will increase memory requirements. Decreasing the read size may decrease I/O throughput.) type: int32 default: 8388608 Testing: Tested that Impala built and basic queries could run. Change-Id: I3c20a9d55f89170b11f569c90b7f2949ddbe4211 Reviewed-on: http://gerrit.cloudera.org:8080/7623 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-11 01:18:03 +00:00
Tim Armstrong	a98b90bd38	IMPALA-4674: Part 2: port backend exec to BufferPool Always create global BufferPool at startup using 80% of memory and limit reservations to 80% of query memory (same as BufferedBlockMgr). The query's initial reservation is computed in the planner, claimed centrally (managed by the InitialReservations class) and distributed to query operators from there. min_spillable_buffer_size and default_spillable_buffer_size query options control the buffer size that the planner selects for spilling operators. Port ExecNodes to use BufferPool: * Each ExecNode has to claim its reservation during Open() * Port Sorter to use BufferPool. * Switch from BufferedTupleStream to BufferedTupleStreamV2 * Port HashTable to use BufferPool via a Suballocator. This also makes PAGG memory consumption more efficient (avoid wasting buffers) and improve the spilling algorithm: * Allow preaggs to execute with 0 reservation - if streams and hash tables cannot be allocated, it will pass through rows. * Halve the buffer requirement for spilling aggs - avoid allocating buffers for aggregated and unaggregated streams simultaneously. * Rebuild spilled partitions instead of repartitioning (IMPALA-2708) TODO in follow-up patches: * Rename BufferedTupleStreamV2 to BufferedTupleStream * Implement max_row_size query option. Testing: * Updated tests to reflect new memory requirements Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e Reviewed-on: http://gerrit.cloudera.org:8080/5801 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-05 01:03:02 +00:00
Michael Ho	b38d9826d7	IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. Also, codegen functions will not return error status should the IR function fails the LLVM verification step. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Reviewed-on: http://gerrit.cloudera.org:8080/5483 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-18 11:08:25 +00:00
Tim Armstrong	a3ce5b4488	IMPALA-5085: large rows in BufferedTupleStreamV2 The stream defaults to pages of default_page_len_. If a row doesn't fit in that page, it will allocate another page up to max_page_len_ bytes and append a single row to that page, then immediately unpin the page. This means that when writing a stream, the large page only needs to be kept in memory temporarily, which helps with memory requirements. E.g. consider a hash join that is repartitioning 1 unpinned stream into 16 unpinned streams. We will need default_page_len_ * 15 + max_page_len_ * 2 bytes of reservation because when processing a large row we only need one large write buffer at a time. Also switches the stream to lazily allocating write pages, so that we don't need to allocate a page until we know the size of the row to go in it. This required a mechanism to "save" reservation in PrepareForRead()/PrepareForWrite(). A SubReservation APi is added to BufferPool for this purpose and the stream now saves read and write reservation for lazy page allocation. It also saves reservation instead of double-pinning pages in the read/write case. The large row cases are not as optimised for memory consumption or performance - queries processing very large numbers of large rows are an extreme edge case that is likely to hit other performance bottlenecks first. Pages with large rows can have up to 50% internal fragmentation. To avoid duplicating more logic between AddRow() and AllocateRow() I restructured things so that AddRowSlow() is implemented in terms of AllocateRowSlow(). AllocateRow() now takes a function as an argument to populate the row. Testing: * Added tests for the case where 0 rows are added to the stream * Extend BigRow to exercise the new code. * Also test large strings and read/write streams. Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5 Reviewed-on: http://gerrit.cloudera.org:8080/6638 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-17 10:08:29 +00:00
Attila Jeges	21f9063304	Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet" Reverting IMPALA-2716 as SparkSQL does not agree with the approach taken. More details can be found at: https://issues.apache.org/jira/browse/SPARK-12297 Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd Reviewed-on: http://gerrit.cloudera.org:8080/6896 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-23 01:46:22 +00:00
Matthew Jacobs	a16a0fa84d	IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP Adds Impala support for TIMESTAMP types stored in Kudu. Impala stores TIMESTAMP values in 96-bits and has nanosecond precision. Kudu's timestamp is a 64-bit microsecond delta from the Unix epoch (called UNIXTIME_MICROS), so a conversion is necessary. When writing to Kudu, TIMESTAMP values in nanoseconds are averaged to the nearest microsecond. When reading from Kudu, the KuduScanner returns UNIXTIME_MICROS with 8bytes of padding so Impala can convert the value to a TimestampValue in-line and copy the entire row. Testing: Updated the functional_kudu schema to use TIMESTAMPs instead of converting to STRING, so this provides some decent coverage. Some BE tests were added, and some EE tests as well. TODO: Support pushing down TIMESTAMP predicates TODO: Support TIMESTAMPs in range partitioning expressions Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d Reviewed-on: http://gerrit.cloudera.org:8080/6526 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-11 20:55:51 +00:00
Attila Jeges	5803a0b074	IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE LIKE <file> will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. HDFS tables created by Impala using CREATE TABLE LIKE <other table> will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Reviewed-on: http://gerrit.cloudera.org:8080/5939 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-02 20:24:08 +00:00
Tim Armstrong	955b257cfb	IMPALA-5073: Part 1: add option to use mmap() for buffer pool Support allocating with mmap instead of TCMalloc to give more control over memory usage. Also tell Linux to back larger buffers with huge pages when possible to reduce TLB pressure. The main complication is that memory returned by mmap() is not necessarily aligned to a huge page boundary, so we need to "fix up" the mapping ourselves. Adds additional memory metrics, since we previously relied on the assumption that all memory was allocated through TCMalloc. memory.total-used tracks the total across the buffer pool and TCMalloc. When the buffer pool is not present, they just report the TCMalloc values. This can be enabled with the --mmap_buffers flag. The transparent huge pages support can be disabled with the --madvise_huge_pages startup flag. At some point this should become the default, but it requires more work to validate perf and resource used (virtual address space, etc). Testing: Added some unit tests to test edge cases and the different supported flags. Many pre-existing tests also exercise the modified code. Change-Id: Ifbc748f74adcbbdcfa45f3ec7df98284925acbd6 Reviewed-on: http://gerrit.cloudera.org:8080/6474 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-04-18 09:53:54 +00:00
Tim Armstrong	42002b91cb	IMPALA-5124: add tests for scratch read errors Adds tests for read errors from permissions (i.e. open() fails), corrupt data (integrity check fails) and truncated files (read() fails). Fixes a couple of bugs: * Truncated reads were not detected in TmpFilemgr * IoMgr buffers weren't returned on error paths (this isn't a true leak but results in DCHECKs being hit). Change-Id: I3f2b93588dd47f70a4863ecad3b5556c3634ccb4 Reviewed-on: http://gerrit.cloudera.org:8080/6562 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-04-18 06:34:47 +00:00
Dan Hecht	6242478d0c	Fix merge conflict Commits `fcc2d81` and `1335af3` conflicted. Change-Id: Ia5444d6b44b9aeea18f7861849513a2bde5c881f Reviewed-on: http://gerrit.cloudera.org:8080/5967 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-10 21:34:54 +00:00
Tim Armstrong	1335af3684	IMPALA-4842: BufferedBlockMgrTest.WriteError is flaky The test should allow Unpin() to fail with a scratch allocation error to handle the case where the first write fails and blacklists the scratch disk around the same time that the second write starts. Usually either the second write succeeds because it started before the first write failed or it fails with CANCELLED because the BufferedBlockMgr::is_cancelled_ flag is set. There is a small window for a race after the disk is blacklisted in TmpFileMgr but before BufferedBlockMgr::WriteComplete() is called. Testing: I was able to reproduce the problem locally by adding some delays to the test. I added a variant of the WriteError test that more reliably reproduces the bug. Ran both WriteError tests in a loop locally to try to flush out flakiness. Change-Id: I9878d7000b03a64ee06c2088a8c30e318fe1d2a3 Reviewed-on: http://gerrit.cloudera.org:8080/5940 Tested-by: Impala Public Jenkins Reviewed-by: Michael Ho <kwho@cloudera.com>	2017-02-10 01:28:16 +00:00
Bharath Vissapragada	fcc2d817b8	IMPALA-1427: Improvements to "Unknown disk-ID" warning - Removes the runtime unknown disk ID reporting and instead moves it to the explain plan as a counter that prints the number of scan ranges missing disk IDs in the corresponding HDFS scan nodes. - Adds a warning to the header of query profile/explain plan with a list of tables missing disk ids. - Removes reference to enabling dfs block metadata configuration, since it doesn't apply anymore. - Removes VolumeId terminology from the runtime profile. Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2 Reviewed-on: http://gerrit.cloudera.org:8080/5828 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-08 02:02:15 +00:00
Taras Bobrovytsky	858f5c2197	IMPALA-4363: Add Parquet timestamp validation Before this patch, we would simply read the INT96 Parquet timestamp representation and assume that it's valid. However, not all bit permutations represent a valid timestamp. One of the boost functions raised an exception (that we didn't catch) when passed an invalid boost date object, which resulted in a crash. This patch fixes problem by validating that the date falls into 1400..9999 year range as we are scanning Parquet. Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac Reviewed-on: http://gerrit.cloudera.org:8080/5343 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2016-12-03 06:41:07 +00:00
Matthew Jacobs	cfac09de10	IMPALA-3710: Kudu DML should ignore conflicts, pt2 Second part of IMPALA-3710, which removed the IGNORE DML option and changed the following errors on Kudu DML operations to be ignored: 1) INSERT where the PK already exists 2) UPDATE/DELETE where the PK doesn't exist This changes other data-related errors to be ignored as well: 3) NULLs in non-nullable columns, i.e. null constraint violoations. 4) Rows with PKs that are in an 'uncovered range'. It became clear that we can't differentiate between (3) and (4) because both return a Kudu 'NotFound' error code. The Impala error codes have been simplified as well: we just report a generic KUDU_NOT_FOUND error in these cases. This also adds some metadata to the thrift report sent to the coordinator from sinks so the total number of rows with errors can be added to the profile. Note that this does not include a breakdown of error counts by type/code because we cannot differentiate between all of these cases yet. An upcoming change will add this new info to the beeswax interface and show it in the shell output (IMPALA-3713). Testing: Updated kudu_crud tests to check the number of rows with errors. Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c Reviewed-on: http://gerrit.cloudera.org:8080/4985 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-11-09 06:43:41 +00:00
Tim Armstrong	6587c08f70	IMPALA-4387: validate decimal type in Avro file schema This patch prevents an invalid decimal type in an Avro file schema from crashing Impala. Most invalid Avro schemas are caught by the frontend, but file schemas still need to be validated by the backend. After this patch files with bad schemas are skipped. Testing: This was hit very rarely by the scanner fuzzing. Added a regression test that scans a file with a bad schema. Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc Reviewed-on: http://gerrit.cloudera.org:8080/4876 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-10-30 00:12:58 +00:00
Matthew Jacobs	99ed6dc67a	IMPALA-4134,IMPALA-3704: Kudu INSERT improvements 1.) IMPALA-4134: Use Kudu AUTO FLUSH Improves performance of writes to Kudu up to 4.2x in bulk data loading tests (load 200 million rows from lineitem). 2.) IMPALA-3704: Improve errors on PK conflicts The Kudu client reports an error for every PK conflict, and all errors were being returned in the error status. As a result, inserts/updates/deletes could return errors with thousands errors reported. This changes the error handling to log all reported errors as warnings and return only the first error in the query error status. 3.) Improve the DataSink reporting of the insert stats. The per-partition stats returned by the data sink weren't useful for Kudu sinks. Firstly, the number of appended rows was not being displayed in the profile. Secondly, the 'stats' field isn't populated for Kudu tables and thus was confusing in the profile, so it is no longer printed if it is not set in the thrift struct. Testing: Ran local tests, including new tests to verify the query profile insert stats. Manual cluster testing was conducted of the AUTO FLUSH functionality, and that testing informed the default mutation buffer value of 100MB which was found to provide good results. Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 Reviewed-on: http://gerrit.cloudera.org:8080/4728 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-10-25 02:06:10 +00:00
Lars Volker	2fa1633e40	IMPALA-4329: Prevent crash in scheduler when no backends are registered The scheduler crashed with a segmentation fault when there were no backends registered: After not being able to find a local backend (none are configured at all) in ComputeScanRangeAssignment(), the previous code would eventually try to return the top of assignment_ctx.assignment_heap in SelectRemoteBackendHost(), but that heap would be empty. Subsequently, when using the IP address of that heap node, a segmentation fault would occur. This change adds a check and aborts scheduling with an error. It also contains a test. Change-Id: I6d93158f34841ea66dc3682290266262c87ea7ff Reviewed-on: http://gerrit.cloudera.org:8080/4776 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-10-21 03:16:30 +00:00
Alex Behm	2a04b0e21a	IMPALA-3943: Address post-merge comments. Adds code comments and issues a warning for Parquet files with num_rows=0 but at least one non-empty row group. Change-Id: I72ccf00191afddb8583ac961f1eaf11e5eb28791 Reviewed-on: http://gerrit.cloudera.org:8080/4696 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-10-14 05:41:22 +00:00
Thomas Tauber-Marshall	b2c2fe7813	IMPALA-3786: Replace "cloudera" with "apache" (part 2) As part of the ASF transition, we need to replace references to Cloudera in Impala with references to Apache. This primarily means changing Java package names from com.cloudera.impala.* to org.apache.impala.* A prior patch renamed all the files as necessary, and this patch performs the actual code changes. Most of the changes in this patch were generated with some commands of the form: find . \| grep "\.java\\|\.py\\|\.h\\|\.cc" \| \ xargs sed -i s/'com$.$cloudera$\.$impala/org\1apache\2impala/g along with some manual fixes. After this patch, the remaining references to Cloudera in the repo mostly fall into the categories: - External components that have cloudera in their own package names, eg. com.cloudera.kudu/llama - URLs, eg. https://repository.cloudera.com/ Change-Id: I0d35fa6602a7fc0c212b2ef5e2b3322b77dde7e2 Reviewed-on: http://gerrit.cloudera.org:8080/3937 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Internal Jenkins	2016-09-29 21:14:13 +00:00
Tim Armstrong	241c7e0197	IMPALA-3201: in-memory buffer pool implementation This patch implements basic in-memory buffer management, with reservations managed by ReservationTrackers. Locks are fine-grained so that the buffer pool can scale to many concurrent queries. Includes basic tests for buffer pool setup, allocation and reservations. Change-Id: I4bda61c31cc02d26bc83c3d458c835b0984b86a0 Reviewed-on: http://gerrit.cloudera.org:8080/4070 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-09-28 23:38:20 +00:00
Bikramjeet Vig	9313dcdb83	IMPALA-3671: Add query option to limit scratch space usage Currently we can only disable spilling via a startup option which means we need to restart the cluster for this. This patch adds a new query option 'SCRATCH_LIMIT' that limits the amount of scratch directory space that can be used. This would be useful to prevent runaway queries or to prevent queries from spilling when that is not desired. This also adds a 'ScratchSpace' counter to the runtime profile of the BlockMgr that keeps track of the scratch space allocated. Valid values for the SCRATCH_LIMIT query option are: - unspecified or a limit of -1 means no limit - a limit of 0 (zero) means spilling is disabled - an int (= number of bytes) - a float followed by "M" (MB) or "G" (GB) Testing: A new test file "test_scratch_limit.py" was added for testing functionality. Change-Id: Ibf8842626ded1345b632a0ccdb9a580e6a0ad470 Reviewed-on: http://gerrit.cloudera.org:8080/4497 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-09-24 02:48:46 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Juan Yu	2ab130aa0a	IMPALA-3575: Add retry to backend connection request and rpc timeout This patch adds a configurable timeout for all backend client RPC to avoid query hang issue. Prior to this change, Impala doesn't set socket send/recv timeout for backend client. RPC will wait forever for data. In extreme cases of bad network or destination host has kernel panic, sender will not get response and RPC will hang. Query hang is hard to detect. If hang happens at ExecRemoteFragment() or CancelPlanFragments(), query cannot be canelled unless you restart coordinator. Added send/recv timeout to all RPCs to avoid query hang. For catalog client, keep default timeout to 0 (no timeout) because ExecDdl() could take very long time if table has many partitons, mainly waiting for HMS API call. Added a wrapper RetryRpcRecv() to wait for receiver response for longer time. This is needed by certain RPCs. For example, TransmitData() by DataStreamSender, receiver could hold response to add back pressure. If an RPC fails, the connection is left in an unrecoverable state. we don't put the underlying connection back to cache but close it. This is to make sure broken connection won't cause more RPC failure. Added retry for CancelPlanFragment RPC. This reduces the chance that cancel request gets lost due to unstable network, but this can cause cancellation takes longer time. and make test_lifecycle.py more flaky. The metric num-fragments-in-flight might not be 0 yet due to previous tests. Modified the test to check the metric delta instead of comparing to 0 to reduce flakyness. However, this might not capture some failures. Besides the new EE test, I used the following iptables rule to inject network failure to verify RPCs never hang. 1. Block network traffic on a port completely iptables -A INPUT -p tcp -m tcp --dport 22002 -j DROP 2. Randomly drop 5% of TCP packets to slowdown network iptables -A INPUT -p tcp -m tcp --dport 22000 -m statistic --mode random --probability 0.05 -j DROP Change-Id: Id6723cfe58df6217f4a9cdd12facd320cbc24964 Reviewed-on: http://gerrit.cloudera.org:8080/3343 Reviewed-by: Juan Yu <jyu@cloudera.com> Tested-by: Internal Jenkins	2016-07-18 13:29:24 -07:00
Michael Ho	ed5ec6772f	IMPALA-1619: Support 64-bit allocations. This change extends MemPool, FreePool and StringBuffer to support 64-bit allocations, fixes a bug in decompressor and extends various places in the code to support 64-bit allocation sizes. With this change, the text scanner can now decompress compressed files larger than 1GB. Note that the UDF interfaces FunctionContext::Allocate() and FunctionContext::Reallocate() still use 32-bit for the input argument to avoid breaking compatibility. In addition, the byte size of a tuple is still assumed to be within 32-bit. If it needs to be upgraded to 64-bit, it will be done in a separate change. A new test has been added to test the decompression of a 2GB snappy block compressed text file. Change-Id: Ic1af1564953ac02aca2728646973199381c86e5f Reviewed-on: http://gerrit.cloudera.org:8080/3575 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2016-07-08 15:42:09 -07:00
Michael Ho	a07fc367ee	Revert "IMPALA-1619: Support 64-bit allocations." This reverts commit 1ffb2bd5a2a2faaa759ebdbaf49bf00aa8f86b5e. Unbreak the packaging builds for now. Change-Id: Id079acb83d35b51ba4dfe1c8042e1c5ec891d807 Reviewed-on: http://gerrit.cloudera.org:8080/3543 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Michael Ho <kwho@cloudera.com>	2016-07-05 13:37:26 -07:00
Michael Ho	5f3dfdf6c7	IMPALA-1619: Support 64-bit allocations. This change extends MemPool, FreePool and StringBuffer to support 64-bit allocations, fixes a bug in decompressor and extends various places in the code to support 64-bit allocation sizes. With this change, the text scanner can now decompress compressed files larger than 1GB. Note that the UDF interfaces FunctionContext::Allocate() and FunctionContext::Reallocate() still use 32-bit for the input argument to avoid breaking compatibility. In addition, the byte size of a tuple is still assumed to be within 32-bit. If it needs to be upgraded to 64-bit, it will be done in a separate change. Change-Id: I7ed28083d809a86d801a9c063a0aa32c50d32b20 Reviewed-on: http://gerrit.cloudera.org:8080/2781 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-07-05 13:37:25 -07:00
Tim Armstrong	547be27e77	IMPALA-3745: parquet invalid data handling Added checks/error handling: * Negative string lengths while decoding dictionary or data page. * Buffer overruns while decoding dictionary or data page. * Some metadata FILECHECKs were converted to statuses. Testing: Unit tests for: * decoding of strings with negative lengths * truncation of all parquet types * dictionary creation correctly handling error returns from Decode(). End-to-end tests for handling of negative string lengths in dictionary- and plain-encoded data in corrupt files, and for handling of buffer overruns for string data. The corrupted parquet files were generated by hacking Impala's parquet writer to write invalid lengths, and by hacking it to write plain-encoded data instead of dictionary-encoded data by default. Performance: set num_nodes=1; set num_scanner_threads=1; select * from biglineitem where l_orderkey = -1; I inspected MaterializeTupleTime. Before the average was 8.24s and after was 8.36s (a 1.4% slowdown, within the standard deviation of 1.8%). Change-Id: Id565a2ccb7b82f9f92cc3b07f05642a3a835bece Reviewed-on: http://gerrit.cloudera.org:8080/3387 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-06-15 21:33:39 -07:00
Tim Armstrong	2dad444c8c	IMPALA-3732: handle string length overflow in avro files Avro string lengths are encoded as 64-bit integers. Impala can only handle up to 32-bit integers, so we need to be careful about handling out-of-range integers. Negative integers were already handled by a previous patch, but if a positive 64-bit integer is truncated to a 32-bit integer, the result can be a negative length. This patch fixes CHAR/VARCHAR behaviour, where we can just truncate the string, and STRING, where we can't truncate the string, so must return an error. Testing: Added unit tests for STRING, CHAR, and VARCHAR that exercise the string overflow handling. Change-Id: If6541e7c68255bf599b26386a55057c93e62af51 Reviewed-on: http://gerrit.cloudera.org:8080/3383 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-06-15 02:33:17 -07:00
Henry Robinson	3dff390e48	IMPALA-3682: Don't retry unrecoverable socket creation errors If a thrift client can't create a socket, all subsequent calls to Open() should fail fast since socket creation errors are treated as unrecoverable. Testing: manual testing with a bad SSL configuration. Impalad startup fails fast, rather than retrying 10 times as previously. Change-Id: I394be287143eefc79cf22865898b71ca24c41328 Reviewed-on: http://gerrit.cloudera.org:8080/3317 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Henry Robinson <henry@cloudera.com>	2016-06-14 10:32:46 -07:00
Skye Wanderman-Milne	01287a3ba9	IMPALA-3441, IMPALA-3659: check for malformed Avro data This patch adds error checking to the Avro scanner (both the codegen'd and interepted paths), including out-of-bounds checks and data validity checks. I ran a local benchmark using the following queries: set num_scanner_threads=1; select count(i) from default.avro_bigints_big; # file contains only longs select max(l_orderkey) from biglineitem_avro; # file has tpch.lineitem schema Both benchmark queries see negligable or no performance impact. This patch adds a new Avro scanner unit test and an end-to-end test that queries several corrupted files, as well as updates the zig-zag varlen int unit test. Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132 Reviewed-on: http://gerrit.cloudera.org:8080/3072 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-06-13 18:32:32 -07:00
Michael Ho	0243a21da8	IMPALA-3242: Remove most usages of RuntimeState::SetMemLimitExceeded() There are multiple places in the code which call RuntimeState::SetMemLimitExceeded(). Most of them are unnecessary as the error status constructed will eventually be propagated up the tree of exec nodes. There is no obvious reason to treat query memory limit exceeded differently. In some cases such as scan-node, calling SetMemLimitExceeded() is actually confusing as all scanner threads may pick up error status when any thread exceeds query memory limit, causing a lot of noise in the log. This change replaces most calls to RuntimeState::SetMemLimitExceeded() with MemTracker::MemLimitExceeded(). The remaining places are: the old hash table code, the UDF framework and QueryMaintenance() which checks for memory limit periodically. The query maintenance case will be removed eventually once IMPALA-2399 is fixed. Change-Id: Ic0ca128c768d1e73713866e8c513a1b75e6b4b59 Reviewed-on: http://gerrit.cloudera.org:8080/3140 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2016-05-23 08:40:19 -07:00
Casey Ching	39a28185e8	Re-enable Kudu in build using client stubs when needed The stubs in Impala broke during the merge commit. This commit removes the stubs in hopes of improving robustness of the build. The original problem (Kudu clients are only available for some OSs) is now addressed by moving the stubbing into a dummy Kudu client. The dummy client only allows linking to succeed, if any client method is called, Impala will crash. Before calling any such method, Kudu availability must be checked. Change-Id: I4bf1c964faf21722137adc4f7ba7f78654f0f712 Reviewed-on: http://gerrit.cloudera.org:8080/2585 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2016-03-29 23:57:54 +00:00
David Alves	7381304a23	Merge branch 'feature/kudu' into cdh5-trunk This is the final merge commit that merges the 'feature/kudu' branch into cdh5-trunk. Change-Id: Ib3dfb4fc7a69c5cb1c5789422ee52fa192ed677a	2016-03-13 19:28:43 -07:00
Michael Ho	9ed3b685a1	IMPALA-2399: Check for mem limit in allocations in parquet scanner and decompressor This change replaces all calls to MemPool::Allocate() with MemPool::TryAllocate() in the parquet scanner and the decompressor. Also streamline CheckQueryState() to avoid unnecessary spinlock acquisition for the common case when there is no error. Also removes some dead code in the text converter. MemPool::Allocate() is also updated to return a valid pointer instead of NULL when the allocation size is zero. NULL is only returned during allocation failure. This change also updates CollectionValueBuilder::GetFreeMemory() to return Status in case it exceeds memory limit. As part of the change, the max allocation limit (2 GB) is also removed from it as 64-bit allocations are supported in MemPool with this change. Change-Id: Ic70400407b7662999332448f4d1bce2cc344ca89 Reviewed-on: http://gerrit.cloudera.org:8080/2203 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-03-12 02:05:45 +00:00
casey	804cfbdd64	Get and use Kudu from the toolchain by default This is for review purposes only. This patch will be merged with David's big merge patch. Changes: 1) Make Kudu compilation dependent on the OS since not all OSs support Kudu. 2) Only run Kudu related tests when Kudu is supported (see #1). 3) Look for Kudu locally, but in a different location. To use a local build of Kudu, set KUDU_BUILD_DIR to the path Kudu was built in and set KUDU_CLIENT_DIR to the path KUDU was installed in. Example: git clone https://github.com/cloudera/kudu.git ...build 3rd party etc... mkdir -p $KUDU_BUILD_DIR cd $KUDU_BUILD_DIR cmake <path to Kudu source dir> make DESTDIR=$KUDU_CLIENT_DIR make install 4) Look for Kudu in the toolchain if not using a local Kudu build. 5) Add Kudu service startup scripts. The Kudu in the toolchain is actually a parcel that has been renamed (the contents were not modified in any way), that mean the Kudu service binaries are there. Those binaries are now used to run the Kudu service. Change-Id: I3db88cbd27f2ea2394f011bc8d1face37411ed58	2016-03-11 11:38:05 -08:00
David Alves	82222abaf5	Merge branch 'feature/kudu' into cdh5-trunk This merges the 'feature/kudu' branch with cdh5-trunk as of commit: 055500cc753f87f6d1c70627321fcc825044e183 This patch is not a pure merge patch in the sense that goes beyond conflict resolution to also address reviews to the 'feature/kudu' branch as a whole. The review items and their resolution can be inspected at: http://gerrit.cloudera.org:8080/#/c/1403/ Change-Id: I6dd4270cd17a4f5c02811c343726db3504275a92	2016-03-11 11:37:58 -08:00
Henry Robinson	9a2ac6cd7f	IMPALA-2987: Distinguish between already-closed and never-seen data stream receivers This patch adds an output parameter 'already_unregistered' to FindRecvrOrWait() to signal to the caller in which of two cases it may have returned NULL. If 'already_unregistered' is true, the receiver has already been setup and closed (possibly by cancellation, possibly by the fragment deliberately closing its inputs in the case of a limit). This is not an error - cancellation will be signalled to the sender from the coordinator, and deliberate closure means the coordinator will tear down the query shortly. If 'already_unregistered' is set to false by FindRecvrOrWait(), the DataStreamMgr has never seen the intended receiver. This means the sender has waited for a full timeout period without the upstream receiver being established; this signals a likely query setup problem (as long as datastream_sender_timeout_ms is set sufficiently large) and so we return an error. We need to tweak the two timeout parameters here: * datastream_sender_timeout_ms needs to be large enough to avoid false negatives for problems during query setup (otherwise queries will unexpectedly cancel that would otherwise have succeeded, if slowly). * STREAM_EXPIRATION_TIME_MS needs to be set high enough that a query will not continue executing for longer than STREAM_EXPIRATION_TIME_MS after it closes its input (otherwise the sender will get already_unregistered=false, and cancel). This case will only trigger when a sender tries to call TransmitData() after the receiver has been closed for STREAM_EXPIRATION_TIME_MS; this should not happen in non-error cases as receivers are not closed before consuming their entire input. In this patch the former has been set to 2 minutes, and the latter to 5 minutes. Change-Id: Ib1734992c7199b9dd4b03afca5372022051b6fbd Reviewed-on: http://gerrit.cloudera.org:8080/2305 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Henry Robinson <henry@cloudera.com>	2016-03-01 22:15:40 -08:00
Juan Yu	c9b33ddf63	IMPALA-1886/IMPALA-2154: Add support for multi-stream bz2/gzip compressed files. Fix a bug in which Impala only reads the first stream of a multi-stream bz2/gzip file. Changes the bz2 decoder to read the file in a streaming fashion rather than reading the entire file into memory before it can be decompressed. Change-Id: Icbe617d03a69953f0bf3aa0f7c30d34bc612f9f8 (cherry picked from commit b6d0b4e059329633dc50f1f73ebe35b7ac317a8e) Reviewed-on: http://gerrit.cloudera.org:8080/2219 Reviewed-by: Juan Yu <jyu@cloudera.com> Tested-by: Internal Jenkins	2016-02-28 21:31:37 -08:00
Michael Ho	a57be09fc7	IMPALA-2816: Include null indicators when checking if a row fits in a block This change includes the potential space needed for null indicators when determining whether a row can fit in a new I/O write block or small buffers in buffered tuple stream. For rows with size close to the I/O block size, there may not be enough space to hold the entire row in a block after reserving the header space for null indicators. This change also updates the buffered tuple stream BE test to test for this corner case. Change-Id: I256974281d555f9a015c17ea23a1b4d5e9055c97 Reviewed-on: http://gerrit.cloudera.org:8080/1973 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2016-02-09 08:42:58 +00:00
Tim Armstrong	fbcc2433ea	Enforce enum values in generate_error_codes.py Fixes a bug in generate_error_codes where the enum value specified in generate_error_codes.py was not actually used. Change-Id: If7e3269d12a839106c595d44da09c573a8a177f2 Reviewed-on: http://gerrit.cloudera.org:8080/1894 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-02-05 22:20:46 +00:00
Sailesh Mukil	f6309c4bd1	Fix error code number gap in thrift/generate_error_codes.py One of the error codes were removed leaveing a gap in the error code numbers. This change just closes the gap. Change-Id: I2e424e55439459d4c7a84dd393f55d72400dabf0 Reviewed-on: http://gerrit.cloudera.org:8080/1891 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2016-02-05 11:11:34 +00:00
Sailesh Mukil	6982a49b58	IMPALA-2598: Re-enable SSL and Kerberos on server-server This patch removes the workaround that disallows SSL and Kerberos being enabled at the same time. It was previously disallowed because SSL and Kerberos wouldn't work together for server-server (or daemon-daemon) communication by causing a hang. The issue has been addressed in the following patches: http://gerrit.cloudera.org:8080/#/c/1594/ http://gerrit.cloudera.org:8080/#/c/1599/ This patch should be merged only after the above 2 are merged. Change-Id: I63d492d1733204edd1249aff2cb3b168ec82ea92 Reviewed-on: http://gerrit.cloudera.org:8080/1772 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2016-01-20 04:10:02 +00:00
Michael Ho	e01ab4f1b2	IMPALA-2620: FunctionContext::Allocate() and friends should check for memory limits. FunctionContext::Allocate(), FunctionContextImpl::AllocateLocal() and FunctionContext::Reallocate() allocate memory without taking memory limits into account. The problem is that these functions invoke FreePool::Allocate() which may call MemPool::Allocate() that doesn't check against the memory limits. This patch fixes the problem by making these FunctionContext functions check for memory limits and set an error in the FunctionContext object if memory limits are exceeded. An alternative would be for these functions to call MemPool::TryAllocate() instead and return NULL if memory limits are exceeded. However, this may break some existing external UDAs which don't check for allocation failures, leading to unexpected crashes of Impala. Therefore, we stick with this ad hoc approach until the UDF/UDA interfaces are updated in the future releases. Callers of these FunctionContext functions are also updated to handle potential failed allocations instead of operating on NULL pointers. The query status will be polled at various locations and terminate the query. This patch also fixes MemPool to handle the case in which malloc may return NULL. It propagates the failure to the callers instead of continuing to run with NULL pointers. In addition, errors during aggregate functions' initialization are now properly propagated. Change-Id: Icefda795cd685e5d0d8a518cbadd37f02ea5e733 Reviewed-on: http://gerrit.cloudera.org:8080/1445 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2015-12-19 04:45:55 +00:00
Henry Robinson	2d39f1cc59	IMPALA-2598: (Workaround) Disallow Kerberos and SSL on server<->server connections This patch does the following: * Prevents Impala from starting if 'internal' Kerberos and SSL are enabled at the same time. * Changes the required configuration to enable 'internal' SSL to include --ssl_client_ca_certificate. This allows 'external' SSL to be configured without enabling 'internal' SSL. Test are included for the first item. For the second, the appropriate test is to try to connect with an internal SSL client to a non-SSL server. However, this causes the connection to hang, which is not an easy condition to detect in a test case. Change-Id: I7fa545045fed57e161fb37898d5782937c710a0c Reviewed-on: http://gerrit.cloudera.org:8080/1318 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins Reviewed-on: http://gerrit.cloudera.org:8080/1323	2015-10-25 04:26:27 +00:00
Skye Wanderman-Milne	e4d12931dd	Update the Avro scanner bad version header error message to include the file. This makes cases like IMPALA-2402 much easier to diagnose, since this message will mostly occur if Impala tries to read a file that's not an Avro data file. Change-Id: I6504e668905ecc6964b77a6fe0cfc9c7511fd5c0 Reviewed-on: http://gerrit.cloudera.org:8080/1202 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-10-12 14:41:03 -07:00
Juan Yu	41509ce3c1	IMPALA-2477: Parquet metadata randomly 'appears stale' Stream::ReadBytes() could fail by other reasons than 'stale metadata'. Adding Errorcode Check to make sure Impala return proper error message. It also fixes IMPALA-2488 metadata.test_stale_metadata fails on non-hdfs filesystem. Change-Id: I9a25df3fb49f721bf68d1b07f42a96ce170abbaa Reviewed-on: http://gerrit.cloudera.org:8080/1166 Reviewed-by: Juan Yu <jyu@cloudera.com> Tested-by: Internal Jenkins	2015-10-07 14:47:41 -07:00

1 2

65 Commits