impala

mirror of https://github.com/apache/impala.git synced 2026-01-26 12:02:21 -05:00

Author	SHA1	Message	Date
Skye Wanderman-Milne	68fef6a5bf	IMPALA-2213: make Parquet scanner fail query if the file size metadata is stale This patch changes the Parquet scanner to check if it can't read the full footer scan range, indicating that file has been overwritten by a shorter file without refreshing the table metadata. Before it would DCHECK. This patch adds a test for this case, as well as the case where the new file is longer than the metadata states (which fails with an existing error). Change-Id: Ie2031ac2dc90e4f2573bd3ca8a3709db60424f07 Reviewed-on: http://gerrit.cloudera.org:8080/1084 Tested-by: Internal Jenkins Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2015-10-01 13:58:39 -07:00
Juan Yu	6bac14a283	IMPALA-2005: Cleanup the newly created table if CTAS fails. If CTAS query fails during the DML part Impala should drop the newly created table. Change-Id: I39e04a6923a36afa48f3252addd50ddda83d1706 (cherry picked from commit e03ce43585f68590a95038341e74db458f34bf32) Reviewed-on: http://gerrit.cloudera.org:8080/870 Reviewed-by: Juan Yu <jyu@cloudera.com> Tested-by: Internal Jenkins	2015-10-01 13:58:38 -07:00
Skye Wanderman-Milne	0c5e6a804f	IMPALA-2443: add support for more Parquet array encodings This patch adds full support for the various Parquet array encodings, as well as tests that use files from https://github.com/apache/hive/tree/master/data/files. This should allow us to read any existing array data. Change-Id: I3d22ae237b1dc82ee75a83c1d4890d76316fadee Reviewed-on: http://gerrit.cloudera.org:8080/826 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-10-01 13:58:37 -07:00
Matthew Jacobs	70b9954593	IMPALA-2189: [RM] Retry logic for Llama RPC may throw exception The code in resource-broker.cc that makes RPCs to Llama will attempt to retry the RPC some number of times (which is configurable) if the RPC returns a failure. If the RPC throws (which thrift may do), we try to reset the connection and then make the RPC again, but this time not guarded by a try/catch block. If this RPC throws, the process will crash. This fixes the issue by removing the try/catch and instead using the ClientCache DoRpc function which handles this already. Some additional Llama RPC calling wrappers were removed as well. Change-Id: Iba5add47a77fe9257e73eea5711ef4b948abe76a Reviewed-on: http://gerrit.cloudera.org:8080/881 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-09-30 17:17:52 -07:00
Casey Ching	fd400927a9	Make TDdlType compatible with older clients A disaster recovery application that uses thrift to directly execute DDL (instead of using SQL) stopped working. A client call to drop a function ended up down the code path to drop a view. Apparently commit 47d061 messed up the enum ordering for old clients by adding TRUNCATE_TABLE in the middle of the enum list. The fix is to move TRUNCATE_TABLE to the end. Commit 47d061 was never released so this there shouldn't be concern about breaking newer clients. Change-Id: I79ebec65497077471a37e5712061c418403a336a Reviewed-on: http://gerrit.cloudera.org:8080/899 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-09-27 15:13:26 -07:00
Tim Armstrong	d37bf390a8	IMPALA-2406: avoid rows with no tuples In some cases the planner generated plans with rows with no materialized tuples. Recent changes to the backend caused these to hit a DCHECK. This patch addresses one case in the planner where it was possible to create such plans: when the planner generated an empty node from a select subquery with no from clause. The fix is to create a materialized tuple based on the select list expressions, in the same way as we handle these selects when the planner cannot statically determine they have no result rows. An example query is included as a test. It also adds additional checks to the frontend and backend to catch these invalid rows earlier. Change-Id: I851f2fb5d389471d0bb764cb85f3c49031a075e4 Reviewed-on: http://gerrit.cloudera.org:8080/911 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-09-27 15:13:25 -07:00
Tim Armstrong	212901a352	Nested types: describe for nested paths Describe should work if given a path that references to a complex-typed column of a table. It should produce output that lists the names and types of all valid subpaths of the column (e.g. struct fields, or key/val for a map). This changes some error messages in resolving paths, since we can no longer definitively determine based on the path length whether the first path element is meant to be the db or the table. Change-Id: I8a54e83df67141011ff5396c98f9eb0bde0fb04c Reviewed-on: http://gerrit.cloudera.org:8080/863 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-09-22 10:58:34 -07:00
aacalfa	5e733e8d62	IMPALA-2190: Complete conversion functions between timestamp, unixtime, and string dates Change-Id: I48a446f19c7634477f175d0defa8779dd70a392f Reviewed-on: http://gerrit.cloudera.org:8080/654 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-09-07 07:07:20 +00:00
Tim Armstrong	5b1157b44b	IMPALA-2079: Part 2: handling tmp device errors Tmp devices are blacklisted when a write error is encountered for that device. No more scratch space will be allocated on the blacklisted device, based on the assumption that the device is likely to be misconfigured or failing. This patch does not attempt to recover the query that experienced the write error. It also does not attempt to remap any existing blocks away from the temporary device. This behaviour is unit tested for several failure scenarios. This patch adds additional test infrastructure required for testing BufferedBlockMgr behavior in the presence of faults and in configurations with multiple tmp directories. Adds metrics tmp-file-mgr.active-scratch-dirs and tmp-file-mgr.active-scratch-dirs.list that track the number and set of active scratch dirs and expose it in the Impala web UI. Change-Id: I9d80ed3a7afad6ff8e5d739b6ea2bc0949f16746 Reviewed-on: http://gerrit.cloudera.org:8080/579 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-09-03 01:54:32 +00:00
Skye Wanderman-Milne	bcc73a36da	Nested types: read and materialize nested types in Parquet scanner This patch modifies the Parquet scanner to resolve nested schemas, and read and materialize collection types. The high-level modification is to create a CollectionColumnReader that recursively materializes map- and array-type slots. This patch also adds many tests, most of which query a new table called complextypestbl. This table contains hand-generated data that is meant to expose edge cases in the scanner. The tests mostly test the scanner, with a few tests of other functionality (e.g. array serialization). I ran a local benchmark comparing this scanner code to the original scanner code on an expanded version of tpch_parquet.lineitem with 48009720 rows. My benchmark involved selecting different numbers of columns with a single scanner thread, and I looked at the HDFS scan node time in the query profiles. This code introduces a 10%-20% regression in single-threaded scan time. Change-Id: Id27fb728934e8346444f61752c9278d8010e5f3a Reviewed-on: http://gerrit.cloudera.org:8080/576 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-09-02 19:23:54 +00:00
Skye Wanderman-Milne	62b31bdb52	Nested types: send materialized path to BE See comment in Descriptors.thrift for what the materialized path is. Change-Id: I64d00cf1bc2edcbbed3b6cdd5e934c55fff70a49 Reviewed-on: http://gerrit.cloudera.org:8080/650 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-09-02 10:30:24 +00:00
Tim Armstrong	d7e52e336a	IMPALA-1660: addendum: add more aliases Add in missing dfloor alias. This should have been added as part of IMPALA-1660 as an alias for floor(double) but was overlooked. Also add in aliases for decimal versions of functions where they exist. Change-Id: Icb790745714882248d365274e95d45eaaf0ba133 Reviewed-on: http://gerrit.cloudera.org:8080/697 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-09-01 10:46:16 +00:00
Tim Armstrong	b402003532	Nested Types: Dedup tuples when de/serializing This patch extends the deduplication of tuples in row batches to work on non-adjacent tuples. This deduplication requires an additional data structure (a hash table) and adds additional performance overhead (up to 3x serialization time), so it is only enabled for row batches with compositions that are likely to blow up due to non-adjacent duplication of large tuples. This avoids performance regression in typical cases, while preventing size blow-ups in problematic cases, such as joining three streams of tuples some of which contain may contain large collections. A test is included that ensures that adjacent deduplication is enabled. The row batch serialize benchmark shows that deduplication does not regress performance of serialization or deserialization. Change-Id: I3c71ad567d1c972a0f417d19919c2b28891fb407 Reviewed-on: http://gerrit.cloudera.org:8080/573 Tested-by: Internal Jenkins Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2015-08-31 18:45:57 +00:00
Dimitris Tsirogiannis	fdb90ed753	CDH-23206: Impala support for column-level authorization (part 1) This commit adds partial support for column-level authorization in Impala using the Sentry Service. The following changes are included: * Added support for parsing and analyzing GRANT/REVOKE statements with column-level privileges. The supporting syntax is: - GRANT SELECT (<col_names>) ON TABLE <table_name> TO [ROLE] <role_name> [WITH GRANT OPTION] - REVOKE [GRANT OPTION FROM] SELECT (<col_names>) ON TABLE <table_name> FROM [ROLE] <role_name> * Added support for storing column-level privileges in the Catalog Service and updating the Sentry Service when GRANT/REVOKE statements are executed. * Modified the SHOW GRANT ROLE statement to include information about column-level privileges. Subsequent patches will add support for enforcing column-level privileges in SQL queries and other statements. Change-Id: I0fd9daa92cc5147cb6f4b25eb9651aab8bf3049f Reviewed-on: http://gerrit.cloudera.org:8080/607 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-08-28 23:58:36 +00:00
Feni Chawla	2db0371a26	IMPALA-2033: Netezza compatibility date/time related functions. Added INT_MONTHS_BETWEEN, TIMEOFDAY, TIMESTAMP_CMP, MONTHS_BETWEEN functions Change-Id: I44834c84e21856568613938418947c532e7fbd2e Reviewed-on: http://gerrit.cloudera.org:8080/642 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins	2015-08-27 04:17:05 +00:00
Martin Grund	60c5140ea7	IMPALA-1983: Warn if table stats are potentially corrupt. When the `numRows` parameter stored in the table properties is errornously set to 0 and a number of non-empty files are present the table statistics are considered to be corrupt. To hint that there might be a problem, the explain statement will emit an additional warning if it detects potentially corrupt table stats like in the following example: Estimated Per-Host Requirements: Memory=42.00MB VCores=1 WARNING: The following tables have potentially corrupt table and/or column statistics. compute_stats_db.corrupted 03:AGGREGATE [FINALIZE] \| output: count:merge() \| 02:EXCHANGE [UNPARTITIONED] \| 01:AGGREGATE \| output: count() \| 00:SCAN HDFS [compute_stats_db.corrupted] partitions=1/2 files=1 size=24B In addition, the small query optimization is disabled for such queries. Change-Id: I0fa911f5132aa62195b854248663a94dcd8b14de Reviewed-on: http://gerrit.cloudera.org:8080/689 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: Internal Jenkins	2015-08-26 22:19:33 +00:00
Henry Robinson	b8cd78823a	IMPALA-1795: Add support for passwords for SSL private key files This patch allows administrators to configure all Impala daemons with a password for the private key file used to negotiate connections with clients which present the corresponding public key. This private key is obtained by running a user-supplied shell command and using the result. The command is supplied by setting --ssl_private_key_password_cmd. The output of the command is truncated to a maximum of 1024 bytes (this is a limitation of RunShellProcess(), but should not be significant for this use case), and then all trailing whitespace is trimmed (this is to avoid unexpected trailing newlines etc. from shell output). If the password is incorrect clients will be unable to connect to the server, whether or not they have the correct public key. If the command exits with an error, the server will not start. Change-Id: Icc13933fdf50a6170c859989626da5772fe5040d Reviewed-on: http://gerrit.cloudera.org:8080/623 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-08-26 03:03:32 +00:00
Skye Wanderman-Milne	ec671a406c	Simplify HdfsParquetScanner::AssembleRows() logic. This patch updates AssembleRows() to have fewer exit and error paths, as well as to explicitly distinguish between the row group being finished and an error occurring. It functionally changes the behavior in only two minor ways: - The entire row group will be read regardless of how many values the file metadata says there are. Previously it would only read up to the number stated in the metadata, and then had extra logic for checking if there were any values remaining. - If abort_on_error is false and there is an error reading a row group, subsequent row groups will still be read (except if OOM). Before this would sometimes happen and sometimes not. Change-Id: Id1836cfe2a507e46cb030be32b4c1553f478f639 Reviewed-on: http://gerrit.cloudera.org:8080/624 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-08-22 01:34:32 +00:00
Alex Behm	de4410af15	Nested Types: Assign conjuncts bound by collection-item tuples in Hdfs scans. This patch changes HdfsScanNode.init() to collect conjuncts that can be evaluated while materializing the items (tuples) of collection-typed slots, and assign these conjuncts to the scan node. Limitation: Conjuncts that must first be migrated into inline views and that cannot be captured by slot binding will not be assigned here, but in an UnnestNode. This limitation applies to conjuncts bound by inline-view slots that are backed by non-SlotRef exprs in the inline-view's select list. We only capture value transfers between slots, and not between arbitrary exprs. Change-Id: I20f2522070b257411c5e5d4ba9430e74b215308f Reviewed-on: http://gerrit.cloudera.org:8080/665 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-08-21 09:08:54 +00:00
Vlad Berindei	e4c42fa8bf	IMPALA-595: Add CASCADE to DROP DATABASE and use it in cleanup_db Change-Id: Idfa5b6943bc797e10d542487c31b8f1b527d8c97 Reviewed-on: http://gerrit.cloudera.org:8080/635 Reviewed-by: Vlad Berindei <vlad.berindei@cloudera.com> Tested-by: Internal Jenkins	2015-08-20 03:34:31 +00:00
Skye Wanderman-Milne	7906ed44ac	IMPALA-2015: Add support for nested loop join Implement nested-loop join in Impala with support for multiple join modes, including inner, outer, semi and anti joins. Null-aware left anti-join is not currently supported. Summary of changes: Introduced the NestedLoopJoinNode class in the FE that represents the nested loop join. Common functionality between NestedLoopJoinNode and HashJoinNode (e.g. cardinality estimation) was moved to the JoinNode class. In the BE, introduced the NestedLoopJoinNode class that implements the nested-loop join execution strategy. Change-Id: I238ec7dc0080f661847e5e1b84e30d61c3b0bb5c Reviewed-on: http://gerrit.cloudera.org:8080/652 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-08-19 08:40:14 +00:00
Sailesh Mukil	1c46cab5c6	IMPALA-2084: SPLIT_PART and REGEXP_LIKE functions for Tableau pushdown Added the SPLIT_PART and the REGEXP_LIKE builtin functions and tests for both. The REGEXP_LIKE has an optional third parameter which if used, uses a different 'prepare' function (RegexpLikePrepare in like-predicate.cc) so that the appropriate options can be set in the RE2 library. Added a patch for the RE2 library so that the 'dot matches all' option is exposed via the RE2 class. Fixed a bug in the case when the function to be evaluated for the WHERE clause operates on constants, proper cleanup isn't guaranteed on certain edge cases. Change-Id: Ia2a8de9eeb2854100a2d949f612cfaba317c5a7b Reviewed-on: http://gerrit.cloudera.org:8080/501 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2015-08-18 09:07:34 +00:00
Casey Ching	cf60967b7e	IMPALA-1675: Avoid overflow when adding large intervals to TIMESTAMPs It turns out there is a variety of cases where boost incorrectly adds intervals if the interval is at (or beyond) an edge case value. This change defines a max interval and returns NULL if the user supplies an interval beyond the max. Change-Id: I4fb6869be22ab06089b66eeffaea04b0c0880080 Reviewed-on: http://gerrit.cloudera.org:8080/492 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-08-16 12:09:24 +00:00
Martin Grund	ec3697656b	Fix comment about runtime profile serialization. Fixes the comment about how the runtime profile tree is flattened using pre-order traversal instead of in-order traversal. The implementation in RuntimeProfile::ToThrift() shows exactly that. Change-Id: Ib6c3dc7506a14d6b1d467177669b6d701ffedd45 Reviewed-on: http://gerrit.cloudera.org:8080/615 Tested-by: Internal Jenkins Reviewed-by: Martin Grund <mgrund@cloudera.com>	2015-08-15 18:38:54 +00:00
Dimitris Tsirogiannis	47c5ae405a	Revert "IMPALA-2015: Add support for nested loop join" This reverts commit 6837cdec7f6a7e1c7e8157e323f3ab68277689aa. Change-Id: I2fd6424c553a701fcbfd425b4486af7280820b23 Reviewed-on: http://gerrit.cloudera.org:8080/636 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-08-13 02:20:07 +00:00
Skye Wanderman-Milne	f000758ca8	IMPALA-2015: Add support for nested loop join Implement nested-loop join in Impala with support for multiple join modes, including inner, outer, semi and anti joins. Null-aware left anti-join is not currently supported. Summary of changes: Introduced the NestedLoopJoinNode class in the FE that represents the nested loop join. Common functionality between NestedLoopJoinNode and HashJoinNode (e.g. cardinality estimation) was moved to the JoinNode class. In the BE, introduced the NestedLoopJoinNode class that implements the nested-loop join execution strategy. Change-Id: Id65a1aae84335bba53f06339bdfa64a1b0be079e Reviewed-on: http://gerrit.cloudera.org:8080/457 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-08-07 02:47:32 +00:00
Feni Chawla	9428448146	IMPALA 2034: Netezza compatibility char functions for ASCII and UTF-8 strings: CHR and BTRIM Change-Id: I76bf9ba76172b9f1a192ee0936d73718808c0fbd Reviewed-on: http://gerrit.cloudera.org:8080/529 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins	2015-08-06 02:24:24 +00:00
Casey Ching	074e5b4349	Remove hashbang from non-script python files Many python files had a hashbang and the executable bit set though they were not intended to be run a standalone script. That makes determining which python files are actually scripts very difficult. A future patch will update the hashbang in real python scripts so they use $IMPALA_HOME/bin/impala-python. Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba Reviewed-on: http://gerrit.cloudera.org:8080/599 Reviewed-by: Casey Ching <casey@cloudera.com> Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2015-08-04 05:26:07 +00:00
Matthew Jacobs	b7417e0ab4	[RM] Init Llama client cache metrics The ClientCache has a set of metrics that are registered by calling the Init() function. This adds the missing call to the ResourceBroker's ClientCache Init() and adds the metric definitions. Change-Id: I879e8a176021589d28d2276fd7b3e5edc08fefb7 Reviewed-on: http://gerrit.cloudera.org:8080/569 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2015-07-31 21:43:17 +00:00
Tim Armstrong	e151ebaa71	IMPALA-1001: Bit and byte manipulation functions Bit and byte functions for compatibility with Teradata: bitand, bitor, bitxor, bitnot, countset, getbit, setbit, shiftleft, shiftright, rotateleft, rotateright. Interfaces and behavior follow Teradata documentation. All bit* functions are compatible with DB2. bitand only is compatible with Oracle. Change-Id: Idba3fb7beb029de493b602e6279aa68e32688df3	2015-07-28 08:11:01 -07:00
Tim Armstrong	822cb8f5e2	IMPALA-1660: Netezza compatibility - factorial Implements suffix n! operator for factorial and factorial function. Slightly refactor operators in fe to share code between unary operators. Based partially on work by Arthur Peng <arthur.peng@intel.com>. Change-Id: I71b6c824c59fc5305f16b8c4457805126a1da93b Reviewed-on: http://gerrit.cloudera.org:8080/531 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-07-27 19:03:48 +00:00
Alex Behm	39874514d9	Nested Types: Exec nodes for Subplan evaluation. This patch adds three new exec nodes for evaluating Subplans: 1. SingularRowSrcNode 2. SubplanNode 3. UnnestNode Change-Id: I2af059a4ab7e7d98a65ae24b234e5d7e5f39ece8 Reviewed-on: http://gerrit.cloudera.org:8080/403 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-07-25 23:57:58 +00:00
Skye Wanderman-Milne	8b66b11bb8	Nested types: BE changes for Avro struct support Most of this patch is rewriting the schema resolution logic to handle recursive schemas. The other changes are for reading and codegening recursive schemas. Change-Id: I257db05e02ed99c62c8dcfd0136b9e8f392d5933 Reviewed-on: http://gerrit.cloudera.org:8080/86 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-07-25 04:08:04 +00:00
Tim Armstrong	e5cc539d3f	IMPALA-1660: Netezza math function aliases Add aliases for existing functions for Netezza compatibility: dceil->ceil, dtrunc->truncate, dexp->exp, dlog1->ln, log10->dlog10, dpow->pow, fpow->pow, dsqrt->sqrt, random->rand. Change-Id: I97da27b676d4e07e55735540f494bdb873f7ed61 Reviewed-on: http://gerrit.cloudera.org:8080/559 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-07-23 21:56:33 +00:00
Casey Ching	a6d534682b	IMPALA-2086, IMPALA-2090: Avoid boost year/month interval logic Boost handles a couple of edge cases differently than other databases such as Postgres and MySQL when adding year/month intervals to timestamps. This change makes Impala consistent for the other databases. The performance difference was not noticeable (<5% if any). Change-Id: Icb02a06281b53753938cab88e0d28f20709fee06 Reviewed-on: http://gerrit.cloudera.org:8080/489 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-07-20 10:16:54 +00:00
Martin Grund	ed18dd4a8b	IMPALA-80: Dynamic progress reporting for the shell This patch adds a way to allow for dynamic progress reporting in the shell. There are two new command line flags for the shell --live_progress - will print the completed vs total # of scan ranges --live_summary - prints an updated exec summary In addition to the command line flags, these options can be set from within the shell using: set LIVE_SUMMARY=True set LIVE_PROGRESS=True The new options will be listed under shell options. Both reports will be updated at most every second, for longer running queries it will be adjusted to the time between two RPC calls to get the query status. To provide this information in the ExecSummary, the Thrift structure for the ExecSummary was extended to contain a progress indicator. The output is printed to stderr and only available in interactive mode. An example video is available here: https://asciinema.org/a/5wi7ypckx4ol4ha1hlg3e3q1k Change-Id: I70b2ab5fa74dc2ba5bc3b338ef13ddc6ccf367d2 Reviewed-on: http://gerrit.cloudera.org:8080/508 Tested-by: Internal Jenkins Reviewed-by: Martin Grund <mgrund@cloudera.com>	2015-07-17 17:59:29 +00:00
Arthur Peng	a275f2a751	IMPALA-1660: Netezza math functions support 1.Add cot function 2.Add double type support in truncate Change-Id: Id48c58b7778a31edfbda8982f7a8c3d05a1ad14e	2015-07-16 22:29:26 -07:00
Casey Ching	f351119730	Add section in builtin function registry for invisible functions An upcoming patch will add a function that will not be user visible. This patch allows a non-visible function to be added in the same way that visible functions are added (using impala_functions.py). Change-Id: I70971ced0d595a7aaa975985e589d2676423e221 Reviewed-on: http://gerrit.cloudera.org:8080/528 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-07-15 23:43:55 +00:00
Skye Wanderman-Milne	cc68c1fcaf	Add collection_item_desc_ to SlotDescriptor The item tuple descriptor is already set in SlotDescriptor.java, this patch just plumbs it through to the backend. Change-Id: I4b67ef50ccfde422829d4d2698b04b32666746be Reviewed-on: http://gerrit.cloudera.org:8080/483 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2015-07-08 23:26:48 +00:00
Matthew Jacobs	b338034376	Lower-case units in generated metrics mdl file The mdl file will be consumed by CM. They have asked for the units to be lower-case. Change-Id: Iacc583ff2c1680ec02a41feab558fbb2890d95be Reviewed-on: http://gerrit.cloudera.org:8080/499 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2015-07-01 20:18:37 +00:00
Skye Wanderman-Milne	dcb1b53749	Nested types: expose table column types through TableDescriptor Before we only had type information for each SlotDescriptor, rather than for the entire table. However, in order to do schema resolution for nested fields, we need to be able to traverse the table schema starting from the table-level columns. We could theoretically expose only the paths needed to resolve each slot, but it's simpler to have the whole table. Change-Id: I026c1f1f552d1ac5d1b267f876e1c39a258714b5 Reviewed-on: http://gerrit.cloudera.org:8080/404 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-06-24 08:04:30 +00:00
Henry Robinson	79913b01e6	IMPALA-2064: Add effective_user() builtin The user() builtin always returns the connecteduser. However, if the client wants to see which user its queries are actually delegated to, there was no easy way to do that. This patch adds effective_user(), which returns the proxy delegated user for authorization purposes. If no delegated user is set, the effective user is the same as that returned from user(). The only way to test this is via a new custom cluster test, which sets impala.doas.user so that the effective user might be different from the connected one. Change-Id: I7048c27c6808a6986dbe1246929816176dca9f76 Reviewed-on: http://gerrit.cloudera.org:8080/458 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins	2015-06-16 23:42:40 +00:00
Martin Grund	ec8b815faf	CMake 2.6 Dependency Management In CMake 2.6 dependencies for custom targets could not be expressed inline. This patch fixes an issue when generating the thrift files with CMake versions prior to 2.8. Change-Id: Ie04fbcb45b3efb45a6bbaa806a1630c26357185f Reviewed-on: http://gerrit.cloudera.org:8080/461 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-06-16 04:42:47 +00:00
Martin Grund	384ae3ab08	Fixes for Toolchain Issues If a static version of zlib and bzip2 is picked up we assumed that it would be compiled with -fPIC. However, this is not always the case. Thus in the non-toolchain case we specifically dynamic link with zlib and bzip2 for the dynamic targets. In addition, this patch removes static linking of libgcc in the toolchain case as LLVM is not able to find the exception handling symbols even if they are present in the binary. Static linking of libgcc is postponed. Next, if Impala is build with -notests the external data source thrift files would not be generated. This patch make sure the dependencies are expressed correctly. Finally, if a user would have google perftools installed on the system we would accidentally pick up the system libraries and the thirdparty headers which will end in linker errors. This patch fixes the path issues. Change-Id: Ic000101c33da26d75a0cd733f7ef02f1bd694937 Reviewed-on: http://gerrit.cloudera.org:8080/460 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-06-15 23:14:32 +00:00
Martin Grund	81f247b171	Optional Impala Toolchain This patch allows to optionally enable the new Impala binary toolchain. For now there are now major version differences in the toolchain dependencies and what is currently kept in thirdparty. To enable the toolchain, export the variable IMPALA_TOOLCHAIN to the folder where the binaries are available. In addition this patch moves gutil from the thirdparty directory into the source tree of be/src to allow easy propagation of compiler and linker flags. Furthermore, the thrift-cpp target was added as a dependency to all targets that require the generated thrift sources to be available before the build is started. What is the new toolchain: The goal of the toolchain is to homogenize the build environment and to make sure that Impala is build nearly identical on every platform. To achieve this, we limit the flexibility of using the systems host libraries and rather rely on a set of custom produced binaries including the necessary compiler. Change-Id: If2dac920520e4a18be2a9a75b3184a5bd97a065b Reviewed-on: http://gerrit.cloudera.org:8080/427 Reviewed-by: Adar Dembo <adar@cloudera.com> Tested-by: Internal Jenkins Reviewed-by: Martin Grund <mgrund@cloudera.com>	2015-06-13 03:11:44 +00:00
Matthew Jacobs	cbedd03d9f	Add option in generate_metrics.py to output CM mdl Adds support in the script generate_metrics.py to produce a CM compatible metric definition (MDL) file. Fixes some metrics missing descriptions and changing some metrics created as gauges that are really counters. TODO: Support histograms, stats, and metric defs with args Change-Id: I3ebb45145035facab5d4408118150f8c8eb8786a Reviewed-on: http://gerrit.cloudera.org:8080/423 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2015-06-11 02:58:33 +00:00
Dan Hecht	d46de9bba1	IMPALA-1968: Part 1: Improve planner numNodes estimate for remote scans This commit will be backported to 5.4.x to improve plans when using Isilon and S3. The planner currently estimates the number of backends that an hdfs scan node will execute on as the number of datanodes holding block replica for the corresponding table. This can be a bad estimate for various reasons: 1) It's completely wrong when the scan is remote (e.g. S3 or Isilon). 2) It doesn't account for partition pruning. 3) The size of the set of hosts holding block replica may larger than the number of scan ranges. Improve the estimate by examing the scan ranges and taking locality into account. While this new estimate will eventually be used in all cases, this change uses the new estimate only when there is a remote scan range as to not change plans produced for local ranges (since this commit will be backported to 5.4.x). So, this commit purposely addresses only case 1. A follow on commit will enable the new logic for all cases. Also set up the S3PlannerTest so that we can enable it in the nightly jenkins S3 run. It was inadvertantly never enabled there. Change-Id: I3fd3f7c5431a535fb044c98c326338c21b8a1898 Reviewed-on: http://gerrit.cloudera.org:8080/425 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-06-03 20:04:03 +00:00
Arthur Peng	8734cd642e	IMPALA-1568: Enable alter table recover partitions Scan HDFS path of the table and find the partitions which are missing in metastore. Add these partitions into metastore. Change-Id: I150f114db576bc18d39f3791be7af581ab49dfab Reviewed-on: http://gerrit.cloudera.org:8080/24 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins	2015-06-02 08:38:07 +00:00
Martin Grund	0f85e0e700	IMPALA-1588: Enabling caching of HDFS file handles (Part III) This patch enables caching of HDFS file handles to avoid opening the file over and over again. When a file is opened for the first time, a HdfsCachedFileHandle object is created that is a small wrapper around the hdfsFile instance allowing to associate the last modified timestamp with this instance. When the file handle is no longer needed, it is returned to the DsikIoMgr where it is cached under the given path. When the file is opened again, first a lookup is performed to see if an existing handle can be reused. If there is an existing handle, the last modified time of the cached handle is compared with the last modified time of the file to be opened. If they are equal the handle can be reused, otherwise it is closed and the file is opened regularly. The new flag `-max_cached_file_handles` controls the overall size of the cache by defining an upper bound of cached file handles. Furthermore, five new metrics were added to report the number of currently cached file handles in the DiskIoMgr and the hit ratio of the cache (including hit and miss count). impala-server.io.mgr.num-cached-file-handles impala-server.io.mgr.cached-file-handles-hit-ratio impala-server.io.mgr.cached-file-handles-hit-count impala-server.io.mgr.cached-file-handles-miss-count mpala-server.io.mgr.num-file-handles-outstanding Due to the way how Impala performs the scan operations the cache may contain multiple entries for the same file. If the limit of open files in the context of the process is smaller than `max_cached_file_handles`, the lower limit is used as the cache capacity. Performance and Memory Evaluation: The patch was evaluated in three tests 1) Throughput, parallel scans on a small table with 200 small files. TP increased from ~50 QPS to ~150 QPS with FD caching. 2) Latency: single table with 300k files. Running select count(*) on the table was executed in 2792.30s with FD caching and in 2764.81s without FD caching (based HEAD~1 commit). No overhead. 3) Memory consumption. For the above table the delta in RSS memory consumption after running the query is 30MB which equals roughly the expected 2-3kB per FD for 10k cached descriptors. Change-Id: Ifa6560d141188c329d7bc73c2dabcc1352d69cd7 Reviewed-on: http://gerrit.cloudera.org:8080/366 Tested-by: Internal Jenkins Reviewed-by: Martin Grund <mgrund@cloudera.com>	2015-05-30 17:16:22 +00:00
Feni Chawla	d4817b697f	IMPALA-1771: Add support for hyperbolic trigonometric functions sinh(), cosh(), tanh() and atan2() Change-Id: Iedd89629b36ec4f5ef270e5eff48371e075ad3ff Reviewed-on: http://gerrit.cloudera.org:8080/409 Tested-by: Internal Jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2015-05-27 15:54:55 +00:00

1 2 3 4 5 ...

459 Commits