impala

mirror of https://github.com/apache/impala.git synced 2026-01-09 15:00:11 -05:00

Author	SHA1	Message	Date
Nong Li	895d69c09f	IMPALA-1026: Fix decimal partition cols. Change-Id: I956b69a86528f1969febf356181dc3182f309909 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2841 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-06-06 09:26:56 -07:00
Matthew Jacobs	2f9b2ae785	Fix SHOW DATA SOURCE test; must execute setup/cleanup serially The SHOW DATA SOURCE tests were run as part of the other SHOW * tests in test_show(), but the setup/cleanup for data sources can't be run in parallel. This change moves the SHOW DATA SOURCE tests into a separate test method and the setup/cleanup code is only run for this test (i.e. not using setup_method() and teardown_method()). The test is then only executed serially. Change-Id: I221145f49cfe7290e132c6a87a5295b747c1fcc7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2864 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 5bcd769eae3a694d7f6f42d093f9197e8a4e8b77) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2870	2014-06-05 20:07:57 -07:00
Nong Li	b5c5c05bcb	Fix bad test. Needs to be overwrite to allow loading from snapshot. Change-Id: I7abe2a105d72662c874debfb2b9ae98647b03a1e Reviewed-on: http://gerrit.ent.cloudera.com:8080/2853 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-06-05 08:36:46 -07:00
Dimitris Tsirogiannis	0348a36b49	IMPALA-887: Improve partition pruning time (final) This commit contains the final set of changes for improving the performance of partition pruning. For each HdfsTable, we materialize a set of partition value metadata that allows the efficient evaluation of simple predicates on partition attributes without invoking the BE. These changes result in three orders of magnitude performance improvement during partition pruning. Change-Id: I5b405f0f45a470f2ba7b2191e0d46632c354d5ae Reviewed-on: http://gerrit.ent.cloudera.com:8080/2700 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2823	2014-06-03 23:17:44 -07:00
Nong Li	e6b7565eff	Fix decimal literal casting and cast expr reanalyze(). BigDecimal doesn't think about scale the way we need it to. Change-Id: I09612c31e30e80ce4806080f1d24c6615090785e Reviewed-on: http://gerrit.ent.cloudera.com:8080/2794 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-06-02 23:34:20 -07:00
Ippokratis Pandis	e34ede292c	IMPALA-1016: Return correct number of NULL values when projecting newly added column This patch handles the case where when a query was projecting a newly added column, the parquet scanner was returning infinite values. Change-Id: Ie5f4d4a88d5868e8d9e5c39fa9440821776dde3c Reviewed-on: http://gerrit.ent.cloudera.com:8080/2725 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2761 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>	2014-06-01 01:28:25 -07:00
Nong Li	8f4dc0f2f0	IMPALA-974: Switch from FloatLiteral to DecimalLiteral. Float/Doubles are lossy so using those as the default literal type is problematic. Change-Id: I5a619dd931d576e2e6cd7774139e9bafb9452db9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2758 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-05-31 22:19:06 -07:00
Nong Li	5d80942d42	[CDH5] IMPALA-1019: Fix cancellation path in io mgr for cached reads. Change-Id: I11efd65d1efa900f79afe88b781262a44ac5006a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2703 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-05-30 19:14:39 -07:00
Nong Li	6e691f9500	IMPALA-1010: Remove Close() of build side in blocking join node. This optimization is generally not safe since the probe side is still streaming. The join node could acquire all of the data from the child into its own pool but then there's no real point in doing this (doesn't lead to lower memory footprint and just makes the mem accounting harder to reason about). This is exposed in busy plans. Change-Id: I37b0f6507dc67c79e5ebe8b9242ec86f28ddad41 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2747 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-05-30 11:50:50 -07:00
Skye Wanderman-Milne	c8b2017093	Add decimal UDF/UDA support. Change-Id: Ie48c1cb8e978c7282593b7f602dd68added6d3fd Reviewed-on: http://gerrit.ent.cloudera.com:8080/2625 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit 5048f04b332c13b1bff32fb257272b0fea4b8584) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2739	2014-05-29 20:49:53 -07:00
Matthew Jacobs	12b72c4330	IMPALA-1011: Handle SHOW DATA SOURCES when no sources configured Change-Id: I367b90c7603aea973d442f9186a6b32598a66a28 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2716 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins (cherry picked from commit 4df5c6d741237e9c91e84e39fd6ea760ccb40cf5) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2723 Reviewed-by: Matthew Jacobs <mj@cloudera.com>	2014-05-28 20:38:41 -07:00
Lenni Kuff	745c091fcc	[CDH5] Update SHOW TABLE STATS to include per-partition HDFS caching stats Change-Id: I71b01f84bbd308108d775e78c644e867b48e05be Reviewed-on: http://gerrit.ent.cloudera.com:8080/2621 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-05-28 08:54:54 -07:00
Lenni Kuff	c45e9a70d9	[CDH5] Add DDL support for HDFS caching This change adds DDL support for HDFS caching. The DDL allows the user to indicate a table or partition should be cached and which pool to cache the data into: * Create a cached table: CREATE TABLE ... CACHED IN 'poolName' * Cache a table/partition: ALTER TABLE ... [partitionSpec] SET CACHED IN 'poolName' * Uncache a table/partition: ALTER TABLE ... [partitionSpec] SET UNCACHED When a table/partition is marked as cached, a new HDFS caching request is submitted to cache the location (HDFS path) of the table/partition and the ID of that request is stored with in the table metadata (in the table properties). This is stored as: 'cache_directive_id'='<requestId>'. The cache requests and IDs are managed by HDFS and persisted across HDFS restarts. When a cached table or partition is dropped it is important to uncache the cached data (drop the associated cache request). For partitioned tables, this means dropping all cache requests from all cached partitions in the table. Likewise, if a partitioned table is created as cached, new partitions should be marked as cached by default. It is desirable to know which cache pools exists early on (in analysis) so the query will fail without hitting HDFS/CatalogServer if a non-existent pool is specified. To support this, a new cache pool catalog object type was introduced. The catalog server caches the known pools (periodically refreshing the cache) and sends the known pools out in catalog updates. This allows impalads to perform analysis checks on cache pool existence going to HDFS. It would be easy to use this to add basic cache pool management in the future (ADD/DROP/SHOW CACHE POOL). Waiting for the table/partition to become cached may take a long time. Instead of blocking the user from access the time during this period we will wait for the cache requests to complete in the background and once they have finished the table metadata will be automatically refreshed. Change-Id: I1de9c6e25b2a3bdc09edebda5510206eda3dd89b Reviewed-on: http://gerrit.ent.cloudera.com:8080/2310 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-05-27 16:47:15 -07:00
ishaan	10952da6e0	Change the slf4j version to harmonize with the rest of CDH. All other CDH components use slf4j version 1.7.5; Impala's use of an earlier version causes a lot of benign warnings. This patch changes Impala's version to be the same as the rest of the stack. Change-Id: I297903d146c6b7642de5b6fa4eefa28a6a08fafe Reviewed-on: http://gerrit.ent.cloudera.com:8080/2541 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-05-27 13:46:17 -07:00
Dimitris Tsirogiannis	ca86e470de	IMPALA-887: Improve partition pruning time This commit is the first step in improving the performance of partition pruning. Currently, Impala can prune approximately 10K partitions per sec, thereby introducing significant overhead for huge table with a large number of partitions. With this commit we reduce that overhead by 3X by batching the partition pruning calls to the backend. Change-Id: I3303bfc7fb6fe014790f58a5263adeea94d0fe7d Reviewed-on: http://gerrit.ent.cloudera.com:8080/2608 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2687	2014-05-26 13:10:12 -07:00
Victor Bittorf	c13a1d080e	IMPALA-938: Fix implicit casting in timestamp arithmetic exprs. Change-Id: I7e875ec2251e9782c98b60195ecbc92258b63b5c Reviewed-on: http://gerrit.ent.cloudera.com:8080/2657 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins (cherry picked from commit 8822401dbb65d9b4d996d5bb78ac3aca1aa2dbac) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2671	2014-05-23 14:11:35 -07:00
Skye Wanderman-Milne	1dff1686aa	Add option to build UDF test libs in copy-udfs-udas.sh The option is off by default, but useful for running this script without building the world. Change-Id: I82d8251cf9bb2763ce69094da1995a4d6ceff167 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2647 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins (cherry picked from commit a7f77643820dcbfbab231a9260c94450564bd2df) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2659 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-05-22 18:01:55 -07:00
Nong Li	5729024fe9	IMPALA-984: Fix missing reanalyze in InlineViewRef and NULL handling. Change-Id: Ia80035c5456630aeef7a24288a998fe08546a282 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2652 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-05-21 18:18:29 -07:00
Henry Robinson	e87c0eb22a	[CDH5] Detect pseudo-distributed Llama cluster Since we're no longer using the MiniLlama, we need to explicitly set whether or not the cluster is pseudo-distributed. Impala needs this information to correctly translate datanode addresses to a format that Llama understands. This change (adapted from one made by Casey) adds a method to the frontend (callable via JNI) to get a configuration value from the Hadoop configuration. We'll set that configuration value for local RM testing. Change-Id: Ifd51db98a993ac0270dac2b832babbc394483c1a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2549 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-05-20 21:24:33 -07:00
Alex Behm	1b9a8020bf	IMPALA-996: Exclude non-materialized slots from a tuple's avgSerializedSize. Change-Id: Ic7936c6b5c5e6d4c162d91105128cda2b1b7284c Reviewed-on: http://gerrit.ent.cloudera.com:8080/2617 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2626	2014-05-20 16:21:59 -07:00
Alex Behm	b252921363	IMPALA-994: Handle incorrect column metadata in views created by Hive. Change-Id: I3fba08d191c479f37371ce50fd07b8476a73eba2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2613 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2618 Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-05-19 20:17:23 -07:00
Matthew Jacobs	f9c9a7ca13	Add SHOW DATA SOURCES Change-Id: Ieeb0df107f45a58b8a99f717e96453da93ee7270 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2529 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit b2392c5bfe9fc928ad19af6ff6737e6dc6324e63) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2614	2014-05-19 17:52:27 -07:00
Matthew Jacobs	6ccd56bc1f	Enforce slot equivalences at data source scan nodes Change-Id: I2ed606ba398990ab05afa3301b6356c6a636e2bb Reviewed-on: http://gerrit.ent.cloudera.com:8080/2521 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 55061f6953956f45d433fe227ded539a648e3f9c) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2536	2014-05-19 14:37:44 -07:00
Dimitris Tsirogiannis	a7a9cde86f	CDH-18969: Incorrect query result in Impala This commit fixes issue CDH-18969 where Impala returns wrong results when querying an HBase table. This issue is triggered when a column family sorts lexicographically before ":key", which is the column family of the row key, thereby causing the wrong column to be used as a row key by the backend. The following changes are included: 1. Modified the load function in HBaseTable.java to make sure the catalog object of an HBase table always stores the row key column first. Change-Id: Icd7ebc973d81672c04d5c7c8bbabd813338d5eac Reviewed-on: http://gerrit.ent.cloudera.com:8080/2513 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2602	2014-05-18 16:29:11 -07:00
Skye Wanderman-Milne	edbbe6035e	Decimal: read from Avro Allows reading decimal columns with or without codegen. Includes tests based on a data file posted on HIVE-5823. Change-Id: Ie541c6b98bd24543691850cb45a434af60b5a5a6 (cherry picked from commit 6983dcefdf70cce14724e17d03bc061ffb8f671c) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2596 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-05-16 22:26:11 -07:00
Alex Behm	fcf4e43a3c	IMPALA-962: Fully qualify table and view names in toSql(). Change-Id: I6bf757c4ffbaf82c136af7b59d2d415234545a86 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2373 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2589	2014-05-16 01:26:38 -07:00
Lenni Kuff	61cbdd4f49	[CDH5] Add Sentry Service to local test environment Adds the ability to start/stop the Sentry Service to our local test environment and load the sentry-site.xml configs. Since the existing Sentry startup scripts don't work I wrote a simple wrapper to handle service startup. Change-Id: I1b77a2e50e51e6e6eae58cfed4d5d7c403dbc0b4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2540 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-05-14 12:02:02 -07:00
Dimitris Tsirogiannis	2d7a8b7c70	IMPALA-964: Full outer join on values() followed by group by hits a preconditions check This commit fixes IMPALA-964 where full outer join between two inline views followed by a group by (e.g. select 1 FROM (VALUES(1 x, 1 y)) a FULL OUTER JOIN (VALUES(1 x, 1 y)) b ON (a.x = b.y) GROUP BY a.x;) hits a preconditions check. This check evaluates if the numNodes (number of nodes for the purpose of resource estimation) variable is greater or equal to zero and is triggered when we try to compute the resource estimates (number of distinct values) of a plan fragment. The following changes are included in this commit: 1. Modified the getNumDistinctValues function in PlanFragment class to consider the special case where the numNodes of a plan fragment is -1. 2. Added a test case in QueryTest/joins.test. Change-Id: I2962ed5079e174d0e76ad990ab84e1fb1a4607ef Reviewed-on: http://gerrit.ent.cloudera.com:8080/2466 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2514 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>	2014-05-11 19:30:38 -07:00
Victor Bittorf	0bb66ef327	Adding aliases ADD_MONTHS and SUB_MONTHS This is a request for consistency with oracle. Change-Id: I463a66694a068cd773532d8f6f853a4b089b918a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2400 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins (cherry picked from commit 1f0b643789596f96c54580b8c5262fada4dfc958) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2502	2014-05-09 17:35:29 -07:00
Matthew Jacobs	0c533bb152	External Data Source: Backend changes Change-Id: Ifa62b4ea231da47facb31c3f8d43e5e3ac73591f Reviewed-on: http://gerrit.ent.cloudera.com:8080/2284 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins (cherry picked from commit f1e5db2853135c4346788192e2dbc632d4fe1dfb) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2497 Reviewed-by: Matthew Jacobs <mj@cloudera.com>	2014-05-09 02:24:41 -07:00
Matthew Jacobs	ebc6c5894e	External Data Source: Frontend and catalog changes Initial frontend and catalog changes for external data sources. Change-Id: Ia0e61ef97cfd7a4e138ef555c17f2e45bbf08c18 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2224 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit dfa14c828957f751db9c89bae0bdc040ce6f648c) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2485	2014-05-08 14:56:19 -07:00
Dimitris Tsirogiannis	1a21bb9b9e	IMPALA-642: Conjunctive predicates on HBase table not working... This commit fixes IMPALA-642 issue where conjunctive predicates are returning incorrect results from HBase in the presence of NULL values. The following changes are included: 1. Modified the HBaseScanNode to re-apply the "pushed-down" predicates. 2. Added tests in QueryTest/hbase-filters.test 3. Added tests in PlannerTest/hbase.test Change-Id: I598b325ad63b043b325fba74448698ed71a3cd78 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2414 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2489 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>	2014-05-08 13:59:00 -07:00
Henry Robinson	38befd2126	IMPALA-724: Support infinite / nan values in text files This patch allows the text scanner to read 'inf' or 'Infinity' from a row and correctly translate it into floating-point infinity. It also adds is_inf() and is_nan() builtins. Finally, we change the text table writer to write Infinity and NaN for compatibility with Hive. In the future, we might consider adding nan / inf literals to our grammar (postgres has this, see: http://www.postgresql.org/docs/9.3/static/datatype-numeric.html). Change-Id: I796f2852b3c6c3b72e9aae9dd5ad228d188a6ea3 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2393 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins (cherry picked from commit 58091355142cadd2b74874d9aa7c8ab6bf3efe2f) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2483	2014-05-08 12:28:53 -07:00
ishaan	50caed17d7	[CDH5] Fix the format option in run-all Previously, the -format option was a no-op. Moreoever, run-all would not work without the option. This patch fixes both problems. Change-Id: I4726c03452409322fd0cd864cdb6dd395c4e651a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2449 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-05-07 22:56:38 -07:00
Lenni Kuff	13c794db91	[CDH5] Update dependency versions to CDH5.1.0 This just updates the versions, it doesn't touch anything in /thirdparty. Change parquet version to append SNAPSHOT Added hadoop-hbase-compat jar in AUX_CLASSPATH and mapreduce/*.jar to HDFS Change-Id: I4471ef4476997371cf49a9d54cfa63f2fda126e4	2014-05-07 15:10:40 -07:00
Victor Bittorf	6f31dc7f8a	Adding STDDEV builtin. Change-Id: I79e5aee1e9e879aa2d09078ab45bc149675e1d4a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2341 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins (cherry picked from commit a42c375d933c0b7ffe7c9b6702777679492d7ad6) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2464	2014-05-06 13:06:26 -07:00
Henry Robinson	f968bb6087	IMPALA-923: Boolean slotrefs not marked as assigned in inline views A boolean slotref predicate that could be pushed into an inline view would not be correctly marked as assigned, leading to an extra select node being introduced to evaluate it. This was because the id of the expression after substitution would change (see createInlineViewPlan()), but only the post-substitution conjunct IDs were marked as assigned. This bug only affected standalone slotrefs; other exprs (like casts, or explicit predicates referencing a slotref) would not change their ID under substitution. Change-Id: I4127528b4aec25c966a4d186ddc98a68502b90c1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2430 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins (cherry picked from commit b49bfdf57769615d43d86fcfce2269531640788a) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2435	2014-05-02 18:45:21 -07:00
Nong Li	03e5665e56	Decimal: Read/Write to parquet. This adds support for the FIXED_LENGTH_BYTE_ARRAY parquet type and encoding for decimals. Change-Id: I9d5780feb4530989b568ec8d168cbdc32b7039bd Reviewed-on: http://gerrit.ent.cloudera.com:8080/1727 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2432	2014-05-02 16:38:35 -07:00
ishaan	0fa87cba54	Reduce mini dfs logging verbosity. Currently, the default log level is set to DEBUG. This produces approximately 10-20 GB of logs per build, which is unacceptable. Change-Id: Ibbb48876fc72faa23d76f32166f31f0257a7a3a0 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2386 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2387	2014-04-28 23:42:48 -07:00
Skye Wanderman-Milne	60db4d4d82	CDH-18416: Don't inline ReadWriteUtil::ReadZLong() For wide Avro tables, ReadZLong() would get inlined many times into a single function body, causing LLVM to crash. Not inlining doesn't seem to have a performance impact on narrow tables, and helps with wide tables. This change also adds tests over wide (i.e. many-column) tables. The test tables are produced by specifying shell commands to generate test tables in functional_schema_template.sql, which are executed in generate-schema-statements.py. In the SQL templates, sections starting with a ` are treated as shell commands. The output of the shell command is then used as the section text. This is only a starting point; it isn't currently implemented for all sections, and may have to be tweaked if we use this mechanism for all tables. Change-Id: Ife0d857d19b21534167a34c8bc06bc70bef34910 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2206 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com> (cherry picked from commit 1c5951e3cce25a048208ab9bb3a3aed95e41cf67) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2353 Tested-by: jenkins	2014-04-28 15:58:15 -07:00
Victor Bittorf	808f9a661a	IMPALA-939: Regex should match anywhere in string. Change-Id: I8dcd337c3b06b632017270670a4f199ec7ada648 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2296 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins (cherry picked from commit c97f82eaaf0efe9bd4c3da3d005464f425696a62) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2371	2014-04-25 16:16:15 -07:00
Victor Bittorf	46151dc7dd	Adding EXTRACT builtin. Change-Id: I6de20f336ecdfa3acd8d3a9166cff4a062baaacc Reviewed-on: http://gerrit.ent.cloudera.com:8080/2247 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins (cherry picked from commit f233955020ffbd1023f2d6adbbfb22e267986305) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2370	2014-04-25 15:38:51 -07:00
Alex Behm	91e1eb0789	CDH-18563: Speed up the computation of transitive value transfers. The issue: Computing the full transitive closure for all slots can be very expensive (10s of seconds for >2k slots, minutes for >4k slots). Queries with many views and/or unions were affected most because each union/view adds a new tuple with slots, increasing the total number of slots. The fix: The new algorithm exploits the sparse structure of the value transfer graph for a significant speedup (>100x). The high-level steps are: 1. Identify complete subgraps based on bi-directional value transfers, and coalesce the slots of each complete subgraph into a single slot. 2. Map the remaining uni-directional value transfers into the new slot domain. 3. Identify the connected components of the uni-directional value transfers. This step partitions the value transfers into disjoint sets. 4. Compute the transitive closure of each partition from (3) in the new slot domain separately. Hopefully, the partitions are small enough to afford the O(N^3) complexity of the brute-force transitive closure computation. Change-Id: I35b57295d8f04b92f00ac48c04d1ef1be4daf41b Reviewed-on: http://gerrit.ent.cloudera.com:8080/2360 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-04-24 23:53:28 -07:00
ishaan	405a6fbba3	[CDH5] Change the hdfs-site template to work for CDH5 The hdfs-site template in CDH5 is different from the one we fine in CDH5. Specifically: - It has entries that enable hdfs caching. - It uses the correct parameter name for hdfs block locations timeout. Change-Id: I0ca6bd84b074ccbb8f42243d37c5082b305f9bcf Reviewed-on: http://gerrit.ent.cloudera.com:8080/2338 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-04-24 11:36:56 -07:00
Alex Behm	121fab8fdf	IMPALA-888: Drop union operands with constant conjuncts evaluating to false. This patch simplifies the complex slot materialization logic for unions by making the materialization independent of conjuncts assigned to MergeNodes. When 'pushing down' predicates into union operands, we drop union operands with constant predicates evaluating to false. Constant predicates that evaluate to true are simply ignored. Change-Id: I0e7ccfb206bed29db2b5d667e2bb61310980e80a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2327 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-04-23 18:25:14 -07:00
casey	2351266d0e	Replace single process mini-dfs with multiple processes This should allow individual service components, such as a single nodemanager, to be shutdown for failure testing. The mini-cluster bundled with hadoop is a single process that does not expose the ability to control individual roles. Now each role can be controlled and configured independently of the others. Change-Id: Ic1d42e024226c6867e79916464d184fce886d783 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1432 Tested-by: Casey Ching <casey@cloudera.com> Reviewed-by: Casey Ching <casey@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/2297 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-04-23 18:24:05 -07:00
Alex Behm	c8e928119d	IMPALA-912: Enforce slot equivalences at the lowest possible plan node. The reported issue is that we can have redundant hash expressions in exchanges. The underlying cause is that we fail to remove redundant join predicates. This patch enforces slot equivalences based on our computed equivalence classes at the lowest possible plan node by generating new equality predicates. Each plan subtree now has a minimal set of equality predicates that express all known equivalences between slots belonging to tuples materialized at that plan node. As a result, eliminating redundant join predicates becomes trivial: It is sufficient to pick a single representative predicate of each relevant equivalence class. All predicates beyond that are redundant. Change-Id: I7998fe8d7bdf84cc8eb129d32c86269bedeab68e Reviewed-on: http://gerrit.ent.cloudera.com:8080/2177 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2278	2014-04-18 13:28:49 -07:00
Lenni Kuff	15327e8136	Migrate DataErrors tests to Python test framework, re-enable subset of tests This re-enables a subset of the stable data errors tests and updates them to work in our test framework. This includes support for updating results via --update_results. This also lets us remove a lot of old code that was there only to support these disabled tests. Change-Id: I4c40c3976d00dfc710d59f3f96c99c1ed33e7e9b Reviewed-on: http://gerrit.ent.cloudera.com:8080/1952 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2277	2014-04-18 02:25:11 -07:00
Henry Robinson	2a69019525	IMPALA-945: Fix column reordering with SELECT expressions Previously, to produce the correct output expressions for the root plan fragment before a table sink, InsertStmt would reorder the result expressions for the query statement at the plan root. This had stopped working for SelectStmts (and test coverage didn't catch that). Now InsertStmt produces its own output expressions that can substitute for the originals from the query statement, and the planner uses those instead. All query tests for column reordering have been duplicated to use SELECT expressions. Change-Id: Ib909fe35d27416b33ba2e5ac797aa931e1fe43f9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2204 Tested-by: jenkins Reviewed-by: Henry Robinson <henry@cloudera.com> (cherry picked from commit d526db7ac6274f35b6affcb7428327100026e14e) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2275	2014-04-18 00:12:12 -07:00
Nong Li	1cab95066d	Add the return type as a column for SHOW FUNCTIONS. Also includes some misc pattern matching cleanup. Change-Id: I6c9ec78b094a73864b4d669afbd75a48c9bf9585 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2199 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/2271	2014-04-17 17:58:13 -07:00

1 2 3 4 5 ...

559 Commits