impala

mirror of https://github.com/apache/impala.git synced 2026-01-07 09:02:19 -05:00

Author	SHA1	Message	Date
Victor Bittorf	0bb66ef327	Adding aliases ADD_MONTHS and SUB_MONTHS This is a request for consistency with oracle. Change-Id: I463a66694a068cd773532d8f6f853a4b089b918a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2400 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins (cherry picked from commit 1f0b643789596f96c54580b8c5262fada4dfc958) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2502	2014-05-09 17:35:29 -07:00
Matthew Jacobs	0c533bb152	External Data Source: Backend changes Change-Id: Ifa62b4ea231da47facb31c3f8d43e5e3ac73591f Reviewed-on: http://gerrit.ent.cloudera.com:8080/2284 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins (cherry picked from commit f1e5db2853135c4346788192e2dbc632d4fe1dfb) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2497 Reviewed-by: Matthew Jacobs <mj@cloudera.com>	2014-05-09 02:24:41 -07:00
Matthew Jacobs	ebc6c5894e	External Data Source: Frontend and catalog changes Initial frontend and catalog changes for external data sources. Change-Id: Ia0e61ef97cfd7a4e138ef555c17f2e45bbf08c18 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2224 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit dfa14c828957f751db9c89bae0bdc040ce6f648c) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2485	2014-05-08 14:56:19 -07:00
Dimitris Tsirogiannis	1a21bb9b9e	IMPALA-642: Conjunctive predicates on HBase table not working... This commit fixes IMPALA-642 issue where conjunctive predicates are returning incorrect results from HBase in the presence of NULL values. The following changes are included: 1. Modified the HBaseScanNode to re-apply the "pushed-down" predicates. 2. Added tests in QueryTest/hbase-filters.test 3. Added tests in PlannerTest/hbase.test Change-Id: I598b325ad63b043b325fba74448698ed71a3cd78 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2414 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2489 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>	2014-05-08 13:59:00 -07:00
Henry Robinson	38befd2126	IMPALA-724: Support infinite / nan values in text files This patch allows the text scanner to read 'inf' or 'Infinity' from a row and correctly translate it into floating-point infinity. It also adds is_inf() and is_nan() builtins. Finally, we change the text table writer to write Infinity and NaN for compatibility with Hive. In the future, we might consider adding nan / inf literals to our grammar (postgres has this, see: http://www.postgresql.org/docs/9.3/static/datatype-numeric.html). Change-Id: I796f2852b3c6c3b72e9aae9dd5ad228d188a6ea3 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2393 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins (cherry picked from commit 58091355142cadd2b74874d9aa7c8ab6bf3efe2f) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2483	2014-05-08 12:28:53 -07:00
ishaan	50caed17d7	[CDH5] Fix the format option in run-all Previously, the -format option was a no-op. Moreoever, run-all would not work without the option. This patch fixes both problems. Change-Id: I4726c03452409322fd0cd864cdb6dd395c4e651a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2449 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-05-07 22:56:38 -07:00
Lenni Kuff	13c794db91	[CDH5] Update dependency versions to CDH5.1.0 This just updates the versions, it doesn't touch anything in /thirdparty. Change parquet version to append SNAPSHOT Added hadoop-hbase-compat jar in AUX_CLASSPATH and mapreduce/*.jar to HDFS Change-Id: I4471ef4476997371cf49a9d54cfa63f2fda126e4	2014-05-07 15:10:40 -07:00
Victor Bittorf	6f31dc7f8a	Adding STDDEV builtin. Change-Id: I79e5aee1e9e879aa2d09078ab45bc149675e1d4a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2341 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins (cherry picked from commit a42c375d933c0b7ffe7c9b6702777679492d7ad6) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2464	2014-05-06 13:06:26 -07:00
Henry Robinson	f968bb6087	IMPALA-923: Boolean slotrefs not marked as assigned in inline views A boolean slotref predicate that could be pushed into an inline view would not be correctly marked as assigned, leading to an extra select node being introduced to evaluate it. This was because the id of the expression after substitution would change (see createInlineViewPlan()), but only the post-substitution conjunct IDs were marked as assigned. This bug only affected standalone slotrefs; other exprs (like casts, or explicit predicates referencing a slotref) would not change their ID under substitution. Change-Id: I4127528b4aec25c966a4d186ddc98a68502b90c1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2430 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins (cherry picked from commit b49bfdf57769615d43d86fcfce2269531640788a) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2435	2014-05-02 18:45:21 -07:00
Nong Li	03e5665e56	Decimal: Read/Write to parquet. This adds support for the FIXED_LENGTH_BYTE_ARRAY parquet type and encoding for decimals. Change-Id: I9d5780feb4530989b568ec8d168cbdc32b7039bd Reviewed-on: http://gerrit.ent.cloudera.com:8080/1727 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2432	2014-05-02 16:38:35 -07:00
ishaan	0fa87cba54	Reduce mini dfs logging verbosity. Currently, the default log level is set to DEBUG. This produces approximately 10-20 GB of logs per build, which is unacceptable. Change-Id: Ibbb48876fc72faa23d76f32166f31f0257a7a3a0 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2386 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2387	2014-04-28 23:42:48 -07:00
Skye Wanderman-Milne	60db4d4d82	CDH-18416: Don't inline ReadWriteUtil::ReadZLong() For wide Avro tables, ReadZLong() would get inlined many times into a single function body, causing LLVM to crash. Not inlining doesn't seem to have a performance impact on narrow tables, and helps with wide tables. This change also adds tests over wide (i.e. many-column) tables. The test tables are produced by specifying shell commands to generate test tables in functional_schema_template.sql, which are executed in generate-schema-statements.py. In the SQL templates, sections starting with a ` are treated as shell commands. The output of the shell command is then used as the section text. This is only a starting point; it isn't currently implemented for all sections, and may have to be tweaked if we use this mechanism for all tables. Change-Id: Ife0d857d19b21534167a34c8bc06bc70bef34910 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2206 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com> (cherry picked from commit 1c5951e3cce25a048208ab9bb3a3aed95e41cf67) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2353 Tested-by: jenkins	2014-04-28 15:58:15 -07:00
Victor Bittorf	808f9a661a	IMPALA-939: Regex should match anywhere in string. Change-Id: I8dcd337c3b06b632017270670a4f199ec7ada648 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2296 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins (cherry picked from commit c97f82eaaf0efe9bd4c3da3d005464f425696a62) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2371	2014-04-25 16:16:15 -07:00
Victor Bittorf	46151dc7dd	Adding EXTRACT builtin. Change-Id: I6de20f336ecdfa3acd8d3a9166cff4a062baaacc Reviewed-on: http://gerrit.ent.cloudera.com:8080/2247 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins (cherry picked from commit f233955020ffbd1023f2d6adbbfb22e267986305) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2370	2014-04-25 15:38:51 -07:00
Alex Behm	91e1eb0789	CDH-18563: Speed up the computation of transitive value transfers. The issue: Computing the full transitive closure for all slots can be very expensive (10s of seconds for >2k slots, minutes for >4k slots). Queries with many views and/or unions were affected most because each union/view adds a new tuple with slots, increasing the total number of slots. The fix: The new algorithm exploits the sparse structure of the value transfer graph for a significant speedup (>100x). The high-level steps are: 1. Identify complete subgraps based on bi-directional value transfers, and coalesce the slots of each complete subgraph into a single slot. 2. Map the remaining uni-directional value transfers into the new slot domain. 3. Identify the connected components of the uni-directional value transfers. This step partitions the value transfers into disjoint sets. 4. Compute the transitive closure of each partition from (3) in the new slot domain separately. Hopefully, the partitions are small enough to afford the O(N^3) complexity of the brute-force transitive closure computation. Change-Id: I35b57295d8f04b92f00ac48c04d1ef1be4daf41b Reviewed-on: http://gerrit.ent.cloudera.com:8080/2360 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-04-24 23:53:28 -07:00
ishaan	405a6fbba3	[CDH5] Change the hdfs-site template to work for CDH5 The hdfs-site template in CDH5 is different from the one we fine in CDH5. Specifically: - It has entries that enable hdfs caching. - It uses the correct parameter name for hdfs block locations timeout. Change-Id: I0ca6bd84b074ccbb8f42243d37c5082b305f9bcf Reviewed-on: http://gerrit.ent.cloudera.com:8080/2338 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-04-24 11:36:56 -07:00
Alex Behm	121fab8fdf	IMPALA-888: Drop union operands with constant conjuncts evaluating to false. This patch simplifies the complex slot materialization logic for unions by making the materialization independent of conjuncts assigned to MergeNodes. When 'pushing down' predicates into union operands, we drop union operands with constant predicates evaluating to false. Constant predicates that evaluate to true are simply ignored. Change-Id: I0e7ccfb206bed29db2b5d667e2bb61310980e80a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2327 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-04-23 18:25:14 -07:00
casey	2351266d0e	Replace single process mini-dfs with multiple processes This should allow individual service components, such as a single nodemanager, to be shutdown for failure testing. The mini-cluster bundled with hadoop is a single process that does not expose the ability to control individual roles. Now each role can be controlled and configured independently of the others. Change-Id: Ic1d42e024226c6867e79916464d184fce886d783 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1432 Tested-by: Casey Ching <casey@cloudera.com> Reviewed-by: Casey Ching <casey@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/2297 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-04-23 18:24:05 -07:00
Alex Behm	c8e928119d	IMPALA-912: Enforce slot equivalences at the lowest possible plan node. The reported issue is that we can have redundant hash expressions in exchanges. The underlying cause is that we fail to remove redundant join predicates. This patch enforces slot equivalences based on our computed equivalence classes at the lowest possible plan node by generating new equality predicates. Each plan subtree now has a minimal set of equality predicates that express all known equivalences between slots belonging to tuples materialized at that plan node. As a result, eliminating redundant join predicates becomes trivial: It is sufficient to pick a single representative predicate of each relevant equivalence class. All predicates beyond that are redundant. Change-Id: I7998fe8d7bdf84cc8eb129d32c86269bedeab68e Reviewed-on: http://gerrit.ent.cloudera.com:8080/2177 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2278	2014-04-18 13:28:49 -07:00
Lenni Kuff	15327e8136	Migrate DataErrors tests to Python test framework, re-enable subset of tests This re-enables a subset of the stable data errors tests and updates them to work in our test framework. This includes support for updating results via --update_results. This also lets us remove a lot of old code that was there only to support these disabled tests. Change-Id: I4c40c3976d00dfc710d59f3f96c99c1ed33e7e9b Reviewed-on: http://gerrit.ent.cloudera.com:8080/1952 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2277	2014-04-18 02:25:11 -07:00
Henry Robinson	2a69019525	IMPALA-945: Fix column reordering with SELECT expressions Previously, to produce the correct output expressions for the root plan fragment before a table sink, InsertStmt would reorder the result expressions for the query statement at the plan root. This had stopped working for SelectStmts (and test coverage didn't catch that). Now InsertStmt produces its own output expressions that can substitute for the originals from the query statement, and the planner uses those instead. All query tests for column reordering have been duplicated to use SELECT expressions. Change-Id: Ib909fe35d27416b33ba2e5ac797aa931e1fe43f9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2204 Tested-by: jenkins Reviewed-by: Henry Robinson <henry@cloudera.com> (cherry picked from commit d526db7ac6274f35b6affcb7428327100026e14e) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2275	2014-04-18 00:12:12 -07:00
Nong Li	1cab95066d	Add the return type as a column for SHOW FUNCTIONS. Also includes some misc pattern matching cleanup. Change-Id: I6c9ec78b094a73864b4d669afbd75a48c9bf9585 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2199 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/2271	2014-04-17 17:58:13 -07:00
Nong Li	87295a4e06	Decimal implementation. This patch implements decimal support for text based formats. Change-Id: I8e2c9e512ed149fe965216a72cb21fffd4f18e75 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1669 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/2238 Tested-by: jenkins	2014-04-14 21:07:32 -07:00
Skye Wanderman-Milne	e60bf29a96	IMPALA-13: Use SSE string functions that take an explicit length This patch modifies DelimitedTextParser and StringValue to work with data containing null characters by using SSE instructions that take a length, rather than expecting null-terminated strings. It also adds some other minor changes to correctly handle data with nulls and to faciliate testing. I checked the execution time of a count() and a select() limit 1 query locally, and saw no difference for either text or sequence files. Change-Id: Ia920b35bea7048aa286f39ec83e313c2a39251d1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2110 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/2181	2014-04-11 11:16:24 -07:00
Henry Robinson	37236845b1	Mark test_non_codegen_tinyint_grouping as execute_serially The test contains an INSERT and some DDL, which is racy if performed in parallel. Change-Id: I2b88533f45756fcf6372d6ee4eb7edd474087048 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2167 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Henry Robinson <henry@cloudera.com> (cherry picked from commit 8b103c029cc341bacea4746c369bb58e6af5ed29) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2182 Tested-by: jenkins	2014-04-10 15:17:25 -07:00
Lenni Kuff	9e2dd7e049	Add support for SHOW PARTITIONS <table name> This statement returns info on all partitions for the given table. It is implemented as an alias for SHOW TABLE STATS, with some extended analysis checks (such as throwing if the statement targets an unpartitioned table). Change-Id: I19154a9d90314de18f86ba355aa5dbed808f147f Reviewed-on: http://gerrit.ent.cloudera.com:8080/2145 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/2179 Tested-by: jenkins	2014-04-10 12:15:39 -07:00
Alex Behm	0585dfb546	IMPALA-888: Materialize union slots referenced by constant predicates. To keep the predicate assignment/propagation logic simple, we assign conjuncts whose underlying base table exprs are constant in at least one union operand to the evaluating MergeNode, and not in the operand(s) whose corresponding base table exprs are constant. The JIRA describes two different bugs: The first bug was that the slots required for evaluating such predicates in the MergeNode were not marked as materialized. The second bug was that predicates 'pushed' into union operands did not get re-analyzed after substituting the predicate's exprs with the result exprs of that union operand. Missing casts lead to a crash. The new test covers both bugs. Change-Id: I0f5b8a366b32f7d4b2587e13793b6103cdf7e8b3 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2162 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-04-07 18:32:29 -07:00
Henry Robinson	415540d789	IMPALA-901: Fix grouping with NULLs when codegen is disabled The standard implementation of HashTable::Equals() did not correctly check the NULL bit when the argument row did not evaluate to NULL for a given probe expr. In the rare circumstance that this gave rise to a false positive (more on that below), two rows with different grouping values would be considered equal, and one would be excluded from the final aggregation output. HashTable::EvalRow() fills an expression value buffer with the values of either probe or build exprs evaluated for the argument row. These cached values are used to determine row equality in Equals(). In order to avoid a lot of false collisions, an 'unlikely' value is written to that buffer for NULL values, chosen to be HashUtil::FNV_SEED. So without correct NULL-bit checking in Equals(), two single-slot rows are considered to be equal if one of them has NULL for its slot, and the other has a value equal to HashUtil::FNV_SEED truncated to the size of the slot. For tinyint columns, this value is -59. As it happens, our random generator happened to create a table with one tinyint column and which contained NULL and -59 as values. In order to trigger this bug, the rows must also have been written to disk in order such that the scanners returned -59 first, and then NULL to the aggregation node; the bug is not symmetric and works in the opposite case. Change-Id: I17d43eaeee62b2ac01b67dd599bc4346b012a074 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2130 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins (cherry picked from commit 6e8098254280a9d5ead0b607263ca6728a3222a7) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2161 Reviewed-by: Henry Robinson <henry@cloudera.com>	2014-04-07 17:30:52 -07:00
Alex Behm	8b319f8959	IMPALA-935: Make PlanFragment.getDestFragment() return null if no destination is set. Change-Id: I269a7f552d7ff67ff4d65e86e8c6df9c41d0fca1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2159 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-04-07 16:21:24 -07:00
Alex Behm	a85dacafe8	IMPALA-904: Make TupleIsNullPredicate work on non-nullable tuples. We wrap certain exprs substituted from outer-joined inline view in an expr that evaluates to NULL if the underling tuple(s) are NULL. We do this for exprs that evaluate to non-NULL values if their slots are NULL, i.e., we must then distinguish tuples that are NULL from slots that are NULL (otherwise evaluating an expr against a tuple that is NULL due to the outer join may incorrectly return a non-NULL value.) The bug: Exprs referring to an outer-joined inline view may appear in various places in the outer query block. For example, they could appear in an On-clause or be placed into scans/aggregates due to predicate propagation. In such cases, the underlying tuples may not be nullable yet because they only become nullable after the outer join. We had a DCHECK in tuple-is-null-predicate.cc requiring the tuples to be nullable. The fix: Remove the DCHECK. The fix is not elegant but practical. It would be rather difficult to fix the inline view expr substitution such that a TupleIsNullPredicate never references a non-nullable tuple, esp. due to predicate propagation. Change-Id: I180f75f14173f356abfeec751e6b2d419378a9a7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2157 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-04-07 14:18:49 -07:00
Nong Li	c27bd34075	Revert "Disable decimal in analysis." This reverts commit 695017410adf6d4f8426c4117798c93f823a4b4b. Change-Id: I919d965e8e711d588e6c56dcdbd3c8e0d9ec7a05 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2104 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-03-27 12:45:55 -07:00
ishaan	734e720297	Fix the tpcds count queries test. Because of a malformed .test file, TPCDS-COUNT-PROMOTION was never run because of a missing section delimiter. This patch fixes the .test file and adds the delimiter. Change-Id: Ifd0fa5db1c2bb84815fc66e981e6a989e6c217e4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2017 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2080	2014-03-25 22:26:42 -07:00
Nong Li	b0de4bbe40	IMPALA-812: Fix select node to properly transfer memory ownership. Change-Id: I83b6d085362726aa080077845d3bef71b184621c Reviewed-on: http://gerrit.ent.cloudera.com:8080/2076 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-03-25 18:38:55 -07:00
Skye Wanderman-Milne	3e728f3180	Symbol mangling for UDF prepare/close functions Change-Id: If8f1386073f467e66ada74e606fc98f3344f0733 (cherry picked from commit 32df8b3f963a2b46ec33aad86a151d4c7ecda39c) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1993 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-03-19 02:15:07 -07:00
Nong Li	457055f8f4	IMPALA-892: Fix subexpr for IR generated from compound predicate. Change-Id: I638533827e97f3486eb75a571b18f9e8d1cd4aed Reviewed-on: http://gerrit.ent.cloudera.com:8080/1973 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-03-18 16:49:34 -07:00
Skye Wanderman-Milne	44125729dc	UDF/UDA memory management improvements * AggFnEvaluator now uses the UDF mem pool (I'm planning to change this to per-exec node pools in the expr refactoring) * FunctionContext::TrackAllocation()/Free() actually use the UDF's mem tracker * Added FunctionContextImpl::Close() which sets warnings for leaked allocations Change-Id: I792ffd49102a92b57e34df18d8ff5f5d0fd27370 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1792 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com> (cherry picked from commit 41a5f7cfa718789fa3b2de3a31f085411fb5000c) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1954 Tested-by: jenkins	2014-03-17 20:38:25 -07:00
Henry Robinson	635dd7d289	IMPALA-875: Respect isAnalyzed_ in IntLiteral expressions Partition column expressions are analysed twice for INSERT statements - once to infer the type and so to add a possible cast, and once to compute stats on the resulting expr. However, this process resulted in an partition column expr that was a IntLiteral getting the smallest type that would contains its value, rather than retaining the column-compatible type that had been assigned to it. This patch does the minimum thing, which is make IntLiteral.analyze() idempotent. Doing the same thing to Expr and LiteralExpr unearths some other bugs, which we will have to fix in a follow-on patch (see IMPALA-884). Change-Id: Ie22fc5d3f4832c735a1ebc0ef78f50d736f597fd Reviewed-on: http://gerrit.ent.cloudera.com:8080/1931 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins (cherry picked from commit 1912d65ea21a5025d385948642f0d4aadad91abf) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1947	2014-03-17 17:35:12 -07:00
Lenni Kuff	dd20958e5d	Minor test cleanup * Prefer 'refresh <table name>' over 'invalidate metadata' * Remove the 'RELOAD' test setup option that was used by only 1 test. * Delete a .py test file that seems to be a duplicate Change-Id: I890546635840bb8f4d55789a89f8c8f33e40d001 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1933 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1946 Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-03-17 17:30:15 -07:00
Alex Behm	f4cfc75544	IMPALA-860: Place outer and semi joins at a fixed position in the plan. The bug: It is generally incorrect to re-order joins across outer/semi joins. For example, an inner join following an outer join may reduce the cardinality, so placing the inner join before the outer join during join re-ordering would be incorrect because the outer join is cardinality preserving (on one or both sides). The same argument holds for semi joins. The fix: Place outer and semi joins at a fixed position in the plan based on where they appeared in the original query. Inner joins to the left/right of outer/semi joins are still re-ordered properly. Change-Id: Idae837097b9376473d7f8124eef69b51f612b210 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1909 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1922	2014-03-15 05:18:11 -07:00
Srinath Shankar	74a975c45b	IMPALA-862: count(x) may return null when a similar count(distinct x) is also used count(x) with no distinct and no group-by expressions returns NULL on empty input if other distinct aggs (e.g. COUNT(distinct x) are present. This happens because the COUNT is transformed to SUM(COUNT()), with the inner COUNT being evaluated WITH a group-by expression (e.g. x). SUM over empty input returns NULL, but COUNT should return 0. This patch fixes this by replacing COUNT with zeroifnull(COUNT) before AggregateInfo is generated if there are distinct aggs and no group-bys. The logic in AggregateInfo itself has not been modified. Change-Id: I902e3fdd95767135b2f3fe423e8802ef57366af1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1921 Reviewed-by: Srinath Shankar <sshankar@cloudera.com> Tested-by: jenkins	2014-03-14 23:35:55 -07:00
Alex Behm	ce40134ad0	IMPALA-867: Fail COMPUTE STATS in analysis for Avro tables affected by HIVE-6308. Avro tables that were not created with a column-definition list do not have their columns properly populated in the Metastore backend DB (HIVE-6308). For such tables COMPUTE STATS and Hive's ANALYZE TABLE cannot succeed. This patch fails COMPUTE STATS in analysis for such broken Avro tables and adds tests for Avro tables with mismatched a column-definition list and Avro schema. Change-Id: I561ecea944ae2f83d69950b7a1ab9edaa89bdcea Reviewed-on: http://gerrit.ent.cloudera.com:8080/1892 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1920	2014-03-14 23:24:55 -07:00
Lenni Kuff	aa0b7a35f5	IMPALA-880: COMPUTE STATS should update partitions in batches When updating partition metadata as part of COMPUTE STATS we would previously attempt to update all partitions at once. This could lead to HMS socket timeouts and also could run into issues if there were > 32K partitions. In this change we now update the partitions in batches, with a max size of 500 partitions per batch. We also compare whether the row count has changed and only update partitions that have been modified. Change-Id: If7bfcc30f86fc2fdd79855b981067ac29a47b5e1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1913 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1918	2014-03-14 19:20:12 -07:00
Alex Behm	15e05082c0	IMPALA-831: Distributed aggregation and top-n over unions. Change-Id: I056e8271421008378db93e8b2393861cc9dd4b90 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1840 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1886	2014-03-13 15:42:31 -07:00
Lenni Kuff	cc1c0c61fd	IMP-1291: Support "extended" ASCII characters as delimiters in text files This fixes how we validate delimiters to be in line with Hive. A delimiter must fit in a single byte and can be specified in the following formats, as far as I can tell (there isn't documentation): - A single ASCII or unicode character (ex. '\|') - An escape character in octal format (ex. \001. Stored in the metastore as a unicode character: \u0001). - A signed decimal integer in the range [-128:127]. Used to support delimiters for ASCII character values between 128-255 (-2 maps to ASCII 254). Previously, we were not handling the "signed integer" case so there was no way to specify a delimiter in the "extended" ASCII range of 128-255. To support result validation, the test infrastructure had to be updated to support reading/writing different character encodings. Change-Id: Ie3c4d444dc9c6e60192093ed0c0f6f151eab16bc Reviewed-on: http://gerrit.ent.cloudera.com:8080/1848 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1888	2014-03-13 13:00:15 -07:00
Alex Behm	d640273a3f	IMPALA-861: Proper slot materialization on distinct aggs inside an inline view. The bug: Slot materialization on distinct aggs inside an inline view did not work if the only reference to the 2nd-phase agg-tuple slots was in a predicate from an outer query block (e.g., Where-clause of the block with the inline view ref). The reason was that bound predicates were fetched from the wrong tuple (from the 1st phase agg). The fix: Assign predicates to the top-most agg in the single-node plan that can evaluate them, as follows: For non-distinct aggs place them in the 1st phase agg node. For distinct aggs place them in the 2nd phase agg node. Change-Id: I0f6ab53cf7bb0c6aed9524ad2e24a849d2dc0ec4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1843 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1881	2014-03-13 04:39:48 -07:00
Alex Behm	7fcd7cd64e	Add list of tables missing stats to explain header and mem-limit exceeded error. Change-Id: Ibe8f329d5513ae84a8134b9ddb3645fa174d8a66 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1501 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1880	2014-03-12 21:15:22 -07:00
Nong Li	b1aeea3f0b	Disable decimal in analysis. Change-Id: I4cff1fc74ef0afeba15bee5b9eb6851abbfddbdf Reviewed-on: http://gerrit.ent.cloudera.com:8080/1874 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-03-12 18:55:06 -07:00
Alex Behm	58950a52a3	IMPALA-798: Distributed execution of CTAS and explain CTAS. Change-Id: I32004a4b31c54cf5c185169fece143a61213d12d Reviewed-on: http://gerrit.ent.cloudera.com:8080/1850 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1867	2014-03-12 16:51:50 -07:00
Matthew Jacobs	8fa8a0f828	IMPALA-843: Do not close reader contexts until plan fragment close Fixes a crash that occurs in some cases when io buffers are still used and child nodes are closed early. We close child nodes early when all rows have been consumed and resources are transfered, but in some cases io buffers are still in use when a scan node is closed. We avoid this problem by only closing reader contexts when the entire fragment is closed. Change-Id: Ie62cdecdcd530bdc61dd4e83cd9ecfc7d2c93ef6 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1806 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 66f14a47b953b7b7153c73f4e018d03461dcd5ef) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1859	2014-03-12 14:44:18 -07:00
Alex Behm	748ea3f38b	Fix test_partitioning.py and expected results. Change-Id: I21148f3a10abbda4f9e587f83cbabdd2a79c6147 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1861 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1866 Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-03-12 11:25:17 -07:00

1 2 3 4 5 ...

531 Commits