impala

mirror of https://github.com/apache/impala.git synced 2026-01-06 06:01:03 -05:00

Author	SHA1	Message	Date
Lenni Kuff	70c05d4caa	IMPALA-897: shell does not close queries after completion when running from a script The problem was that we were setting a flag marking the last_query_handle as closed, but were not resetting the flag before the next query. This caused the first query to be closed properly, but subsequent queries would not be closed. The fix is to change where the flag is reset to the same place as where we assign last_query_handle. Added a test case. Change-Id: I870a96789489bfe4f388910b808409cd0584af8a (cherry picked from commit 1439151af5b63112b0dd631fac9c7ab4d43bba37) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1976 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-03-18 18:46:54 -07:00
Lenni Kuff	9c3b318112	Fix test_compressed_formats to properly pull in tbl created in Hive Change-Id: I4e143826e5900ebfa6f77023ae4cf0d2c71db190 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1960 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1967 Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-03-18 13:24:10 -07:00
Lenni Kuff	b7432cd68a	Constrain test_explain to run only on text/none table format The tests expect to be run against text/none tables which causes failures on exhaustive test runs. I don't think it adds any extra coverage to run these tests against lzo format so added a constraint. Change-Id: Ib0878e2ba84107c9df4499def304fe45ba4fe4b4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1884 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/1964 Tested-by: jenkins	2014-03-18 11:51:19 -07:00
Skye Wanderman-Milne	44125729dc	UDF/UDA memory management improvements * AggFnEvaluator now uses the UDF mem pool (I'm planning to change this to per-exec node pools in the expr refactoring) * FunctionContext::TrackAllocation()/Free() actually use the UDF's mem tracker * Added FunctionContextImpl::Close() which sets warnings for leaked allocations Change-Id: I792ffd49102a92b57e34df18d8ff5f5d0fd27370 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1792 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com> (cherry picked from commit 41a5f7cfa718789fa3b2de3a31f085411fb5000c) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1954 Tested-by: jenkins	2014-03-17 20:38:25 -07:00
Lenni Kuff	d7c06486e1	Disable flaky explain tests due to inconsistent per-host mem requirements Change-Id: Ie372696c4986dc7f7c8f7fc074c41b89bd65f456 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1939 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins (cherry picked from commit ed4cb660b7a60d9b9248df525c477bab4d218c4b) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1953 Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-03-17 17:42:21 -07:00
Henry Robinson	635dd7d289	IMPALA-875: Respect isAnalyzed_ in IntLiteral expressions Partition column expressions are analysed twice for INSERT statements - once to infer the type and so to add a possible cast, and once to compute stats on the resulting expr. However, this process resulted in an partition column expr that was a IntLiteral getting the smallest type that would contains its value, rather than retaining the column-compatible type that had been assigned to it. This patch does the minimum thing, which is make IntLiteral.analyze() idempotent. Doing the same thing to Expr and LiteralExpr unearths some other bugs, which we will have to fix in a follow-on patch (see IMPALA-884). Change-Id: Ie22fc5d3f4832c735a1ebc0ef78f50d736f597fd Reviewed-on: http://gerrit.ent.cloudera.com:8080/1931 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins (cherry picked from commit 1912d65ea21a5025d385948642f0d4aadad91abf) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1947	2014-03-17 17:35:12 -07:00
Lenni Kuff	dd20958e5d	Minor test cleanup * Prefer 'refresh <table name>' over 'invalidate metadata' * Remove the 'RELOAD' test setup option that was used by only 1 test. * Delete a .py test file that seems to be a duplicate Change-Id: I890546635840bb8f4d55789a89f8c8f33e40d001 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1933 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1946 Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-03-17 17:30:15 -07:00
Skye Wanderman-Milne	be18bd8f76	IMPALA-752: Improve INSERT error message for unsupported file formats Change-Id: Ib16817d6e49d3df30643563eb9ec5573a920bba7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1911 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit 9e93c237fde1877eb0d140e73b090f2b891f3474) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1941	2014-03-17 14:54:46 -07:00
Nong Li	88a54dc532	Restrict parquet many cols test to one test dimension. Change-Id: Ib7e1a63a9981fc646899b627748b523119d9a5d4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1928 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-03-16 15:37:37 -07:00
Henry Robinson	7fa41471f6	IMPALA-838: Fix premature timeout of sessions A crucial comparison was between time values with different units. Tests didn't catch this because they only confirmed that sessions were timed out within the correct time, not that they were not timed out early. Change-Id: Ia8c57d3d70e4702996d0225b167142b7bf88d236 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1926 Tested-by: jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2014-03-16 11:41:26 -07:00
ishaan	2caf687d8d	Temporarily disable explain tests for explain levels 2 and 3 The explain tests that verify we detect missing stats properly is failing for avro. This change disables the test to unblock the full data load build. Change-Id: I0a7f54dbf1e8a3ebb557250287e7e0491aaa27f2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1925 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-03-15 20:13:20 -07:00
Lenni Kuff	aa0b7a35f5	IMPALA-880: COMPUTE STATS should update partitions in batches When updating partition metadata as part of COMPUTE STATS we would previously attempt to update all partitions at once. This could lead to HMS socket timeouts and also could run into issues if there were > 32K partitions. In this change we now update the partitions in batches, with a max size of 500 partitions per batch. We also compare whether the row count has changed and only update partitions that have been modified. Change-Id: If7bfcc30f86fc2fdd79855b981067ac29a47b5e1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1913 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1918	2014-03-14 19:20:12 -07:00
Nong Li	6629c32d6a	IMPALA-742: Fix unhandled metadata reading code path in parquet scanner. Change-Id: I1dd6364d148fed881020c045ece635b1601f86bb Reviewed-on: http://gerrit.ent.cloudera.com:8080/1836 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-03-14 16:08:45 -07:00
Lenni Kuff	cc1c0c61fd	IMP-1291: Support "extended" ASCII characters as delimiters in text files This fixes how we validate delimiters to be in line with Hive. A delimiter must fit in a single byte and can be specified in the following formats, as far as I can tell (there isn't documentation): - A single ASCII or unicode character (ex. '\|') - An escape character in octal format (ex. \001. Stored in the metastore as a unicode character: \u0001). - A signed decimal integer in the range [-128:127]. Used to support delimiters for ASCII character values between 128-255 (-2 maps to ASCII 254). Previously, we were not handling the "signed integer" case so there was no way to specify a delimiter in the "extended" ASCII range of 128-255. To support result validation, the test infrastructure had to be updated to support reading/writing different character encodings. Change-Id: Ie3c4d444dc9c6e60192093ed0c0f6f151eab16bc Reviewed-on: http://gerrit.ent.cloudera.com:8080/1848 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1888	2014-03-13 13:00:15 -07:00
Matthew Jacobs	e817c3742c	Admission controller: fix a number of TODOs * Remove requirement that fair scheduler and Llama conf files be on the classpath if specified as relative paths. Now they can be specified as any relative or absolute path. * Add flags to disable all per-pool max requests limits or mem limits. * Rename RequestPoolUtils to RequestPoolService * Make it more clear RequestPoolService is a singleton by putting it in ExecEnv * FileWatchService: use Executors.newScheduledThreadPool instead of a thread * Moved MEGABYTE (and related constants) to new Constants class (frontend) * Test RequestPoolService: Removed AllocationFileLoaderServiceHelper, replaced with reflection Change-Id: Iadf79cf77a7894a469c3587d0019a6d0bee7e58f Reviewed-on: http://gerrit.ent.cloudera.com:8080/1787 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit b9a167f6fdb4ab2595aca6035e1f9d926b909d94) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1858	2014-03-12 14:23:54 -07:00
Alex Behm	748ea3f38b	Fix test_partitioning.py and expected results. Change-Id: I21148f3a10abbda4f9e587f83cbabdd2a79c6147 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1861 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1866 Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-03-12 11:25:17 -07:00
Lenni Kuff	08417c875f	IMPALA-849: Impala does not work with boolean partition key columns This is because in HdfsTable we call call "expr.castTo(colType)", but BooleanLiteral (incorrectly) didn't implement "uncheckedCastTo()". This meant that instead of a BooleanLiteral being returned we got back a CastExpr, which cannot be cast to LiteralExpr. As part of this change it turns out Boolean partition columns are also broken in Hive. I filed HIVE-6590 for these issues and we decided to disable INSERT into a boolean partition column for Impala due to this bug. Change-Id: I3e295bb96aadc08d64faf551f6393a7128a7ef27 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1755 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-03-11 18:42:08 -07:00
Henry Robinson	05c8e4da93	IMPALA-624: Inserts should respect changes in partition location Impala would ignore changes in a partition's location (by ALTER TABLE ... SET LOCATION ...). Change-Id: I9fdc1f09f9d848aa1a4ade3d4f35f8de9cbd18a5 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1647 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1824	2014-03-08 13:21:06 -08:00
Lenni Kuff	23c619f794	Limit test_udfs to always run with a single exec_option test vector Change-Id: If3ff1f5f17a95cce88282f9dc165fe5ce85200b9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1781 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1811 Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-03-07 18:44:11 -08:00
Matthew Jacobs	d64c516fa8	Admission controller: Add mem limit to tests Change-Id: Ieae5c25e0d034317113f97ed66b8971cd80e0bae Reviewed-on: http://gerrit.ent.cloudera.com:8080/1705 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 0d8d1fa370264acd94d62399863ab751e6cbff06) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1804	2014-03-07 15:46:08 -08:00
Matthew Jacobs	d0386083fb	Admission control tests: Increase thread join timeout and remove unnecessary locking In some rare cases on overloaded machines, the thread join timeout of 10 seconds isn't long enough. Also, taking the lock at that time isn't necessary because the main thread will not attempt to cancel a thread unless it is already in the list of running threads. Threads are added to that list only after they submit their query. Change-Id: I23a67d726bc25221f0e9331ca1a3e9f5363f821d Reviewed-on: http://gerrit.ent.cloudera.com:8080/1744 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 27cf239592fafdb36a5680c480914f38a16037da) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1760	2014-03-07 14:49:57 -08:00
Matthew Jacobs	989830186f	Remove RM pool configuration and yarn_pool query option/profile property Admission control adds support for configuring pools via a fair scheduler allocation configuration, so the pool configuration mechanism is no longer needed. This also renames the "yarn_pool" query option to the more general "request_pool" as it can also be used to configure the admission controller when RM/Yarn is not used. Similarly, the query profile shows the pool as "Request Pool" rather than "Yarn Pool". Change-Id: Id2cefb77ccec000e8df954532399d27eb18a2309 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1668 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 8d59416fb519ec357f23b5267949fd9682c9d62f) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1759	2014-03-06 14:46:09 -08:00
Matthew Jacobs	41d90312fa	Admission controller: user to pool resolution, authorization, and pool configs Adds RequestPoolUtils which exposes user to pool resolution, authorization, and relevant pool configurations by wrapping Yarn classes that provide that functionality. (To support CDH4, those Yarn classes will come from thirdparty/cdh4-extras.) RequestPoolUtils is created once by the backend and the instance lives for the duration of the process. Change-Id: I53db075555578614356d33f9d939c5378b9ec797 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1566 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 8e385bdb54ed97e567c672a76723936c24cfe45f) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1758	2014-03-06 14:21:31 -08:00
Skye Wanderman-Milne	6ceed1e632	UDF API additions This patch introduces the ability to specify a prepare and close function for a UDF, as well as FunctionContext methods for maintaining state across UDF invocations within a query. Many of the changes are related to adding an Expr::Open() function which calls the UDF's prepare function, if specified (it has to be called in Open() since the LLVM module must be compiled first). Change-Id: I581d90d03dff71f7ff5d4a6bef839ba6bc46b443 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1693 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit 8e2ed7fb9051d98f89327715fdebd6f5ed22d6ee) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1757	2014-03-05 07:32:34 -08:00
Alex Behm	69a840d965	Consistent memory estimates for explain tests. Our new build machines (e.g., beefy) have more cores than our other machines, so scan nodes may have a different memory estimate causing the explain tests to fail. This patch fixes the num_scanner_threads to 1 for explain tests to ensure consisteny estimates. Change-Id: Ie6194f3c3b17d04aa141d04fcddb7ac948e92fcf Reviewed-on: http://gerrit.ent.cloudera.com:8080/1735 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1753 Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-03-05 05:38:30 -08:00
Skye Wanderman-Milne	68f6d57809	More robust initialization of ScannerContext This commit is in conjunction with the "Fix missing status check in lzo scanner." commit in the LZO repo. It provides a test case for the LZO fix, and changes the ScannerContext initialization so it will fail more gracefully instead of crashing. Change-Id: Idcafeb3679a8fa54322d1ec31c6f1aba860e4e4f Reviewed-on: http://gerrit.ent.cloudera.com:8080/1680 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit 9b84e3514c618bb3e171b5b3bb2ff862af4d35cc) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1752	2014-03-04 21:39:47 -08:00
Skye Wanderman-Milne	203fc66456	Add GetTypeDesc() method to FunctionContext. This is currently only implemented for NativeUdfExpr. Change-Id: I81b442c5668dff43d0486d1cfc445bca2af66606 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1664 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit e1087c3a78e6e12938b583c302907bd32c59f524) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1720	2014-03-01 20:24:30 -08:00
Lenni Kuff	bf16b5cd0d	IMPALA-749: Fetch partitions in batches, rather than all at once. This updates how Impala fetches partition metadata from the Hive Metastore to fetch partitions in batches, rather than all at once. This helps reduce the load on the HMS and also lets Impala scale to above 32K partitions. The downside is that it may require additional RPCs to get all the partitions. This is done by first querying the metastore to get all the partition names that exist, then splitting the list of names into seperate batches to get the actual partition metadata. Impala uses a default size of 1000 partitions per batch, but it can be configured by setting the 'hive.metastore.batch.retrieve.table.partition.max' parameter in the hive-site.xml config file. Change-Id: Ide0ec30ef8a9e00f79c26551aa8e5e7814c73034 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1662 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1698	2014-02-28 22:30:45 -08:00
ishaan	098ad99b82	Skip the invalidate metadata stress test until the race in the catalog server is resolved. Change-Id: I71911078d274f894f5a28c0e7123e5e5ac8dc940 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1507 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1702 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>	2014-02-28 10:42:48 -08:00
Matthew Jacobs	8ac929f095	Admission controller: use request memory estimate in admission Change-Id: I86ba26df434e9297b11abe349ea237fea9b04b87 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1622 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit fc0b3f8c289f39fa07816e1cd9e7b0484b845470) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1689	2014-02-27 13:34:29 -08:00
Alex Behm	9cabee4a71	Wait for the Metastore to come up before starting HiveServer2. Change-Id: Ic8e29efe63f6745e1ff44248657cbd7882bb16d9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1626 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1670 Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-02-25 21:05:33 -08:00
Lenni Kuff	d6cbd3dc44	Ensure db/table names are always case insensitive in catalog topic entry keys This fixes a bug that can happen with 'invalidate metadata <table name>' if the following sequences of events happens: 1) Table is created in Impala (table names are always treated as lower case) 2) Table is dropped and re-created in Hive, using the same name but different casing 3) invalidate metadata <table name> is run in Impala, which will update the existing table with the version from the Hive metastore. When building the next statestore update, the catalog server will send an update out thinking that the table from 1) was dropped and the table from 3) was added because the topic entry key is case sensitive. This may incorrectly remove the table from an impalad's catalog. The fix is to always treat db/table names as case insensitive. Change-Id: Ib59edc403989781bf12e0405c0ccd37b8e41ee41 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1634 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/1637	2014-02-23 00:20:16 -08:00
Lenni Kuff	83414c56d9	IMPALA-823: Catalog Server does not handle Hive Metastore connection failures When a HMS connection failed to open, an unchecked exception was being thrown. This wasn't getting handled properly and was causing the loading threads to die. This fixes the problem by ensuring the loading threads catch all types of exceptions and also fixes the TableLoader to return an IncompleteTable should a HMS connection failure occur. Change-Id: I3b696fd8ef12aa6749b602324dcdfe4d27c935ee Reviewed-on: http://gerrit.ent.cloudera.com:8080/1609 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1627	2014-02-21 11:59:21 -08:00
Matthew Jacobs	9156cb94ca	Admission controller functional tests The test works by submitting a number of queries (parameterized) with some delay between submissions (parameterized) and the ability to submit to one impalad or many. The queries are set with the WAIT debug action so that we have more control over the state that the admission controller uses to make decisions. Each query is submitted on a separate thread. Depending on the test parameters a varying number of queries will be admitted, queued, and rejected. Once queries are admitted, the query execution blocks and we can cancel the query in order to allow another queued query to be admitted. The test tracks the state of the admission controller using metric counters on each impalad. Change-Id: I455484a7f899032890b22c38592fcea1875f5399 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1413 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins (cherry picked from commit bc2a74d6da622de877422f926ff1892bed867bb1) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1624 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-02-20 14:48:30 -08:00
Nong Li	904ae86e82	IMPALA-626: Allow dropping functions while it is running. Change-Id: Ia9d6fa1daadddbd05961696d13b9ff43fef2da61 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1621 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-02-20 13:12:10 -08:00
ishaan	e3a0bedfe4	Add a time constraint while evaluating regressions in performance reports. For very short queries, using percentage as a strict bound for detecting regression does not cut it. This introduces a flag through which the user can set a value (in seconds), which will only attempt to detect a regression if the time difference is equal to or greater than the value. Change-Id: I95bde65afb8fd564666223fe069537e2483ec98e Reviewed-on: http://gerrit.ent.cloudera.com:8080/1498 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1579	2014-02-18 10:12:36 -08:00
Lenni Kuff	5f027f61c5	IMPALA-800 / IMPALA-795: Check catalog version before removing entries from the lib cache There was an issue with the lib cache cleanup code where if a function were dropped then re-created we might incorrectly remove the new functions's library from the cache. Consider these statements executed in quick succession: 1) create function fn() 2) drop function fn() 3) create function fn() 4) select fn() ... Since we perform direct-DDL and immediately apply the result of a DDL operation to the local impalad catalog, steps 1-4 may complete before a statestore catalog update with the drop from step 2) is received. When the statestore heartbeat with the drop is received, we incorrectly removed the new function's lib cache entry while the select statement was executing, causing the crash. The fix for this problem is to verify the catalog versions to ensure we only drop items that have a catalog version <= the catalog version the drop corresponds to. Change-Id: I7dd1886bf24740cb41f1315ecbb540e38d9ad363 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1552 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1576	2014-02-17 17:56:49 -08:00
Skye Wanderman-Milne	3598395290	Set sync_ddl=true for tests that drop functions. This is a temporary "fix" for IMPALA-795 to unblock the build. The actual fix should prevent a dropped and re-created function from being re-dropped by an old catalog update. Change-Id: Id9dc36a8ecd5e7d1a1146ad0ac092ae12cb33529 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1547 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit 80439d638a4ac02cedfe1490556b176cd818429f) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1559 Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-02-14 10:44:54 -08:00
Nong Li	3722711a06	IMPALA-800 workaround. Mark test_libs_with_same_filename as serial. This test will drop functions in a binary used by the other UDF tests. That triggers IMPALA-800. Change-Id: I8e6f1ad5b4a7ece2d891559751142f0c12e07c3c Reviewed-on: http://gerrit.ent.cloudera.com:8080/1556 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com> (cherry picked from commit 95100e0bdfd9472183fcc7cd8636666d5b654a37) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1558 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-02-14 00:22:08 -08:00
Lenni Kuff	21652c02e8	IMPALA-736: invalidate metadata <table name> should add/remove table from catalog if it does/does not exist in the metastore The invalidation logic is: - If table exists in the metastore, add it to the catalog as an IncompleteTable. If the table's parent database does not exist in the catalog, it will also be added. - If the table does not exist in the metastore, remove it from the catalog cache. - If we are unable to determine whether the table exists in the metastore (there was an exception thrown), invalidate any existing entry by replacing it with an IncompleteTable. Change-Id: If64f07950324a1bec186f9c9ce829197cad87044 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1301 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1522	2014-02-11 22:28:57 -08:00
Nong Li	80d4fd958e	IMPALA-786: Drop function should clear library cache. We were previously only clearing the cache in the catalog service update loop so the impalad the drop was issued to was not doing the right thing. Change-Id: I6bee228e8c0d565cea4ea61cbf64240d83a45a7d Reviewed-on: http://gerrit.ent.cloudera.com:8080/1511 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-02-10 18:51:39 -08:00
Lenni Kuff	7a6892dcbe	Fix race when invalidating catalog metadata and loading a new table There was race when the catalog was invalidated at the same time a table was being loaded. This is because an uninitialized Table was being returned unexpectedly to the impalad due to the concurrent invalidate. This fixes the problem by updating the CatalogObjectCache to load when a catalog object is uninitialized, rather than load when null. New items can now be added in a initialized or uninitialized state; uninitialized objects are loaded on access. Also adds a stress test for invalidate metadata/invalidate metadata <table>/refresh In addition, it cleans up the locking in the Catalog to make it more straight forward. The top-level catalogLock_ is now only in CatalogServiceCatalog and this lock is used to protect the catalogVersion_. Operations that need to perform an atomic bulk catalog operation can use this lock (such as when the CatalogServer needs to take a snapshot of the catalog to calculate what delta to send to the statestore). Otherwise, the lock is not needed and objects are protected by the synchronization at each level in the object heirarchy (Db->[Function/Table]). That is, Dbs are synchronized by the Db cache, each Db has a Table Cache which is synchronized independently. Change-Id: I9e542cd39cdbef26ddf05499470c0d96bb888765 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1355 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1418	2014-01-31 16:16:32 -08:00
Alex Behm	6b769d011d	Adds limited support for the FETCH_FIRST fetch orientation in HS2 client requests. Adds a bounded query-result cache that clients can enable by setting an 'impala.resultset.cache.size' option in the HS2 confOverlay mapof the HS2 exec request. Impala permits FETCH_FIRST for a particular stmt iff result caching is enabled. FETCH_FIRST will succeed as long all previously fetched rows fit into the bounded result cache. Regardless of whether a FETCH_FIRST succeeds or not, clients may always resume fetching with FETCH_NEXT. The FETCH_FIRST feature is intended to allow HUE users to export an entire result set (to Excel, CSV, etc.) after browsing through a few pages of results, without having ro re-run the query from scratch. Change-Id: I71ab4794ddef30842594c5e1f7bc94724d6ce89f Reviewed-on: http://gerrit.ent.cloudera.com:8080/1356 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1406	2014-01-30 14:58:46 -08:00
Lenni Kuff	b4f5c1edcf	Enable lazy loading of table metadata for the CatalogService/Impalad This change adds support for lazy loading of table metadata to the CatalogService/Impalad. The way this works is that the CatalogService initially sends out an update with only the databases and table names (wrapped as IncompleteTables). When an Impalad encounters one of these tables, it will contact the catalog service to get the metadata, possibly triggering a metadata load if the catalog server has not yet loaded this table. With these changes the catalog server starts up in just seconds, even for large metastores since it only needs to call into the metastore to get the list of tables and databases. The performance of "invalidate metadata" also improves for the same reason. I also picked up the catalog cleanup patch I had to make the APIs a bit more consistent and remove the need for using a LoadingCache for databases. This also fixes up the FE tests to run in a more realistic fashion. The FE tests now run against catalog object recieved from the catalog server. This actually turned up some bugs in our previous test configuration where we were not running with the correct column stats (we were always running with avgSerializedSize = slotSize). This changed some plans so the planner tests needed to be updated. Still TODO: This does not include the changes to perform background metadata loading. I will send that out as a separate patch on top of this. Change-Id: Ied16f8a7f3a3393e89d6bfea78f0ba708d0ddd0e Saving changes Change-Id: I48c34408826b7396004177f5fc61a9523e664acc Reviewed-on: http://gerrit.ent.cloudera.com:8080/1328 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/1338 Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-21 21:43:29 -08:00
Alex Behm	5ae53c2f80	Compilation fixes after rebasing. Change-Id: I87348336b2489069d65f34821c1a3df3c5ca9512	2014-01-15 15:12:12 -08:00
Alex Behm	f4b809dd11	Re-registering resource brokers with Llama if Llama restarts. All in-flight queries will be blocked until re-registration succeeds or until a timeout has been reached. Change-Id: I9c22c9d3a2deff92b227065974109715a1b18595	2014-01-15 15:12:08 -08:00
Alex Behm	c295b5eda8	[CDH5] Fixed JDBC connectivity to Impala and Hive and related Impala tests. Hive now uses the simple SASL transport because its NOSASL transport is broken (HIVE-4232). Impala still uses the NOSASL transport. The changes also include more careful dependency management. Change-Id: I16633dcef912dce20c8de8cf2f43c45a49460d20	2014-01-15 15:11:47 -08:00
Lenni Kuff	51f003a785	IMP-1156: Add CatalogServer API for listing all UDFs and UDAs in a database Adds a new client API for retrieving all user defined functions (aggregate and scalar) in a database. This is a requirement from CM Backup Disaster and Recovery. Change-Id: I4e33d714795fe808370262f36218ea112f67ec30 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1271 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-01-14 00:01:25 -08:00
Lenni Kuff	0ce83818a6	IMPALA-675: Add function to get current default database This uses the same syntax as postgres: current_database() Change-Id: Ic6ca8ce1fe8c10a496800c45c58b0df4d214b51c Reviewed-on: http://gerrit.ent.cloudera.com:8080/1274 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-13 22:27:22 -08:00
Alex Behm	6799c93922	Simplified/enhanced explain plans with a total of four explain levels. There are now 4 explain levels summarized as follows: - Level 0: MINIMAL Non-fragmented parallel plan only showing plan nodes with minimal attributes - Level 1: STANDARD Non-fragmented parallel plan with some details in plan nodes - Level 2: EXTENDED Non-fragmented parallel plan with full details in plan nodes including the table/column stats, row size, #hosts, cardinality, and estimated per-host memory requirement - Level 3: VERBOSE Fragmented parallel plan with full details (like level 2) This patch also includes several bugfixes related to plan costing and/or testing of explain plans. Change-Id: I622310f01d1b3d53ea1031adaf3b3ffdd94eba30 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1211 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-10 19:17:59 -08:00

1 2 3 4 5 ...

414 Commits