impala

mirror of https://github.com/apache/impala.git synced 2025-12-30 03:01:44 -05:00

Author	SHA1	Message	Date
Michael Brown	fe2be25245	IMPALA-4775: minor adjustments to python test infra logging - Set up log handler to append, not truncate. This was the cause of IMPALA-4775. Other improvements: - Log a thread name, not thread ID. Thread names are more useful. - Use ISO 8601-like timestamps I tested that running disrepancy_searcher.py doesn't overwrite its logs anymore. One such command that could reproduce it is: tests/comparison/discrepancy_searcher.py \ --use-postgresql \ --query-count 1 \ --db-name tpch_kudu I also ensured the stress test (concurrent_select.py) still logged to its file. Change-Id: I2b7af5b2be20f3c6f38d25612f6888433c62d693 Reviewed-on: http://gerrit.cloudera.org:8080/5746 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-20 02:24:47 +00:00
Tim Armstrong	69859bddfb	IMPALA-4549: consistently treat 9999 as upper bound for timestamp year Previously Impala was inconsistent about whether the year 10000 was supported, as a result of inconsistency in boost, which reported the maximum year as 9999 but sometimes allowed 10000. This meant that Impala sometimes accepted the year 10000 and sometimes not. Use the patched boost version and update tests accordingly. Testing: Ran an exhaustive build. Change-Id: Iaf23b40833017789d879e5da7bb10384129e2d10 Reviewed-on: http://gerrit.cloudera.org:8080/5665 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-19 00:04:27 +00:00
Michael Brown	db7facdee0	IMPALA-4351,IMPALA-4353: [qgen] randomly generate INSERT statements - Generate INSERT statements that are either INSERT ... VALUES or INSERT ... SELECT - On both types of INSERTs, we either insert into all columns, or into some column list. If the column list exists, all primary keys will be present, and 0 or more additional columns will also be in the list. The ordering of the column list is random. - For INSERT ... SELECT, occasionally generate a WITH clause - For INSERT ... VALUES, generate non-null constants for the primary keys, but for the non-primary keys, randomly generate a value expression. The type system in the random statement/query generator isn't sophisticated enough to the implicit type of a SELECT item or a value expression. It knows it will be some INT-based type, but not if it's going to be a SMALLINT or a BIGINT. To get around this, the easiest thing seems to be to explicitly cast the SELECT items or value expressions to the columns' so-called exact_type attribute. Much of the testing here involved running discrepancy_searcher.py --explain-only on both tpch_kudu and a random HDFS table, using both the default profile and DML-only profile. This was done to quickly find bugs in the statement generation, as they tend to bubble up as analysis errors. I expect to make other changes as follow on patches and more random statements find small test issues. For actual use against Kudu data, you need to migrate data from Kudu into PostgreSQL 5 (instructions tests/comparison/POSTGRES.txt) and run something like: tests/comparison/discrepancy_searcher.py \ --use-postgresql \ --postgresql-port 5433 \ --profile dmlonly \ --timeout 300 \ --db-name tpch_kudu \ --query-count 10 Change-Id: I842b41f0eed07ab30ec76d8fc3cdd5affb525af6 Reviewed-on: http://gerrit.cloudera.org:8080/5486 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-01-13 01:31:47 +00:00
Michael Brown	54665120cb	IMPALA-4355: random query generator: modify statement execution flow to support DML - Rework the discrepancy searcher to run DML statements. We do this by using the query profile to choose a table, copy that table, and generate a statement that will INSERT into that copy. We chose a slow copy over other methods because INSERTing into a copy is a more reliable test that prevents table sizes from getting out of hand or time-consuming replay to reproduce a particular statement. - Introduce a statement generator stub. The real generator work is tracked in IMPALA-4351 and IMPALA-4353. Here we simply generate a basic INSERT INTO ... VALUES statement to make sure our general query execution flow is working. - Add query profile stub for DML statements (INSERT-only at this time). Since we'll want INSERT INTO ... SELECT very soon, this inherits from DefaultProfile. Also add building blocks for choosing random statements in the DefaultProfile. - Improve the concept of an "execution mode" and add new modes. Before, we had "RAW", "CREATE_TABLE_AS", and "CREATE_VIEW_AS". The idea here is that some random SELECT queries could be generated as "CREATE TABLE\|VIEW AS" at execution time, based on weights in the query profile. First, we remove the use of raw string literals for this, since raw string literals can be error-prone, and introduce a StatementExecutionMode class to contain a namespace for the enumerated statement execution modes. Second, we introduce a couple new execution modes. The first is DML_SETUP: this is a DML statement that needs to be run in both the test and reference databases concurrently. For our purposes, it's the INSERT ... SELECT that copies data from the chosen random table into the table copy. The second is DML_TEST: this is a randomly-generated DML statement. - Switch to using absolute imports in many places. There was a mix of absolute and relative imports happening here, and they were causing problems, especially when comparing data types. In Python, <class 'db_types.Int'> != <class 'tests.comparison.db_types.Int'>. Using from __future__ import absolute_import didn't seem to catch the relative import usage anyway, so I haven't employed that. - Rename some, but not nearly all, names from "query" to "statement". Doing this is a rather large undertaking leading to much larger diffs and testing (IMPALA-4602). - Fix a handful of flake8 warnings. There are a bunch that went unfixed for over- and under-indentation. - Testing o ./discrepancy_searcher.py runs with and without --explain-only, and with --profile default and --profile dmlonly. For tpch_kudu data, it seems sufficient to use a --timeout of about 300. o Leopard run to make sure standard SELECT-only generation still works o Generated random stress queries locally o Generated random data locally Change-Id: Ia4c63a2223185d0e056cc5713796772e5d1b8414 Reviewed-on: http://gerrit.cloudera.org:8080/5387 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-01-12 21:40:39 +00:00
Lars Volker	8b7f876649	IMPALA-4722: Disable log caching in test_scratch_disk test_scratch_disk fails sporadically when trying to assert the presence of log messages. This is probably caused by log caching, since after such failures the log files do contains the lines in question. I manually tested this by running the tests repeatedly for 2 days (10k runs). To make future diagnosis of similar problems easier, this change also adds more output to assert_impalad_log_contains(). Change-Id: I9f21284338ee7b4374aca249b6556282b0148389 Reviewed-on: http://gerrit.cloudera.org:8080/5669 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-12 18:58:48 +00:00
Tim Armstrong	75027c913b	IMPALA-4745: fix TestScratchLimit failure on S3 The commit "IMPALA-3202,IMPALA-2079: rework scratch file I/O" improved efficiency of scratch file use in some scenarios. TestScratchLimit::test_with_low_scratch_limit started failing on S3, because it expects to use more than 50MB of scratch space. Testing: Ran the test in a loop locally for 50+ iterations - didn't see any failures. Change-Id: I607b4c6ad10eba0e6c7bc8d6e640d42da26ee6c8 Reviewed-on: http://gerrit.cloudera.org:8080/5654 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-11 03:47:29 +00:00
Lars Volker	ac59489df9	IMPALA-4751: Remove blank line from raw_text template The additional blank line can break tooling which uses the /query_profile_encoded endpoint and has been erroneously introduced in the fix for IMPALA-3918. Change-Id: I9b688aa9e2423b0271c8891a983e5b22707d8dbc Reviewed-on: http://gerrit.cloudera.org:8080/5664 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-10 21:16:57 +00:00
Jim Apple	0ee6d19d59	IMPALA-4742: Change "{}".format() to "{0}".format() for Py 2.6 From the Python docs: "Changed in version 2.7: The positional argument specifiers can be omitted, so '{} {}' is equivalent to '{0} {1}'." http://gerrit.cloudera.org:8080/5401 used the newer form, "{}".format(). This change uses the older backwards-compatible compatible form. Change-Id: If78b9b4061ca191932ac5b0b14e0ee8951a9d4e8 Reviewed-on: http://gerrit.cloudera.org:8080/5641 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-08 23:52:01 +00:00
Jim Apple	9fa2ff7138	IMPALA-2605: Omit the sort and mini stress tests These stress tests were sometimes causing the end-to-end tests to hang indefinitey, including in the pre-merge testing (sometimes called "GVO" or "GVM"). This patch also prints to stdout some connections metrics that may prove useful for debugging stress test hangs in the future. The metrics are printed before and after stress tests are run when run-tests.py is used. Change-Id: Ibd30abf8215415e0f2830b725e43b005daa2bb2d Reviewed-on: http://gerrit.cloudera.org:8080/5401 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-01-06 21:32:15 +00:00
Tim Armstrong	95ed4434f2	IMPALA-3202,IMPALA-2079: rework scratch file I/O Refactor BufferedBlockMgr/TmpFileMgr to push more I/O logic into TmpFileMgr, in anticipation of it being shared with BufferPool. TmpFileMgr now handles: * Scratch space allocation and recycling * Read and write I/O The interface is also greatly changed so that it is built around Write() and Read() calls, abstracting away the details of temporary file allocation from clients. This means the TmpFileMgr::File class can be hidden from clients. Write error recovery: Also implement write error recovery in TmpFileMgr. If an error occurs while writing to scratch and we have multiple scratch directories, we will try one of the other directories before cancelling the query. File-level blacklisting is used to prevent excessive repeated attempts to resize a scratch file during a single query. Device-level blacklisting is not implemented because it is problematic to permanently take a scratch directory out of use. To reduce the number of error paths, all I/O errors are now handled asynchronously. Previously errors creating or extending the file were returned synchronously from WriteUnpinnedBlock(). This required modifying DiskIoMgr to create the file if not present when opened. Also set the default max_errors value in the thrift definition file, so that it is in effect for backend tests. Future Work: * Support for recycling variable-length scratch file ranges. I omitted this to avoid making the patch even large. Testing: Updated BufferedBlockMgr unit test to reflect changes in behaviour: * Scratch space is no longer permanently associated with a block, and is remapped every time a new block is written to disk . * Files are now blacklisted - updated existing tests and enable the disable blacklisting test. Added some basic testing of recycling of scratch file ranges in the TmpFileMgr unit test. I also manually tested the code in two ways. First by removing permissions for /tmp/impala-scratch and ensuring that a spilling query fails cleanly. Second, by creating a tiny ramdisk (16M) and running with two scratch directories: one on /tmp and one on the tiny ramdisk. When spilling, an out of space error is encountered for the tiny ramdisk and impala spills the remaining data (72M) to /tmp. Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17 Reviewed-on: http://gerrit.cloudera.org:8080/5141 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-05 02:26:24 +00:00
Lars Volker	25ebf586e0	IMPALA-4689: Fix computation of last active time The last active time in impala-server.cc#L1806 is in milliseconds, but the TimestampValue c'tor expects seconds. This change also renames some variables to make their meaning more explicit, aiming to prevent similar bugs in the future. This change also fixes a bug that occurred when during startup of the local minicluster the operating system PIDs would wrap around. This way the first impalad would not be the one with the smallest PID and ImpalaCluster.get_first_impalad() would return the wrong one. I ran git-clang-format on the change. Change-Id: I283564c8d8e145d44d9493f4201555d3a1087edf Reviewed-on: http://gerrit.cloudera.org:8080/5546 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2017-01-04 12:12:04 +00:00
Taras Bobrovytsky	2159beee89	IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Here's an example of a full call with the new options that runs the stress test on the local mini cluster: ./concurrent_select.py \ --tpch-kudu-db=tpch_kudu \ --generate-dml-queries \ --dml-mod-values 11 13 17 \ --generate-compute-stats-queries \ --select-probability=0.5 \ --mem-limit-padding-pct=25 \ --mem-limit-padding-abs=50 \ --reset-databases-before-binary-search \ --reset-databases-after-binary-search Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Reviewed-on: http://gerrit.cloudera.org:8080/5093 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Impala Public Jenkins	2016-12-20 01:33:01 +00:00
David Knupp	6c5f8e3f5e	IMPALA-4639: Add pytest option and xfail markers for tests that only run locally. As we're beginning to run Impala end-to-end tests on remote clusters, we're finding some tests that do not pass for infrastructure-related reasons (as opposed to product issues.) It would be useful to be able to xfail any tests that we know to be problematic within a given module, yet still run the others. This way, we can get passing test runs as we're ironing out those infrastructure issues. Change-Id: Id4d6e46dc1e64ad20c727ccb19af7a9f3daf917f Reviewed-on: http://gerrit.cloudera.org:8080/5446 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-15 02:45:50 +00:00
Tim Armstrong	246acba0b3	IMPALA-4659: fuzz test fixes * Apply a 512m mem_limit to all fuzz tests. This limits aggregate memory consumption to ~5GB per daemon(assuming 10 concurrent tests). * Refactor the exec option handling to use the exec_option dimension. This avoids executing the test multiple times redundantly * Remove the xfails to reduce noise, since there is no immediate plan to fix the product bugs. Instead just pass the tests. Testing: Ran in a loop for ~1h to flush out flakiness. Change-Id: Ie1942ceef252ec3e6171a0a54722b66a7d9abbd7 Reviewed-on: http://gerrit.cloudera.org:8080/5502 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2016-12-15 01:31:28 +00:00
Matthew Jacobs	652e7d56d9	IMPALA-4654: KuduScanner must return when ReachedLimit() Fixes a bug in the KuduScanner where the scan node's limit was not respected and thus the scanner thread would continue executing until the scan range was fully consumed. This could result in completed queries leaving fragments running and those threads could be using significant CPU and memory. For example, the query 'select * from tpch_kudu.lineitem limit 90' when running in the minicluster and lineitem is partitioned into 3 hash partitions would end up leaving a scanner thread running for ~60 seconds. In real world scenarios this can cause unexpected resource consumption. This could build up over time leading to query failures if these queries are submitted frequently. The fix is to ensure KuduScanner::GetNext() returns with eos=true when it finds ReachedLimit=true. An unnecessary and somewhat confusing flag 'batch_done' was being returned by a helper function DecodeRowsIntoRowBatch, which isn't necessary and was removed in order to make it more clear how the code in GetNext() should behave. Change-Id: Iaddd51111a1b2647995d68e6d37d0500b3a322de Reviewed-on: http://gerrit.cloudera.org:8080/5493 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-12-14 23:24:47 +00:00
Matthew Jacobs	73e41cea19	IMPALA-4642: Fix TestFragmentLifecycle failures; kudu test must wait Fixes test failures in TestFragmentLifecycle when it runs after TestKuduMemLimits which takes some time for all fragments to finish closing, even though the query is finished. TestFragmentLifecycle checks that there are no fragments in flight. For now, this fixes the tests by forcing TestKuduMemLimits to wait for all 'in flight' fragments to complete before continuing. We still need to understand why the KuduScanNode/KuduScanner is taking so long to Close() (see IMPALA-4654). Change-Id: Ia655a37ff06e92cc55ba05f01d5e94fe39447c65 Reviewed-on: http://gerrit.cloudera.org:8080/5481 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins	2016-12-13 03:12:42 +00:00
Lars Volker	1e683d4ee6	IMPALA-4403: Implement SHOW RANGE PARTITIONS for Kudu tables Change-Id: Idf5b2fdd02938a42fa59ec98884e4ac915dd1f65 Reviewed-on: http://gerrit.cloudera.org:8080/5390 Reviewed-by: Lars Volker <lv@cloudera.com> Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-12-10 00:05:50 +00:00
Lars Volker	e1a6db7609	Bump Kudu server version to latest master (a70c905006) This also re-enabled kudu_alter.test, which was disabled in IMPALA-4628. Change-Id: Ie5acdeffea7ed9a68ce0f48d1f68c6c922044704 Reviewed-on: http://gerrit.cloudera.org:8080/5427 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-12-09 19:24:50 +00:00
Lars Volker	02b5cce846	IMPALA-4628: Disable broken kudu test to unblock GVOs Change-Id: I30d45acb26eb3e709a1994a89e8444ca9530d8cc Reviewed-on: http://gerrit.cloudera.org:8080/5428 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2016-12-08 21:58:46 +00:00
Tim Armstrong	88448d1d4a	IMPALA-4586: don't constant fold in backend This patch ensures that setting the query option enable_expr_rewrites=false will disable both constant folding in the frontend (which it did already) and constant caching in the backend (which is enabled in this patch). This gives a way for users to revert to the old behaviour of non-deterministic UDFs before these optimisations were added in Impala 2.8. Before this patch, the backend would cache values based on IsConstant(). This meant that there was no way to override caching of values of non-deterministic UDFs, e.g. with enable_expr_rewrites. After this patch, we only cache literal values in the backend. This offers the same performance as before in the common case where the frontend will constant fold the expressions anyway. Also rename some functions to more cleanly separate the backend concepts of "constant" expressions and expressions that can be evaluated without a TupleRow. In a future change (IMPALA-4617) we should remove the IsConstant() analysis logic from the backend entirely and pass the information from the frontend. We should also fix isConstant() in the frontend so that it only returns true when it is safe to constant-fold the expression (IMPALA-4606). Once that is done, we could revert back to using IsConstant() instead of IsLiteral(). Testing: Added targeted test to test constant folding of UDFs: we expect different results depending on whether constant folding is enabled. Also run TestUdfs with expr rewrites enabled and disabled, since this can exercise different code paths. Refactored test_udfs somewhat to avoid running uninteresting combinations of query options for targeted tests and removed some 'drop * if not exists' statements that aren't necessary when using unique_database. This change revealed flakiness in test_mem_limit, which seems to have only worked by coincidence. Updated TrackAllocation() to actually set the query status when a memory limit is exceeded. Looped this test for a while to make sure it isn't flaky any more. Also fix other test bugs where the vector argument is modified in-place, which can leak out to other tests. Change-Id: I0c76e3c8a8d92749256c312080ecd7aac5d99ce7 Reviewed-on: http://gerrit.cloudera.org:8080/5391 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2016-12-08 04:53:53 +00:00
Taras Bobrovytsky	1083639ff2	IMPALA-4585: Allow the $DATABASE template in the CATCH section In a recent change (IMPALA-4363) we introduced a change where all file paths in .test files should be replaced with '__HDFS_FILENAME__'. This caused problems for tests on non-HDFS file systems and we also lost some test coverage. This patch fixes the problem by allowing the $DATABASE template in the catch section of the .test file. Change-Id: If0f6ae8dea7ac4cdaf0c61ebd8f0c589c353a96e Reviewed-on: http://gerrit.cloudera.org:8080/5372 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2016-12-08 02:20:50 +00:00
Michael Ho	9337518137	IMPALA-4595: Ignore discarded functions after linking For LLVM IR UDF, Impalad will link an external LLVM module in which the IR UDF is defined with the main module. If it happens that a symbol is defined in both modules, LLVM may choose to discard the one defined in the external module. The discarded function and its callee will not be present in the linked module. In IMPALA-4595, udf-sample.cc was compiled without any optimization. Duplicated definition such as StringVal::null() may have different inlining level between the external module and the main module. When the duplicated definition in the external module is discarded, some of its callee functions (which are not inlined) may not be defined in the main module so they can no longer be located in the linked module. This trips up some code in the LlvmCodegen::LinkModule(). In particular, when parsing for functions in external module which are materialized during linking, certain functions may not be present due to the reason above. Impalad will hit a DCHECK in debug build or crash due to null pointer access in release build. This change fixes the problem above by taking into account that certain functions may not be defined anymore after linking. This change also fixes two incorrect status propagation in fe-support.cc. Change-Id: Iaa056a0c888bfcc95b412e1bc1063bb607b58ab7 Reviewed-on: http://gerrit.cloudera.org:8080/5384 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2016-12-07 22:52:35 +00:00
Dan Burkert	f83652c1da	Replace INTO N BUCKETS with PARTITIONS N in CREATE TABLE This commit also removes the now unused `DISTRIBUTE`, `SPLIT`, and `BUCKETS` keywords that were going to be newly released in Impala 2.6, but are now unused. Additionally, a few remaining uses of the `DISTRIBUTE BY` syntax has been switched to `PARTITION BY`. Change-Id: I32fdd5ef26c532f7a30220db52bdfbf228165922 Reviewed-on: http://gerrit.cloudera.org:8080/5382 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-12-07 07:31:16 +00:00
Michael Ho	9b80224f9f	IMPALA-2925: Mark test_alloc_update as xfail. test_alloc_update.py is flaky and the expected failure sometimes doesn't occur. Mark this test as xfail for now to unblock the build. Change-Id: If4e86e7b9c064bc78b672814cd3569453ecc268d Reviewed-on: http://gerrit.cloudera.org:8080/5366 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-12-06 12:37:17 +00:00
Dimitris Tsirogiannis	cba93f1ac3	IMPALA-4561: Replace DISTRIBUTE BY with PARTITION BY in CREATE TABLE Change-Id: I0e07c41eabb4c8cb95754cf04293cbd9e03d6ab2 Reviewed-on: http://gerrit.cloudera.org:8080/5317 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-12-06 10:41:53 +00:00
Alex Behm	7efa08316e	IMPALA-4572: Run COMPUTE STATS on Parquet tables with MT_DOP=4. COMPUTE STATS on Parquet tables is run with MT_DOP=4 by default. COMPUTE STATS on non-Parquet tables will run without MT_DOP. Users can always override the behavior by setting MT_DOP manually. Setting MT_DOP to 0 means a statement will be run in the conventional execution mode (without intra-node paralellism based on multiple fragment instances). Users can set a higher MT_DOP even for Parquet tables. Testing: Added a new test that checks the effective MT_DOP. Locally ran test_mt_dop.py, test_scanners.py, test_nested_types.py, test_compute_stats.py, and test_cancellation.py. Change-Id: I2be3c7c9f3004e9a759224a2e5756eb6e4efa359 Reviewed-on: http://gerrit.cloudera.org:8080/5315 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-03 22:28:53 +00:00
Henry Robinson	b9034ea0d5	IMPALA-4580: Fix crash with FETCH_FIRST when #rows < result cache size The following sequence can lead to a crash: 1. Client sets result cache size to N 2. Client issues query with #results < N 3. Client fetches all results, triggering eos and tearing down Coordinator::root_sink_. 4. Client restarts query with FETCH_FIRST. 5. Client reads all results again. After cache is exhausted, Coordinator::GetNext() is called to detect eos condition again. 6. GetNext() hits DCHECK(root_sink_ != nullptr). This patch makes GetNext() a no-op if called after it sets *eos, avoiding the crash.. Testing: Regression test that triggered the bug before this fix. Change-Id: I454cd8a6cf438bdd0c49fd27c2725d8f6c43bb1d Reviewed-on: http://gerrit.cloudera.org:8080/5335 Reviewed-by: Henry Robinson <henry@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-12-03 11:07:04 +00:00
Taras Bobrovytsky	858f5c2197	IMPALA-4363: Add Parquet timestamp validation Before this patch, we would simply read the INT96 Parquet timestamp representation and assume that it's valid. However, not all bit permutations represent a valid timestamp. One of the boost functions raised an exception (that we didn't catch) when passed an invalid boost date object, which resulted in a crash. This patch fixes problem by validating that the date falls into 1400..9999 year range as we are scanning Parquet. Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac Reviewed-on: http://gerrit.cloudera.org:8080/5343 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2016-12-03 06:41:07 +00:00
Thomas Tauber-Marshall	7bcb51b152	IMPALA-4357: Fix DROP TABLE to pass analysis if the table fails to load If a table fails to load, eg. because it was deleted externally from Kudu, we should still allow 'DROP TABLE' to pass analysis. Otherwise, you may be unable to drop tables that are in a bad state. Testing: - Updates existing Kudu tests to reflect the new behavior, and fixes a couple of problems with those tests that were causing them to pass spuriously (as well as fixing the same problem with another test in the file while I'm here). Change-Id: I6b41fc3c0e95508ab67f1d420b033b02ec75a5da Reviewed-on: http://gerrit.cloudera.org:8080/5144 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-02 21:58:03 +00:00
Michael Brown	8d4f8d8d93	IMPALA-4343,IMPALA-4354: qgen: model INSERTs; write INSERTs from query model This patch adds support to the random query generator infrastructure to model and write SQL INSERTs. It does not actually randomly generate INSERTs at this time (tracked in IMPALA-4353 and umbrella task IMPALA-3740) but does provide necessary building blocks to do so. First, it's necessary to model the INSERTs as part of our data model. This was done by taking the current notion of a Query and making it a SelectQuery. We also then create an abstract Query containing some of the more common methods and attributes. We then model an INSERT query, INSERT clause, and VALUES clause (IMPALA-4343). Second, it's necessary to test the basics of this data model. It made sense to go ahead and implement the necessary SqlWriter methods to write the SQL for these clauses (IMPALA-4354). I could then use this writer with some existing and new tests that take a query written into our data model and write the SQL, verifying they're correct. For INSERT into Kudu tables, the equivalent PostgreSQL queries need to use "ON CONFLICT DO NOTHING", so all existing and new query tests verify they can be written as PostgreSQL as well. Testing: - all the query generator tests pass - I can run Leopard front_end.py and load older query generator reports, browse them, and re-run failed queries - I can run Leopard controller.py to actually do a query generator run - discrepancy_searcher.py --explain-only ran for hundreds of queries. There were no problems writing the SELECT queries Change-Id: I38e24da78c49e908449b35f0a6276ebe4236ddba Reviewed-on: http://gerrit.cloudera.org:8080/5162 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>	2016-12-02 20:49:43 +00:00
Matthew Jacobs	48983b3893	IMPALA-4567: Fix test_kudu_alter_table exhaustive failures The issue is that we set the Kudu table name explicitly via tblproperty so it doesn't have the unique db name in the underlying Kudu name. Meanwhile, the tests are run concurrently in exhaustive so this test may end up running the multiple times (w/ different parameters, e.g. disable_codegen) concurrently. This test needs to be run serially. Change-Id: Ibca64d5567c24240606e454b052d130fcd0c3968 Reviewed-on: http://gerrit.cloudera.org:8080/5312 Reviewed-by: David Knupp <dknupp@cloudera.com> Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-12-02 04:01:19 +00:00
Tim Armstrong	b374061206	IMPALA-4564,IMPALA-4565: mt_dop fixes for old aggs and joins Fix a test bug where we need to skip nested types tests for the old aggs and joins. Fix a product bug where *eos is not initialised by the MT scan node. This causes incorrect results when the calling ExecNode does not initialise the eos variable, e.g. the sort node and the old agg and join nodes. Testing: Added a test that reproduces the incorrect results with the sort node when run under ASAN Tested the mt_dop tests locally with old aggs and joins to ensure they pass. Change-Id: I48c50c8aa0c23710eb099fba252bc3c0cb74b313 Reviewed-on: http://gerrit.cloudera.org:8080/5302 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2016-12-02 01:46:55 +00:00
Michael Ho	a41918d443	Fix E2E test infrastructure to handle missing exceptions correctly This change fixes a bug in the E2E infrastructure that handles the case when an expected exception wasn't thrown. The code was expecting that test_section['CATCH'] to be a string but in reality it's a list of strings. It also clarifies the error message about the missing exception. This change also enforces that the CATCH subsection in tests cannot be empty. Change-Id: I7d83c5db59e8a239e4e70694a1e625af6f21419c Reviewed-on: http://gerrit.cloudera.org:8080/5260 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2016-12-01 23:43:03 +00:00
Jim Apple	5a158dbcd1	IMPALA-4543: Properly escape ignored tests subdirectories. In the shell, double-quoted strings are not very close to "raw" strings; double quotes end the string, but parameter expansion is also performed forstrings like "${FOO}". To pass strings from Python to the shell, I have replaced double quotes with single quotes and escaped the single quote characters in the strings. While I am here, add better logging in TestExecutor.run_tests to make errors like this easier to diagnose. Change-Id: I006eb559ec5f5b5b0379997fab945116dfc7e8d7 Reviewed-on: http://gerrit.cloudera.org:8080/5242 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2016-11-30 22:30:39 +00:00
Michael Brown	585ed5aaae	IMPALA-4450: qgen: use string concatenation operator for postgres queries The random query generator writes a logical query Python object into Impala or PostgreSQL dialects. When the CONCAT() function is chosen, Impala's and PostgreSQL's CONCAT() implementations behave differently. However, PostgreSQL has a \|\| operator that functions like Impala's CONCAT(). The method added here overrides the default behavior for the PostgresqlSqlWriter. It prevents CONCAT(arg1, arg2, ..., argN) from being written and instead causes the SQL to be written as 'arg1 \|\| arg2 \|\| ... \|\| argN'. Testing: I made sure that we generate syntactically valid queries still on the PostgreSQL side. This includes queries that made use of string concatenation. I also re-ran some failed queries that previously produced different results. They now produce the same results. This is a very straightforward change, so unit or functional tests for this seem overkill. The full effects of using \|\| instead of CONCAT() are hard to test. It's not clear if in my manual testing of \|\| vs. CONCAT() that I missed some edge behavior, especially in some complicated query, nested expressions, GROUPing BY, and so on. Change-Id: I149b695889addfd7df4ca5f40dc991456da51687 Reviewed-on: http://gerrit.cloudera.org:8080/5034 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Sailesh Mukil <sailesh@cloudera.com>	2016-11-30 16:22:56 +00:00
Dimitris Tsirogiannis	9f497ba02f	IMPALA-2890: Support ALTER TABLE statements for Kudu tables With this commit, we add support for additional ALTER TABLE statements against Kudu tables. The new supported ALTER TABLE operations for Kudu are: - ADD/DROP range partitions. Syntax: ALTER TABLE <tbl_name> ADD [IF NOT EXISTS] RANGE <kudu_partition_spec> ALTER TABLE <tbl_name> DROP [IF EXISTS] RANGE <kudu_partition_spec> - ADD/DROP/RENAME column. Syntax: ALTER TABLE <tbl_name> ADD COLUMNS (col_spec, [col_spec, ...]) ALTER TABLE <tbl_name> DROP COLUMN <col_name> ALTER TABLE <tbl_name> CHANGE COLUMN <old> <new_name> <type> - Rename Kudu table using the 'kudu.table_name' table property. Example: ALTER TABLE <tbl_name> SET TBLPROPERTY ('kudu.tbl_name'='<new_name>'), will change the underlying Kudu table name to <new_name>. - Renaming the HMS/Catalog table entry of a Kudu table is supported using the existing ALTER TABLE <tbl_name> RENAME TO <new_tbl_name> syntax. Not supported: - ALTER TABLE <tbl_name> REPLACE COLUMNS Change-Id: I04bc87e04e05da5cc03edec79d13cedfd2012896 Reviewed-on: http://gerrit.cloudera.org:8080/5136 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-11-30 04:55:03 +00:00
Tim Armstrong	e2cde13a2b	IMPALA-4519: increase timeout in TestFragmentLifecycle Increase the timeout to over 120s to match datastream_sender_timeout_ms. This should avoid spurious test failures if we are unlucky and a sender gets stuck waiting for a receiver fragment that will never start. Testing: Ran the test in a loop for a while to flush out any flakiness. Change-Id: I9fe6e6c74538d0747e3eeb578cf0518494ff10c8 Reviewed-on: http://gerrit.cloudera.org:8080/5244 Tested-by: Impala Public Jenkins Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2016-11-29 01:46:05 +00:00
Tim Armstrong	fe8d994f0f	IMPALA-4541: fix test dimensions for test_codegen_mem_limit The test should only be run with codegen enabled. Change-Id: Iac460d2a1b69de638c557d7c8aa318a73ad0507b Reviewed-on: http://gerrit.cloudera.org:8080/5221 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-25 23:32:24 +00:00
Tim Armstrong	16552f6eda	IMPALA-4525: fix crash when codegen mem limit exceeded The error path in OptimizeLlvmModule() has not worked correctly for a long time because various places in the code assume that codegen'd function pointers will be filled in (e.g. ScalarFnCall) . Since the recent change "IMPALA-4397,IMPALA-3259: reduce codegen time and memory" it is more likely to go down this path. The cases when errors occur on this path: memory limit exceeded, internal codegen bugs, and corrupt IR UDFs, are all cases when it is not correct or safe to continue executing the query, so we should just fail the query. Testing: Add a test where codegen reliably fails with memory limit exceeded. Change-Id: Ib38d0a44b54c47617cad1b971244f477d344d505 Reviewed-on: http://gerrit.cloudera.org:8080/5211 Reviewed-by: Michael Ho <kwho@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-24 08:03:39 +00:00
Henry Robinson	e4fc5bd5c5	IMPALA-4488: HS2 GetOperationStatus() should keep session alive GetOperationStatus() is used by DBC drivers to check if a query is ready for its results to be fetched. However, it did not keep the associated session alive, so queries would time out if they took longer than the timeout to materialize their first rows to be fetched. Add withSession() to GetOperationStatus() * Add a test that failed before this patch, and succeeds after. Change-Id: Ibb3f66188209563b4b74b2ca96480f16ace0f190 Reviewed-on: http://gerrit.cloudera.org:8080/5213 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-24 05:39:45 +00:00
Alex Behm	bbf5255d0e	IMPALA-1788: Fold constant expressions. Adds a new ExprRewriteRule for replacing constant expressions with their literal equivalent via BE evaluation. Applies the new rule together with the existing ones on the parse tree, after analysis. Limitations - Constant folding is applied on the unresolved expressions. As a result, it only works for expressions that are constant within a single query block, as opposed to expressions that may become constant after fully substituting inline-view exprs. - Exprs are not normalized, so some opportunities for constant folding are missed for certain expr-tree shapes. This patch includes the following interesting changes: - Introduces a timestamp literal that can only be produced by constant folding (not expressible directly via SQL). - To make sure that rewrites have no user-visible effect, the original result types and column labels of the top-level statement are restored after the rewrites are performed. - Does not fold exprs if their evaluation resulted in a warning or error, or if the resulting value is not representable by corresponding FE LiteralExpr. - Fixes an existing issue with converting strings between the FE/BE. String produced in the BE that have characters with a value > 127 are not correctly deserialized into a Java String via thrift. We detect this case during constant folding and abandon folding of such exprs. - Fixes several issues with detecting/reporting errors in NativeEvalConstExprs(). - Cleans up ExprContext::GetValue() into ExprContext::GetConstantValue() which clarifies its only use of evaluating exprs from the FE. Testing: - Modifies expr-test.cc to run all tests through the constant folding path. - Adds basic planner and rewrite rule tests. - Exhaustive test run passed Change-Id: If672b703db1ba0bfc26e5b9130161798b40a69e9 Reviewed-on: http://gerrit.cloudera.org:8080/5109 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-23 21:11:30 +00:00
Michael Ho	1e306211d0	IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and fixes filter stats updates This change codegens HdfsParquetScanner::EvalRuntimeFilters() by unrolling its loop, codegen'ing the expression evaluation of the runtime filter and replacing some type information with constants in the hashing function of runtime filter to avoid branching at runtime. This change also fixes IMPALA-4495 by not counting a row as 'considered' in the filter stats before the filter arrives. This avoids unnecessarily marking a runtime filter as ineffective before it's even used. With this change, TPCDS-Q88 improves by 13-14%. primitive_broadcast_join_1 improves by 24%. Change-Id: I27114869840e268d17e91d6e587ef811628e3837 Reviewed-on: http://gerrit.cloudera.org:8080/4833 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2016-11-23 12:48:47 +00:00
Tim Armstrong	4db330e69a	IMPALA-4397,IMPALA-3259: reduce codegen time and memory A handful of fixes to codegen memory usage: * Delete the IR module when we're done with it (it can be fairly large) * Track the compiled code size (typically not that large, but it can add up if there are many fragments). * Estimate optimisation memory requirements and track it in the memory tracker. This is very crude but much better than not tracking it. A handful of fixes to improve codegen time/cost, particularly targeted at compute stats workloads: * Avoid over-inlining when there are many aggregate functions, conjuncts, etc by adding "NoInline" attributes. * Don't codegen non-grouping merge aggregations. They will only process one row per Impala daemon, so codegen is not worth it. * Make the Hll algorithm more efficient by specialising the hash function based on decimal width. Limitations: * This doesn't tackle over-inlining of large expr trees, but a similar approach will be used there in a follow-on patch. Perf: Compute stats on functional_parquet.widetable_1000_cols goes from 1min+ of codegen to ~ 5s codegen on my machine. Local perf runs of tpc-h and targeted perf showed no regressions and some moderate improvements (1-2%). Also did an experiment to understand the perf consequences of disabling inlining. I manually set CODEGEN_INLINE_EXPRS_THRESHOLD to 0, and ran: drop stats tpch_20_parquet.lineitem compute stats tpch_20_parquet.lineitem; There was no difference in time spent in the agg node: 30.7s with inlining, 30.5s without. Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091 Reviewed-on: http://gerrit.cloudera.org:8080/4956 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2016-11-23 08:18:17 +00:00
David Knupp	696fb68e58	IMPALA-4510: Selectively filter args for metric verification tests run-tests.py is a wrapper around impala-py.test. It abstracts away the need to invoke separate runs for serial tests, parallel tests, and metric verification tests. Because it's possible for a user to specify certain test suites, or even specific tests, on the command line when calling run-tests.py, it had been necessary to override the command line args when it came time to run the metric verification tests -- otherwise those other tests/suites would be rerun. Before this patch, we had simply been stripping away all command line args. However, that blanket approach causes problems when running tests against a remote cluster, because we need to retain those command line args that pertain to the remote cluster. This patch selectively prunes unwanted command line args for the last metric verification test stage, keeping the ones that we need, and also adds extensive documentation for explaining why we have to go through this fairly odd and elaborate step. This patch was tested by running a sample test suite locally, and against a remote cluster. Previously, the metric verification stage had been failing for remote cluster tests (since they were defaulting to localhost for services that were only available remotely.) With the patch, the remote verfification tests were passing. Also, while I'm here, add a small change that exits immediately if the user calls for --help. Before this, we actually still ran the tests. Change-Id: I069172f44c1307d55f85779cdb01fecc0ba1799e Reviewed-on: http://gerrit.cloudera.org:8080/5135 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Internal Jenkins	2016-11-23 05:46:18 +00:00
Alex Behm	8f2bb2f72f	IMPALA-3809: Show Kudu-specific column metadata in DESCRIBE. TODO: - Corresponding changes to DESCRIBE EXTENDED/FORMATTED. Testing: A private core/hdfs run passed. Change-Id: I83c91b540bc6d27cb4f21535fe12f3f8658c233e Reviewed-on: http://gerrit.cloudera.org:8080/5125 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-22 23:06:05 +00:00
Michael Ho	b7eeb8bf85	IMPALA-4432: Handle internal codegen disabling properly There are some conditions in which codegen is disabled internally even if it's enabled in the query option. For instance, the single node optimization or the expression evaluation requests sent from the FE to the BE. These internal disabling of codegen are advisory as their purposes are to reduce the latency for tables with no or very few rows. The internal disabling of codegen doesn't interact well with UDFs which cannot be interpreted (e.g. IR UDF) as it conflates the 'disable_codegen' query option set by the user. As a result, it's hard to differentiate between when codegen is disabled explicitly by users and when it is disabled internally. This change fixes the problem above by adding an explicit flag in TQueryCtx to indicate that codegen is disabled internally. This flag is only advisory. For cases in which codegen is needed to function, this internal flag is ignored and if codegen is disabled via query option, an error is thrown. For this new flag to work with ScalarFnCall, codegen needs to happen after ScalarFnCall::Prepare() because it's hard to tell if a fragment contains any UDF that cannot be interpreted until after ScalarFnCall::Prepare() is called. However, Prepare() needs the codegen object to codegen so it needs to be created before Prepare(). We can either always create the codegen module or defer codegen to a point after ScalarFnCall::Prepare(). The former has the downside of introducing unnecessary latency for say single-node optimization so the latter is implemented. It is needed as part of IMPALA-4192 any way. After this change, ScalarFnCall expressions which need to be codegen'd are inserted into a vector in RuntimeState in ScalarFnCall::Prepare(). Later in the codegen phase, these expressions' GetCodegendComputeFn() will be called after codegen for operators is done. If any of these expressions are already codegen'd indirectly by the operators, GetCodegendComputeFn() will be a no-op. This preserves the behavior that ScalarFnCall will always be codegen'd even if the fragment doesn't contain any codegen enabled operators. Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397 Reviewed-on: http://gerrit.cloudera.org:8080/5105 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2016-11-22 14:56:03 +00:00
Lars Volker	8ea21d099f	IMPALA-2523: Make HdfsTableSink aware of clustered input IMPALA-2521 introduced clustering for insert statements. This change makes the HdfsTableSink aware of clustered inputs, so that partitions are opened, written, and closed one by one. This change also adds/modifies tests in several ways: - clustered insert tests switch from selecting all rows from alltypessmall to alltypes. Together with varying settings for batch_size, this results in a larger number of row batches being written. - clustered insert tests select from alltypes instead of functional.alltypes to make sure we also select from various input formats. - clustered insert tests have been added to select from alltypestiny to create inserts with 1 and 2 rows per partition respectively. - exhaustive insert tests now use different values for batch_size: 1, 16, 0 (meaning default, 1024). This is limited to uncompressed parquet files, to maintain a reasonable runtime. On my machine execution of test.insert took 1778 seconds, compared to 1002 seconds with the just default row batch size. - There is additional testing in test_insert_behaviour.py to make sure that insertion over several row batches only creates one file per partition. - It renames the test_insert method to make it unique in the file and allow for effective filtering with -k. - It adds tests to the Analyzer test suite. Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 Reviewed-on: http://gerrit.cloudera.org:8080/4863 Reviewed-by: Lars Volker <lv@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-11-22 02:51:20 +00:00
Sailesh Mukil	178fd59142	IMPALA-4502: test_partition_ddl_predicates breaks on non-HDFS filesystems This is because that test uses 'set cached' and 'set uncached' which are not supported on non-HDFS filesystems. This patch creates a separate test file for non-HDFS filesystems with only supported queries and invokes the right file based on the filesystem. Change-Id: I8606aa427cb6e50be3395cdde246abb53db5172c Reviewed-on: http://gerrit.cloudera.org:8080/5164 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2016-11-22 00:42:57 +00:00
Dan Hecht	035b775a6d	IMPALA-4440: lineage timestamps can go backwards across daylight savings transitions Using TimestampValue (or equivalent string representation) for timestamps that require a point in time doesn't work because the same time can represent multiple point in times. For example, the timestamp: '2016-11-13 01:01 AM' occurred twice last weekend. Instead, we should use unix time directly rather than trying to derive unix time from a (timezone-less) timestamp. Note that there are other questionable uses of TimestampValue for internal Impala service stuff, but I want to fix them separately as they are not as important and fixing does add some risk. While I'm here, remove a template TimestampValue constructor that was unused and is confusing. We don't have any end-to-end tests that exercise column lineage, so add a simple custom cluster test that enables lineage and verifes the start and end unix times are within appropriate bounds. The other column lineage graph fields are at least tested via planner tests. Automated regression testing for the specifc daylight savings issue is difficult as we'd have to cross the daylight savings boundary at just the right time during query execution in order to reproduce reliably. But open to ideas. Testing: - loop the new test overnight without any failures. - exhaustive run. Change-Id: I34e435fc3511e65bc62906205cb558f2c116a8a9 Reviewed-on: http://gerrit.cloudera.org:8080/5129 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-11-21 22:18:37 +00:00
Dimitris Tsirogiannis	3db5ced4ce	IMPALA-3726: Add support for Kudu-specific column options This commit adds support for Kudu-specific column options in CREATE TABLE statements. The syntax is: CREATE TABLE tbl_name ([col_name type [PRIMARY KEY] [option [...]]] [, ....]) where option is: \| NULL \| NOT NULL \| ENCODING encoding_val \| COMPRESSION compression_algorithm \| DEFAULT expr \| BLOCK_SIZE num The output of the SHOW CREATE TABLE statement was altered to include all the specified column options for Kudu tables. Change-Id: I727b9ae1b7b2387db752b58081398dd3f3449c02 Reviewed-on: http://gerrit.cloudera.org:8080/5026 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-11-18 11:41:01 +00:00

1 2 3 4 5 ...

1104 Commits