impala

mirror of https://github.com/apache/impala.git synced 2026-01-09 15:00:11 -05:00

Author	SHA1	Message	Date
Taras Bobrovytsky	bd6d2df730	IMPALA-5527: Add nested testdata flattener The TableFlattener takes a nested dataset and creates an equivalent unnested dataset. The unnested dataset is saved as Parquet. When an array or map is encountered in the original table, the flattener creates a new table and adds an id column to it which references the row in the parent table. Joining on the id column should produce the original dataset. The flattened dataset should be loaded into Postgres in order to run the query generator (in nested types mode) on it. There is a script that automates generaration, flattening and loading random data into Postgres and Impala: testdata/bin/generate-load-nested.sh -f Testing: - ran ./testdata/bin/generate-load-nested.sh -f and random nested data was generated and flattened as expected. Change-Id: I7e7a8e53ada9274759a3e2128b97bec292c129c6 Reviewed-on: http://gerrit.cloudera.org:8080/5787 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-17 03:18:06 +00:00
Vincent Tran	d062257462	IMPALA-5494: Fixes the selectivity of NOT IN predicates This change modifies the logic of NOT IN predicate so that the planner can calculate the correct node cardinality. Prior to this change, both IN and NOT IN predicates shared the same selectivity, which resulted in the same cardinality during planning. The selectivity is calculated by the following heuristic: selectivity = 1 - (num of predicate children / num of distinct values) Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63 Reviewed-on: http://gerrit.cloudera.org:8080/7168 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-16 22:18:07 +00:00
Lars Volker	467ccd1950	IMPALA-5223: Add waiting for HBase Zookeeper nodes to retry loop Occasionally we'd see HBase fail to startup properly on CentOS 7 clusters. The symptom was that HBase would not open the required nodes in zookeeper, signaling its readiness. As a workaround, this change includes waiting for the Zookeeper nodes into the retry logic. Change-Id: Id8dbdff4ad02cac1322e7d580e0a6971daf6ea28 Reviewed-on: http://gerrit.cloudera.org:8080/7159 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: anujphadke <aphadke@cloudera.com> Reviewed-by: David Knupp <dknupp@cloudera.com> Tested-by: Lars Volker <lv@cloudera.com>	2017-06-13 05:57:49 +00:00
Tim Armstrong	7a0ee685b8	IMPALA-5446: dropped Sorter::Reset() status This patch aligns the sorter's methods closer with the ExecNode methods and moves the possibly-failing parts of Reset() into Open(). Testing: Added WARN_UNUSED_RESULT to all the sorter methods that return Status to prevent similar issues in future. Add a test that sometimes goes down this code path. It was able to cause a crash at least once every 5 executions. Ran an exhaustive build to make sure there were no other regressions. Change-Id: I7d4f9e93a44531901e663b3f1e18edc514363f74 Reviewed-on: http://gerrit.cloudera.org:8080/7134 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-13 05:13:44 +00:00
Thomas Tauber-Marshall	6dd50f6d0c	IMPALA-5453: test_create_table_like_file fails on enum.parquet A recent addition to test_create_table_like_file (IMPALA-2525) relies on a file, enum.parquet, being preloaded into HDFS, which is done by create-load-data.sh. The problem is that the test creates the table as an internal table with its location as the directory containing enum.parquet. When the test completes and the table is dropped, enum.parquet is deleted, so the test cannot be successfully run again, and a snapshot generated from the contents of HDFS afterwards will not contain the file. The fix is to create the table as an external table. Testing: - Ran the test and verfied enum.parquet is still present in HDFS. Change-Id: I6c386843e5ef5bf6fc208db1ff90be98fd8baacf Reviewed-on: http://gerrit.cloudera.org:8080/7139 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-09 23:29:17 +00:00
Henry Robinson	1886da45e8	IMPALA-5435: Increase runtime filter test timeouts (again) Codegen time under ASAN can take ~10s, making the 15s timeouts for runtime filter tests a bit small. Double those timeouts to 30s. Change-Id: I2280e08910430e271da2173e465731bba5aef6cf Reviewed-on: http://gerrit.cloudera.org:8080/7097 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-08 21:31:08 +00:00
anujphadke	70657a860a	IMPALA-5400: Execute tests in subplans.test This change executes the tests added to subplans.test and removes a test which incorrectly references subplannull_data.test (a file which does not exist) Change-Id: I02b4f47553fb8f5fe3425cde2e0bcb3245c39b91 Reviewed-on: http://gerrit.cloudera.org:8080/7038 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-08 02:02:00 +00:00
Michael Ho	91237051af	IMPALA-4164: Avoid overly aggressive inlining in LLVM IR When generating IR functions during codegen, we used to always tag the functions with the "AlwaysInline" attribute. That potentially leads to excessive inlining, causing very long optimization / compilation time with marginal performance benefit at runtime. One of the reasons for doing it was that the "target-cpu" and "target-features" attributes were missing in the generated IR functions so the LLVM inliner considers them incompatible with the cross-compiled functions. As a result, the inliner will not inline the generated IR functions into cross-compiled functions and vice versa unless the "AlwaysInline" attributes exist. This change fixes the problem above by setting the "target-cpu" and "target-features" attributes of all IR functions to match that of of the host's CPUs so both generated IR functions and cross-compiled functions will have the same values for those attributes. With these attributes set, we now rely on the inliner of LLVM to determine whether a function is worth being inlined. With this change, the codegen time of a query with very long predicate went from 15s to 4s and the overall runtime went from 19s to 8s. Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce Reviewed-on: http://gerrit.cloudera.org:8080/6941 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-08 00:16:12 +00:00
Jakub Kukul	0992a6afda	IMPALA-2525: Treat parquet ENUMs as STRINGs when creating impala tables. Change-Id: Ia7a2e20c3ab83eb3fac422c3b33c117856fec475 Reviewed-on: http://gerrit.cloudera.org:8080/6550 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-07 02:51:54 +00:00
Alex Behm	edf9f2ffb6	IMPALA-5438: Always eval union const exprs in subplan. The bug was that the constant exprs of a union were only evaluated for the first fragment instance. However, for a union inside a subplan, we should always evaluate the constant exprs. Testing: - Added a regression test. - Locally ran test_nested_types.py and the union tests in test_queries.py Change-Id: Icd2f21f0213188e2304f8e9536019c7940c07768 Reviewed-on: http://gerrit.cloudera.org:8080/7091 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-06 18:02:49 +00:00
aphadke	9c8c590886	IMPALA-5363: Reset probe_batch_ after reaching limit For every new iteration of a subplan there are leftover rows from the previous iteration of a subplan. This change transfers the ownership from the probe_batch_ to output_batch_ and resets the probe_batch_ on hitting the limit. Change-Id: Iafd621d33a4e2fac42391504566ffd8dd0e18a67 Reviewed-on: http://gerrit.cloudera.org:8080/7014 Tested-by: Impala Public Jenkins Reviewed-by: Lars Volker <lv@cloudera.com>	2017-06-06 00:53:13 +00:00
Alex Behm	ecda49f3e3	IMPALA-5381: Adds DEFAULT_JOIN_DISTRIBUTION_MODE query option. Adds a new query option DEFAULT_JOIN_DISTRIBUTION_MODE to control which join distribution mode is chosen when the join inputs have an unknown cardinality (e.g., missing stats) or when the expected costs of the different strategies are equal. Values for DEFAULT_JOIN_DISTRIBUTION_MODE: [BROADCAST, SHUFFLE] Default: BROADCAST Note that this change effectively undoes IMPALA-5120. Testing: - Added new planner tests - Core/hdfs run passed Change-Id: Ibd34442f422129d53bef5493fc9cbe7375a0765c Reviewed-on: http://gerrit.cloudera.org:8080/7059 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-04 08:11:53 +00:00
Matthew Jacobs	2dcbefc652	IMPALA-5338: Fix Kudu timestamp column default values While support for TIMESTAMP columns in Kudu tables has been committed (IMPALA-5137), it does not support TIMESTAMP column default values. This supports CREATE TABLE syntax to specify the default values, but more importantly this fixes the loading of Kudu tables that may have had default values set on UNIXTIME_MICROS columns, e.g. if the table was created via the python client. This involves fixing KuduColumn to hide the LiteralExpr representing the default value because it will be a BIGINT if the column type is TIMESTAMP. It is only needed to call toSql() and toStringValue(), so helper functions are added to KuduColumn to encapsulate special logic for TIMESTAMP. TODO: Add support and tests for ALTER setting the default value (when IMPALA-4622 is committed). Change-Id: I655910fb4805bb204a999627fa9f68e43ea8aaf2 Reviewed-on: http://gerrit.cloudera.org:8080/6936 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-02 01:47:48 +00:00
Lars Volker	f7f8c4518a	IMPALA-4988: Add query option read_parquet_statistics This change adds a query option to disable reading Parquet statistics. It provides a workaround when dealing with files that have corrupt parquet statistics. Note that Impala handles Parquet files affected by PARQUET-251 correctly by ignoring statistics for anything but plain numeric types. This query option is supposed to help with files affected by unknown or errors or by errors that are yet to be made. Change-Id: I427f7fde40d0f4b703751e40f3c2109a850643f7 Reviewed-on: http://gerrit.cloudera.org:8080/7001 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-01 01:00:13 +00:00
Jim Apple	07a7138817	Add a script to test performance on a developer machine This is a migration from an old and broken script from another repository. Example use: bin/single_node_perf_run.py --ninja --workloads targeted-perf \ --load --scale 4 --iterations 20 --num_impalads 3 \ --start_minicluster --query_names PERF_AGG-Q3 \ $(git rev-parse HEAD~1) $(git rev-parse HEAD) The script can load data, run benchmarks, and compare the statistics of those runs for significant differences in performance. It glues together buildall.sh, bin/load-data.py, bin/run-workload.py, and tests/benchmark/report_benchmark_results.py. Change-Id: I70ba7f3c28f612a370915615600bf8dcebcedbc9 Reviewed-on: http://gerrit.cloudera.org:8080/6818 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-05-31 08:10:48 +00:00
Joe McDonnell	4e56cad8bf	IMPALA-5379: Add test for parquet_dictionary_filtering The current code only tests with the default setting for parquet_dictionary_filtering, which is true. This adds a test to verify that parquet_dictionary_filtering set to false does not filter any row groups. Change-Id: If3175ce1d01c806d822c2782d60ca10939e7179e Reviewed-on: http://gerrit.cloudera.org:8080/7021 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-31 06:49:43 +00:00
Michael Ho	f15589573b	IMPALA-5376: Loads all TPC-DS tables This change loads the missing tables in TPC-DS. In addition, it also fixes up the loading of the partitioned table store_sales so all partitions will be loaded. The existing TPC-DS queries are also updated to use the parameters for qualification runs as noted in the TPC-DS specification. Some hard-coded partition filters were also removed. They were there due to the lack of dynamic partitioning in the past. Some missing TPC-DS queries are also added to this change, including query28 which discovered the infamous IMPALA-5251. Having all tables in TPC-DS available paves the way for us to include all supported TPCDS queries in our functional testing. Due to the change in the data, planner tests and the E2E tests have different results than before. The results of E2E tests were compared against the run done with Netezza and Vertica. The divergence were all due to the truncation behavior of decimal types in DECIMAL_V1. Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9 Reviewed-on: http://gerrit.cloudera.org:8080/6877 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-27 05:19:53 +00:00
Alex Behm	e89d7057a6	IMPALA-2373: Extrapolate row counts for HDFS tables. The main idea of this patch is to use table stats to extrapolate the row counts for new/modified partitions. Existing behavior: - Partitions that lack the row count stat are ignored when estimating the cardinality of HDFS scans. Such partitions effectively have an estimated row count of zero. - We always use the row count stats for partitions that have one. The row count may be innaccurate if data in such partitions has changed significantly. Summary of changes: - Enhance COMPUTE STATS to also store the total number of file bytes in the table. - Use the table-level row count and file bytes stats to estimate the number of rows in a scan. - A new impalad startup flag is added to enable/disable the extrapolation behavior. The feature is disabled by default. Note that even with the feature disabled, COMPUTE STATS stores the file bytes so you can enable the feature without having to run COMPUTE STATS again. Testing: - Added new FE unit test - Added new EE test Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4 Reviewed-on: http://gerrit.cloudera.org:8080/6840 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-26 21:06:17 +00:00
Thomas Tauber-Marshall	014c5603f8	IMPALA-5354: INSERT hints for Kudu tables A previous change, IMPALA-3742, added an exchange node and sort node to plans for inserts into Kudu tables to partition and sort the input to match the target table. This patch enables INSERT hints for Kudu tables - 'noshuffle' which removes the exchange node from the plan and 'noclustered' which removes the sort node. Insert hints have no effect for inserts that are small enough to result in a single node execution. Testing: - Updated FE planner and analysis tests. - Ran Kudu EE tests. Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05 Reviewed-on: http://gerrit.cloudera.org:8080/6980 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-25 21:08:59 +00:00
Grant Henke	4471eb3b95	IMPALA-5369: Remove old pom parent in testdata module Change-Id: Ie9013aeb5afd631546b3333da9201d0345dc9321 Reviewed-on: http://gerrit.cloudera.org:8080/6992 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-25 20:36:25 +00:00
Sailesh Mukil	50bd015f2d	IMPALA-5333: Add support for Impala to work with ADLS This patch leverages the AdlFileSystem in Hadoop to allow Impala to talk to the Azure Data Lake Store. This patch has functional changes as well as adds test infrastructure for testing Impala over ADLS. We do not support ACLs on ADLS since the Hadoop ADLS connector does not integrate ADLS ACLs with Hadoop users/groups. For testing, we use the azure-data-lake-store-python client from Microsoft. This client seems to have some consistency issues. For example, a drop table through Impala will delete the files in ADLS, however, listing that directory through the python client immediately after the drop, will still show the files. This behavior is unexpected since ADLS claims to be strongly consistent. Some tests have been skipped due to this limitation with the tag SkipIfADLS.slow_client. Tracked by IMPALA-5335. The azure-data-lake-store-python client also only works on CentOS 6.6 and over, so the python dependencies for Azure will not be downloaded when the TARGET_FILESYSTEM is not "adls". While running ADLS tests, the expectation will be that it runs on a machine that is at least running CentOS 6.6. Note: This is only a test limitation, not a functional one. Clusters with older OSes like CentOS 6.4 will still work with ADLS. Added another dependency to bootstrap_build.sh for the ADLS Python client. Testing: Ran core tests with and without TARGET_FILESYSTEM as 'adls' to make sure that all tests pass and that nothing breaks. Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Reviewed-on: http://gerrit.cloudera.org:8080/6910 Tested-by: Impala Public Jenkins Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>	2017-05-25 19:35:24 +00:00
Tim Armstrong	b4343895d8	IMPALA-4923: reduce memory transfer for selective scans Most of the code changes are to restructure things so that the scratch batch's tuple buffer is stored in a separate MemPool from auxiliary memory such as decompression buffers. This part of the change does not change the behaviour of the scanner in itself, but allows us to recycle the tuple buffer without holding onto unused auxiliary memory. The optimisation is implemented in TryCompact(): if enough rows were filtered out during the copy from the scratch batch to the output batch, the fixed-length portions of the surviving rows (if any) are copied to a new, smaller, buffer, and the original, larger, buffer is reused for the next scratch batch. Previously the large buffer was always attached to the output batch, so a large buffer was transferred between threads for every scratch batch processed. In combination with the decompression buffer change in IMPALA-5304, this means that in many cases selective scans don't produce nearly as many empty or near-empty batches and do not attach nearly as much memory to each batch. Performance: Even on an 8 core machine I see some speedup on selective scans. Profiling with "perf top" also showed that time in TCMalloc was reduced - it went from several % of CPU time to a minimal amount. Running TPC-H on the same machine showed a ~5% overall improvement and no regressions. E.g. Q6 got 20-25% faster. I hope to do some additional cluster benchmarking on systems with more cores to verify that the severe performance problems there are fixed, but in the meantime it seems like we have enough evidence that it will at least improve things. Testing: Add a couple of selective scans that exercise the new code paths. Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea Reviewed-on: http://gerrit.cloudera.org:8080/6949 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-25 02:55:36 +00:00
Alex Behm	ee0fc260d1	IMPALA-5309: Adds TABLESAMPLE clause for HDFS table refs. Syntax: <tableref> TABLESAMPLE SYSTEM(<number>) [REPEATABLE(<number>)] The first number specifies the percent of table bytes to sample. The second number specifies the random seed to use. The sampling is coarse-grained. Impala keeps randomly adding files to the sample until at least the desired percentage of file bytes have been reached. Examples: SELECT * FROM t TABLESAMPLE SYSTEM(10) SELECT * FROM t TABLESAMPLE SYSTEM(50) REPEATABLE(1234) Testing: - Added parser, analyser, planner, and end-to-end tests - Private core/hdfs run passed Change-Id: Ief112cfb1e4983c5d94c08696dc83da9ccf43f70 Reviewed-on: http://gerrit.cloudera.org:8080/6868 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-24 02:38:08 +00:00
Lars Volker	0c8b2d3dbe	IMPALA-5144: Remove sortby() hint The sortby() hint is superseded by the SORT BY SQL clause, which has been introduced in IMPALA-4166. This changes removes the hint. Change-Id: I83e1cd6fa7039035973676322deefbce00d3f594 Reviewed-on: http://gerrit.cloudera.org:8080/6885 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-22 00:40:04 +00:00
Lars Volker	3610533f4b	IMPALA-5339: Fix analysis with sort.columns and expr rewrites IMPALA-4166 introduced a bug by duplicating code that adds sort expressions. Upon re-analysis, this code would hit an IndexOutOfBoundsException. Change-Id: Ibebba29509ae7eaa691fe305500cda6bd41a179a Reviewed-on: http://gerrit.cloudera.org:8080/6921 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-20 00:56:45 +00:00
Zach Amsden	d6e612f5c7	IMPALA-5180: Don't use non-deterministic exprs in partition pruning Non-deterministic exprs which evaluate as constant should not be used during HDFS partition pruning. We consider Exprs which have no SlotRefs as bound by default, and thus we end up trying to apply them indisrciminately. Constant propagation makes this situation easier to run into and the behavior is rather unexpected. The fix for now is to explicitly disallow non-deterministic Exprs in partition pruning. Change-Id: I91054c6bf017401242259a1eff5e859085285546 Reviewed-on: http://gerrit.cloudera.org:8080/6575 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-19 08:50:25 +00:00
Matthew Jacobs	6226e59702	IMPALA-5137: Support TIMESTAMPs in Kudu range predicate DDL Adds support in DDL for timestamps in Kudu range partition syntax. For convenience, strings can be specified with or without explicit casts to TIMESTAMP. E.g. create table ts_ranges (ts timestamp primary key, i int) partition by range ( partition '2009-01-02 00:00:00' <= VALUES < '2009-01-03 00:00:00' ) stored as kudu Range bounds are converted to Kudu UNIXTIME_MICROS during analysis. Testing: Adds FE and EE tests. Change-Id: Iae409b6106c073b038940f0413ed9d5859daaeff Reviewed-on: http://gerrit.cloudera.org:8080/6849 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-19 00:41:46 +00:00
Matthew Jacobs	24c77f194b	IMPALA-5137: Support pushing TIMESTAMP predicates to Kudu This change builds on the support for reading and writing TIMESTAMP columns to Kudu tables (see [1]), adding support for pushing TIMESTAMP predicates to Kudu for scans. Binary predicates and IN list predicates are supported. Testing: Added some planner and EE tests to validate the behavior. 1: https://gerrit.cloudera.org/#/c/6526/ Change-Id: I08b6c8354a408e7beb94c1a135c23722977246ea Reviewed-on: http://gerrit.cloudera.org:8080/6789 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-18 21:09:51 +00:00
Matthew Jacobs	d04f96b990	IMPALA-5301: Set Kudu minicluster memory limit By default, Kudu assumes it has 80% of system memory which is far too high for the minicluster. This sets a mem limit of 2gb and lowers the limit of the block cache. These values were tested on a gerrit-verify-dryrun job as well as an exhaustive run. This patch also simplifies TestKuduMemLimits which was unnecessarily creating a large table during test execution. Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5 Reviewed-on: http://gerrit.cloudera.org:8080/6844 Reviewed-by: Todd Lipcon <todd@apache.org> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-17 23:44:37 +00:00
Matthew Jacobs	7c368999f8	IMPALA-5319: Fix test_hdfs_scan_node_errors failures The recent Kudu TIMESTAMP patch (IMPALA-5137) made an inadvertent change [1] to alltypeserror_tmp and alltypeserrornonulls_tmp, changing 'timestamp_col' from STRING to TIMESTAMP. This seems to cause failures on exhaustive jobs which run test_hdfs_scan_node_errors against all file-formats. I haven't been able to reproduce this failure myself, so cannot test whether this fixes the jobs that are failing, but this change to revert these tables seems warranted given they were changed inadvertently. 1: https://gerrit.cloudera.org/#/c/6526/11/testdata/datasets/functional/functional_schema_template.sql Change-Id: I533f1921662802ea6e076eefac973f50c014fcb5 Reviewed-on: http://gerrit.cloudera.org:8080/6891 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2017-05-17 16:34:14 +00:00
Lars Volker	1ada9dac88	IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); Sort columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying sort columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the sort columns. We do this because sort columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. Sort columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Until then, it is possible to specify sort columns using both ways at the same time and the column lists will be concatenated. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Reviewed-on: http://gerrit.cloudera.org:8080/6495 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-12 15:43:30 +00:00
Matthew Jacobs	a16a0fa84d	IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP Adds Impala support for TIMESTAMP types stored in Kudu. Impala stores TIMESTAMP values in 96-bits and has nanosecond precision. Kudu's timestamp is a 64-bit microsecond delta from the Unix epoch (called UNIXTIME_MICROS), so a conversion is necessary. When writing to Kudu, TIMESTAMP values in nanoseconds are averaged to the nearest microsecond. When reading from Kudu, the KuduScanner returns UNIXTIME_MICROS with 8bytes of padding so Impala can convert the value to a TimestampValue in-line and copy the entire row. Testing: Updated the functional_kudu schema to use TIMESTAMPs instead of converting to STRING, so this provides some decent coverage. Some BE tests were added, and some EE tests as well. TODO: Support pushing down TIMESTAMP predicates TODO: Support TIMESTAMPs in range partitioning expressions Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d Reviewed-on: http://gerrit.cloudera.org:8080/6526 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-11 20:55:51 +00:00
Thomas Tauber-Marshall	b8c8fb1b43	IMPALA-5294: Kudu INSERT partitioning fails with constants An INSERT into a Kudu table with a constant value being inserted into a partition column causes an IllegalStateExcaption. This is because DistributedPlanner removes constants from the list of partition exprs before creating the KuduPartitionExpr, but KuduPartitionExpr expects to get one expr per partition column. The fix is to pass the full list of partition exprs into the KuduPartitionExpr, instead of the list that has had constants removed. This preserves the behavior that if all of the partition exprs are constant we fall back to UNPARTITIONED. One complication is that if a partition expr is a NullLiteral, it must be cast to a specific type to be passed to the BE. The InsertStmt will cast the partition exprs to the partition column types, but these casts may be lost from the copies of the partition exprs stored by the KuduPartitionExpr during reset(). To fix this, the KuduPartitionExpr can store the types of the partition cols and recast the partition exprs to those types during analyze(). Change-Id: I12cbb319f9a5c47fdbfee347b47650186b27f8f9 Reviewed-on: http://gerrit.cloudera.org:8080/6828 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-10 22:53:02 +00:00
Lars Volker	9270346825	IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types This change adds functionality to write and read parquet::Statistics for Decimal, String, and Timestamp values. As an exception, we don't read statistics for CHAR columns, since CHAR support is broken in Impala (IMPALA-1652). This change also switches from using the deprecated fields 'min' and 'max' to populate the new fields 'min_value' and 'max_value' in parquet::Statistics, that were added in parquet-format pull request #46. The HdfsParquetScanner will preferably read the new fields if they are populated and if the column order 'TypeDefinedOrder' has been used to compute the statistics. For columns without a column order set or with only the deprecated fields populated, the scanner will read them only if they are of simple numeric type, i.e. boolean, integer, or floating point. This change removes the validation of the Parquet Statistics we write to Hive from the tests, since Hive does not write the new fields. Instead it adds a parquet file written by Hive that uses the deprecated fields for its statistics. It uses that file to exercise the fallback logic for supported types in a test. This change also cleans up the interface of ParquetPlainEncoder in parquet-common.h. Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 Reviewed-on: http://gerrit.cloudera.org:8080/6563 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Lars Volker <lv@cloudera.com>	2017-05-09 15:47:21 +00:00
Lars Volker	12f3ecceab	IMPALA-5287: Test skip.header.line.count on gzip This change fixed IMPALA-4873 by adding the capability to supply a dict 'test_file_vars' to run_test_case(). Keys in this dict will be replaced with their values inside test queries before they are executed. Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 Reviewed-on: http://gerrit.cloudera.org:8080/6817 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-09 01:36:46 +00:00
Thomas Tauber-Marshall	aca07ee816	IMPALA-5120: Default to partitioned join when stats are missing Previously, we defaulted to broadcast join when stats were missing, but this can lead to disastrous plans when the right hand side is actually large. Its always difficult to make good plans when stats are missing, but defaulting to partitioned joins should reduce the risk of disastrous plans. Testing: - Added a planner test that joins a table with no stats. Change-Id: Ie168ecfcd5e7c5d3c60d16926c151f8f134c81e0 Reviewed-on: http://gerrit.cloudera.org:8080/6803 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-08 19:05:11 +00:00
Jim Apple	374f1121da	IMPALA-3224: De-Cloudera non-docs JIRA URLs John Russell is planning to fix the URLS in docs in a separate commit. Fixed using: (git ls-files \| xargs replace \ 'https://issues.cloudera.org/browse/IMPALA' 'IMPALA' --) && \ git checkout HEAD docs Change-Id: I28ea06e89341de234f9005fdc72a2e43f0ab8182 Reviewed-on: http://gerrit.cloudera.org:8080/6487 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-05-07 04:44:57 +00:00
Joe McDonnell	aa05c6493b	IMPALA-3654: Parquet stats filtering for IN predicate This generates min/max predicates for InPredicates that have only constant values in the IN list. It is only used for statistics filtering on Parquet files. Change-Id: I4a88963a7206f40a867e49eceeaf03fdd4f71997 Reviewed-on: http://gerrit.cloudera.org:8080/6810 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-06 03:40:57 +00:00
Taras Bobrovytsky	50e3abdc3d	IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals() The slot descriptor vectors are not guaranteed to be sorted on the slot index within a tuple. As a result, TupleDescriptor::LayoutEquals() sometimes returned a wrong result. In this patch, we sort the vectors of slot descriptors on the slot index within the tuple before comparing the vectors. Testing: - ran EE tests locally. Change-Id: I426ad244678dbfe517262dfb7bbf4adc0247a35e Reviewed-on: http://gerrit.cloudera.org:8080/6610 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-04 02:04:03 +00:00
Michael Brown	8b459dffec	IMPALA-5162,IMPALA-5163: stress test support on secure clusters This patch adds support for running the stress test (concurrent_select.py) and loading nested data (load_nested.py) into a Kerberized, SSL-enabled Impala cluster. It assumes the calling user already has a valid Kerberos ticket. One way to do that is: 1. Get access to a keytab and krb5.config 2. Set KRB5_CONFIG and KRB5CCNAME appropriately 3. Run kinit(1) 4. Run load_nested.py and/or concurrent_select.py within this environment. Because our Python clients already support Kerberos and SSL, we simply need to make sure to use the correct options when calling the entry points and initializing the clients: Impala: Impyla Hive: Impyla HDFS: hdfs.ext.kerberos.KerberosClient With this patch, I was able to manually do a short concurrent_select.py run against a secure cluster without connection or auth errors, and I was able to do the same with load_nested.py for a cluster that already had TPC-H loaded. Follow-ons for future cleanup work: IMPALA-5263: support CA bundles when running stress test against SSL'd Impala IMPALA-5264: fix InsecurePlatformWarning under stress test with SSL Change-Id: I0daad57bb8ceeb5071b75125f11c1997ed7e0179 Reviewed-on: http://gerrit.cloudera.org:8080/6763 Reviewed-by: Matthew Mulder <mmulder@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-02 04:56:01 +00:00
Thomas Tauber-Marshall	801c95f39f	IMPALA-3742: Partitions and sort INSERTs for Kudu tables Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu are currently painful because we just send rows randomly, which creates a lot of work for Kudu since it partitions and sorts data before writing, causing writes to be slow and leading to timeouts. We can alleviate this by sending the rows to Kudu already partitioned and sorted. This patch partitions and sorts rows according to Kudu's partitioning scheme for INSERTs and UPSERTs. A followup patch will handle UPDATE and DELETE. It accomplishes this by inserting an exchange node and a sort node into the plan before the operation. Both the exchange and the sort are given a KuduPartitionExpr which takes a row and calls into the Kudu client to return its partition number. It also disallows INSERT hints for Kudu tables, since the hints that we support (SHUFFLE, CLUSTER, SORTBY), so longer make sense. Testing: - Updated planner tests. - Ran the Kudu functional tests. - Ran performance tests demonstrating that we can now handle much larger inserts without having timeouts. Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Reviewed-on: http://gerrit.cloudera.org:8080/6559 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-02 01:40:43 +00:00
Zach Amsden	77304530f1	IMPALA-5003: Constant propagation in scan conjuncts Implements constant propagation within conjuncts and applies the optimization to scan conjuncts and collection conjuncts within Hdfs scan nodes. The optimization is applied during planning. At scan nodes in particular, we want to optimize to enable partition pruning. In certain cases, we might end up with a FALSE conditional, which now will convert to an EmptySet node. Testing: Expanded the test cases for the planner to achieve constant propagation. Added Kudu, datasource, Hdfs and HBase tests to validate we can create EmptySetNodes. Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 Reviewed-on: http://gerrit.cloudera.org:8080/6389 Tested-by: Impala Public Jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2017-05-02 01:12:14 +00:00
Dan Hecht	741421de09	IMPALA-5252: Fix crash in HiveUdfCall::GetStringVal() when mem_limit exceeded We need to check for AllocateLocal() returning NULL. CopyFrom() takes care of that for us. Also adjust a few other places in the code base that didn't have the check. The new test reproduces the crash, but in order to get this test file to execute, I had to move the xfail to be a function decorator. Apparently xfail as a statement causes the test to not run at all. We should run all of these queries even if they are non-determistic to at least verify that impalad does not crash. Change-Id: Iafefef24479164cc4d2b99191d2de28eb8b311b6 Reviewed-on: http://gerrit.cloudera.org:8080/6761 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-04-29 02:23:51 +00:00
Thomas Tauber-Marshall	6cddb952ce	IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs Previously, exprs used in sorts were evaluated lazily. This can potentially be bad for performance if the exprs are expensive to evaluate, and it can lead to crashes if the exprs are non-deterministic, as this violates assumptions of our sorting algorithm. This patch addresses these issues by materializing ordering exprs. It does so when the expr is non-deterministic (including when it contains a UDF, which we cannot currently know if they are non-deterministic), or when its cost exceeds a threshold (or the cost is unknown). Testing: - Added e2e tests in test_sort.py. - Updated planner tests. Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e Reviewed-on: http://gerrit.cloudera.org:8080/6322 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins	2017-04-26 22:34:04 +00:00
Michael Ho	42ca45e830	IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg Since commit `d2d3f4c` (on asf-master), TAggregateExpr contains the logical input types of the Aggregate Expr. The reason they are included is that merging aggregate expressions will have input tyes of the intermediate values which aren't necessarily the same as the input types. For instance, NDV() uses a binary blob as its intermediate value and it's passed to its merge aggregate expressions as a StringVal but the input type of NDV() in the query could be DecimalVal. In this case, we consider DecimalVal as the logical input type while StringVal is the intermediate type. The logical input types are accessed by the BE via GetConstFnAttr() during interpretation and constant propagation during codegen. To handle distinct aggregate expressions (e.g. select count(distinct)), the FE uses 2-phase aggregation by introducing an extra phase of split/merge aggregation in which the distinct aggregate expressions' inputs are coverted and added to the group-by expressions in the first phase while the non-distinct aggregate expressions go through the normal split/merge treatement. The bug is that the existing code incorrectly propagates the intermediate types of the non-grouping aggregate expressions as the logical input types to the merging aggregate expressions in the second phase of aggregation. The input aggregate expressions for the non-distinct aggregate expressions in the second phase aggregation are already merging aggregate expressions (from phase one) in which case we should not treat its input types as logical input types. This change fixes the problem above by checking if the input aggregate expression passed to FunctionCallExpr.createMergeAggCall() is already a merging aggregate expression. If so, it will use the logical input types recorded in its 'mergeAggInputFn_' as references for its logical input types instead of the aggregate expression input types themselves. Change-Id: I158303b20d1afdff23c67f3338b9c4af2ad80691 Reviewed-on: http://gerrit.cloudera.org:8080/6724 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-04-26 21:40:32 +00:00
aphadke	5809317c9a	IMPALA-4893: Efficiently update the rows read counter for sequence file Update the rows read counter after processing the scan range instead of updating it after reading every row for sequence files to save CPU cycles. Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 Reviewed-on: http://gerrit.cloudera.org:8080/6522 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Impala Public Jenkins	2017-04-26 01:12:01 +00:00
Attila Jeges	59b2db6ba7	IMPALA-3079: Fix sequence file writer This change fixes the following issues in the Sequence File Writer: 1. ReadWriteUtil::VLongRequiredBytes() and ReadWriteUtil::PutVLong() were broken. As a result, Impala created corrupt uncompressed sequence files. 2. KEY_CLASS_NAME was missing from the sequence file header. As a result, Hive could not read back uncompressed sequence files created by Impala. 3. Impala created record-compressed sequence files with empty keys block. As a result, Hive could not read back record-compressed sequence files created by Impala. 4. Impala created block-compressed files with: - empty key-lengths block - empty keys block - empty value-lengths block This resulted in invalid block-compressed sequence files that Hive could not read back. 5. In some cases the wrong Record-compression flag was written to the sequence file header. As a result, Hive could not read back record- compressed sequence files created by Impala. 6. Impala added 'sync_marker' instead of 'neg1_sync_marker' to the beginning of blocks in block-compressed sequence files. Hive could not read these files back. 7. The calculation of block sizes in SnappyBlockCompressor class was incorrect for odd-length buffers. Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 Reviewed-on: http://gerrit.cloudera.org:8080/6107 Reviewed-by: Michael Ho <kwho@cloudera.com> Reviewed-by: Attila Jeges <attilaj@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-04-25 21:07:53 +00:00
Thomas Tauber-Marshall	915a16345c	IMPALA-5125: SimplifyConditionalsRule incorrectly handles aggregates This patch addresses 3 issues: - SelectList.reset() didn't properly reset some of its members, though they're documented as needing to be reset. This was causing a crash when the Planner attempted to make an aggregation node for an agg function that had been eliminated by expr rewriting. While I'm here, I added resetting of all of SelectList's members that need to be reset, and fixed the documentation of one member that shouldn't be reset. - SimplifyConditionalsRule was changing the meaning of queries that contain agg functions, e.g. because "select if(true, 0, sum(id))" is not equivalent to "select 0". The fix is to not return the simplfied expr if it removes all aggregates. - ExprRewriteRulesTest was performing rewrites on the result exprs of the SelectStmt, which causes problems if the result exprs have been substituted. In normal query execution, we don't rewrite the result exprs anyway, so the fix is to match normal query execution and rewrite the select list exprs. Testing: - Added e2e test to exprs.test. - Added unit test to ExprRewriteRulesTest. Change-Id: Ic20b1621753980b47a612e0885804363b733f6da Reviewed-on: http://gerrit.cloudera.org:8080/6653 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins	2017-04-24 21:41:11 +00:00
Taras Bobrovytsky	75553165ee	IMPALA-4883: Union Codegen For each non-passthrough child of the Union node, codegen the loop that does per row tuple materialization. Testing: Ran test_queries.py test locally in exchaustive mode. Benchmark: Ran a local benchmark on a local 10 GB TPCDS dataset on an unpartitioned store_sales table. SELECT COUNT(c), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: 39s704ms Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------------------ 13:AGGREGATE 1 194.504us 194.504us 1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE 1 17.284us 17.284us 3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 3 2s202ms 2s934ms 3 1 115.00 KB 10.00 MB 00:UNION 3 32s514ms 34s926ms 288.01M 288.01M 3.08 MB 0 \|--02:SCAN HDFS 3 158.373ms 216.085ms 28.80M 28.80M 489.71 MB 1.88 GB tpcds_10_parquet.store_sales \|--03:SCAN HDFS 3 167.002ms 171.738ms 28.80M 28.80M 489.74 MB 1.88 GB tpcds_10_parquet.store_sales \|--04:SCAN HDFS 3 125.331ms 145.496ms 28.80M 28.80M 489.57 MB 1.88 GB tpcds_10_parquet.store_sales \|--05:SCAN HDFS 3 148.478ms 194.311ms 28.80M 28.80M 489.69 MB 1.88 GB tpcds_10_parquet.store_sales \|--06:SCAN HDFS 3 143.995ms 162.781ms 28.80M 28.80M 489.57 MB 1.88 GB tpcds_10_parquet.store_sales \|--07:SCAN HDFS 3 169.731ms 250.201ms 28.80M 28.80M 489.58 MB 1.88 GB tpcds_10_parquet.store_sales \|--08:SCAN HDFS 3 164.110ms 254.374ms 28.80M 28.80M 489.61 MB 1.88 GB tpcds_10_parquet.store_sales \|--09:SCAN HDFS 3 135.631ms 162.117ms 28.80M 28.80M 489.63 MB 1.88 GB tpcds_10_parquet.store_sales \|--10:SCAN HDFS 3 138.736ms 167.778ms 28.80M 28.80M 489.67 MB 1.88 GB tpcds_10_parquet.store_sales 01:SCAN HDFS 3 202.015ms 248.728ms 28.80M 28.80M 489.68 MB 1.88 GB tpcds_10_parquet.store_sales After: 20s177ms Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------------------ 13:AGGREGATE 1 174.617us 174.617us 1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE 1 16.693us 16.693us 3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 3 2s830ms 3s615ms 3 1 115.00 KB 10.00 MB 00:UNION 3 4s296ms 5s258ms 288.01M 288.01M 3.08 MB 0 \|--02:SCAN HDFS 3 1s212ms 1s340ms 28.80M 28.80M 488.82 MB 1.88 GB tpcds_10_parquet.store_sales \|--03:SCAN HDFS 3 1s387ms 1s570ms 28.80M 28.80M 489.37 MB 1.88 GB tpcds_10_parquet.store_sales \|--04:SCAN HDFS 3 1s224ms 1s347ms 28.80M 28.80M 487.22 MB 1.88 GB tpcds_10_parquet.store_sales \|--05:SCAN HDFS 3 1s245ms 1s321ms 28.80M 28.80M 489.25 MB 1.88 GB tpcds_10_parquet.store_sales \|--06:SCAN HDFS 3 1s232ms 1s505ms 28.80M 28.80M 484.21 MB 1.88 GB tpcds_10_parquet.store_sales \|--07:SCAN HDFS 3 1s348ms 1s518ms 28.80M 28.80M 488.20 MB 1.88 GB tpcds_10_parquet.store_sales \|--08:SCAN HDFS 3 1s231ms 1s335ms 28.80M 28.80M 483.58 MB 1.88 GB tpcds_10_parquet.store_sales \|--09:SCAN HDFS 3 1s179ms 1s349ms 28.80M 28.80M 482.76 MB 1.88 GB tpcds_10_parquet.store_sales \|--10:SCAN HDFS 3 1s121ms 1s154ms 28.80M 28.80M 486.59 MB 1.88 GB tpcds_10_parquet.store_sales 01:SCAN HDFS 3 1s284ms 1s523ms 28.80M 28.80M 486.70 MB 1.88 GB tpcds_10_parquet.store_sales Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439 Reviewed-on: http://gerrit.cloudera.org:8080/6459 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-04-21 04:53:09 +00:00
Thomas Tauber-Marshall	baba8960b3	IMPALA-5217: KuduTableSink checks null constraints incorrectly KuduTableSink uses the referenced_columns map to translate between the index into the output exprs 'j' and the index into columns in the Kudu table 'col', but we incorrectly use 'j' when calling into the Kudu table schema to check the nullability of columns. Testing: - Added e2e tests to kudu_insert.test Change-Id: I8ed458278f135288a821570939de8ee294183df2 Reviewed-on: http://gerrit.cloudera.org:8080/6670 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins	2017-04-20 23:27:59 +00:00

1 2 3 4 5 ...

1595 Commits