impala

mirror of https://github.com/apache/impala.git synced 2026-01-06 06:01:03 -05:00

Author	SHA1	Message	Date
Jinchul	99962d2e81	IMPALA-4168: Adds Oracle-style hint placement for INSERT/UPSERT Allow to specify Oracle-style hint on INSERT/UPSERT statements. For example, - insert /* +noshuffle / into table functional.alltypes partition(year, month) select from functional.alltypes; - upsert /* +noshuffle / into functional_kudu.alltypes select from functional.alltypes; Testing: Add unit tests to ParserTest#TestPlanHints Add plan check tests to PlannerTest#testInsert, PlannerTest#testKuduUpsert Add tests to ToSqlTest#planHintsTest Change-Id: Ied7629d70197a0270cdc0853e00cc021fdb4dc20 Reviewed-on: http://gerrit.cloudera.org:8080/8676 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-10 03:03:49 +00:00
aphadke	38461c524f	IMPALA-5052: Read and write signed integer logical types in Parquet This patch maps a signed integer logical type in parquet to a supported Impala column type. This change introduces the following mapping - INT_8 -> TINYINT INT_16 -> SMALLINT INT_32 -> INT INT_64 -> BIGINT Also, added a parquet file with the following schema for testing - schema { optional int32 id; optional int32 tinyint_col (INT_8); optional int32 smallint_col (INT_16); optional int32 int_col; optional int64 bigint_col; } Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Reviewed-on: http://gerrit.cloudera.org:8080/8548 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Tianyi Wang <twang@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-09 04:55:59 +00:00
Zoltan Borok-Nagy	ce65b43d47	IMPALA-2248: Make idle_session_timeout a query option This commit makes idle_session_timeout a query option. idle_session_timeout currently can be set as a command line option, which will be the default timeout for sessions. HS2 sessions can override it with a smaller value by setting it in the configuration overlay of HS2 OpenSession(). However, we can't override idle_session_timeout for JDBC/ODBC connections, because we cannot put this in the connection string. This commit is a workaround for this problem, it allows JDBC/ODBC connections to set the session timeout as a query option with the SET statement. After this commit, the session timeout can be overridden to any value, i.e. the command line flag idle_session_timeout doesn't limit this option anymore. I created an automated test case in JdbcTest.java based on test_hs2.py::test_concurrent_session_mixed_idle_timeout. I also extended the test_session_expiration and test_set_and_unset test suites. Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e Reviewed-on: http://gerrit.cloudera.org:8080/8490 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-06 01:47:47 +00:00
Bikramjeet Vig	545163bb0a	IMPALA-5929: Remove redundant explicit casts to string This patch adds a query rewriter to remove redundant explicit casts to a string type (string, char, varchar) from binary predicates of the form "cast(<non-const expr> to <string type>) <eq/ne op> <string constant>". The cast is redundant if the predicate evaluation is the same even if the cast is removed and the constant is converted to the original type of the expression. For example: cast(int_col as string) = '123456' -> int_col = 123456 Performance: For the following query on a table having 6001215 records - select * from tpch.lineitem where cast(l_linenumber as string) = '0' +-----------------+-----------+--------+ \| \| Scan Time \| +-----------------+-----------+--------+ \| \| Avg \| St dev \| \| Without rewrite \| 1s406ms \| 44ms \| \| With rewrite \| 1s099ms \| 28ms \| +-----------------+-----------+--------+ Testing: - Added unit tests to ExprRewriteRulesTest - Added functional test to expr.test - Current FE planner tests and BE expr-test run successfully with this change. Change-Id: I91b7c6452d0693115f9b9ed9ba09f3ffe0f36b2b Reviewed-on: http://gerrit.cloudera.org:8080/8660 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-03 01:15:42 +00:00
Philip Zeyliger	f755910e97	Remove unused deps, centralize some pom versions, upgrade SLF4J and commons-io. As a follow-on to centralizing into one parent pom, we can now manage thirdparty dependency versions in Java a little bit more clearly. Upgrades SLF4J, commons.io: slf4j: 1.7.5 -> 1.7.25 commons.io: 2.4 -> 2.6 The SLF4J upgrade is nice to be able to run under Java9. The release notes at https://www.slf4j.org/news.html are uneventful. Commons IO 2.6 supports Java 9 and is source and binary compatible, per https://commons.apache.org/proper/commons-io/upgradeto2_6.html and https://commons.apache.org/proper/commons-io/upgradeto2_5.html. Removes the following dependencies: htrace-core hadoop-mapreduce-client-core hive-shims com.stumbleupon:async commons-dbcp jdo-api I ran "mvn dependency:analyze" and these were some (but not all) of the "Unused declared dependencies found." Spelunking in git logs, these dependencies are from 2013 and possibly from an effort to run with dependencies from the filesystem. They don't seem to be required anymore. Stops pulling in an old version of hadoop-client and kite-data-core in testdata/TableFlattener by using the same versions as the Hadoop we use. Doing so was unnecessarily causing us to download extra, old Hadoop jars, and the new Hadoop jars seem to work just as well. This is the kind of divergence that centralizing the versions into variables will help with. Creates variables for: junit.version slf4j.version hadoop.version commons-io.version httpcomponents.core.version thrift.version kite.version (controlled via $IMPALA_KITE_VERSION in impala-config.sh) Cleans up unused IMPALA_PARQUET_URL variables in impala-config.sh. We only download Parquet via Maven, rather than downloading it in the toolchain, so this variable wasn't doing anything. I ran the core tests with this change. Change-Id: I717e0625dfe0fdbf7e9161312e9e80f405a359c5 Reviewed-on: http://gerrit.cloudera.org:8080/8853 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-20 22:04:18 +00:00
Alex Behm	1f7b3b00e9	IMPALA-5310: Part 3: Use SAMPLED_NDV() in COMPUTE STATS. Modifies COMPUTE STATS TABLESAMPLE to use the new SAMPLED_NDV() function. Testing: - modified/improved existing functional tests - core/hdfs run passed Change-Id: I6ec0831f77698695975e45ec0bc0364c765d819b Reviewed-on: http://gerrit.cloudera.org:8080/8840 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-16 04:58:59 +00:00
Taras Bobrovytsky	7256fcefb4	IMPALA-6284: Mark the intermediate decimal avg struct as packed We saw some failures on the exhaustive release build because the compiler assumed that the pointer to the intermediate struct that is used for computing decimal average was aligned. To fix the problem, we mark the struct with a "packed" attribute so that the compiler does not expect it to be aligned. Testing: - Ran the failing test locally on an release build and it passed. Change-Id: Id25ec6e20dde3f50fb37a22135b355ad251809e0 Reviewed-on: http://gerrit.cloudera.org:8080/8836 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-16 03:26:43 +00:00
Thomas Tauber-Marshall	3f1f706393	IMPALA-6297: Don't partition/sort for DML on unpartitioned Kudu table Impala partitions and sorts rows according to the target table's partitioning scheme before inserting them into Kudu in order to improve the performance of large inserts. A recent change added the ability to create unpartitioned Kudu tables, but Impala still does the partitioning/sorting for them even though its wasted work. This patch modifies the planner to not add the partition/sort for Kudu inserts if the table is unpartitioned, unless the clustered/shuffle hints are used. It also removes the exchange in the case where the partition exprs are all constant. Testing: - Added planner tests for inserting into an unpartitioned Kudu table, with and without hints, and for when the partition exprs are constant. - Ran the existing correctness tests for inserts into unpartitioned Kudu tables in kudu_create.test Change-Id: I3e01a7dd5284767a25df3218656746a5d0ee4632 Reviewed-on: http://gerrit.cloudera.org:8080/8810 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-16 03:06:19 +00:00
Zoram Thanga	f4eb00123f	IMPALA-6114: Require type equality for NumericLiteral::localEquals(). This patch fixes a regression introduced as part of IMPALA-1788, where an expression like 'CAST(0 AS DECIMAL(14))' is rewritten as a NumericLiteral expression of type DECIMAL(14,0). The query had another NumericLiteral of type TINYINT. While analyzing the DISTINCT aggregation clause of the SELECT query, AggregateInfo::create() removes duplicate expressions from groupingExprs. NumericLiteral::localEquals() is used to check for equality. Now since the method does not consider expression types, a TINYINT literal is considered to be duplicate of a DECIMAL literal. This results in a query like the following to fail: SELECT DISTINCT CAST(0 AS DECIMAL(14), 0 FROM functional.alltypes We propose to fix the issue by accounting for types as well when comparing analyzed numeric literals. A test case has been added to AnalyzeStmtsTest. Change-Id: Ia88d54088dfd128b103759dc01103b6c35bf6257 Reviewed-on: http://gerrit.cloudera.org:8080/8448 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-15 01:28:13 +00:00
Alex Behm	0936e32966	IMPALA-5310: Part 2: Add SAMPLED_NDV() function. Adds a new SAMPLED_NDV() aggregate function that is intended to be used in COMPUTE STATS TABLESAMPLE. This patch only adds the function itself. Integration with COMPUTE STATS will come in a separate patch. SAMPLED_NDV() estimates the number of distinct values (NDV) based on a sample of data and the corresponding sampling rate. The main idea is to collect several x/y data points where x is the number of rows and y is the corresponding NDV estimate. These data points are used to fit an objective function to the data such that the true NDV can be extrapolated. The aggregate function maintains a fixed number of HyperLogLog intermediates to compute the x/y points. Several objective functions are fit and the best-fit one is used for extrapolation. Adds the MPFIT C library to perform curve fitting: https://www.physics.wisc.edu/~craigm/idl/cmpfit.html The library is a C port from Fortran. Scipy uses the Fortran version of the library for curve fitting. Testing: - added functional tests - core/hdfs run passed Change-Id: Ia51d56ee67ec6073e92f90bebb4005484138b820 Reviewed-on: http://gerrit.cloudera.org:8080/8569 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-12 22:20:18 +00:00
Philip Zeyliger	d2fe9f437e	IMPALA-6270: create Impala parent pom This commit links together all the individual pom.xml files to have a new "impala-parent" pom as the parent. This enables de-duplicating all the repository configuration. I ran the build to test this. Change-Id: Id744e4357ee4d8e4be4e5490b2159bb76a2192f0 Reviewed-on: http://gerrit.cloudera.org:8080/8753 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-12 04:30:15 +00:00
Laszlo Gaal	e81b7c6b68	IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs For some time Impala in a production environment has been able to access data stored in Amazon S3 buckets using credentials specified in a number of ways: - storing Amazon access keys in environment variables or in core-site.xml. - using proprietary management tools to store Amazon access keys securely - using Amazon IAM roles bound to VMs running in EC2. The development minicluster environment used the first approach, which risked leaking these keys. This change enables Impala builds to use IAM roles to access S3 buckets when running on an Amazon EC2 virtual machine. The changes mainly ensure that environment variables carrying the traditional AWS credentials do not conflict with credentials supplied by the IAM role attached to the VM instance. IAM role based credentials are accessible through the EC2 instance-property mechanism; for further details see Amazon's docs at http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#instance-metadata-security-credentials The change also removes the remaining references to the s3n: provider. In the FE tests all URIs referring to s3n: are replaced with their s3a: equivalents, except for a single negative test in AnalyzeStmtsTest.java, which is removed. In addition to the code changes, the s3n: and s3a: credential properties are also removed from core-site.xml.tmpl. The s3a: provider can pick up AWS S3 credentials from environment variables or IAM properties bound to the VM instance, which is a more flexible approach. As environment variables have precedence over IAM roles, care must be taken when managing the canonical environment variables carrying AWS credentials. There are two requirements to be reconciled: 1. The FE tests have code that examines s3a: URIs; this code needs existing, but not necessarily valid AWS credentials. 2. When the Impala test suite is executed on an EC2 VM, AWS credentials can be supplied via IAM roles. These credentials can be used only if the AWS_* environment variables are unset (do not exist). The tradeoff is managed following these rules: 1. When AWS_* environment variables are set before invoking the Impala configuration scripts, their value is preserved and the config scripts ensure that the variables are exported. 2. If the AWS_* variables are missing or empty, they will be unset to ensure that credentials supplied by Amazon's IAM roles can be accessed, 3. except if the scripts are running outside of EC2 (so there can be no IAM roles) and TARGET_FILESYSTEM is not set "s3". This combination is most often the case on a developer's local workstation. In this case the AWS_* credential variables are forcibly set to dummy values to allow the FE tests to succeed. The removal of S3 credential parameters from core-site.xml[.tmpl] also allows users to set up their own credentials there, the config scripts will not change those settings. Environment variables carrying AWS security credentials will be set up according to the following table: Instance: Running outside EC2 \|\| Running in EC2 \| --------------------+--------+--------++--------+--------+ TARGET_FILESYSTEM \| S3 \| not S3 \|\| S3 \| not S3 \| --------------------+--------+--------++--------+--------+ \| \| \|\| \| \| empty \| unset \| dummy \|\| unset \| unset \| AWS_* \| \| \|\| \| \| env --------------+--------+--------++--------+--------+ var \| \| \|\| \| \| not empty \| export \| export \|\| export \| export \| \| \| \|\| \| \| --------------------+--------+--------++--------+--------+ Legend: unset: the variable is unset export: the variable is exported with its current value dummy: the variable is set to a preset dummy value and exported Running on an EC2 VM is indicated by setting RUNNING_IN_EC2 to "true" and exporting it before impala_config.sh is invoked. The change also moves the logic performing the S3 access checks into a separate script file: bin/check-s3-access.sh. This file now contains all the S3-specific logic and network access to check if the requested S3 bucket can be accessed. Testing: Performed local builds for HDFS as well as automated builds against HDFS and S3, using both IAM roles and explicit AWS_* credentials for authentication. Verified that FE tests that parse s3a: URLs are still successful in all these combinations (when they are run). Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae Reviewed-on: http://gerrit.cloudera.org:8080/8294 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Reviewed-by: Zach Amsden <zamsden@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-09 01:43:01 +00:00
Thomas Tauber-Marshall	2e83ba5796	IMPALA-6280: Materialize TupleIsNullPredicate for insert sorts When a sort is inserted into a plan for an INSERT due to either the target table being a Kudu table or the use of the 'clustered' hint, and a TupleIsNullPredicate is present in the output of the sort, the TupleIsNullPredicate may reference an incorrect tuple (i.e. not the materialized sort tuple), leading to errors. The solution is to materialize the TupleIsNullPredicate into the sort tuple and then perform the appropriate expr substitutions, as is already done for the case of analytic sorts. Testing: - Added an e2e test with a query that would previously fail. Change-Id: I6c0ca717aa4321a5cc84edd1d5857912f8c85583 Reviewed-on: http://gerrit.cloudera.org:8080/8791 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-08 23:58:18 +00:00
Alex Behm	11497c2aa9	IMPALA-6286: Remove invalid runtime filter targets. If the target expression of a runtime filter evaluates to a non-NULL value for outer-join non-matches, then assigning the filter below the nullable side of an outer join may lead to incorrect query results. See IMPALA-6286 for an example and explanation. This patch adds a conservative check that prevents the creation of runtime filters that could potentially have such incorrect targets. Some safe opportunities are deliberately missed to keep the code simple. See RuntimeFilterGenerator#getTargetSlots(). Testing: - added planner tests which passed locally Change-Id: I88153eea9f4b5117df60366fad2bd91776b95298 Reviewed-on: http://gerrit.cloudera.org:8080/8783 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-08 23:11:36 +00:00
Tianyi Wang	c505a8159b	IMPALA-6210: Add query id to lineage graph logging Some tools use lineage graph logging to collect query metrics. Currently only query hash is present in this log. Adding query id into it makes such accounting easier. Testing: The equality of query id in the query profile and lineage log is checked in test_lineage.py. A test for TUniqueIdUtil is added to the FE tests. Change-Id: I4adbd02df37a234dbb79f58b7c46ca11a914229f Reviewed-on: http://gerrit.cloudera.org:8080/8589 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-06 00:52:19 +00:00
Vuk Ercegovac	633dbff71d	IMPALA-1422: support a constant on LHS of IN predicates. Currently, constant expressions for the LHS of the IN predicate are not supported. This patch adds this support as a rewrite in StmtRewriter (where subqueries are rewritten to joins). Since there is a nested-loop variant of left semijoin, support for IN is handled by not erring out. NOT IN is handled by a rewrite to corresponding NOT EXISTS predicate. Support for NOT IN with a correlated subquery is not included in this change. Re-organized the frontend subquery analysis tests to expand coverage. Testing: - added frontend subquery analysis tests - added e2e tests Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb Reviewed-on: http://gerrit.cloudera.org:8080/8322 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-02 04:09:05 +00:00
Alex Behm	b3d8a507cb	IMPALA-5310: Add COMPUTE STATS TABLESAMPLE. Adds the TABLESAMPLE clause for COMPUTE STATS. Syntax: COMPUTE STATS <table> TABLESAMPLE SYSTEM(<number>) [REPEATABLE(<number>)] Computes and replaces the table-level row count and total file size, as well as all table-level column statistics. Existing partition-level row counts are not modified. The TABLESAMPLE clause can be used to limit the scanned data volume to a desired percentage. When sampling, the unmodified results of the COMPUTE STATS queries are sent to the CatalogServer. There, the stats are extrapolated before storing them into the HMS so as not to confuse other engines like Hive/SparkSQL which may rely on the shared HMS fields being accurate. Limitations - Only works for HDFS tables - TABLESAMPLE is not supported for COMPUTE INCREMENTAL STATS - TABLESAMPLE requires --enable_stats_extrapolation=true Changes to EXPLAIN The stored statistics from the HMS are more clearly displayed under a 'stored statistics' section. Example: 00:SCAN HDFS [functional.alltypes, RANDOM] partitions=24/24 files=24 size=478.45KB stored statistics: table: rows=7300 size=478.45KB partitions: 24/24 rows=7300 columns: all Testing: - added new functional tests - core/hdfs run passed Change-Id: I7f3e72471ac563adada4a4156033a85852b7c8b7 Reviewed-on: http://gerrit.cloudera.org:8080/8136 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-29 22:37:01 +00:00
Dimitris Tsirogiannis	a88c3b9c52	Revert "IMPALA-5538: Use explicit catalog versions for deleted objects" This reverts commit `dd340b8810`. This commit caused a number of issues tracked in IMPALA-6001. The issues were due to the lack of atomicity between the catalog version change and the addition to the delete log of a catalog object. Conflicts: be/src/service/impala-server.cc Change-Id: I3a2cddee5d565384e9de0e61b3b7d0d9075e0dce Reviewed-on: http://gerrit.cloudera.org:8080/8667 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-29 02:19:50 +00:00
Dimitris Tsirogiannis	0588309ed6	IMPALA-6053: Fix exception when storadeIds don't match hosts This commit fixes an issue where an IllegalStateException is thrown if there is a mismatch between the number of storageIDs and the number of host locations of a file block, causing the metadata load of a table to abort. With this fix, the storadeIDs are ignored if they don't match the number of hosts of a block, allowing table loading to proceed. That change will also cause remote reads during table scans for the blocks for which the mismatch was detected. Testing: No additional tests were added as this error was triggered on an EMC Isilon system v8.0. Change-Id: Ia3d685208dce7a1cbe94a33b8ac9aeb7c8a3f391 Reviewed-on: http://gerrit.cloudera.org:8080/8668 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-28 23:40:38 +00:00
Gabor Kaszab	88cb68cfbe	IMPALA-2181: Add query option levels for display Four display levels are introduced for each query option: REGULAR, ADVANCED, DEVELOPMENT and DEPRECATED. When the query options are displayed in Impala shell using SET then only the REGULAR and ADVANCED options are shown. A new command called SET ALL shows all the options grouped by their option levels. When the query options are displayed through the SET SQL statement then the result set would contain an extra column indicating the level of each option. Similarly to Impala shell here the SET command only diplays the REGULAR and ADVANCED options while SET ALL shows them all. If the Impala shell connects to an Impala daemon that predates this change then all the options would be displayed in the REGULAR group. Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 Reviewed-on: http://gerrit.cloudera.org:8080/8447 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-28 00:31:15 +00:00
Vuk Ercegovac	628f19ed0b	IMPALA-6092: avoid drop/create function interactions in e2e tests The e2e unit tests for udfs can interact via the backend lib_cache, causing test flakes. IMPALA-6215 explains a race between the lib_cache and UdfExecutor in the frontend which is the likely the root cause. Two e2e tests use the same jar (test_java_udfs and test_udf_invalid_symbol), test_udf_invalid_symbol drops a function from that jar, which causes the use of that jar to fail in the test_java_udfs test. Since the state of lib_cache is per process, its state causes these interactions across unit tests. This change avoids the interactions by using separate jars for the separate tests. Change-Id: Ica3538788b1d2ab5e361261e2ade62780b838e65 Reviewed-on: http://gerrit.cloudera.org:8080/8593 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-27 21:20:20 +00:00
Tim Armstrong	1a7b0d0bdc	IMPALA-6227: deflake admission stress tests The problem was that, during the initial admission decision phase, some queries were initially queued then dequeued once memory came available. All of the accounting in the test implicitly relies on queries not being dequeued until queries are later explicitly ended, so if this happened, the test broke in multiple subtle ways. This happened because the query only scanned a small number of rows, which could be all buffered on the receiver side of the exchange even before the client fetched any rows from the coordinator. This means that the reserved memory on some backends could increase then decrease during the initial admission phase, resulting in a query being queued then dequeued. The fix is to increase the number of rows returned by the query so that all fragments remain active during the initial admission phase. This increased test execution time somewhat, so I also had to bump the queue wait timeout for the admission stress tests (they assume that queries don't time out in the queue). Testing: Ran the test under debug, release and ASAN builds, i.e. impala-py.test tests/custom_cluster/test_admission_controller.py \ --workload_exploration_strategy="functional-query:exhaustive" I looped the mem_limit test for a while to confirm it didn't reproduce (it reproduced reliably every 2-3 iterations before this fix). It still reproduces every 5-10 runs with exhaustive+release, so I need to do further work to make it more robust. Change-Id: Iafb3af0ce68f96e5d713dbb3b37dd0b50ea66bb4 Reviewed-on: http://gerrit.cloudera.org:8080/8631 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-23 07:48:18 +00:00
Pranay	c656181826	IMPALA-4927: Impala should handle invalid input from Sentry Impala requests a list of roles from Sentry and then asks for privileges for each role. If Sentry returns a non existent role in the first step, then there will be a Java exception in Impala in the second step and the communication with Sentry is aborted. The issue is fixed by handling the exception if an invalid role is found and continue with getting permissions for the rest of the roles. Testing: ------- Since invalid role could not be created through impala-shell/Hue interface the code was instrumented to have an invalid Role " ", and SHOW ROLES statement was executed from impala shell to see how the condition is handled. Change-Id: I781411018d580854d80a9cad81a1ded7ca16af8b Reviewed-on: http://gerrit.cloudera.org:8080/8588 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Zach Amsden <zamsden@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-23 03:25:58 +00:00
Jinchul	3845c0f157	IMPALA-2250: Make multiple COUNT(DISTINCT) message state workarounds Change-Id: I5084be10946d68f3ec0760c2b7e698635df26a89 Reviewed-on: http://gerrit.cloudera.org:8080/8614 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-23 00:21:30 +00:00
Vuk Ercegovac	21a96ed2e3	IMPALA-4985: use parquet stats of nested types for dynamic pruning Currently, parquet row-groups can be pruned at run-time using min/max stats when predicates (in, binary) are specified for column scalar types. This patch extends pruning to nested types for the same class of predicates. A nested value is an instance of a nested type (struct, array, map). A nested value consists of other nested and scalar values (as declared by its type). Predicates that can be used for row-group pruning must be applied to nested scalar values. In addition, the parent of the nested scalar must also be required, that is, not empty. The latter requirement is conservative: some filters that could be used for pruning are not used for correctness reasons. Testing: - extended nested-types-parquet-stats e2e test cases. Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe Reviewed-on: http://gerrit.cloudera.org:8080/8480 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-22 22:00:16 +00:00
Taras Bobrovytsky	bc12a9eb35	IMPALA-5019: Decimal V2 addition In this patch, we implement the new decimal return type rules for addition expressions. These rules become active when the query option DECIMAL_V2 is enabled. The algorithm for determining the type of the result is described in the JIRA. DECIMAL V1: +----------------------------------------------------------------+ \| typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) \| +----------------------------------------------------------------+ \| DECIMAL(38,38) \| +----------------------------------------------------------------+ DECIMAL V2: +----------------------------------------------------------------+ \| typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) \| +----------------------------------------------------------------+ \| DECIMAL(38,6) \| +----------------------------------------------------------------+ This patch required backend changes. We implement an algorithm where we handle the whole and fractional parts separately, and then combine them to get the final result. This is more complex and slower. We try to avoid this by first checking if the result would fit into int128. Testing: - Added expr tests. - Tested locally on my machine with a script that generates random decimal numbers and checks that Impala adds them correctly. Performance: For the common case, performance remains the same. select cast(2.2 as decimal(18, 1) + cast(2.2 as decimal(18, 1) BEFORE: 4.74s AFTER: 4.73s In this case, we check if it is necessary to do the complex addition, and it turns out to be not necessary. We see a slowdown because the result needs to be scaled down by dividing. select cast(2.2 as decimal(38, 19) + cast(2.2 as decimal(38, 19) BEFORE: 1.63s AFTER: 13.57s In following case, we take the most complex path and see the most signification perfmance hit. select cast(7.5 as decimal(38,37)) + cast(2.2 as decimal(38,37)) BEFORE: 1.63s AFTER: 20.57 Change-Id: I401049c56d910eb1546a178c909c923b01239336 Reviewed-on: http://gerrit.cloudera.org:8080/8309 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-21 22:08:20 +00:00
Tianyi Wang	5e9b4e2fd2	IMPALA-5976: Remove equivalence class computation in FE Equivalent class is used to get the equivalencies between slots. It is ill-defined and the current implementation is inefficient. This patch removes it and directly uses the information from the value transfer graph instead. Value transfer graph is reimplemented using Tarjan's strongly connected component algorithm and BFS with adjacency lists to speed up on both condensed and sparse graphs. Testing: It passes the existing tests. In planner tests the equivalence between SCC-condensed graph and uncondensed graph is checked. A test case is added for a helper class IntArrayList. An outer-join edge case is added in planner test. On a query with 1800 union operations, the equivalence class computation time is reduced from 7m57s to 65ms and the planning time is reduced from 8m5s to 13s. Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a Reviewed-on: http://gerrit.cloudera.org:8080/8317 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-18 09:07:06 +00:00
Thomas Tauber-Marshall	2510fe0aa0	IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter or a min-max filter, depending on if it has HDFS or Kudu targets, respectively. In RuntimeFilterGenerator in the planner, each hash join node generates a bloom and min-max filter for each equi-join predicate, but only those filters that end up being assigned to a target make it into the final plan. Min-max filters are only assigned to Kudu scans if the target expr is a column, as Kudu doesn't support bounds on general exprs, and only if the join op is '=' and not 'is distinct from', as Kudu doesn't support returning NULLs if a bound is set. Min-max filters are inserted into by the PartitionedHashJoinBuilder. Codegen is used to eliminate branching on the type of filter. String min-max filters truncate their bounds at 1024 chars, so that the max amount of memory used by min-max filters is negligible. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Functional Testing: - Added new planner tests and updated the old ones. (in old tests, a lot of runtime filters are renumbered as we always generate min-max filters even if they don't end up getting assigned and they take up some of the RF ids). - Updated existing runtime filter tests to work with Kudu. - Added e2e tests for min-max filter specific functionality. Perf Testing: - All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu, timings are averages of 3 runs. - Ran a contrived query with a filter that does not eliminate any rows (full self join of lineitem). The difference in running time was negligible - 24.46s with filters on, 24.15s with filters off for a ~1% slowdown. - Ran a contrived query with a filter that elimiates all rows (self join on lineitem with a join condition that never matches). The filters resulted in a significant speedup - 0.26s with filters on, 1.46s with filters off for a ~5.6x speedup. This query is added to targeted-perf. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Reviewed-on: http://gerrit.cloudera.org:8080/7793 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-17 21:33:51 +00:00
Jim Apple	edc70c1661	Impala is graduating; remove outdated references to incubation Change-Id: I4e6080a2b196926e46b1e641f6530ba1fa9bd444 Reviewed-on: http://gerrit.cloudera.org:8080/8577 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-16 22:31:38 +00:00
Tim Armstrong	70919187c6	IMPALA-6080: clean up table descriptor handling * Add DescriptorTbl::CreateHdfsTableDescriptor to avoid having to create an entire DescriptorTbl during INSERT finalization (when only a descriptor for the output table is needed) * Remove TQueryExecRequest.desc_tbl, there's already a home for it in TQueryContext.desc_tbl This required fixing a problem in the planner test infrastructure where the TQueryCtx was reused for planning multiple times despite being modified during planning. This is based on Marcel Kornacker's coordinator cleanup patch. Testing: Ran core tests. Change-Id: Id427dab0c196b556bd8b2d64ec618403d5cbd4d6 Reviewed-on: http://gerrit.cloudera.org:8080/8330 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-16 21:43:27 +00:00
Philip Zeyliger	155bb77649	Remove unused/defunct Maven repositories. Removes three Maven repositories. davidtrott and codehaus both don't exist any more, so they're not doing anyone any good. (We had previously cleaned up Codehaus in IMPALA-5224, but a reference was resurrected.) The libphonenumber repo was simply misconfigured: the library exists in Maven central in the "normal" place, and a subdirectory repo is unnecessary. To test this, I ran "buildall" after removing ~/.m2/ on my machine. Change-Id: I79eb6c483561726c7cbaf86874001f1979128720 Reviewed-on: http://gerrit.cloudera.org:8080/8497 Tested-by: Impala Public Jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2017-11-16 15:51:48 +00:00
Jinchul	f9155f0d81	IMPALA-5341: Avoid unintended filter out in fe test Change-Id: Ie79f644a37b0ffab7b0d8e94e77650d56423697a Reviewed-on: http://gerrit.cloudera.org:8080/8543 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-16 09:01:37 +00:00
Vuk Ercegovac	6a2b7a64fb	IMPALA-4704: Turns on client connections when local catalog initialized. Currently, impalad starts beeswax and hs2 servers even if the catalog has not yet been initialized. As a result, client connections see an error message stating that the impalad is not yet ready. This patch changes the impalad startup sequence to wait until the catalog is received before opening beeswax and hs2 ports and starting their servers. Testing: - python e2e tests that start a cluster without a catalog and check that client connections are rejected as expected. Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6 Reviewed-on: http://gerrit.cloudera.org:8080/8202 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-13 21:14:14 +00:00
Thomas Tauber-Marshall	3a1073c87c	IMPALA-6173: Fix SHOW CREATE TABLE for unpartitioned Kudu tables IMPALA-5546 added the ability to create unpartitioned Kudu tables, but when SHOW CREATE TABLE is run on it still prints 'PARTITION BY' just without a partition clause. This patch removes the 'PARTITION BY' from the output. Testing: - Added test that runs SHOW CREATE on an unpartitioned Kudu table. Change-Id: Icc327266cfb8b5c05efec97348528cea6904bb20 Reviewed-on: http://gerrit.cloudera.org:8080/8506 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-09 23:59:13 +00:00
Lars Volker	471285bee3	IMPALA-6124: Fix alter table ddl updates and test Impala would previously update the ddl time of a table when dropping a partition but not when adding one. This change removes updates to the ddl time when partitions are added or removed to be consistent with Hive. Additionally the check in the ddl update test would fail if some operations took longer than 20 seconds. Instead, this change makes sure that the ddl time increases as intended. To test this change I ran test_last_ddl_time_update in exhaustive mode and also ran a private S3 build. Change-Id: I3126252e7709304d3e1fa4bb06a0b847180bd6cf Reviewed-on: http://gerrit.cloudera.org:8080/8411 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-31 04:00:37 +00:00
Bharath Vissapragada	f2cd5bd516	IMPALA-5429: Multi threaded block metadata loading Implements multi threaded block metadata loading on the Catalog server where we fetch block metadata for multiple partitions of a single table in parallel. Number of threads to load the metadata is controlled by the following two parameters (set on the Catalog server startup and applies for each table load) -max_hdfs_partitions_parallel_load(default=5) -max_nonhdfs_partitions_parallel_load(default=20) We use different thread pool sizes for HDFS and non-HDFS tables since non-HDFS supports much higher throughput of RPC calls for listStatus /listFiles. Based on our experiments, S3 showed a linear speed up (up to ~113x) with increasing number of loading threads where as the HDFS throughput was limited to ~5x in un-secure clusters and up to ~3.7x in secure clusters. We narrowed it down to scalability bottlenecks in HDFS RPC implementation (HADOOP-14558) on both the server and the client side. One thing to note here is that the thread pool based metadata fetching is implemented only for loading HDFS block metadata and not for loading HMS partition information. Our experiments showed that while loading large partitioned tables, ~90% of the time is spent in connecting to NN and loading the HDFS block information and optimizing the rest ~10% makes the code unnecessarily complex without much gain. Additional notes: - The multithreading approach is implemented for * INVALIDATE (loading from scratch), * REFRESH (reusing existing md) code paths, * ALTER TABLE ADD/RECOVER PARTITIONS. - This patch makes the implementation of ListMap thread-safe since we use that data structure as a shared state between multiple partition metadata loding threads. Testing and Results: - This patch doesn't add any new tests since there is enough test coverage already. Passed core/exhaustive runs with HDFS/S3. - We noticed up to ~113x speedup on S3 tables(thread_pool_size=160) and up to ~5x speed up in un-secure HDFS clusters and ~3.7x in secure HDFS clusters. - Synthesized the following two large tables on HDFS and S3 and noticed significant reduction in my test DDL queries. (1) 100K partitions + 1 million files (2) 80 partitions + 250K files 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4% 100K-PARTITIONS-1M-FILES-CUSTOM-08-ADD-PARTITION I -17.25% 80-PARTITIONS-250K-FILES-11-REFRESH-PARTITION I -23.57% 80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION I -23.87% 80-PARTITIONS-250K-FILES-09-INVALIDATE I -24.88% 80-PARTITIONS-250K-FILES-03-RECOVER I -35.90% 80-PARTITIONS-250K-FILES-07-REFRESH I -43.03% 100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS I -43.93% 100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV I -46.59% 80-PARTITIONS-250K-FILES-10-REFRESH-AFTER-ADD-PARTITION I -48.71% 100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH I -49.02% 80-PARTITIONS-250K-FILES-05-QUERY-AFTER-INV I -49.05% 100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTI -51.87% 80-PARTITIONS-250K-FILES-S3-03-RECOVER I -67.17% 80-PARTITIONS-250K-FILES-S3-05-QUERY-AFTER-INV I -76.45% 80-PARTITIONS-250K-FILES-S3-07-REFRESH I -87.04% 80-PARTITIONS-250K-FILES-S3-10-REFRESH-AFTER-ADD-PART I -88.57% Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Reviewed-on: http://gerrit.cloudera.org:8080/8235 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-26 21:56:25 +00:00
Attila Jeges	00fd8388c3	IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to the single node plan without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. The runtime filter assignment is done on the distributed plan and the above constraints are enforced there directly. Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Reviewed-on: http://gerrit.cloudera.org:8080/7564 Tested-by: Impala Public Jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2017-10-26 17:06:32 +00:00
Tianyi Wang	31c440bf34	IMPALA-6060: Check the return value of JNI exception handling functions When JVM runs out of memory and throws an error to JNI, the error handling code uses JNI to get the exception message, resulting in a null pointer and crashing the process. This patch adds error handling code to JniUtil::GetJniExceptionMsg(). Change-Id: Ie3ed88bf8739c56a066f2402727c8204e96aa116 Reviewed-on: http://gerrit.cloudera.org:8080/8334 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-24 21:22:58 +00:00
Philip Zeyliger	308bd5d588	IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls. This commit allows users to add more than 500 (=MAX_PARTITION_UPDATES_PER_RPC) partitions in a single ALTER TABLE command. We batch the operations against Hive into groups of 500. I tested this manually, creating 1002 partitions and observing the expected 3 API calls against the Hive Metastore in the log. I can confirm that there is coverage of this in some existing tests. A new, simple, test has been added that confirms that creating 502 partitions works. Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Reviewed-on: http://gerrit.cloudera.org:8080/8238 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-24 07:40:16 +00:00
Csaba Ringhofer	0bccb3ea02	IMPALA-4918: Support getting column comments via HS2 Fill the 'comments'/'remarks' field during HS2 column metadata requests. To test: - create a JDBC connection to Impala with HS2 driver - call getMetaData().getColumns() for a table with column comments - the returned ResultSet should include column comments in field "REMARKS" Change-Id: I1d33dfd031b5344d7136695b623cec76143ada5c Reviewed-on: http://gerrit.cloudera.org:8080/8315 Reviewed-by: Lars Volker <lv@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-20 07:55:18 +00:00
Bharath Vissapragada	c189d0a39c	IMPALA-6016: Fix logging in TableLoadingMgr class This patch moves the logging of "loads in progress" to a place where the current load is accounted. The reason to move the logging is that the current load is not reflected in the loadingTables_ till loadAsync() is called. Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4 Reviewed-on: http://gerrit.cloudera.org:8080/8212 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-10 02:45:46 +00:00
Zoram Thanga	e570343320	IMPALA-4682: Remove Preconditions check from analyzeAggregation(). When one runs a query like 'select * from t order by count(a)' we are incorrectly throwing an IllegalStateException, with a Preconditions check which asserts that the column is not "". Now, this query is invalid, and we are correctly handling it if "" is replaced by a specific column: 'select a from t order by count(b)' which produces the error message "ERROR: AnalysisException: select list expression not produced by aggregation output (missing from GROUP BY clause?): a" This patch fixes the handling of "" in this context, by removing the Preconditions check, so that the error becomes "ERROR: AnalysisException: select list expression not produced by aggregation output (missing from GROUP BY clause?): " Note that the second changed line is required because selectListItem.Expr_ is null when SelectListItem.isStar_ is true. A new FE unit test has been added for this use case. Change-Id: I57c20aeed401275d45913fedfd61c206c38641b7 Reviewed-on: http://gerrit.cloudera.org:8080/8143 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-10 02:37:42 +00:00
Sandeep Akinapelli	999a444e24	IMPALA-2636: HS2 GetTables() returns TABLE_TYPE as TABLE for VIEW Added code to read the table type from metastore table but defaults to "TABLE" if the metastore table is not loaded. After the change, GetTabletypes also returns "VIEW" apart from "TABLE" Changed unit and jdbc testcases for GetTableTypes. Added new Frontend test for reading views. Change-Id: I90616388e6181cf342b3de389af940214ed46428 Reviewed-on: http://gerrit.cloudera.org:8080/7353 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-07 03:16:06 +00:00
Michael Brown	581dfe002e	IMPALA-6021: Revert "IMPALA-6009: Upgrade Guava to 14.0.1" This reverts commit `3c870aa360`. Change-Id: Idd07d9b011ebd407d7a274725a10b6371d024186 Reviewed-on: http://gerrit.cloudera.org:8080/8225 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-06 20:05:17 +00:00
Taras Bobrovytsky	6259641077	IMPALA-4939, IMPALA-4940: Decimal V2 multiplication Implement the new DECIMAL return type rules for multiply expressions, active when query option DECIMAL_V2=1. The algorithm for determining the type of the result of multiplication is described in the JIRA. DECIMAL V1: +-----------------------------------------------------------------------+ \| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) \| +-----------------------------------------------------------------------+ \| DECIMAL(38,38) \| +-----------------------------------------------------------------------+ +-----------------------------------------------------------------------+ \| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) \| +-----------------------------------------------------------------------+ \| DECIMAL(38,30) \| +-----------------------------------------------------------------------+ DECIMAL V2: +-----------------------------------------------------------------------+ \| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) \| +-----------------------------------------------------------------------+ \| DECIMAL(38,37) \| +-----------------------------------------------------------------------+ +-----------------------------------------------------------------------+ \| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) \| +-----------------------------------------------------------------------+ \| DECIMAL(38,6) \| +-----------------------------------------------------------------------+ In this patch, we also fix the early multiplication overflow. We compute a 256 bit integer intermediate value, which we then attempt to scale down and round. Performance: I ran TPCH 300 and TPCDS 1000 workloads and the performance is almost identical. For TPCH Q1, there was an improvement from 21 seconds to 16 seconds. I did not see any regressions. The performance improvement is due to the way we check for overflows after this patch (by counting the leading zeros instead of dividing). It can be clealy seen in this query: select cast(2.2 as decimal(38, 1)) * cast(2.2 as decimal(38, 1)) before: 7.85s after: 2.03s I noticed performance regressions in the following cases: - When we need to convert to a 256 bit integer before multiplying, which was introduced in this patch. Whether this happens depends on the resulting precision and the value of the inputs. In the following extreme case, the intermediate value is converted to a 256 bit integer every time. select cast(1.1 as decimal(38, 37)) * cast(1.1 as decimal(38, 37)) before: 14.56s (returns null) after: 126.17s - When we need to scale down the intermediate value. In the following query the result is decimal(38,6) after the patch, so the intermediate needs to be scaled down. select cast(2.2 as decimal(38,1)) * cast(2.2 as decimal(38,19)) before: 7.25s after: 13.06s These regressions are possible only when the resulting precision is 38 which is not common in typical workloads. Note: The actual queries that I ran for the benchmark are not exactly as above. I constructed tables with millions of rows with those values. I ran the queries with DECIMAL_v2=1 option before and after the patch. Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Reviewed-on: http://gerrit.cloudera.org:8080/7438 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-04 09:37:36 +00:00
Bikramjeet Vig	0601f06cb6	IMPALA-4951: Fix database visibility for user with only column privilege Currently a database is not visible to a user that only has column level privileges for tables in that database. This patch will make the database visible, which is the expected behavior in this case. Testing: added a test case to verify the same. Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a Reviewed-on: http://gerrit.cloudera.org:8080/8168 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-04 03:14:20 +00:00
Alex Behm	3c870aa360	IMPALA-6009: Upgrade Guava to 14.0.1 Builds are breaking because the latest hive-exec snapshot jar includes Guava 14.0.1 classes which conflict with the version Impala depends on (11.0.2). Between versions 11.0.2 and 14.0.1 Guava changed the API of some Hasher methods that Impala uses. As a workaround this patch upgrades Impala's Guava dependency to version 14.0.1 to be consistent with the classes in hive-exec. Testing: - mvn compile succeeded Change-Id: Iddc5da8849d5aa7317d3dc572884d05dee859bdd Reviewed-on: http://gerrit.cloudera.org:8080/8198 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2017-10-03 21:03:53 +00:00
Thomas Tauber-Marshall	c07391ce51	IMPALA-5994: Lower case struct-field names Impala tries to always store column names in lower case. As part of a cleanup of issues related to upper case Kudu column names, a check was added in Analyzer to enforce this. The check fails when doing star expansion on a struct to select all fields in the case where a table was created in Hive with upper case letters in a struct field name. This happens because Hive does not covert struct field names to all lower case in HMS. The solution is to force StructField names to lower case. Testing: - Added a test in test_nested_types.py - Fixed FE test that expected struct field to be output in upper case. Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e Reviewed-on: http://gerrit.cloudera.org:8080/8169 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-30 01:13:19 +00:00
Dimitris Tsirogiannis	dd340b8810	IMPALA-5538: Use explicit catalog versions for deleted objects This commit changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. Previously, deletions of catalog objects were not explicitly annotated with the catalog version in which the deletion occured and the impalads were using the max catalog version in a catalog update in order to decide whether a deletion should be applied to the local catalog cache or not. This works correctly under the assumption that all the changes that occurred in the catalog between an update's min and max catalog version are included in that update, i.e. no gaps or missing updates. With the upcoming fix for IMPALA-5058, that constraint will be relaxed, thus allowing for gaps in the catalog updates. To avoid breaking the existing behavior, this patch introduced the following changes: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core tests. Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Reviewed-on: http://gerrit.cloudera.org:8080/7731 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-26 20:20:56 +00:00
Vuk Ercegovac	646920810f	IMPALA-1767 Adds predicate to test boolean values true, false, unknown. Adds a new expression to represent the following boolean predicate: <expr> IS [NOT] (TRUE \| FALSE \| UNKNOWN) The expression is expanded in the parser to istrue/false for the checks against true and false respectively and to isnull for the check against unknown. Compared to the other approaches (rewrites, extended backend expr), this change is the simplest. Main downside is that error messages are in terms of the lowered expression. Testing: - fe: parser, tosql, analyze exprs - e2e: query exprs Change-Id: I9d5fba65ef6c87dfc55a25d2c45246f74eb48c40 Reviewed-on: http://gerrit.cloudera.org:8080/8122 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-23 04:33:20 +00:00

1 2 3 4 5 ...

1795 Commits