This patch bumps up the GBN to 14842939. This build
includes HIVE-23995 and HIVE-24175 and some of the tests
were modified to take into account of that.
Also, fixes a minor bug in environ.py
Testing done:
1. Core tests.
Change-Id: I78f167c1c0d8e90808e387aba0e86b697067ed8f
Reviewed-on: http://gerrit.cloudera.org:8080/17628
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
This change bumps up the CDP_BUILD_NUMBER to 6912987. The new
CDP build includes Iceberg artifacts.
The new Hive version has a few bugs that cause existing tests
to fail. Unfortunately we can't expect them to be fixed soon
in CDP Hive, so I adjusted the tests and added some TODO comments.
Change-Id: Ide03d6b86043e72753485ff3d4056e0a1bb5c36f
Reviewed-on: http://gerrit.cloudera.org:8080/16701
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change lets the user set the managed location path in new
databases, e.g.
CREATE DATABASE db MANAGEDLOCATION 'some url';
This property sets the location where the database's tables with
table property 'transactional'='true' will be placed.
The change also adds managedlocation to DESCRIBE DATABASE's output.
Example:
DESCRIBE DATABASE db;
+------------------+-----------------------------------------+---------+
| name | location | comment |
+------------------+-----------------------------------------+---------+
| db | hdfs://localhost:20500/test-warehouse/a | |
| managedlocation: | hdfs://localhost:20500/test-warehouse/b | |
+------------------+-----------------------------------------+---------+
DESCRIBE DATABASE EXTENDED db6;
+------------------+-----------------------------------------+---------+
| name | location | comment |
+------------------+-----------------------------------------+---------+
| db | hdfs://localhost:20500/test-warehouse/a | |
| managedlocation: | hdfs://localhost:20500/test-warehouse/b | |
| Owner: | | |
| | csringhofer | USER |
+------------------+-----------------------------------------+---------+
Note that Impala's output for DESCRIBE DATABASE (EXTENDED) is
different than Hive's, where a new column was added for each extra
piece of information, while Impala adds a new row to keep the 3 column
format. Changing to Hive's format would be preferable in my opinion,
but is a potentially breaking change.
See IMPALA-6686 for further discussion.
Testing:
- added FE and EE tests
- ran relevant tests
Change-Id: I925632a43ff224f762031e89981896722e453399
Reviewed-on: http://gerrit.cloudera.org:8080/16529
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Updated CDP build to 7.2.1.0-57 to include new Hive features such as
HIVE-22995.
In minicluster, we have default values of hive.create.as.acid and
hive.create.as.insert.only which are false. So by default hive creates
external type table located in external warehouse directory.
Due to HIVE-22995, desc db returns external warehouse directory.
With above reasons, we need use external warehouse dir in some tests.
Also add a new test for "CREATE DATABASE ... LOCATION".
Tested:
Re-run failed test in minicluster.
Run exhaustive tests.
Change-Id: I57926babf4caebfd365e6be65a399f12ea68687f
Reviewed-on: http://gerrit.cloudera.org:8080/15990
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Hive 3 changed the typical storage model for tables to split them
between two directories:
- hive.metastore.warehouse.dir stores managed tables (which is now
defined to be only transactional tables)
- hive.metastore.warehouse.external.dir stores external tables
(everything that is not a transactional table)
In more recent commits of Hive, there is now validation that the
external tables cannot be stored in the managed directory. In order
to adopt these newer versions of Hive, we need to use separate
directories for external vs managed warehouses.
Most of our test tables are not transactional, so they would reside
in the external directory. To keep the test changes small, this uses
/test-warehouse for the external directory and /test-warehouse/managed
for the managed directory. Having the managed directory be a subdirectory
of /test-warehouse means that the data snapshot code should not need to
change.
The Hive 2 configuration doesn't change as it does not have this concept.
Since this changes the dataload layout, this also sets the CDH_MAJOR_VERSION
to 7 for USE_CDP_HIVE=true. This means that dataload will uses a separate
location for data as compared to USE_CDP_HIVE=false. That should reduce
conflicts between the two configurations.
Testing:
- Ran exhaustive tests with USE_CDP_HIVE=false
- Ran exhaustive tests with USE_CDP_HIVE=true (with current Hive version)
- Verified that dataload succeeds and tests are able to run with a newer
Hive version.
Change-Id: I3db69f1b8ca07ae98670429954f5f7a1a359eaec
Reviewed-on: http://gerrit.cloudera.org:8080/15026
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Note that this is the continuation of work in
https://github.com/vihangk1/impala/commits/IMPALA-8851
This patch's goal is to change Impala's behavior in the following case:
- the query is a DROP TABLE/VIEW/DATABASE/FUNCTIONS IF EXISTS statement
- the given object does not exist
- the user has some kind of privilege on the object, which imples the
privilege to know whether object exists, but does not have DROP
privilege on the object
Until now this lead to an authorization exception, while it will be
allowed with this change.
An example where this is useful is a user who has CREATE privilege on
a database, and creates table t_owned, and gets ownership of the
table. In this case DROP TABLE IF EXISTS was non idempotent:
DROP TABLE IF EXISTS t_owned;
-> success
DROP TABLE IF EXISTS t_owned;
-> authorization error, as the privileges for the table were
deleted when the table was successfully dropped
After this change the second statement will be also successful.
The authorization logic has to avoid leaking information that the
user has no right to know. For this reason DROP IF EXISTS has to
return the same error message regardless whether the object exists
or not if the user has no right to know it's existence. This is
achieved with the following pattern:
- in the IF EXISTS case first an ANY privilege is registered, then
the existence of the object is checked and if it doesn't exist,
the analysis returns successfully
- if the object exists, the DROP privilege is registered (if there is
no IF EXISTS in the query, this always happens)
- as the authorization logic checks privileges in the order of
registration, first the ANY will be checked, and DROP will be only
checked if the user has ANY privileges
Testing:
- Added a new test case in the sentry tests which confirms that the
authorization exception is not thrown when a drop if exists query is
issued on a object which does not exist.
- Changed several tests affected by the new behavior.
- Ran core tests.
Change-Id: Iba068935e5da92d71e16e2321afdb8e7b781086a
Reviewed-on: http://gerrit.cloudera.org:8080/14121
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Previously, when creating a new database, the CatalogOpExecutor would
create an HMS Database object, issue the HMS createDatabase call, and
then create a Catalog entry from that same Database object. The
resulting Catalog entry would be missing certain fields that are
auto-created by the HMS itself, most importantly the location field.
The code for CTAS seems to have contained a workaround for this issue
ever since catalogd was first introduced: rather than using the location
stored in the Db object, it would re-fetch the Database from HMS.
Now that this is fixed, that workaround could be removed and some code
simplified.
A new test verifies that a newly-created database has the appropriate
location, and existing CTAS tests verify that functionality didn't
regress.
Change-Id: I13df31cee1e5768b073e0e35c4c16ebf1892be23
Reviewed-on: http://gerrit.cloudera.org:8080/11229
Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this commit it was quite random which DDL oprations
returned a result set and which didn't.
With this commit, every DDL operations return a summary of
its execution. They declare their result set schema in
Frontend.java, and provide the summary in CalatogOpExecutor.java.
Updated the tests according to the new behavior.
Change-Id: Ic542fb8e49e850052416ac663ee329ee3974e3b9
Reviewed-on: http://gerrit.cloudera.org:8080/9090
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds new parametrization to the unique database fixture:
- num_dbs: allows creating multiple unique databases at once;
the 2nd, 3rd, etc. datbase name is generated by appending
"2", "3", etc., to the first database name
- sync_ddl: allows creating the dabatases(s) with sync_ddl
which is needed by most tests in test_ddl.py
Testing: I ran debug/core and debug/exhaustive on HDFS and
core/debug on S3. Also ran the test locally in a loop on
exhaustive.
Change-Id: Idf667dd5e960768879c019e2037cf48ad4e4241b
Reviewed-on: http://gerrit.cloudera.org:8080/4155
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This is the first in a series of patches to clean up test_ddl.py
Summary of changes:
- Break up test_create() and corresponding .test files into:
* test_create_database()
* test_create_table()
* test_create_table_like_table()
* test_create_table_like_file()
* test_create_table_as_select()
- Merge test_nested() into the tests above
- Move a test into test_hms_integration.py
- Add a new test_ddl_base.py as base class for DDL tests.
The plan is to split up test_ddl.py into several smaller
.py files in subsequent patches.
Testing: I tested test_ddl.py and test_hms_integration.py on
exhaustive locally as well as in private builds on all filesystems.
Change-Id: I5f4c044d39e165c2535961b8d0a765c8dbbd051c
Reviewed-on: http://gerrit.cloudera.org:8080/3044
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>