59 Commits

Author SHA1 Message Date
Sai Hemanth Gantasala
b67a9cecb3 IMPALA-13593: Enable event processor to consume ALTER_PARTITIONS events
from metastore

HIVE-27746 introduced ALTER_PARTITIONS event type which is an
optimization of reducing the bulk ALTER_PARTITION events into a single
event. The components version is updated to pick up this change. It
would be a good optimization to include this in Impala so that the
number of events consumed by event processor would be significantly
reduced and help event processor to catch up with events quickly.

This patch enables the ability to consume ALTER_PARTITIONS event. The
downside of this patch is that, there is no before_partitions object in
the event message. This can cause partitions to be refreshed even on
trivial changes to them. HIVE-29141 will address this concern.

Testing:
- Added an end-to-end test to verify consuming the ALTER_PARTITIONS
event. Also, bigger time outs were added in this test as there was
flakiness observed while looping this test several times.

Change-Id: I009a87ef5e2c331272f9e2d7a6342cc860e64737
Reviewed-on: http://gerrit.cloudera.org:8080/22554
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2025-08-28 06:53:32 +00:00
Zoltan Borok-Nagy
438461db9e IMPALA-14138: Manually disable block location loading via Hadoop config
For storage systems that support block location information (HDFS,
Ozone) we always retrieve it with the assumption that we can use it for
scheduling, to do local reads. But it's also typical that Impala is not
co-located with the storage system, not even in on-prem deployments.
E.g. when Impala runs in containers, and even if they are co-located,
we don't try to figure out which container runs on which machine.

In such cases we should not reach out to the storage system to collect
file information because it can be very expensive for large tables and
we won't benefit from it at all. Since currently there is no easy way
to tell if Impala is co-located with the storage system this patch
adds configuration options to disable block location retrieval during
table loading.

It can be disabled globally via Hadoop Configuration:

'impala.preload-block-locations-for-scheduling': 'false'

We can restrict it to filesystem schemes, e.g.:

'impala.preload-block-locations-for-scheduling.scheme.hdfs': 'false'

When multiple storage systems are configured with the same scheme, we
can still control block location loading based on authority, e.g.:

'impala.preload-block-locations-for-scheduling.authority.mycluster': 'false'

The latter only disables block location loading for URIs like
'hdfs://mycluster/warehouse/tablespace/...'

If block location loading is disabled by any of the switches, it cannot
be re-enabled by another, i.e. the most restrictive setting prevails.
E.g:
  disable scheme 'hdfs', enable authority 'mycluster'
     ==> hdfs://mycluster/ is still disabled

  disable globally, enable scheme 'hdfs', enable authority 'mycluster'
     ==> hdfs://mycluster/ is still disabled, as everything else is.

Testing:
 * added unit tests for FileSystemUtil
 * added unit tests for the file metadata loaders
 * custom cluster tests with custom Hadoop configuration

Change-Id: I1c7a6a91f657c99792db885991b7677d2c240867
Reviewed-on: http://gerrit.cloudera.org:8080/23175
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-07-17 13:08:15 +00:00
Csaba Ringhofer
6f2d9a24d8 IMPALA-13920: Allow running minicluster with Java 17
IMPALA-11941 allowed building Impala and running tests with Java 17,
but it still uses Java 8 for minicluster components (e.g. Hadoop) and
skips several tests that would restart Hive. It should be possible to
use 17 for everything to be able to deprecate Java 8.

This patch mainly fixes Yarn+Hive+Tez startup issues with java 17 by
setting JAVA_TOOL_OPTIONS.

Another issues fixed is KuduHMSIntegrationTest: this test fails to
restart Kudu due to a bug in OpenJDK (see IMPALA-13856). The current
fix is to remove LD_PRELOAD to avoid loading libjsig (similarly to
the case when MINICLUSTER_JAVA_HOME is set). This works, but it
would be nice to clean up this area in a future patch.

Testing:
- ran exhaustive tests with Java 17
- ran core tests with default Java 8

Change-Id: If58b64a21d14a4a55b12dfe9ea0b9c3d5fe9c9cf
Reviewed-on: http://gerrit.cloudera.org:8080/22705
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2025-04-04 17:50:01 +00:00
Fang-Yu Rao
3a2f5f28c9 IMPALA-12921, IMPALA-12985: Support running Impala with locally built Ranger
The goals and non-goals of this patch could be summarized as follows.
Goals:
 - Add changes to the minicluster configuration that allow a non-default
   version of Ranger (possibly built locally) to run in the context of
   the minicluster, and to be used as the authorization server by
   Impala.
 - Switch to the new constructor when instantiating
   RangerAccessRequestImpl. This resolves IMPALA-12985 and also makes
   Impala compatible with Apache Ranger if RangerAccessRequestImpl from
   Apache Ranger is consumed.
 - Prepare Ranger and Impala patches as supplemental material to verify
   what authorization-related tests could be passed if Apache Ranger is
   the authorization provider. Merging IMPALA-12921_addendum.diff to
   the Impala repository is not in the scope of this patch in that the
   diff file changes the behavior of Impala and thus more discussion is
   required if we'd like to merge it in the future.

Non-goals:
 - Set up any automation for building Ranger from source.
 - Pass all Impala authorization-related tests with a non-default
   version of Ranger.

Instructions on running Impala with locally built Ranger:

Suppose the Ranger project is under the folder $RANGER_SRC_DIR. We could
execute the following to build Apache Ranger for easy reference. By
default, the compressed tarball is produced under
$RANGER_SRC_DIR/target.

mvn clean compile -B -nsu -DskipCheck=true -Dcheckstyle.skip=true \
package install -DskipITs -DskipTests -Dmaven.javadoc.skip=true

After building Ranger, we need to build Impala's Java code so that
Impala's Java code could consume the locally produced Ranger classes. We
will need to export the following environment variables before building
Impala. This prevents bootstrap_toolchain.py from trying to download the
compressed Ranger tarball.

1. export RANGER_VERSION_OVERRIDE=\
   $(mvn -f $RANGER_SRC_DIR/pom.xml -q help:evaluate \
   -Dexpression=project.version -DforceStdout)

2. export RANGER_HOME_OVERRIDE=$RANGER_SRC_DIR/target/\
   ranger-${RANGER_VERSION_OVERRIDE}-admin

It then suffices to execute the following to point
Impala to the locally built Ranger server before starting Impala.

1. source $IMPALA_HOME/bin/impala-config.sh

2. tar zxv -f $RANGER_SRC_DIR/target/\
   ranger-${IMPALA_RANGER_VERSION}-admin.tar.gz \
   -C $RANGER_SRC_DIR/target/

3. $IMPALA_HOME/bin/create-test-configuration.sh

4. $IMPALA_HOME/bin/create-test-configuration.sh \
   -create_ranger_policy_db

5. $IMPALA_HOME/testdata/bin/run-ranger.sh
   (run-all.sh has to be executed instead if other underlying services
   have not been started)

6. $IMPALA_HOME/testdata/bin/setup-ranger.sh

Testing:
 - Manually verified that we could point Impala to a locally built
   Apache Ranger on the master branch (with tip being
   https://github.com/apache/ranger/commit/4abb993).
 - Manually verified that with RANGER-4771.diff and
   IMPALA-12921_addendum.diff, only 3 authorization-related tests
   failed. They failed because the resource type of 'storage-type' is
   not supported in Apache Ranger yet and thus the test cases added in
   IMPALA-10436 could fail.
 - Manually verified that the log files of Apache and CDP Ranger's Admin
   server could be created under ${RANGER_LOG_DIR} after we start the
   Ranger service.
 - Verified that this patch passed the core tests when CDP Ranger is
   used.

Change-Id: I268d6d4d6e371da7497aac8d12f78178d57c6f27
Reviewed-on: http://gerrit.cloudera.org:8080/21160
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-06-15 10:25:13 +00:00
stiga-huang
5250cc14b6 IMPALA-12827: Fix failures in processing AbortTxnEvent due to aborted write id is cleaned up
HdfsTable tracks the ValidWriteIdList from HMS. When the table is
reloaded, the ValidWriteIdList is updated to the latest state. An
ABORT_TXN event that is lagging behind could match to aborted write ids
that have already been cleaned up by the HMS housekeeping thread. Such
write ids can't be found in the cached ValidWriteIdList as opened or
aborted write ids. This hits a Precondition check and fails the event
processing.

This patch fixes the check to allow this case. Also adds more logs for
dealing with write ids.

Tests
 - Add custom-cluster test to start Hive with the housekeeping thread
   turned on and verified that such ABORT_TXN event is processed
   correctly.

Change-Id: I93b6f684d6e4b94961d804a0c022029249873681
Reviewed-on: http://gerrit.cloudera.org:8080/21071
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-02-27 17:54:42 +00:00
Michael Smith
0a42185d17 IMPALA-9627: Update utility scripts for Python 3 (part 2)
We're starting to see environments where the system Python ('python') is
Python 3. Updates utility and build scripts to work with Python 3, and
updates check-pylint-py3k.sh to check scripts that use system python.

Fixes other issues found during a full build and test run with Python
3.8 as the default for 'python'.

Fixes a impala-shell tip that was supposed to have been two tips (and
had no space after period when they were printed).

Removes out-of-date deploy.py and various Python 2.6 workarounds.

Testing:
- Full build with /usr/bin/python pointed to python3
- run-all-tests passed with python pointed to python3
- ran push_to_asf.py

Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb
Reviewed-on: http://gerrit.cloudera.org:8080/19697
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-04-26 18:52:23 +00:00
Michael Smith
830625b104 IMPALA-9442: Add Ozone to minicluster
Adds Ozone as an alternative to hdfs in the minicluster. Select by
setting `export TARGET_FILESYSTEM=ozone`. With that flag,
run-mini-dfs.sh will start Ozone instead of HDFS. Requires a snapshot
because Ozone does not support HBase (HDDS-3589); snapshot loading
doesn't work yet primarily due to HDDS-5502.

Uses the o3fs interface because Ozone puts specific restrictions on
bucket names (no underscores, for instance), and it was a lot easier to
use an interface where everything is written to a single bucket than to
update all Impala's use of HDFS-style paths to make `test-warehouse` a
bucket inside a volume.

Specifies reduced Ozone client retries during shutdown where Ozone may
not be available.

Passes tests with FE_TEST=false BE_TEST=false.

Change-Id: Ibf8b0f7b2d685d8b011df1926e12bf5434b5a2be
Reviewed-on: http://gerrit.cloudera.org:8080/18738
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2022-08-03 16:58:20 +00:00
Vihang Karajgaonkar
cc6f6d5c91 IMPALA-11028: Table loading can fail when events are cleaned up
IMPALA-10502 introduces a createEventId field of a table which
is updated when Impala creates a table. This is used by
the events processor to determine if the subsequent CREATE_TABLE
event which is received should be skipped or not.

When the table is loaded for the first time, in order to avoid
race conditions, TableLoader updates the createEventId to the
last CREATE_TABLE event id from the metastore. In order to
fetch the latest CREATE_TABLE event id, it fetches all the
events from metastore since the last known createEventId of the
table. However, if there is a significant delay between
(more than 24hrs) between the time a table is created
or invalidated, and the table is queried, it is possible that
the metastore cleanup thread deletes the events which are generated
since the table's createEventId. In such a case, the HMS Client method
getNextNotification() throws an IllegalStateException due to the missing
events. This exception causes the Table load to fail and query to error
out.

The fix is to not rely on the HMS Client method which throws the
IllegalStateException. Instead we use the backing thrift API directly.

Testing:
1. Introduced a custom cluster test which can reproduce this issue.
2. Test works after the patch.
3. Core tests.

Change-Id: I95e5e20e1a2086688a92abdfb28e89177e996a1a
Reviewed-on: http://gerrit.cloudera.org:8080/18038
Reviewed-by: Vihang Karajgaonkar <vihang@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-11-23 07:45:47 +00:00
Fang-Yu Rao
1b863132c6 IMPALA-10211 (Part 1): Add support for role-related statements
This patch adds the support for the following role-related statements.
1. CREATE ROLE <role_name>.
2. DROP ROLE <role_name>.
3. GRANT ROLE <role_name> TO GROUP <group_name>.
4. REVOKE ROLE <role_name> FROM GROUP <group_name>.
5. GRANT <privilege> ON <resource> TO ROLE <role_name>.
6. REVOKE <privilege> ON <resource> FROM ROLE <role_name>.
7. SHOW GRANT ROLE <role_name> ON <resource>.
8. SHOW ROLES.
9. SHOW CURRENT ROLES.
10. SHOW ROLE GRANT GROUP <group_name>.

To support the first 4 statements, we implemented the methods of
createRole()/dropRole(), and grantRoleToGroup()/revokeRoleFromGroup()
with their respective API calls provided by Ranger. To support the 5th
and 6th statements, we modified createGrantRevokeRequest() so that the
cases in which the grantee or revokee is a role could be processed. We
slightly extended getPrivileges() so as to include the case when the
principal is a role for the 7th statement. For the last 3 statements, to
make Impala's behavior consistent with that when Sentry was the
authorization provider, we based our implementation on
SentryImpaladAuthorizationManager#getRoles() at
https://gerrit.cloudera.org/c/15833/8/fe/src/main/java/org/apache/impala/authorization/sentry/SentryImpaladAuthorizationManager.java,
which was removed in IMPALA-9708 when we dropped the support for Sentry.

To test the implemented functionalities, we based our test cases on
those at
https://gerrit.cloudera.org/c/15833/8/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test.
We note that before our tests could be automatically run in a
Kerberized environment (IMPALA-9360), in order to run the statements of
CREATE/DROP ROLE <role_name>,
GRANT/REVOKE ROLE <role_name> TO/FROM GROUP <group_name>, and
SHOW ROLES, we revised security-applicationContext.xml, one of the files
needed when the Ranger server is started, so that the corresponding API
calls could be performed in a non-Kerberized environment.

During the process of adding test cases to grant_revoke.test, we found
the following differences in Impala's behavior between the case when
Ranger is the authorization provider and that when Sentry is the
authorization provider. Specifically, we have the following two major
differences.
1. Before dropping a role in Ranger, we have to remove all the
privileges granted to the role in advance, which is not the case when
Sentry is the authorization provider.
2. The resource has to be specified for the statement of
SHOW GRANT ROLE <role_name> ON <resource>, which is different when
Sentry is the authorization provider. This could be partly due to the
fact that there is no API provided by Ranger that allows Impala to
directly retrieve the list of all privileges granted to a specified
role.
Due to the differences in Impala's behavior described above, we had to
revise the test cases in grant_revoke.test accordingly.

On the other hand, to include as many test cases that were in the
original grant_revoke.test as possible, we had to explicitly add the
test section of 'USER' to specify the connecting user to Impala for some
queries that require the connecting user to be a Ranger administrator,
e.g., CREATE/DROP ROLE <role_name> and
GRANT/REVOKE <role_name> TO/FROM GROUP <group_name>. The user has to be
'admin' in the current grant_revoke.test, whereas it could be the
default user 'getuser()' in the original grant_revoke.test because
previously 'getuser()' was also a Sentry administrator.

Moreover, for some test cases, we had to explicitly alter the owner of a
resource in the original grant_revoke.test when we would like to prevent
the original owner of the resource, e.g., the creator of the resource,
from accessing the resource since the original grant_revoke.test was run
without object ownership being taken into consideration.

We also note that in this patch we added the decorator of
@pytest.mark.execute_serially to each test in test_ranger.py since we
have observed that in some cases, e.g., if we are only running the E2E
tests in the Jenkins environment, some tests do not seem to be executed
sequentially.

Testing:
 - Briefly verified that the implemented statements work as expected in
   a Kerberized cluster.
 - Verified that test_ranger.py passes in a local development
   environment.
 - Verified that the patch passes the exhaustive tests in the DEBUG
   build.

Change-Id: Ic2b204e62a1d8ae1932d955b4efc28be22202860
Reviewed-on: http://gerrit.cloudera.org:8080/16837
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-12-21 14:29:52 +00:00
Vihang Karajgaonkar
f8c28f8adf IMPALA-9843: Add support for metastore db schema upgrade
This change adds support to upgrade the HMS database schema using the
hive schema tool. It adds a new option to the buildall.sh script
which can be provided to upgrade the HMS db schema. Alternatively,
users can directly upgrade the schema using the
create-test-configuration.sh script. The logs for the schema upgrade
are available in logs/cluster/schematool.log.

Following invocations will upgrade the HMS database schema.

1. buildall.sh -upgrade_metastore_db
2. bin/create-test-configuration.sh -upgrade_metastore_db

This upgrade option is idempotent. It is a no-op if the metastore
schema is already at its latest version. In case of any errors, the
only fallback currently is to format the metastore schema and load
the test data again.

Testing:
Upgraded the HMS schema on my local dev environment and made
sure that the HMS service starts without any errors.

Change-Id: I85af8d57e110ff284832056a1661f94b85ed3b09
Reviewed-on: http://gerrit.cloudera.org:8080/16054
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-06-12 18:46:11 +00:00
Joe McDonnell
3e76da9f51 IMPALA-9708: Remove Sentry support
Impala 4 decided to drop Sentry support in favor of Ranger. This
removes Sentry support and related tests. It retires startup
flags related to Sentry and does the first round of removing
obsolete code. This does not adjust documentation to remove
references to Sentry, and other dead code will be removed
separately.

Some issues came up when implementing this. Here is a summary
of how this patch resolves them:
1. authorization_provider currently defaults to "sentry", but
   "ranger" requires extra parameters to be set. This changes the
   default value of authorization_provider to "", which translates
   internally to the noop policy that does no authorization.
2. These flags are Sentry specific and are now retired:
 - authorization_policy_provider_class
 - sentry_catalog_polling_frequency_s
 - sentry_config
3. The authorization_factory_class may be obsolete now that
   there is only one authorization policy, but this leaves it
   in place.
4. Sentry is the last component using CDH_COMPONENTS_HOME, so
   that is removed. There are still Maven dependencies coming
   from the CDH_BUILD_NUMBER repository, so that is not removed.
5. To make the transition easier, testdata/bin/kill-sentry-service.sh
   is not removed and it is still called from testdata/bin/kill-all.sh.

Testing:
 - Core job passes

Change-Id: I8e99c15936d6d250cf258e3a1dcba11d3eb4661e
Reviewed-on: http://gerrit.cloudera.org:8080/15833
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-05-20 17:43:40 +00:00
Tim Armstrong
29a7ce67f5 IMPALA-9679: Remove some jars from Docker images
This removes a few transitive dependencies that
don't appear to be needed at runtime.

This also removes the frontend test jar. The inclusion
of that jar was masking an issue where some configs
were not accessible from within the container, because
they were symlinks to paths on the host.

Testing:
Ran dockerized tests in precommit.

Ran regular tests with CDP hive.

Change-Id: I030e7cd28e29cd4e077c0b4addd4d14a8599eed6
Reviewed-on: http://gerrit.cloudera.org:8080/15753
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-05-16 22:39:40 +00:00
Joe McDonnell
8357f61886 IMPALA-9444: Fix URL for postgresql jar download
The current URL uses http://central.maven.org, which has been
decommissioned as part of the transition to HTTPS. See:
https://central.sonatype.org/articles/2019/Jul/15/central-http-deprecation-update/
https://central.sonatype.org/articles/2020/Jan/15/501-https-required-error/
This switches the URL to use https://repo.maven.apache.org/.

Testing:
 - Removed postgresql jar, ran bin/create-test-configuration.sh,
   verified that it downloaded the jar.

Change-Id: I7ee9a1ce77bc3f8c6b3f728633cafe4eb37e669d
Reviewed-on: http://gerrit.cloudera.org:8080/15337
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-03-03 00:23:27 +00:00
stiga-huang
cad156181b IMPALA-9304: Support starting Hive with Ranger in minicluster
Add a new flag -with_ranger in testdata/bin/run-hive-server.sh to start
Hive with Ranger integration. The relative configuration files are
generated in bin/create-test-configuration.sh using a new varient
ranger_auth in hive-site.xml.py. Only Hive3 is supported.

Current limitation:
Can't use different username in Beeline by the -n option. "select
current_user()" keeps returning my username, while "select
logged_in_user()" can return the username given by -n option but it's
not used in authorization.

Tests:
 - Ran bin/create-test-configuration.sh and verified the generated
   hive-site_ranger_auth.xml contains Ranger configurations.
 - Ran testdata/bin/run-hive-server.sh -with_ranger. Verified column
   masking and row filtering policies took effect in Beeline.
 - Added test in test_ranger.py for this mode.

Change-Id: I01e3a195b00a98388244a922a1a79e65146cec42
Reviewed-on: http://gerrit.cloudera.org:8080/15189
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-14 04:26:16 +00:00
Tim Armstrong
6f150d383c IMPALA-9361: manually configured kerberized minicluster
The kerberized minicluster is enabled by setting
IMPALA_KERBERIZE=true in impala-config-*.sh.

After setting it you must run ./bin/create-test-configuration.sh
then restart minicluster.

This adds a script to partially automate setup of a local KDC,
in lieu of the unmaintained minikdc support (which has been ripped
out).

Testing:
I was able to run some queries against pre-created HDFS tables
with kerberos enabled.

Change-Id: Ib34101d132e9c9d59da14537edf7d096f25e9bee
Reviewed-on: http://gerrit.cloudera.org:8080/15159
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-08 05:16:12 +00:00
skyyws
497a17dbdc IMPALA-9266: Use custom cluster test case to replace
CreateKuduTableWithoutHMSTest.java

CreateKuduTableWithoutHMSTest.java test case need to restart impala
cluster, in order to keep original cluster environment unchanged, we
write a custom cluter test case test_kudu_table_create_without_hms.py
to replace.
Besides, we also add an envp 'WITHOUT_HMS' to keep original
hive-site.xml unchanged, just generate a new hive-site_ext.xml without
hms related config.

Change-Id: Ic532574af42ed864612cf28eecee9e0416ef272c
Reviewed-on: http://gerrit.cloudera.org:8080/14962
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-01-04 01:39:04 +00:00
stiga-huang
b6b31e4cc4 IMPALA-9071: Handle translated external HDFS table in CTAS
After upgrading Hive-3 to a version containing HIVE-22158, it's not
allowed for managed tables to be non transactional. Creating non ACID
tables will result in creating an external table with table property
'external.table.purge' set to true.

In Hive-3, the default location of external HDFS tables will be located
in 'metastore.warehouse.external.dir' if it's set. This property is
added by HIVE-19837 in Hive 2.7, but hasn't been added to Hive in cdh6
yet.

In CTAS statement, we create a temporary HMS Table for the analysis on
the Insert part. The table path is created assuming it's a managed
table, and the Insert part will use this path for insertion. However, in
Hive-3, the created table is translated to an external table. It's not
the same as we passed to the HMS API. The created table is located in
'metastore.warehouse.external.dir', while the table path we assumed is
in 'metastore.warehouse.dir'. This introduces bugs when these two
properties are different. CTAS statement will create table in one place
and insert data in another place.

This patch adds a new method in MetastoreShim to wrap the difference for
getting the default table path for non transactional tables between
Hive-2 and Hive-3.

Changes in the infra:
 - To support customizing hive configuration, add an env var,
   CUSTOM_CLASSPATH in bin/set-classpath.sh to be put in front of
   existing CLASSPATH. The customized hive-site.xml should be put inside
   CUSTOM_CLASSPATH.
 - Change hive-site.xml.py to generate a hive-site.xml with non default
   'metastore.warehouse.external.dir'
 - Add an option, --env_vars, in bin/start-impala-cluster.py to pass
   down CUSTOM_CLASSPATH.

Tests:
 - Add a custom cluster test to start Hive with
   metastore.warehouse.external.dir being set to non default value. Run
   it locally using CDP components with HIVE-22158. xfail the test until
   we bump CDP_BUILD_NUMBER to 1507246.
 - Run CORE tests using CDH components

Change-Id: I460a57dc877ef68ad7dd0864a33b1599b1e9a8d9
Reviewed-on: http://gerrit.cloudera.org:8080/14527
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2019-10-24 22:10:03 +00:00
Tim Armstrong
3b15a5c55a IMPALA-8650: Docker build should not depend on test config
Change-Id: Iaa70864f5d047d1ff5f21e69d8f6358306424c0b
Reviewed-on: http://gerrit.cloudera.org:8080/13597
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2019-06-13 15:19:29 +00:00
Todd Lipcon
17daa6efb9 IMPALA-8369 (part 2): Hive 3: switch to Tez-on-YARN execution
This switches away from Tez local mode to tez-on-YARN. After spending a
couple of days trying to debug issues with Tez local mode, it seemed
like it was just going to be too much of a lift.

This patch switches on the starting of a Yarn RM and NM when
USE_CDP_HIVE is enabled. It also switches to a new yarn-site.xml with a
minimized set of configurations, generated by the new python templating.

In order for everything to work properly I also had to update the Hadoop
dependency to come from CDP instead of CDH when using CDP Hive.
Otherwise, the classpath of the launched Tez containers had conflicting
versions of various Hadoop classes which caused tasks to fail.

I verified that this fixes concurrent query execution by running queries
in parallel in two beeline sessions. With local mode, these queries
would periodically fail due to various races (HIVE-21682). I'm also able
to get farther along in data loading.

Change-Id: If96064f271582b2790a3cfb3d135f3834d46c41d
Reviewed-on: http://gerrit.cloudera.org:8080/13224
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Todd Lipcon <todd@apache.org>
2019-05-10 13:42:55 +00:00
Austin Nobis
84addd2a4b IMPALA-8485: Authorization policy file clean up
This patch cleans up references to the deprecated authorization_policy_file
flag. The authz-policy.ini file is no longer created during the test config
creation. The reference is also removed from the gitignore.

Testing:
- All FE tests were run
- All authorization E2E tests were run
- test_authorization.py E2E test was updated to no longer have
  references to the authz-policy.ini file.

Change-Id: Ib1e90973cb3d5b243844d379e5cdcb2add4eec75
Reviewed-on: http://gerrit.cloudera.org:8080/13222
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-05-03 20:13:49 +00:00
Fredy Wijaya
5fa076e95c IMPALA-8329: Bump CDP_BUILD_NUMBER to 1013201
This patch bumps the CDP_BUILD_NUMBER to 1013201. This patch also
refactors the bootstrap_toolchain.py to be more generic for dealing with
CDP components, e.g. Ranger and Hive 3.

The patch also fixes some TODOs to replace the rangerPlugin.init() hack
with rangerPlugin.refreshPoliciesAndTags() API available in this Ranger
build.

Testing:
- Ran core tests
- Manually verified that no regression when starting Hive 3 with
  USE_CDP_HIVE=true

Change-Id: I18c7274085be4f87ecdaf0cd29a601715f594ada
Reviewed-on: http://gerrit.cloudera.org:8080/13002
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-04-17 05:30:33 +00:00
Laszlo Gaal
7bba2e4e4f IMPALA-8380: Bump Postgres JDBC driver version to 42.2.5
Testing on Ubuntu 18.04 with PostgreSQL 10 (the default for the OS)
revealed that HMS fails to start with the existing v9.0 Postgres JDBC
driver.

The patch bumps the driver version to 42.2.5, which allows HMS
and Sentry to start with PostreSQL 10.

To ensure that existing platforms are not broken, core tests were run:
- in Docker for CentOS 6, CentOS 7 and Ubuntu 16.04
- on Amazon VMs for CentOS 6.4 and CentOS 7.4
- Ubuntu 18.04 was tested on a VM and in Docker as well.

This is a joint effort with Lars Volker and Fredy Wijaya.

Change-Id: Ica5423c18a9f8346dda7dae617b1764638b57b6c
Reviewed-on: http://gerrit.cloudera.org:8080/12894
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-04-15 18:43:40 +00:00
Todd Lipcon
bcd7b04245 Clean up generation of XML configuration files
hive-site.xml and sentry-site.xml in particular had grown multiple
slightly-different variants, differing only in a few small pieces. This
was difficult to maintain: in fact, while attempting to clean them up I
found a number of places that the MySQL and Postgres versions of
hive-site had diverged for no apparent reason.

This moves away from using the sed-based templating for these
configuration files, and instead uses python as a poor man's template
system. That enables much simpler conditional logic.

I briefly considered XSLT for this, but decided that Python is probably
easier for the average developer to follow, modify, and debug.

Along the way, I removed a few flags which appear to be no longer used
by Hive 2 or later, and a few items which were already commented out in
the previous template:

- hive.stats.dbclass
- hive.stats.dbconnectionstring
- hive.stats.jdbcdriver

These are no longer relevant after HIVE-12164 ("Remove jdbc stats
collection mechanism") in Hive 2.0.

- hive.metastore.rawstore.impl

This has always defaulted to 'ObjectStore' in Hive, so there was no
reason to set it explicitly.

- test.log.dir
- test.src.dir

These were listed in the config in a commented-out section. These were
commented out ever since 2012 when the file was first introduced.

This also fixes the postgres URL to not include a misplaced ';create'
parameter (which applies to Derby but not postgres).

Change-Id: Ief4434d80baae0fd7be7ffe7b2e07bae1ac45e47
Reviewed-on: http://gerrit.cloudera.org:8080/12930
Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-04-06 00:08:50 +00:00
Vihang Karajgaonkar
6b77c61d94 IMPALA-8345 : Add option to set up minicluster to use Hive 3
As a first step to integrate Impala with Hive 3.1.0 this patch modifies
the minicluster scripts to optionally use Hive 3.1.0 instead of
CDH Hive 2.1.1.

In order to make sure that existing setups don't break this is
enabled via a environment variable override to bin/impala-config.sh.
When the environment variable USE_CDP_HIVE is set to true the
bootstrap_toolchain script downloads Hive 3.1.0 tarballs and extracts it
in the toolchain directory. These binaries are used to start the Hive
services (Hiveserver2 and metastore). The default is still CDH Hive 2.1.1

Also, since Hive 3.1.0 uses a upgraded metastore schema, this patch
makes use of a different database name so that it is easy to switch from
working from one environment which uses Hive 2.1.1 metastore to another
which usese Hive 3.1.0 metastore.

In order to start a minicluster which uses Hive 3.1.0 users should
follow the steps below:

1. Make sure that minicluster, if running, is stopped
before you run the following commands.
2. Open a new terminal and run following commands.
> export USE_CDP_HIVE=true
> source bin/impala-config.sh
> bin/bootstrap_toolchain.py
  The above command downloads the Hive 3.1.0 tarballs and extracts them
in toolchain/cdp_components-${CDP_BUILD_NUMBER} directory. This is a
no-op if the CDP_BUILD_NUMBER has not changed and if the cdp_components
are already downloaded by a previous invocation of the script.

> source bin/create-test-configuration.sh -create-metastore
   The above step should provide "-create-metastore" only the first time
so that a new metastore db is created and the Hive 3.1.0 schema is
initialized. For all subsequent invocations, the "-create-metastore"
argument can be skipped. We should still source this script since the
hive-site.xml of Hive 3.1.0 is different than Hive 2.1.0 and
needs to be regenerated.

> testdata/bin/run-all.sh

Note that the testing was performed locally by downloading the Hive 3.1
binaries into
toolchain/cdp_components-976603/apache-hive-3.1.0.6.0.99.0-9-bin. Once
the binaries are available in S3 bucket, the bootstrap_toolchain script
should automatically do this for you.

Testing Done:
1. Made sure that the cluster comes up with Hive 3.1 when the steps
above are performed.
2. Made sure that existing scripts work as they do currently when
argument is not provided.
3. Impala cluster comes and connects to HMS 3.1.0 (Note that Impala
still uses Hive 2.1.1 client. Upgrading client libraries in Impala will
be done as a separate change)

Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Reviewed-on: http://gerrit.cloudera.org:8080/12846
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-03-28 01:52:45 +00:00
Fredy Wijaya
f1fd72a8ee IMPALA-8261: Enhance create-test-configuration.sh to not fail when FE has not been built
This patch updates create-test-configuration.sh to not fail due to
missing PostgreSQL JDBC driver when FE has not been built by
downloading it from Maven Central instead. When the JDBC driver
already exists in ${POSTGRES_JDBC_DRIVER}, it will use that instead.

Testing:
Manually ran create-test-configuration.sh with and without FE built.

Change-Id: I6536dcffc1124e79c1ed111ad92d257493cc8feb
Reviewed-on: http://gerrit.cloudera.org:8080/12630
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-03-05 09:20:12 +00:00
fwijaya
0cb7187841 IMPALA-8099: Update the build scripts to support Apache Ranger
This patch updates the build scripts to suport Apache Ranger:
- Download Apache Ranger
- Setup Apache Ranger database
- Create Apache Ranger configuration files
- Start/stop Apache Ranger

Testing:
- Ran ./buildall.sh -format on a clean repository and was able to start
  Ranger without any problem.
- Ran test-with-docker

Change-Id: I249cd64d74518946829e8588ed33d5ac454ffa7b
Reviewed-on: http://gerrit.cloudera.org:8080/12469
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-02-15 21:28:05 +00:00
Tim Armstrong
ff628d2b13 IMPALA-7986,IMPALA-7987: run daemons in docker containers
This refactors start-impala-cluster.py to allow multiple implementations
of the minicluster operations like start and stop. There are now
two classes implementing the same set of operations -
MiniClusterOperations and DockerMiniClusterOperations. The docker
versions start and stop the containers added in IMPALA-7948.

With some configuration (see instructions below), the containers can
connect back to services (HDFS, HMS, Kudu, Sentry, etc) running on the
host. Config generation was modified so that services optionally
communicate via the docker bridge network rather than loopback
(the host's loopback interface is not accessible to the containers).

Notes:
* I improved the container build to regenerate containers when cluster
  configs are regenerated (previously the containers could have stale
  configs).
* Switch from CMD to ENTRYPOINT to allow passing in arguments to "docker
  run" without clobbering default args.
* Python 2.6 is not supported for this code path. This only affects
  CentOS 6, which has limited support for docker anyway.
* I deferred implementing wait_for_cluster(), since the existing
  code requires surgery to abstract out assumptions about locating
  processes and web UI ports - see IMPALA-7988.

How to use:
==========
Create a docker network to use for internal cluster communication,
e.g.:
  docker network create -d bridge --gateway=172.17.0.1 \
      --subnet=172.17.0.1/16 impala-cluster

Add the gateway address of the docker network you created to
impala-config-local.sh, e.g.:

  export INTERNAL_LISTEN_HOST=172.17.0.1
  export DEFAULT_FS=hdfs://${INTERNAL_LISTEN_HOST}:20500

Regenerate configs and docker images:

  . bin/impala-config.sh
  ./bin/create-test-configuration.sh
  ninja -j $IMPALA_BUILD_THREADS docker_images

Restart the minicluster and Impala services to pick up the config:

  ./testdata/bin/run-all.sh
  start-impala-cluster.py --docker_network impala-cluster

You can connect with impala-shell and run some queries. You will
likely run into issues, particularly if running against an existing
data load, since "localhost" or "127.0.0.1" get baked into HMS
table definitions.

Testing:
Ran exhaustive tests (not using Docker) to make sure I didn't break
anything.

Change-Id: I5975cced33fa93df43101dd47d19b8af12e93d11
Reviewed-on: http://gerrit.cloudera.org:8080/12095
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-01-18 04:56:49 +00:00
Adam Holley
23f5338bf6 Revert "Revert "IMPALA-7074: Update OWNER privilege on CREATE, DROP, and SET OWNER""
The problem was caused by update in Hive with changed notifications.
HIVE-15180 was added but was incomplete and resulted in the break.
HIVE-17747 fixed the issue by properly creating the messages.

Change-Id: I4b9276c36bf96afccd7b8ff48803a30b47062c3d
Reviewed-on: http://gerrit.cloudera.org:8080/11466
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-09-20 00:51:28 +00:00
Thomas Tauber-Marshall
23da624113 Revert "IMPALA-7074: Update OWNER privilege on CREATE, DROP, and SET OWNER"
This patch has been causing a large number of build failures. Revert
it until we figure out why.

Change-Id: I7f4fc028962d4c6a630456a12a65884a62f01442
Reviewed-on: http://gerrit.cloudera.org:8080/11456
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-09-18 02:11:48 +00:00
Adam Holley
e5b424ba4e IMPALA-7074: Update OWNER privilege on CREATE, DROP, and SET OWNER
This patch adds calls to automatically create or remove owner
privileges in the catalog based on the statement.  This is similar to
the existing pattern where after privileges are granted in Sentry,
they are created in the catalog directly instead of pulled from
Sentry.

When object ownership is enabled:
CREATE DATABASE will grant the user OWNER privileges to that database.
ALTER DATABASE SET OWNER will transfer the OWNER privileges to the
new owner.
DROP DATABASE will revoke the OWNER privileges from the owner.
This will apply to DATABASE, TABLE, and VIEW.

Example:
If ownership is enabled, when a table is created, the creator is the
owner, and Sentry will create owner privileges for the created table so
the user can continue working with it without waiting for Sentry
refresh.  Inserts will be available immediately.

Testing:
- Created new custom cluster tests for object ownership

Change-Id: I1e09332e007ed5aa6a0840683c879a8295c3d2b0
Reviewed-on: http://gerrit.cloudera.org:8080/11314
Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-09-14 06:03:44 +00:00
David Knupp
6e5ec22b12 IMPALA-7399: Emit a junit xml report when trapping errors
This patch will cause a junitxml file to be emitted in the case of
errors in build scripts. Instead of simply echoing a message to the
console, we set up a trap function that also writes out to a
junit xml report that can be consumed by jenkins.impala.io.

Main things to pay attention to:

- New file that gets sourced by all bash scripts when trapping
  within bash scripts:

  https://gerrit.cloudera.org/c/11257/1/bin/report_build_error.sh

- Installation of the python lib into impala-python venv for use
  from within python files:

  https://gerrit.cloudera.org/c/11257/1/bin/impala-python-common.sh

- Change to the generate_junitxml.py file itself, for ease of
  https://gerrit.cloudera.org/c/11257/1/lib/python/impala_py_lib/jenkins/generate_junitxml.py

Most of the other changes are to source the new report_build_error.sh
script to set up the trap function.

Change-Id: Idd62045bb43357abc2b89a78afff499149d3c3fc
Reviewed-on: http://gerrit.cloudera.org:8080/11257
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-23 18:33:58 +00:00
Tianyi Wang
2868bf569a IMPALA-7383: Configurable HMS and Sentry policy DB
Some developers keep multiple impala repos on their disk. Isolating
METASTORE_DB and SENTRY_POLICY_DB may help with switching between those
repos without reloading the data. This patch makes those DB names
configurable and default to an escaped IMPALA_HOME path.

Change-Id: I190d657cb95dfdf73ebd05e5dd24ef2a8e3156b8
Reviewed-on: http://gerrit.cloudera.org:8080/11104
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-09 18:07:40 +00:00
Fredy Wijaya
a203733fac IMPALA-7295: Remove IMPALA_MINICLUSTER_PROFILE=2
This patch removes the use of IMPALA_MINICLUSTER_PROFILE. The code that
uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it defaults to code from
IMPALA_MINICLUSTER_PROFILE=3. In order to reduce having too many code
changes in this patch, there is no code change for the shims. The shims
for IMPALA_MINICLUSTER_PROFILE=3 automatically become the default
implementation.

Testing:
- Ran core and exhaustive tests

Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08
Reviewed-on: http://gerrit.cloudera.org:8080/10940
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-07-14 01:03:18 +00:00
Philip Zeyliger
783de170c9 IMPALA-4277: Support multiple versions of Hadoop ecosystem
Adds support for building against two sets of Hadoop ecosystem
components. The control variable is IMPALA_MINICLUSTER_PROFILE_OVERRIDE,
which can either be set to 2 (for Hadoop 2, Hive 1, and so on) or 3 (for
Hadoop 3, Hive 2, and so on).

We intend (in a trivial follow-on change soon) to make 3 the new default
and to explicitly deprecate 2, but this change only does not switch the
default yet. We support both to facilitate a smoother transition, but
support will be removed soon in the Impala 3.x line.

The switch is done at build time, following the pattern from IMPALA-5184
(build fe against both Hive 1 & 2 APIs). Switching back and forth
requires running 'cmake' again. Doing this at build-time avoids
complicating the Java code with classloader configuration.

There are relatively few incompatible APIs. This implementation
encapsulates that by extracting some Java code into
fe/src/compat-minicluminicluster-profile-{2,3}. (This follows the
pattern established by IMPALA-5184, but, to avoid a proliferation
of directories, I've moved the Hive files into the same tree.)
pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). I
consolidated the Hive changes into the same directory structure.

For Maven, I introduced Maven "profiles" to handle the two cases where
the dependencies (and exclusions) differ. These are driven by the
$IMPALA_MINICLUSTER_PROFILE environment variable.

For Sentry, exception class names changed. We work around this by adding
"isSentry...(Exception)" methods with two different implementations.
Sentry is also doing some odd shading, whereby some exceptions are
"sentry.org.apache.sentry..."; we handle both. Similarly, the mechanism
to create a SentryAuthProvider is slightly different. The easiest way to
see the differences is to run:

  diff -u fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/util/SentryUtil.java
  diff -u fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/authorization/SentryAuthProvider.java

The Sentry work is based on a change by Zach Amsden.

In addition, we recently added an explicit "refresh" permission.  In
Sentry 2, this required creating an ImpalaPrivilegeModel to capture
that. It's a slight customization of Hive's equivalent class.

For Parquet, the difference is even more mechanical. The package names
gone from "parquet" to "org.apache.parquet". The affected code
was extracted into ParquetHelper, but only one copy exists. The second
copy is generated at build-time using sed.

In the rare cases where we need to behave differently at runtime,
MiniclusterProfile.MINICLUSTER_PROFILE is a class which encapsulates
what version we were built aginst. One of the cases is the results
expected by various frontend tests. I avoided the issue by translating
one error string into another, which handled the diversion in one place,
rather than complicating the several locations which look for "No
FileSystem for scheme..." errors.

The HBase APIs we use for splitting regions at test time changed.
This patch includes a re-write of that code for the new APIs. This
piece was contributed by Zach Amsden.

To work with newer versions of dependencies, I updated the version of
httpcomponents.core we use to 4.4.9.

We (Thomas Tauber-Marshall and I) uploaded new Hadoop/Hive/Sentry/HBase
binaries to s3://native-toolchain, and amended the shell scripts to
launch the right things. There are minor mechanical differences.  Some
of this was based on earlier work by Joe McDonnell and Zach Amsden.
Hive's logging is changed in Hive 2, necessitating creating a
log4j2.properties template and using it appropriately. Furthermore,
Hadoop3's new shell script re-writes do a certain amount of classpath
de-duplication, causing some issues with locating the relevant logging
configurations. Accomodations exist in the code to deal with that.

parquet-filtering.test was updated to turn off stats filtering. Older
Hive didn't write Parquet statistics, but newer Hive does. By turning
off stats filtering, we test what the test had intended to test.

For views-compatibility.test, it seems that Hive 2 has fixed certain
bugs that we were testing for in Hive. I've added a
HIVE=SUCCESS_PROFILE_3_ONLY mechanism to capture that.

For AuthorizationTest, different hive versions show slightly different
things for extended output.

To facilitate easier reviewing, the following files are 100% renames as identified by git; nothing
to see here.

 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetCatalogsReq.java (100%)
 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetColumnsReq.java (100%)
 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetFunctionsReq.java (100%)
 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetInfoReq.java (100%)
 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetSchemasReq.java (100%)
 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetTablesReq.java (100%)
 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/impala/compat/MetastoreShim.java (100%)
 rename fe/src/{compat-hive-2 => compat-minicluster-profile-3}/java/org/apache/impala/compat/MetastoreShim.java (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/kms-acls.xml.tmpl (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/kms-site.xml.tmpl (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/yarn-site.xml.tmpl (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-common (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-master (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-tserver (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/kudu/master.conf.tmpl (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/kudu/tserver.conf.tmpl (100%)

CreateTableLikeFileStmt had a chunk of code moved to ParquetHelper.java. This
was done manually, but without changing anything except what Java required in
terms of accessibility and boilerplate.

 rewrite fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java (80%)
 copy fe/src/{main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java => compat-minicluster-profile-3/java/org/apache/impala/analysis/ParquetHelper.java} (77%)

Testing: Ran core & exhaustive tests with both profiles.
Cherry-picks: not for 2.x.

Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7
Reviewed-on: http://gerrit.cloudera.org:8080/9716
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2018-03-23 20:56:00 +00:00
David Knupp
4ce23f72ba Move symlinked auxiliary tests/* to tests/functional/*
The layout of the Impala-auxiliary-tests/tests directory is
changing to allow for different kinds of tests to be saved
there. But just in case the new functional sub-directory
does not exist, preserve backwards compatibility with the
older layout.

Change-Id: Ifb2bbbebc38bbaf3d6a4ad01fa8dd918b7d99b3b
Reviewed-on: http://gerrit.cloudera.org:8080/8896
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins
2017-12-22 04:48:53 +00:00
Joe McDonnell
eecbbcb7c7 IMPALA-5941: Fix Metastore schema creation in create-test-configuration.sh
The Hive Metastore schema script includes other SQL
scripts using \i, which expects absolute paths. Since
we currently invoke it from outside the schema script
directory, it is unable to find those included scripts.

The fix is to switch to the Hive Metastore script
directory when invoking the schema script.

Change-Id: Ic312df4597c7d211d4ecd551d572f751aea0cd24
Reviewed-on: http://gerrit.cloudera.org:8080/8081
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: David Knupp <dknupp@cloudera.com>
2017-09-19 22:59:04 +00:00
Thomas Tauber-Marshall
ce31572792 IMPALA-5426: Update Hive schema script to 1.1.0
A recent update to Hive changed its schema, which is causing
the metastore not to come up when run against the latest version
of Hive as we've hard coded the Hive schema script version in
bin/create-test-configuration.sh to an old version.

This patch updates the version to the latest. The schema script
is included in Hive in the toolchain and the new version will
already be present.

By itself, this patch does not actually change the Hive schema
by default as the version we usually build against doesn't have
the change. A following patch will update impala-config.sh to pull
in the latest version of Hive.

In the long run, we should switch to using Hive's schema tool,
which can do this for us automatically (IMPALA-5430).

Testing:
- Ran an exhaustive private Jenkins build that passed.

Change-Id: I9ea3269c1f95f76d8c02b76a5dea3ca3aa324b70
Reviewed-on: http://gerrit.cloudera.org:8080/7072
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
2017-06-05 21:09:16 +00:00
Henry Robinson
19de09ab7d IMPALA-4160: Remove Llama support.
Alas, poor Llama! I knew him, Impala: a system
of infinite jest, of most excellent fancy: we hath
borne him on our back a thousand times; and now, how
abhorred in my imagination it is!

Done:

* Removed QueryResourceMgr, ResourceBroker, CGroupsMgr
* Removed untested 'offline' mode and NM failure detection from
  ImpalaServer
* Removed all Llama-related Thrift files
* Removed RM-related arguments to MemTracker constructors
* Deprecated all RM-related flags, printing a warning if enable_rm is
  set
* Removed expansion logic from MemTracker
* Removed VCore logic from QuerySchedule
* Removed all reservation-related logic from Scheduler
* Removed RM metric descriptions
* Various misc. small class changes

Not done:

* Remove RM flags (--enable_rm etc.)
* Remove RM query options
* Changes to RequestPoolService (see IMPALA-4159)
* Remove estimates of VCores / memory from plan

Change-Id: Icfb14209e31f6608bb7b8a33789e00411a6447ef
Reviewed-on: http://gerrit.cloudera.org:8080/4445
Tested-by: Internal Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2016-09-20 23:50:43 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
ishaan
db4bd623ed IMPALA-2131: The metastore database name should be a global constant.
Previously, we tried to dynamically name the metastore db. With the introduction of
metatsore snapshots, this is no longer necessary and may cause naming ambiguity if the
Impala repository has a non-standard directory structure.

This patch use a constant name - impala_hive -  defined as an environment variable in
impala-config.

Change-Id: Iadc59db8c538113171c9c2b8cea3ef3f6b3bd4fc
Reviewed-on: http://gerrit.cloudera.org:8080/517
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-02-03 00:58:50 +00:00
Tim Armstrong
0c6628af18 Use psql -q consistently
Use psql -q to suppress verbose output during metastore creation.
Also use -q instead of redirection everywhere for consistency.

Change-Id: I539da86a50d18546474b2cfdc848f992745a7875
Reviewed-on: http://gerrit.cloudera.org:8080/1884
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-01-26 21:15:04 +00:00
Tim Armstrong
05931e0e7f IMPALA-2853: prevent hadoop info logging from getting into generated SQL
The hadoop shell command was used by some data loading steps. When
invoked during data loading, there was no log4j config file in
$HADOOP_CONF_DIR, so it used the default logging settings, which sent
INFO and above to stdout.

This patch adds a log4j.properties file to where it will be picked up by
the hadoop shell command after impala-config.sh is sourced. INFO and
above are sent to stderr instead of stdout.

Change-Id: I94ae8a363504c1e9c7d16d096711bb0ae637f271
Reviewed-on: http://gerrit.cloudera.org:8080/1798
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-01-20 05:15:04 +00:00
Tim Armstrong
c9cb00f4a1 IMPALA-2847: only recreate Sentry Policy DB when formatting cluster
We should only need to recreate the Sentry Policy DB when formatting a
cluster. Previously buildall.sh always tried to create the database
regardless of whether it was needed. E.g. if a machine was just building
Impala without running tests, there is no need to create any of the test
databases. This fixes a regression when running buildall.sh on a machine
without postgres set up.

Change-Id: I35bb1cb275bb4da3f91f496010a7f6ee4daa2792
Reviewed-on: http://gerrit.cloudera.org:8080/1782
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2016-01-14 08:05:29 +00:00
Casey Ching
cfb1ab5c2c IMPALA-2781: Fix shell error reporting after chdir
The original error reporting relied on $0 being accessible from the
current working dir, which failed if a script changed the working dir
and $0 was relative. This updates the error reporting command to cd back
to the original dir before accessing $0.

Change-Id: I2185af66e35e29b41dbe1bb08de24200bacea8a1
Reviewed-on: http://gerrit.cloudera.org:8080/1666
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2016-01-14 07:10:54 +00:00
Tim Armstrong
4b5ad8cbfd Reduce log output for postgres db operations
Various test scripts operating on postgres databases output
unhelpful log messages, including "ERROR" messages that
aren't actual errors when trying to drop a database that doesn't exist.

Send useless output to /dev/null and consistently use || true to
ignore errors from dropdb.

Change-Id: I95f123a8e8cc083bf4eb81fe1199be74a64180f5
Reviewed-on: http://gerrit.cloudera.org:8080/1753
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-01-13 03:58:50 +00:00
Casey Ching
e2bfb6ae2f Misc improvements to shell scripts about error reporting
Changes:
  1) Consistently use "set -euo pipefail".
  2) When an error happens, print the file and line.
  3) Consolidated some of the kill scripts.
  4) Added better error messages to the load data script.
  5) Changed use of #!/bin/sh to bash.

Change-Id: I14fef66c46c1b4461859382ba3fd0dee0fbcdce1
Reviewed-on: http://gerrit.cloudera.org:8080/1620
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2015-12-17 18:25:27 +00:00
Dimitris Tsirogiannis
7ecf10365f CDH-22383: Impala hangs when querying HBase tables with large number of
columns

This commit fixes the issue where Impala hangs when querying an HBase
table with large (>500) number of columns. The issue was triggered by a
large memory allocation of a tuple buffer during the first GetNext call
of the HBase scanner that was causing an infinite loop where each iteration
was allocating a significant amount of memory. The fix is to dynamically
set the mem limit of a row batch based on the corresponding row size and to
dynamically set the maximum size of the tuple buffer so that it does not
exceed that limit.

Change-Id: Ia64f98b229772b50658af952fc641bf00f54f450
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4871
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4933
2014-10-23 15:29:51 -07:00
Lenni Kuff
4ecaf036ba [CDH5] Updates to support running Sentry Service via its service start scripts in /thirdparty
We previously had a wrapper script that started Sentry Service up in our test environment.
This ran in to some issue with the upgrade to Sentry v1.4 due to classpath conflicts with
other components. The fix is to add sentry to /thirdparty and use Sentry's startup scripts
to actually run the service. This is also a more realistic test environment.

The actual addition of Sentry to /thirdparty is not included in this change.

Change-Id: I4c5998cde4fc900b8a34037550459265298da4c4
2014-09-12 22:48:51 -07:00
Mike Yoder
75a97d3d7e [CDH5] Kerberize mini-cluster and Impala daemons
This is the first iteration of a kerberized development environment.
All the daemons start and use kerberos, with the sole exception of the
hive metastore.  This is sufficient to test impala authentication.

When buildall.sh is run using '-kerberize', it will stop before
loading data or attempting to run tests.

Loading data into the cluster is known to not work at this time, the
root causes being that Beeline -> HiveServer2 -> MapReduce throws
errors, and Beeline -> HiveServer2 -> HBase has problems.  These are
left for later work.

However, the impala daemons will happily authenticate using kerberos
both from clients (like the impala shell) and amongst each other.
This means that if you can get data into the mini-cluster, you could
query it.

Usage:
* Supply a '-kerberize' option to buildall.sh, or
* Supply a '-kerberize' option to create-test-configuration.sh, then
  'run-all.sh -format', re-source impala-config.sh, and then start
  impala daemons as usual.  You must reformat the cluster because
  kerberizing it will change all the ownership of all files in HDFS.

Notable changes:
* Added clean start/stop script for the llama-minikdc
* Creation of Kerberized HDFS - namenode and datanodes
* Kerberized HBase (and Zookeeper)
* Kerberized Hive (minus the MetaStore)
* Kerberized Impala
* Loading of data very nearly working

Still to go:
* Kerberize the MetaStore
* Get data loading working
* Run all tests
* The unknown unknowns
* Extensive testing

Change-Id: Iee3f56f6cc28303821fc6a3bf3ca7f5933632160
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4019
Reviewed-by: Michael Yoder <myoder@cloudera.com>
Tested-by: jenkins
2014-09-05 12:36:21 -07:00
Lenni Kuff
18ae6cf7e6 [CDH5] Add updated CDH5.2 dependencies
Change-Id: I9b5727fd42bea787b0693be352bee631a51af2da
2014-08-09 22:18:04 -07:00