Adds a flag --mem_limit_includes_jvm that alters memory accounting to
include the amount of memory we think that the JVM is likely to use.
By default this flag is false, so behaviour is unchanged.
We're not ready to change the default but I want to check this in to
enable experimentation.
Two metrics are counted towards the process limit:
* The maximum JVM heap size. We count this because the JVM memory
consumption can expand up to this threshold at any time.
* JVM non-heap committed memory. This can be a non-trivial amount of
memory (e.g. I saw 150MB on one production cluster). There isn't a
hard upper bound on this memory that I know of but should not
grow rapidly.
This requires adjustments in a couple of other places:
* Admission control previous assumed that all of the process memory
limit was available to queries (an assumption that is not strictly
true because of untracked memory, etc, but close enough). However,
the JVM heap makes a large part of the process limit unusable to
queries, so we should only admit up to "process limit - max JVM heap
size" per node.
* The buffer pool is now a percentage of the remaining process limit
after the JVM heap, instead of the total process limit.
Currently, end-to-end tests fail if run with this flag for two reasons:
* The default JVM heap size is 1/4 of physical memory, which means that
essentially all of the process memory limit is consumed by the JVM
heaps when we running 3 impala daemons per host, unless -Xmx is
explicitly set.
* If the heap size is limited to 1-2GB like below, then most tests pass
but TestInsert.test_insert_large_string fails because IMPALA-4865
lets it create giant strings that eat up all the JVM heap.
start-impala-cluster.py \
--impalad_args=--mem_limit_includes_jvm=true --jvm_args="-Xmx1g"
Testing:
Add a custom cluster test that uses the new option and validates the
the memory consumption values.
Change-Id: I39dd715882a32fc986755d573bd46f0fd9eefbfc
Reviewed-on: http://gerrit.cloudera.org:8080/10928
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this patch, running INVALIDATE METADATA when Sentry is
unavailable could cause Impala query to hang. PolicyReader thread in
SentryProxy is used by two use cases, one as a background thread
that periodically refreshes Sentry policy and another one as a
synchronous operation for INVALIDATE METADATA. For the background
thread, we need to swallow any exception thrown while refreshing the
Sentry policy in order to not kill the background thread. For a
synchronous reset operation, such as INVALIDATE METADATA, swallowing
an exception causes the Impala catalog to wait indefinitely for
authorization catalog objects that never get processed due to Sentry
being unavailable. The patch updates the code by not swallowing any
exception in INVALIDATE METADATA and return the exception to the
caller.
Testing:
- Ran all FE tests
- Added a new E2E test
- Ran all E2E authorization tests
Change-Id: Icff987a6184f62a338faadfdc1a0d349d912fc37
Reviewed-on: http://gerrit.cloudera.org:8080/11897
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reapplies change after fixing where frontend profile is placed in runtime
profile.
When computing incremental statistics by fetching the stats directly
from catalogd, a potentially expensive RPC is made from the impalad
coordinator to catalogd. This change adds metrics to the frontend
section of the profile to track how long the request takes, the size
of the compressed bytes received, and the number of partitions received.
The profile for a 'compute incremental ...' command on a table with
no statistics looks like this:
Frontend:
- StatsFetch.CompressedBytes: 0
- StatsFetch.TotalPartitions: 24
- StatsFetch.NumPartitionsWithStats: 0
- StatsFetch.Time: 26ms
And the profile looks as follows when the table has stats, so the stats
are fetched:
Frontend:
- StatsFetch.CompressedBytes: 24622
- StatsFetch.TotalPartitions: 23
- StatsFetch.NumPartitionsWithStats: 23
- StatsFetch.Time: 14ms
Testing:
- manual inspection
- e2e test to check the profile
Change-Id: I94559a749500d44aa6aad564134d55c39e1d5273
Reviewed-on: http://gerrit.cloudera.org:8080/11670
Reviewed-by: Tianyi Wang <twang@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds the ability to create a new log for each spawn of the
sentry service. This will enable better trouble shooting for the
custom cluster tests that restart the sentry service.
Testing:
- Ran all custom cluster tests.
Change-Id: I6e538af7fd6e6ea21dc3f4442bdebf3b31558516
Reviewed-on: http://gerrit.cloudera.org:8080/11624
Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When computing incremental statistics by fetching the stats directly
from catalogd, a potentially expensive RPC is made from the impalad
coordinator to catalogd. This change adds metrics to the frontend
section of the profile to track how long the request takes, the size
of the compressed bytes received, and the number of partitions received.
The profile for a 'compute incremental ...' command on a table with
no statistics looks like this:
Frontend:
- StatsFetch.CompressedBytes: 0
- StatsFetch.TotalPartitions: 24
- StatsFetch.NumPartitionsWithStats: 0
- StatsFetch.Time: 26ms
And the profile looks as follows when the table has stats, so the stats
are fetched:
Frontend:
- StatsFetch.CompressedBytes: 24622
- StatsFetch.TotalPartitions: 23
- StatsFetch.NumPartitionsWithStats: 23
- StatsFetch.Time: 14ms
Testing:
- manual inspection
- e2e test to check the profile
Change-Id: Ic9b268548c7a98c751eb99855ee08313d1d5a903
Reviewed-on: http://gerrit.cloudera.org:8080/11534
Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Fix tests to always pass query options via the query_options
parameter.
Modified the infrastructure to fail on non-erasure-coding builds if
tests pass in default query options in the wrong way.
Skip an restart test that makes assumptions about scheduling that EC
seems to break.
Testing:
Ran core tests with erasure coding enabled.
Change-Id: I4d809faedc0c45417519f13c73559efb6c54154e
Reviewed-on: http://gerrit.cloudera.org:8080/11536
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch simply adds a warning message to the log when the
authorization_policy_file run-time flag is used. Sentry has
deprecated the use of policy files and they do not support
user level privileges which are required for object ownership.
Here is the Jira where it will be removed. SENTRY-1922
Test:
- Added custom cluster test to validate logs
- Ran all custom cluster tests
Change-Id: Ibbb13f3ef1c3a00812c180ecef022ea638c2ebc7
Reviewed-on: http://gerrit.cloudera.org:8080/11502
Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fixes several issues around granting and revoking of
privileges. This includes:
- REVOKE ALL ON SERVER where the privilege has the grant option was
removing from the cache but not Sentry.
- With the addition of the grantoption to the name in the catalog
object, refactoring was required to make grants and revokes work
correctly.
Assertions with regard to granting and revoking:
- If there is a privilege that has the grant option, that privilege
can be revoked simply with "REVOKE privilege..." or the grant option
can be removed with "REVOKE GRANT OPTION ON..."
- We should not limit the privilege being revoked simply because it
has the grant option.
- If a privilege already exists without the grant option, granting the
privilege with the grant option should add the grant option to it.
- If a privilege already exists with the grant option, granting the
privilege without the grant option will not change anything as the
expectation is if you want to remove the grant option, you should
explicitly use the "REVOKE GRANT OPTION ON...".
Testing:
- Added new grant/revoke tests that validate cache and Sentry refresh
- Ran all FE, E2E, and custom-cluster tests.
Change-Id: I3be5c8f15e9bc53e9661347578832bf446abaedc
Reviewed-on: http://gerrit.cloudera.org:8080/11483
Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The bug that caused the erasure coding test failure was that the
default query options specified by the test overrode the allow_erasure_coded_files
option that was added by the custom cluster test infrastructure when running
erasure coded tests.
Testing:
Manually ran a custom cluster test with and without ERASURE_CODING=true
and with --capture=no and confirmed the right arguments were passed
to start-impala-cluster.py.
Change-Id: I14f60ea8746657a731e48850b0e48300a2b7c66d
Reviewed-on: http://gerrit.cloudera.org:8080/11463
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The problem was caused by update in Hive with changed notifications.
HIVE-15180 was added but was incomplete and resulted in the break.
HIVE-17747 fixed the issue by properly creating the messages.
Change-Id: I4b9276c36bf96afccd7b8ff48803a30b47062c3d
Reviewed-on: http://gerrit.cloudera.org:8080/11466
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds calls to automatically create or remove owner
privileges in the catalog based on the statement. This is similar to
the existing pattern where after privileges are granted in Sentry,
they are created in the catalog directly instead of pulled from
Sentry.
When object ownership is enabled:
CREATE DATABASE will grant the user OWNER privileges to that database.
ALTER DATABASE SET OWNER will transfer the OWNER privileges to the
new owner.
DROP DATABASE will revoke the OWNER privileges from the owner.
This will apply to DATABASE, TABLE, and VIEW.
Example:
If ownership is enabled, when a table is created, the creator is the
owner, and Sentry will create owner privileges for the created table so
the user can continue working with it without waiting for Sentry
refresh. Inserts will be available immediately.
Testing:
- Created new custom cluster tests for object ownership
Change-Id: I1e09332e007ed5aa6a0840683c879a8295c3d2b0
Reviewed-on: http://gerrit.cloudera.org:8080/11314
Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This enables support for Sentry authorization when LocalCatalog is
enabled. The design is detailed in a change to the comment on
CatalogdMetaProvider, but to recap it briefly here:
At a high level, this patch takes the approach of duplicating the "v1"
catalog flow for PRINCIPAL and PRIVILEGE catalog objects. Namely, the catalog
daemon publishes complete objects into the statestore topic, and the
impalad fully replicates them locally.
I took this approach rather than trying to do fine-grained caching and
invalidation for the following reasons:
- The PRINCIPAL and PRIVILEGE metadata is typically many orders of magnitude
smaller than table metadata. So, the benefit of fine-grained caching
and eviction is not as great.
- The PRINCIPAL and PRIVILEGE catalog objects are fairly tightly intertwined
with relationships between them and backwards mappings maintained from
groups back to principals. This logic is implemented by the
AuthorizationPolicy class. Implementing similar mapping in a
fine-grained caching approach would be a reasonable amount of work.
- This bit of code is under some current flux as others are working on
implementing more fine grained permissioning. Thus, trying to
duplicate the logic in a "fetch-on-demand" implementation might turn
out to be chasing somewhat of a moving target.
In order to take this approach, the patch is organized as follows:
- refactored some of the role/principal removal logic from ImpaladCatalog
into AuthorizationPolicy. This makes it easier to perform the similar
"subscribe" with less duplicate cdoe.
- changed catalogd to publish PRINCIPAL and PRIVILEGE objects to v2
catalogs in addition to v1.
- passed through LocalCatalog.getAuthPolicy to CatalogdMetaProvider, and
added an AuthorizationPolicy member there. This member is maintained
when we see PRINCIPAL and PRIVILEGE objects come via the catalog
updates.
- had to implement LocalCatalog.isReady() to ensure that we don't allow
user access until the first topic update has been consumed.
- additionally had to copy some other code from ImpaladCatalog to
protect against various races -- we need a CatalogDeltaLog as well as
careful sequencing of the order in which the objects apply.
With this patch and the following one to enable UDF support, I was able
to run the tests in tests/authorization successfully with LocalCatalog
enabled.
Change-Id: Iccce5aabdb6afe466fdaeae0fb3700c66e658558
Reviewed-on: http://gerrit.cloudera.org:8080/11358
Reviewed-by: Todd Lipcon <todd@apache.org>
Tested-by: Todd Lipcon <todd@apache.org>
This implements cache invalidation inside CatalogdMetaProvider. The
design is as follows:
- when the catalogd collects updates into the statestore topic, it now
adds an additional entry for each table and database. These additional
entries are minimal - they only include the object's name, but no
metadata. This new behavior is conditional on a new flag
--catalog_topic_mode. The default mode is to keep the old style, but
it can be configured to mixed (support both v1 and v2) or v2-only.
- the old-style topic entries are prefixed with a '1:' whereas the new
minimal entries are prefixed with a '2:'. The impalad will subscribe
to one or the other prefix depending on whether it is running with
--use_local_catalog. Thus, old impalads will not be confused by the
new entries and vice versa.
- when the impalad gets these topic updates, it forwards them through to
the catalog implementation. The LocalCatalog implementation forwards
them to the CatalogdMetaProvider, which uses them to invalidate
cached metadata as appropriate.
This patch includes some basic unit tests. I also did some manual
testing by connecting to different impalads and verifying that a session
connected to impalad #1 saw the effects of DDLs made by impalad #2
within a short period of time (the statestore topic update frequency).
Existing end-to-end tests cover these code paths pretty thoroughly:
- if we didn't automatically invalidate the cache on a coordinator
in response to DDL operations, then any test which expects to
"read its own writes" (eg access a table after creating one)
would fail
- if we didn't propagate invalidations via the statestore, then
all of the tests that use sync_ddl would fail.
I verified the test coverage above using some of the tests in
test_ddl.py -- I selectively commented out a few of the invalidation
code paths in the new code and verified that tests failed until I
re-introduced them. Along the way I also improved test_ddl so that, when
this code is broken, it properly fails with a timeout. It also has a bit
of expanded coverage for both the SYNC_DDL and non-SYNC cases.
I also wrote a new custom-cluster test for LocalCatalog that verifies
a few of the specific edge cases like detecting catalogd restart,
SYNC_DDL behavior in mixed mode, etc.
One notable exception here is the implementation of INVALIDATE METADATA
This turned out to be complex to implement, so I left a lengthy TODO
describing the issue and filed a JIRA.
Change-Id: I615f9e6bd167b36cd8d93da59426dd6813ae4984
Reviewed-on: http://gerrit.cloudera.org:8080/11280
Reviewed-by: Todd Lipcon <todd@apache.org>
Tested-by: Todd Lipcon <todd@apache.org>
Currently, incremental stats can consume a substantial
amount of metadata memory (per table, partition, column).
This metadata is transmitted from catalogd to all coordinators.
As a result, memory is used for all loaded tables that use
incremental stats all the time at all coordinators. A consequence
is that coordinators and catalogd die from OOM more often
when incremental stats are used and more network bandwidth is used.
This change removes incremental stats from impalads. These stats
are only needed when computing incremental statistics and merging
new results with the existing results. They are not used by queries.
As a result, the change requires that coordinators fetch
incremental stats directly from catalogd when computing incremental stats.
In addition, catalogd no longer sends incremental stats to coordinators
via the statestore.
The option is enabled by setting a new flag, --pull_incremental_statistics,
on the catalogd and all impalad coordinators.
Testing:
- manual testing
- added end-to-end tests with --pull_incremental_statistics enabled
for the compute-stats-incremental.test
- added fe CatalogTest for new catalogd service method
- passes exhaustive tests when --pull_incremental_statistics is enabled
and disabled
Change-Id: I9d564808ca5157afe4e091909ca6cdac76e60d6e
Reviewed-on: http://gerrit.cloudera.org:8080/11193
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Add the flag: --disable_catalog_data_ops_debug_only that skips loading
files from the file-system from catalogd. The flag is by default false
and its hidden. Its intent is to avoid time-consuming accesses to
the file-system when debugging metadata issues and the file-system
contents are not available. For example, a recent ~18 GB catalog
takes 10 hours to load without the flag set vs. 1 hour to load with
the flag. The extra time comes from accessing the file-system, failing,
and logging exceptions.
This flag specifically disables copying jars from the fs when loading
Java functions and it skips loading avro schema files. Additional cases
can be added under this flag if more are needed.
Testing:
- manually confirmed that jars and avro schema files are skipped.
- added a test to check the same behavior in a custom cluster test.
- ran core tests.
Change-Id: I15789fb489b285e2a6565025eb17c63cdc726354
Reviewed-on: http://gerrit.cloudera.org:8080/11191
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds support for loading Avro-formatted tables in LocalCatalog. In
the case that the table properties indicate a table is Avro-formatted,
the semantics are identical to the existing catalog implementation:
- if an explicit avro schema is specified, it overrides the schema
provided by the HMS
- if no explicit avro schema is specified, one is inferred, and then the
inferred schema takes the place of the one provided by the HMS (thus
promoting columns like TINYINT to INT)
- on COMPUTE STATS, if any discrepancy is discovered between the HMS
schema and the inferred schema, an error is emitted.
The semantics for LocalCatalog are slightly different in the case of
tables which have not been configured as Avro format on the table level:
The existing implementation has the behavior that, when a table is
loaded, all partitions are inspected, and, if any partition is
discovered with Avro format, the above rules are applied. This has some
very unexpected results, described in an earlier email to
dev@impala.apache.org [1]. To summarize that email thread, the existing
behavior was decided to be unintuitive and inconsistent with Hive.
Additionally, this behavior requires loading all partitions up-front,
which gets in the goal of lazy/granular metadata loading in
LocalCatalog.
Thus, the LocalCatalog implementation differs as follows:
- the "schema override" behavior ONLY occurs if the Avro file format has
been selected at a table level.
- if an Avro partition is added to a non-Avro table, and that partition
has a schema that isn't compatible with the table's schema, an error
will occur on read.
The thread additionally discusses adding an error message on "alter" to
prevent users from adding an Avro partition to a table with an
incompatible schema. To keep the scope of this patch minimal, that is
not yet implemented here. I filed IMPALA-7309 to change the behavior of
the existing catalog implementation to match.
A new test verifies the behavior, set to 'xfail' when running on the
existing catalog implementation.
[1] https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@%3Cdev.impala.apache.org%3E
Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Reviewed-on: http://gerrit.cloudera.org:8080/10970
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com>
This change removes the flag --use_krpc which allows users
to fall back to using Thrift based implementation of DataStream
services. This flag was originally added during development of
IMPALA-2567. It has served its purpose.
As we port more ImpalaInternalServices to use KRPC, it's becoming
increasingly burdensome to maintain parallel implementation of the
RPC handlers. Therefore, going forward, KRPC is always enabled.
This change removes the Thrift based implemenation of DataStreamServices
and also simplifies some of the tests which were skipped when KRPC
is disabled.
Testing done: core debug build.
Change-Id: Icfed200751508478a3d728a917448f2dabfc67c3
Reviewed-on: http://gerrit.cloudera.org:8080/10835
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This changes the default statestore interval for the custom cluster
tests. This can reduce the time taken for the cluster to start and
metadata to load. On some tests this resulted in saving 5+ seconds
per test. Overall it shaved around a minute off the custom cluster
tests.
Testing:
Ran 10 iterations of the tests.
Change-Id: Ia5d1612283ff420d95b0dd0ca5a2a67f56765f79
Reviewed-on: http://gerrit.cloudera.org:8080/10845
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In this patch we add a query option ALLOW_ERASURE_CODED_FILES, that
allows us to enable or disable the support of erasure coded files. Even
though Impala should be able to handle HDFS erasure coded files already,
this feature hasn't been tested thoroughly yet. Also, Impala lacks
metrics, observability and DDL commands related to erasure coding. This
is a query option instead of a startup flag because we want to make it
possible for advanced users to enable the feature.
We may also need a follow on patch to also disable the write path with
this flag.
Cherry-picks: not for 2.x
Change-Id: Icd3b1754541262467a6e67068b0b447882a40fb3
Reviewed-on: http://gerrit.cloudera.org:8080/10646
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change enables the switch to use KRPC by default.
This change also fixes a bug in KrpcDataStreamMgr to
check if maintenance thread was started before calling
Join() on it. This shows up in BE tests as the maintenance
thread isn't started in them.
Testing done: exhaustive build.
Change-Id: Iae736c1c1351758969b4d84e34fc5b2d048660a0
Reviewed-on: http://gerrit.cloudera.org:8080/9461
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
This change adds a flag "--use_krpc" to start-impala-cluster.py. The
flag is currently passed as an argument to the impalad daemon. In the
future it will also enable KRPC for the catalogd and statestored
daemons.
This change also adds a flag "--test_krpc" to pytest. When running tests
using "impala-py.test --test_krpc", the test cluster will be started
by passing "--use_krpc" to start-impala-cluster.py (see above).
This change also adds a SkipIf to skip tests based on whether the
cluster was started with KRPC support or not.
- SkipIf.not_krpc can be used to mark a test that depends on KRPC.
- SkipIf.not_thrift can be used to mark a test that depends on Thrift
RPC.
This change adds a meta test to make sure that the new SkipIf decorators
work correctly. The test should be removed as soon as real tests have
been added with the new decorators.
Change-Id: Ie01a5de2afac4a0f43d5fceff283f6108ad6a3ab
Reviewed-on: http://gerrit.cloudera.org:8080/9291
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins
Currently, impalad starts beeswax and hs2 servers even if the
catalog has not yet been initialized. As a result, client
connections see an error message stating that the impalad
is not yet ready.
This patch changes the impalad startup sequence to wait
until the catalog is received before opening beeswax and hs2 ports
and starting their servers.
Testing:
- python e2e tests that start a cluster without a catalog
and check that client connections are rejected as expected.
Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
Reviewed-on: http://gerrit.cloudera.org:8080/8202
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
The -use_local_tz_for_unix_timestamp_conversion flag exists
to specify if TIMESTAMPs should be interpreted as localtime
or UTC when converting to/from Unix time via builtins:
from_unixtime(bigint unixtime)
unix_timestamp(string datetime[, ...])
unix_timestamp(timestamp datetime)
However, the KuduScanner was calling into code that, when
the gflag above was set, interpreted Unix times as local
time. Unfortunately the write path (KuduTableSink) and some
FE TIMESTAMP code (see KuduUtil.java) did not have this
behavior, i.e. we were handling the gflag inconsistently.
Tests:
* Adds a custom cluster test to run Kudu test cases with
-use_local_tz_for_unix_timestamp_conversion.
* Adds tests for the new builtin
unix_micros_to_utc_timestamp() which run in a custom
cluster test (added test_local_tz_conversion.py) as well
as in the regular tests (added to test_exprs.py).
Change-Id: I423a810427353be76aa64442044133a9a22cdc9b
Reviewed-on: http://gerrit.cloudera.org:8080/7311
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
This commit introduces a new startup option, termed 'is_executor',
that determines whether an impalad process can execute query fragments.
The 'is_executor' option determines if a specific host will be included
in the scheduler's backend configuration and hence included in
scheduling decisions.
Testing:
- Added a customer cluster test.
- Added a new scheduler test.
Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Reviewed-on: http://gerrit.cloudera.org:8080/6628
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Impala Public Jenkins
With this commit we add the ability to limit catalog updates to a
limited set of coordinator nodes. A new startup option, termed
'is_coordinator' is added to indicate if a node is a coordinator.
Coordinators accept connections through HS2 and Beeswax interfaces
and can also participate in query execution. Non-coordinator nodes
do not receive catalog updates from the statestore, do not initialize
a query scheduler and cannot accept Beeswax and HS2 client connections.
Testing:
- Added a custom cluster test that launches a cluster in which the
number of coordinators is less than the cluster size and runs a number
of smoke queries.
- Successfully run exhaustive tests.
Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3
Reviewed-on: http://gerrit.cloudera.org:8080/6344
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Impala Public Jenkins
This patch addresses warning messages from pytest re: the imported
TestMatrix, TestVector, and TestDimension classes, which were being
collected as potential test classes. The fix was to simply prepend
the class names with Impala-
git grep -l 'TestDimension' | xargs \
sed -i 's/TestDimension/ImpalaTestDimension/g'
git grep -l 'TestMatrix' | xargs \
sed -i 's/TestMatrix/ImpalaTestMatrix/g'
git grep -l 'TestVector' | xargs \
sed -i 's/TestVector/ImpalaTestVector/g'
The tests all passed in an exhaustive run on the upstream jenkins
server:
http://jenkins.impala.io:8080/view/Utility/job/pre-review-test/8/
Change-Id: I06b7bc6fd99fbb637a47ba376bf9830705c1fce1
Reviewed-on: http://gerrit.cloudera.org:8080/5794
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
test_scratch_disk fails sporadically when trying to assert the presence
of log messages. This is probably caused by log caching, since after
such failures the log files do contains the lines in question.
I manually tested this by running the tests repeatedly for 2 days (10k
runs).
To make future diagnosis of similar problems easier, this change also
adds more output to assert_impalad_log_contains().
Change-Id: I9f21284338ee7b4374aca249b6556282b0148389
Reviewed-on: http://gerrit.cloudera.org:8080/5669
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Many of our test scripts have import statements that look like
"from xxx import *". It is a good practice to explicitly name what
needs to be imported. This commit implements this practice. Also,
unused import statements are removed.
Change-Id: I6a33bb66552ae657d1725f765842f648faeb26a8
Reviewed-on: http://gerrit.cloudera.org:8080/3444
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
All versions of pytest contain various bugs regarding test marking
(including skips) when tests are both:
1. class-level marked
2. inherited
More info is available in IMPALA-3614 and IMPALA-2943, but the gist is
that it's possible for some tests to be skipped when they shouldn't be.
This is happening pretty badly with the custom cluster tests, because
CustomClusterTestSuite has a class level skipif mark.
The easiest workaround for now is to remove the pytest skipif mark in
CustomClusterTestSuite and skip using explicit pytest.skip() in the
setup_class() method. Some CustomClusterTestSuite children implemented
their own setup_* methods, and I made some adjustments to them both to
clean them up and implement proper parent method calling via super().
Testing:
I ran the following combinations of all the custom cluster tests:
DEBUG / HDFS / core
RELEASE / HDFS / exhaustive
DEBUG / LOCAL / core
DEBUG / S3 / core
Before, we'd get situations in which most of the tests were skipped.
Consider the RELEASE/HDFS/exhaustive situation:
custom_cluster/test_admission_controller.py .....
custom_cluster/test_alloc_fail.py ss
custom_cluster/test_breakpad.py sssss
custom_cluster/test_delegation.py sss
custom_cluster/test_exchange_delays.py ss
custom_cluster/test_hdfs_fd_caching.py s
custom_cluster/test_hive_parquet_timestamp_conversion.py ss
custom_cluster/test_insert_behaviour.py ss
custom_cluster/test_legacy_joins_aggs.py s
custom_cluster/test_parquet_max_page_header.py s
custom_cluster/test_permanent_udfs.py sss
custom_cluster/test_query_expiration.py sss
custom_cluster/test_redaction.py ssss
custom_cluster/test_s3a_access.py s
custom_cluster/test_scratch_disk.py ssss
custom_cluster/test_session_expiration.py s
custom_cluster/test_spilling.py ssss
authorization/test_authorization.py ss
authorization/test_grant_revoke.py s
Now, more tests run appropriately:
custom_cluster/test_admission_controller.py .....
custom_cluster/test_alloc_fail.py ss
custom_cluster/test_breakpad.py sssss
custom_cluster/test_delegation.py ...
custom_cluster/test_exchange_delays.py ss
custom_cluster/test_hdfs_fd_caching.py .
custom_cluster/test_hive_parquet_timestamp_conversion.py ..
custom_cluster/test_insert_behaviour.py ..
custom_cluster/test_kudu_not_available.py .
custom_cluster/test_legacy_joins_aggs.py .
custom_cluster/test_parquet_max_page_header.py .
custom_cluster/test_permanent_udfs.py ...
custom_cluster/test_query_expiration.py ...
custom_cluster/test_redaction.py ....
custom_cluster/test_s3a_access.py s
custom_cluster/test_scratch_disk.py ....
custom_cluster/test_session_expiration.py .
custom_cluster/test_spilling.py ....
authorization/test_authorization.py ..
authorization/test_grant_revoke.py .
Change-Id: Ie301b69718f8690322cc3b4130fb1c715344779c
Reviewed-on: http://gerrit.cloudera.org:8080/3265
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Michael Brown <mikeb@cloudera.com>
This changes add breakpad crash handling support to catalogd, impalad,
and statestored. The destination folder for minidump files can be
configured via the 'minidump_path' command line flag. Leaving it empty
will disable minidump generation. The daemons will rotate minidump
files. The number of files to keep can be configured with the
'max_minidumps' command line flag.
Change-Id: I7a37a38488716ffe34296f3490ae291bbb7228d6
Reviewed-on: http://gerrit.cloudera.org:8080/2028
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
The problem: By default, all file descriptors opened by a process,
including sockets, are inherited by any forked child processes. This
includes the connection socket created at the beginning of each test
in ImpalaTestSuite.setup_class(). In
TestHiveMetaStoreFailure.test_hms_service_dies(), the Hive Metastore
is stopped and restarted, meaning the metastore in now a child process
of the test process. This causes the client connection not to be
closed when the parent process (the test) exits, meaning that one of a
finite number of connections (64) to Impala is left permanently in
use.
This would be barely noticeable except run-tests.py runs the mini
stress test with 4 * <num CPUs> concurrent clients by default. On our
build machines, this is 64 clients, which is also the default max
number of connections for an impalad. When a test process tries to
make the 65th connection (since the leaked connection is still there),
it blocks until a connection is freed up. Due to a quirk of the xdist
py.test plugin that I don't fully understand, the test framework will
not clean up test classes (and close the connections) until a number
of tests complete, causing the test process to deadlock.
The solution: use the close_fds argument to make sure the TCP socket
is closed in the spawned child process. This is also done in
CustomClusterTestSuite._start_impala_cluster() when it starts the new
cluster.
This patch also switches test_hms_failure.py to use check_call()
instead of call(), and explicitly caps the number of stress clients at
64.
Change-Id: I03feae922883a0624df1422ffb6ba5f1d83fb869
Reviewed-on: http://gerrit.cloudera.org:8080/1853
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
Check return code of start-impala-cluster.py and check that statestored
was found in test_custom_cluster. This avoids various strange scenarios
where the cluster wasn't created correctly.
Change-Id: Iebaf325d085b85ad156f2bf8a39dddcf6319fb09
Reviewed-on: http://gerrit.cloudera.org:8080/1765
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Allow Impala to start only with a running HMS (and no additional services like HDFS,
HBase, Hive, YARN) and use the local file system.
Skip all tests that need these services, use HDFS caching or assume that multiple impalads
are running.
To run Impala with the local filesystem, set TARGET_FILESYSTEM to 'local' and
WAREHOUSE_LOCATION_PREFIX to a location on the local filesystem where the current user has
permissions since this is the location where the test data will be extracted.
Test coverage (with core strategy) in comparison with HDFS and S3:
HDFS 1348 tests passed
S3 1157 tests passed
Local Filesystem 1161 tests passed
Change-Id: Ic9718c7e0307273382b1cc6baf203ff2fb2acd03
Reviewed-on: http://gerrit.cloudera.org:8080/1352
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Readability: Alex Behm <alex.behm@cloudera.com>
Previously Impala could erroneously decide to use non-writable scratch
directories, e.g. if /tmp/impala-scratch already exists and is not
writable by the current user.
With this change, if we cannot remove and recreate a fresh scratch directory,
it is not used. If we have no valid scratch directories, we log an
error and continue startup.
Add unit test for CreateDirectory to test behavior for success and
failure cases.
Add system tests to check logging and query execution in various
scenarios where we do not have scratch available.
Modify FilesystemUtil to use non-exception-throwing Boost functions to
avoid unhandled exceptions escaping into the rest of the Impala
codebase, which does not expect the use of exceptions.
Change-Id: Icaa8429051942424e1d811c54bde10102ac7f7b3
Reviewed-on: http://gerrit.cloudera.org:8080/565
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Many python files had a hashbang and the executable bit set though
they were not intended to be run a standalone script. That makes
determining which python files are actually scripts very difficult.
A future patch will update the hashbang in real python scripts so they
use $IMPALA_HOME/bin/impala-python.
Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba
Reviewed-on: http://gerrit.cloudera.org:8080/599
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
Add a few python custom cluster tests to check:
1) The server fails to start if redaction rules are bad, and the error
message appears in the log.
2) Without redaction rules set, Impala functions as before redaction was
introduced.
3) With redaction rules set, redacted values appear in the logs and web
ui instead of the "sensitive" raw values.
Change-Id: I70e6876d6df8e8afbf2c845f6c922c72d564cadb
Reviewed-on: http://gerrit.cloudera.org:8080/172
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This patch fixes two issues:
- Add API to buffered block mgr to allow an atomic Unpin and GetNewBlock. This has
the semantics of unpinning a block and giving the buffer to the new block. This
is necessary for the tuple stream to make sure another thread does not grab the
unpinned block in between.
- Buffer management reading an unpinned stream. Before moving onto a new block (and
unpinning the current), we need to make sure all the tuples returned from the
current block are returned up the operator tree.
Change-Id: I95ee58d1019dd971f6a7dc19ecafdfa54cdbf942
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4333
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
This change adds support for GRANT/REVOKE to Impala via the Sentry Service. This includes
support for creating and dropping roles, granting and revoking roles to/from groups,
granting/revoking privileges to/from roles, and commands to view role metadata.
The specific statements that are added in this patch are:
CREATE/DROP ROLE <roleName>
SHOW ROLES
SHOW ROLE GRANT GROUP <groupName>
GRANT/REVOKE ROLE <roleName> TO/FROM GROUP <groupName>
GRANT/REVOKE <privilegeSpec> TO/FROM <roleName
It does not include some of the fancier bulk-op syntax like support for granting multiple
roles to multiple groups in one statement.
This patch does not add support for the WITH GRANT OPTION to delegate GRANT/REVOKE
privileges to other users.
TODO:
* Authorize these statements on the client side. The current Sentry Service design makes
it difficult to authorize any GRANT/REVOKE statement on the client (Impala) side.
Privilege checks are done within the Sentry Service itself. There are a few different
options available to let Impala "fail fast" and those changes will come in a follow
on patch.
Change-Id: Ic6bd19f5939d3290255222dcc1a42ce95bd345e2
The test works by submitting a number of queries (parameterized) with
some delay between submissions (parameterized) and the ability to
submit to one impalad or many. The queries are set with the WAIT debug
action so that we have more control over the state that the admission
controller uses to make decisions. Each query is submitted on a
separate thread. Depending on the test parameters a varying number of
queries will be admitted, queued, and rejected. Once queries are
admitted, the query execution blocks and we can cancel the query in
order to allow another queued query to be admitted. The test tracks
the state of the admission controller using metric counters on each
impalad.
Change-Id: I455484a7f899032890b22c38592fcea1875f5399
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1413
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
(cherry picked from commit bc2a74d6da622de877422f926ff1892bed867bb1)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1624
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
Test suites that derive from common.CustomClusterTestSuite have a brand
new cluster for every tests case, which they can configure as they wish
with custom arguments using the @with_args() decorator.
A future improvement is to optionally only have one cluster per test
suite, to allow multiple tests to run more quickly if they share
configuration options.
Change-Id: I6abd5740e644996d7ca2800edf4ff11b839d1bc4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/882
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins