This change adds support for authorizing based on policy metadata read from the Sentry
Service. Authorization is role based and roles are granted to user groups. Each role
can have zero or more privileges associated with it, granting fine grained access to
specific catalog objects at server, URI, database, or table scope. This patch only
adds support to authorize against metadata read from the Sentry Policy Service, it does
not add support for GRANT/REVOKE statements in Impala.
The authorization metadata is read by the catalog server from the Sentry Service and
propagated to all nodes in the cluster in the "catalog-update" statestore topic. To
enable the Catalog Server to read policy metadata, the --sentry_config must be
set to a valid sentry-site.xml config file.
On the impalad side, we continue to support authorization based on a file-based provider.
To enable file based authorization set the --authorization_policy_file to a
non-empty value. If --authorization_policy_file is not set, authorization will be done
based on cached policy metadata received from the Catalog Server (via the statestore).
TODO: There are still some issues with the Sentry Service that require disabling some of
the authorization tests and adding some workarounds. I have added comments in the code
where these workarounds are needed.
Change-Id: I3765748d2cdbe00f59eefa3c971558efede38eb1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2552
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
All other CDH components use slf4j version 1.7.5; Impala's use of an earlier version
causes a lot of benign warnings. This patch changes Impala's version to be the same
as the rest of the stack.
Change-Id: I297903d146c6b7642de5b6fa4eefa28a6a08fafe
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2541
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
This just updates the versions, it doesn't touch anything in /thirdparty.
Change parquet version to append SNAPSHOT
Added hadoop-hbase-compat jar in AUX_CLASSPATH and mapreduce/*.jar to HDFS
Change-Id: I4471ef4476997371cf49a9d54cfa63f2fda126e4
This re-enables a subset of the stable data errors tests and updates them to
work in our test framework. This includes support for updating results via --update_results.
This also lets us remove a lot of old code that was there only to support these disabled
tests.
Change-Id: I4c40c3976d00dfc710d59f3f96c99c1ed33e7e9b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1952
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2277
Changes include:
- version changes in impala-config
- version changes in various loading scripts
- hbase jars are no longer in hive/lib
- mini-llama script changes
- updates due to sentry api changes
- JDBC tests disabled
- unsupported types tests disabled.
Change-Id: If8cf1b7ad8e22aa4d23094b9a4b1047f7e9d93ee
This change cleans up our FE pom.xml file by removing unneeded
dependencies and system dependencies (system dependencies are now pulled in
from the Maven release repository).
The upside is that our pom is cleaner and it will also help reduce the likelihood of
broken dependencies since Maven will pull in the right versions. The downside
is that we now pull in quite a few more JARs.
Note: I was unable to find release artifacts for Sentry and Parquet so I leaving
those as "system" for now.
Change-Id: I0b917b09a02243d78d89747591ab6bccacf7cf38
Saving changes
Change-Id: I3697a7b44884c40e077b3e354fef76625e1b881d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1011
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
The following changes have been made:
-- Update hbase
-- Update hive
-- Update hadoop
-- Update the parquet version to 1.2.5
Change-Id: Id6ceaef0e9eebab27ffd408160116fa84ed300fb
This changes adds support for SQL statement authorization in Impala. The authorization
works by updating the Catalog API to require a User + Privilege when getting Table/Db
objects (and in the future can be extended to cover columns as well).
If the user doesn't have permission to access the object, an AuthorizationException is
thrown. The authorization checks are done during analysis as new Catalog objects are
encountered.
These changes build on top of the Hive Access code which handles the actually
processing of authorization requests. The authorization is currently based
on a "policy file" which will be stored in HDFS. This policy file is read once
on startup and then reloaded every 5 minutes. It can also be reloaded on a
specific impalad by executing a "refresh" command.
Authorization is enabled by setting:
--server_name='server1'
and then pointing the impalad to the policy file using the flag:
--authorization_policy_file=/path/to/policy/file
any authorization configuration problems will result in impalad failing to
start.
This change adds support for auxiliary worksloads, tests, and datasets. This is useful
to augment the regular test runs with some additional tests that do not belong in the
main Impala repo.
This patch adds
1. use boost uuid
2. add unit test for HiveServer2 metadata operation
3. add JDBC metadata unit test
4. implement all remaining HiveServer2: GetFunctions and GetTableTypes
5. remove in-process impala server from fe-support
This patch implements the HiveServer2 API.
We have tested it with Lenni's patch against the tpch workload. It has also
been tested manually against Hive's beeline with queries and metadata operations.
All of the HiveServer2 code is implemented in impala-hs2-server.cc. Beeswax
code is refactored to impala-beeswax-server.cc.
HiveServer2 has a few more metadata operations. These operations go through
impala-hs2-server to ddl-executor and then to FE. The logics are implemented in
fe/src/main/java/com/cloudera/impala/service/MetadataOp.java.
Because of the Thrift union issue, I have to modify the generated c++ file.
Therefore, all the HiveServer2 thrift generated c++ code are checked into
be/src/service/hiveserver2/. Once the thrift issue is resolved, I'll remove
these files.
Change-Id: I9a8fe5a09bf250ddc43584249bdc87b6da5a5881
With this change the Python tests will now be called as part of buildall and
the corresponding Java tests have been disabled. The new tests can also be
invoked calling ./tests/run-tests.sh directly.
This includes a fix from Nong that caused wrong results for limit on non-io
manager formats.
Added a script that starts an impalad "cluster" (impalad + state store) with
each impalad running on a different port. Also updated QueryTest to enable
running against an external impalad. This enables running all the tests against
a remote cluster or a local cluster setup with the script I added.
By default we run with the in-process impalad - to enable running against a
remove impalad use the flag:
mvn test -Duse_external_impalad=true
The same host/port flags work with this, for example:
mvn test -Duse_external_impalad=true -Dimpalad=hostName -Dfe_port=21000
This change enables running of the query tests (and potentially other tests in
the future) targeting an in-process or external process test environment. This
means that the tests can be run against a remote distributed cluster with
ImpalaD deployed - or run locally in-process.
To target a remote environment execute the tests using the following two flags:
mvn test -Dimpalad=<hostname of coordinator> -Dfe_port=21000
If these are not specified that the existing (in-process) test environment is
inferred.
The major parts of this change are:
ImpaladClientExecutor - this is a new client executor class that uses the
beeswax thrift interface to communicate with a target impalad instance.
TestUtilities - This class was updated to add support for running queries
against impalad using the Impalad client executor.
As part of this change I also split the query tests into a few separate files:
JoinQueryTest, InsertQueryTest, HBaseQueryTest, etc... This will make it easier
to pick which subset of tests you want to run. It will also help reduce our max
test log file size in the Jenkins runs.
To enable this I created a new 'BaseQueryTest' class that does much of the work
of choosing which combinations of File format, compression, batch size, etc to
run with.
Current shortcomings:
1) It would be nice for "Executor" and "ImpaladClientExecutor" to share a common
interface. None currently exists and I wasn't sure what a good one would be any
thoughts around this would be appreciated. Because of this I had to resort to
passing an "Exector" of type "Object" for the time being.
2) Beeswax API doesn't currently provide a way to specify things like the number
of execution nodes. For now we just ignore this parameter (it can be set by the
impalad instance).
3) Double and float values are formated with a larger prevision when executed
over the Beeswax interface. This causes results to be different and test failures.
A second checkin will update the in-process output to match that of the beeswax.
Depending on the execution mode, we will use a different set of test vectors so
we can help control test execution time. The ideas is that for checkins the tests
are run in the 'reduced' input set mode. For nightly builds we will run the
exhaustive set of test combinations.
This is controlled with a new flag specified when running the tests:
mvn -DtestExecutionMode=exhaustive test
or
mvn -DtestExecutionMode=reduced test
Note: If the -DtestExecutionMode is not specified it will default to reduced.
As part of this change a bunch of the test files had to be updated to be
parameterized. If they are no parameterized then they will not benefit from the new
coverage that has been added.
This change currently is just for the Query Tests. I would like to extract some
of this logic and generalize it for more test suites with a future checkin.
- added option to run with derby metastore, based on whether env var METASTORE_IS_DERBY is set
- emoved hardwired file locations from planner tests
- switching to linking statically against libthrift.a
Also added script rebuild.sh, which contains the build steps of buildall.sh (against impala sources).