Commit Graph

87 Commits

Author SHA1 Message Date
Lenni Kuff
f34a0507bf [CDH5] Add support for Sentry Service to Impala
This change adds support for authorizing based on policy metadata read from the Sentry
Service. Authorization is role based and roles are granted to user groups. Each role
can have zero or more privileges associated with it, granting fine grained access to
specific catalog objects at server, URI, database, or table scope. This patch only
adds support to authorize against metadata read from the Sentry Policy Service, it does
not add support for GRANT/REVOKE statements in Impala.

The authorization metadata is read by the catalog server from the Sentry Service and
propagated to all nodes in the cluster in the "catalog-update" statestore topic. To
enable the Catalog Server to read policy metadata, the --sentry_config must be
set to a valid sentry-site.xml config file.

On the impalad side, we continue to support authorization based on a file-based provider.
To enable file based authorization set the --authorization_policy_file to a
non-empty value. If --authorization_policy_file is not set, authorization will be done
based on cached policy metadata received from the Catalog Server (via the statestore).

TODO: There are still some issues with the Sentry Service that require disabling some of
the authorization tests and adding some workarounds. I have added comments in the code
where these workarounds are needed.

Change-Id: I3765748d2cdbe00f59eefa3c971558efede38eb1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2552
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-06-03 07:19:52 -07:00
ishaan
10952da6e0 Change the slf4j version to harmonize with the rest of CDH.
All other CDH components use slf4j version 1.7.5; Impala's use of an earlier version
causes a lot of benign warnings. This patch changes Impala's version to be the same
as the rest of the stack.

Change-Id: I297903d146c6b7642de5b6fa4eefa28a6a08fafe
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2541
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-05-27 13:46:17 -07:00
Matthew Jacobs
ebc6c5894e External Data Source: Frontend and catalog changes
Initial frontend and catalog changes for external data sources.

Change-Id: Ia0e61ef97cfd7a4e138ef555c17f2e45bbf08c18
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2224
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit dfa14c828957f751db9c89bae0bdc040ce6f648c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2485
2014-05-08 14:56:19 -07:00
Lenni Kuff
13c794db91 [CDH5] Update dependency versions to CDH5.1.0
This just updates the versions, it doesn't touch anything in /thirdparty.
Change parquet version to append SNAPSHOT
Added hadoop-hbase-compat jar in AUX_CLASSPATH and mapreduce/*.jar to HDFS

Change-Id: I4471ef4476997371cf49a9d54cfa63f2fda126e4
2014-05-07 15:10:40 -07:00
Lenni Kuff
15327e8136 Migrate DataErrors tests to Python test framework, re-enable subset of tests
This re-enables a subset of the stable data errors tests and updates them to
work in our test framework. This includes support for updating results via --update_results.

This also lets us remove a lot of old code that was there only to support these disabled
tests.

Change-Id: I4c40c3976d00dfc710d59f3f96c99c1ed33e7e9b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1952
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2277
2014-04-18 02:25:11 -07:00
Matthew Jacobs
fc5ac1f707 [CDH5] Add yarn dependencies to frontend pom.xml necessary for RequestPoolUtils
Change-Id: I2869ec5d81481dd1803a90c5ceae1d0dd3662f6b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1761
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
2014-03-06 11:21:44 -08:00
Henry Robinson
a60c4779c8 [CDH5] Remove system dependencies for Sentry
Change-Id: Id0e95798b2c4060d906923756251c3ad7dee6ec5
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1590
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
2014-02-18 17:11:26 -08:00
Nong Li
53d7bbb97a [CDH5] Impala changes for updated thirdparty components.
Changes include:
  - version changes in impala-config
  - version changes in various loading scripts
  - hbase jars are no longer in hive/lib
  - mini-llama script changes
  - updates due to sentry api changes
  - JDBC tests disabled
  - unsupported types tests disabled.

Change-Id: If8cf1b7ad8e22aa4d23094b9a4b1047f7e9d93ee
2014-01-15 15:12:13 -08:00
Alex Behm
5ae53c2f80 Compilation fixes after rebasing.
Change-Id: I87348336b2489069d65f34821c1a3df3c5ca9512
2014-01-15 15:12:12 -08:00
Alex Behm
0614774706 Fixed reservation from MiniLlama by translating hosts of resource requests from impalad hostports to Hadoop DN hostports.
Change-Id: I7a9a26ec4309710f0ad62a1bd18fb076fe6dd120
2014-01-15 15:12:04 -08:00
Alex Behm
56e9c838fb [CDH5] Changed all Impala dependencies on Hive to use systemPath
to avoid pulling incompatible snapshot jars from Maven.
Added minimal Hive dependencies.

Change-Id: If36b7aa7a29d6c4fe22a4510e3aae52992dd7b74
2014-01-15 15:12:00 -08:00
Alex Behm
62f6c066a6 [CDH5] Added missing FE dependencies for FE tests. Accept LazyBinaryColumnarSerDe for RCFile tables (Hive's new default SerDe for RCFiles). Fixed Jdbc FE test by adding proper auth spec in the connection string.
Change-Id: I6ac3effa398ae01846949cebee1b4db273305aea
Reviewed-on: http://gerrit.ent.cloudera.com:8080/461
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-15 15:11:51 -08:00
Alex Behm
c295b5eda8 [CDH5] Fixed JDBC connectivity to Impala and Hive and related Impala tests. Hive now uses the simple SASL transport because its NOSASL transport is broken (HIVE-4232). Impala still uses the NOSASL transport. The changes also include more careful dependency management.
Change-Id: I16633dcef912dce20c8de8cf2f43c45a49460d20
2014-01-15 15:11:47 -08:00
Alex Behm
60003ad211 [CDH5] Changes to make Impala work on CDH5. Mostly fixing up dependency versions. Minor code changes to address HBase API changes.
Change-Id: Icbbeb13eefa29e38286328d45600117a383cd106
2014-01-15 15:11:23 -08:00
Lenni Kuff
01660374c6 Additional fe and testdata pom.xml cleanup
This change cleans up our FE pom.xml file by removing unneeded
dependencies and system dependencies (system dependencies are now pulled in
from the Maven release repository).

The upside is that our pom is cleaner and it will also help reduce the likelihood of
broken dependencies since Maven will pull in the right versions.  The downside
is that we now pull in quite a few more JARs.

Note: I was unable to find release artifacts for Sentry and Parquet so I leaving
those as "system" for now.

Change-Id: I0b917b09a02243d78d89747591ab6bccacf7cf38

Saving changes

Change-Id: I3697a7b44884c40e077b3e354fef76625e1b881d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1011
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:17 -08:00
Lenni Kuff
882b5f09b2 Update FE pom to use datanucleus 3.2 which is required by CDH4.5 Hive
Change-Id: I1c362eb68113e075eb72db71c2a94d74ceb7427f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/987
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:54:16 -08:00
ishaan
81b80c702c Upgrade thirdparty to use CDH4.5 bits.
The following changes have been made:
  -- Update hbase
  -- Update hive
  -- Update hadoop
  -- Update the parquet version to 1.2.5

Change-Id: Id6ceaef0e9eebab27ffd408160116fa84ed300fb
2014-01-08 10:54:09 -08:00
Nong Li
4800995d44 Add execution for Hive UDFs.
Change-Id: I6a5ad96fed77e2b8a2701f21a917a8eb7a11d500
Reviewed-on: http://gerrit.ent.cloudera.com:8080/458
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:25 -08:00
Lenni Kuff
24116169ad Disable DataErrorsTests due to IMPALA-614
Change-Id: I1c670c9a50ebea3ca875f552b02a8bda3b0906b4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/686
Tested-by: jenkins
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
2014-01-08 10:53:23 -08:00
Lenni Kuff
d6d1557fe7 Capture cluster logs with each test run / don't use mvn for starting cluster services
Change-Id: I708b547e49d035c5f029ea86119cc844ccbc5643
Reviewed-on: http://gerrit.ent.cloudera.com:8080/404
Tested-by: jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:52:40 -08:00
Nong Li
f63437e62a Move to parquet mr hive serde.
Change-Id: Id831b76b89b83c5ad1f270f76b34bf7390e6a06c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/200
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:52:10 -08:00
Lenni Kuff
8c264f0395 Added Sentry v1.1.0 to thirdparty 2014-01-08 10:51:48 -08:00
Alex Behm
8ad15fabcf IMPALA-372: Added CREATE/DROP/ALTER VIEW. 2014-01-08 10:51:35 -08:00
Lenni Kuff
7ac88e1fa9 IMPALA-400: Add support for SQL statement authorization
This changes adds support for SQL statement authorization in Impala. The authorization
works by updating the Catalog API to require a User + Privilege when getting Table/Db
objects (and in the future can be extended to cover columns as well).
If the user doesn't have permission to access the object, an AuthorizationException is
thrown. The authorization checks are done during analysis as new Catalog objects are
encountered.

These changes build on top of the Hive Access code which handles the actually
processing of authorization requests.  The authorization is currently based
on a "policy file" which will be stored in HDFS. This policy file is read once
on startup and then reloaded every 5 minutes. It can also be reloaded on a
specific impalad by executing a "refresh" command.

Authorization is enabled by setting:
--server_name='server1'
and then pointing the impalad to the policy file using the flag:
--authorization_policy_file=/path/to/policy/file

any authorization configuration problems will result in impalad failing to
start.
2014-01-08 10:50:56 -08:00
Lenni Kuff
2f7198292a Add support for auxiliary workloads, tests, and datasets
This change adds support for auxiliary worksloads, tests, and datasets. This is useful
to augment the regular test runs with some additional tests that do not belong in the
main Impala repo.
2014-01-08 10:50:32 -08:00
Alan Choi
9c11c0ce2d HiveServer2 clean up
This patch adds

1. use boost uuid
2. add unit test for HiveServer2 metadata operation
3. add JDBC metadata unit test
4. implement all remaining HiveServer2: GetFunctions and GetTableTypes
5. remove in-process impala server from fe-support
2014-01-08 10:48:06 -08:00
Lenni Kuff
d2e4776731 Support passing snapshot file to buildall, add script to run all tests, remove old tests 2014-01-08 10:47:59 -08:00
Alan Choi
251a8a2bf1 IMP-57: rename fe_port to beeswax_port 2014-01-08 10:47:53 -08:00
Alan Choi
be98df19c8 HiveServer2
This patch implements the HiveServer2  API.

We have tested it with Lenni's patch against the tpch workload. It has also
been tested manually against Hive's beeline with queries and metadata operations.

All of the HiveServer2 code is implemented in impala-hs2-server.cc. Beeswax
code is refactored to impala-beeswax-server.cc.

HiveServer2 has a few more metadata operations. These operations go through
impala-hs2-server to ddl-executor and then to FE. The logics are implemented in
fe/src/main/java/com/cloudera/impala/service/MetadataOp.java.

Because of the Thrift union issue, I have to modify the generated c++ file.
Therefore, all the HiveServer2 thrift generated c++ code are checked into
be/src/service/hiveserver2/. Once the thrift issue is resolved, I'll remove
these files.

Change-Id: I9a8fe5a09bf250ddc43584249bdc87b6da5a5881
2014-01-08 10:47:24 -08:00
Lenni Kuff
30dbf59ef2 Final changes to enable Python test infrastructure and tests
With this change the Python tests will now be called as part of buildall and
the corresponding Java tests have been disabled. The new tests can also be
invoked calling ./tests/run-tests.sh directly.

This includes a fix from Nong that caused wrong results for limit on non-io
manager formats.
2014-01-08 10:46:57 -08:00
Henry Robinson
35e7e2a7a9 Move thirdparty library versions to environment variables 2014-01-08 10:46:38 -08:00
Henry Robinson
5f314a4d7e Move to Postgresql for metastore 2014-01-08 10:46:34 -08:00
Henry Robinson
2f339f2ed8 Add ASL license to all public files 2014-01-08 10:46:32 -08:00
Lenni Kuff
ae97ec5fd8 Enable support for running FindBugs 2 against the FE code 2014-01-08 10:46:12 -08:00
Nong Li
126971edbb Update Impala to use CDH4.1 rc3. 2014-01-08 10:45:04 -08:00
Henry Robinson
881c88f131 IMP-177: Add Postgres connector jar to frontend dependencies 2014-01-08 10:44:36 -08:00
Lenni Kuff
b96b9640ef Add script to start multiple Impalad instances locally and update query test to support an external ImpalaD
Added a script that starts an impalad "cluster" (impalad + state store) with
each impalad running on a different port. Also updated QueryTest to enable
running against an external impalad. This enables running all the tests against
a remote cluster or a local cluster setup with the script I added.

By default we run with the in-process impalad - to enable running against a
remove impalad use the flag:

mvn test -Duse_external_impalad=true

The same host/port flags work with this, for example:

mvn test -Duse_external_impalad=true -Dimpalad=hostName -Dfe_port=21000
2014-01-08 10:44:34 -08:00
Alan Choi
f15ef994fb "mvn test" now uses impalad and beeswax api to submit query and fetch, including
insert query.

review issue: 260
2014-01-08 10:44:30 -08:00
Lenni Kuff
cef688d0fd IMP-95: Fix/recognize intermittent data load failures on jenkins
Builds now fail on data loading problems. Also a simple test fix.
2014-01-08 10:44:18 -08:00
Lenni Kuff
5af1869475 Enable running Query tests targeting an in-process or out of process (impalad) test env
This change enables running of the query tests (and potentially other tests in
the future) targeting an in-process or external process test environment. This
means that the tests can be run against a remote distributed cluster with
ImpalaD deployed - or run locally in-process.

To target a remote environment execute the tests using the following two flags:
mvn test -Dimpalad=<hostname of coordinator> -Dfe_port=21000

If these are not specified that the existing (in-process) test environment is
inferred.

The major parts of this change are:
ImpaladClientExecutor - this is a new client executor class that uses the
beeswax thrift interface to communicate with a target impalad instance.

TestUtilities - This class was updated to add support for running queries
against impalad using the Impalad client executor.

As part of this change I also split the query tests into a few separate files:
JoinQueryTest, InsertQueryTest, HBaseQueryTest, etc... This will make it easier
to pick which subset of tests you want to run. It will also help reduce our max
test log file size in the Jenkins runs.
To enable this I created a new 'BaseQueryTest' class that does much of the work
of choosing which combinations of File format, compression, batch size, etc to
run with.

Current shortcomings:
1) It would be nice for "Executor" and "ImpaladClientExecutor" to share a common
interface. None currently exists and I wasn't sure what a good one would be any
thoughts around this would be appreciated. Because of this I had to resort to
passing an "Exector" of type "Object" for the time being.

2) Beeswax API doesn't currently provide a way to specify things like the number
of execution nodes. For now we just ignore this parameter (it can be set by the
impalad instance).

3) Double and float values are formated with a larger prevision when executed
over the Beeswax interface. This causes results to be different and test failures.
A second checkin will update the in-process output to match that of the beeswax.
2012-07-10 08:08:01 -07:00
Henry Robinson
ce2ae276c1 Build changes for CDH4 upgrade 2012-06-22 16:05:03 -07:00
Alan Choi
ef10afa439 This changes the Thrift from 0.6.1 to 0.7.0. Please uninstall the old thrift and download/install Thrift 0.7.0.
Beeswax service now depends on Hive metastore;
fix buildall.sh to clean generated-source in FE;
fix .gitignore to clean generated-source in BE;
2012-06-14 18:21:08 -07:00
Lenni Kuff
d4396801f1 This enables running the FE tests in two modes - 'exhaustive' and 'reduced'.
Depending on the execution mode, we will use a different set of test vectors so
we can help control test execution time. The ideas is that for checkins the tests
are run in the 'reduced' input set mode. For nightly builds we will run the
exhaustive set of test combinations.

This is controlled with a new flag specified when running the tests:
mvn -DtestExecutionMode=exhaustive test
or
mvn -DtestExecutionMode=reduced test

Note: If the -DtestExecutionMode is not specified it will default to reduced.

As part of this change a bunch of the test files had to be updated to be
parameterized. If they are no parameterized then they will not benefit from the new
coverage that has been added.

This change currently is just for the Query Tests. I would like to extract some
of this logic and generalize it for more test suites with a future checkin.
2012-06-01 16:31:15 -07:00
Michael Ubell
7b14187bf1 Install snappy library
add create-load-data.sh
2012-05-02 07:31:10 -07:00
Henry Robinson
92673b7852 Add -noformat to buildall.sh. Fix java.library.path in pom.xml; clean up indentation 2012-04-12 16:59:52 -07:00
Marcel Kornacker
5d5333c228 Fixing problem in buildall.sh: hbase/hive-site.xml were generated prior to being wiped out as part of the clean-up phase.
Also moving around dependencies in pom.xml to work around problem w/ loading hsql.
2012-03-13 11:53:12 -07:00
Marcel Kornacker
be522a29cd Upgraded to surefire plugin version 2.12.
Temporarily disabled destruction of TestExecEnv in order to avoid crash in HdfsFsCache d'tor.
Activated backend tests in buildall.sh.
2012-03-12 18:18:09 -07:00
Marcel Kornacker
4a4a07fde7 A number of changes for the Jenkins build:
- added option to run with derby metastore, based on whether env var METASTORE_IS_DERBY is set
- emoved hardwired file locations from planner tests
- switching to linking statically against libthrift.a

Also added script rebuild.sh, which contains the build steps of buildall.sh (against impala sources).
2012-03-08 16:19:47 -08:00
Alan Choi
727ee77ec4 HBase now runs on pseudo-distributed mode with 4 region servers
code review : http://review.sf.cloudera.com/r/14695/
2012-03-08 15:07:12 -08:00
Henry Robinson
ac03a01be2 Add hsqldb dependency for front-end JDBC test 2012-02-21 15:30:24 -08:00