Commit Graph

150 Commits

Author SHA1 Message Date
Lenni Kuff
76fa3b2ded Update DDL to support 'STORED AS PARQUET' and 'STORED AS AVRO' syntax
This change updates our DDL syntax support to allow for using 'STORED AS PARQUET'
as well as 'STORED AS PARQUETFILE'. Moving forward we should prefer the new syntax,
but continue to support the old.  I made the same change for 'AVROFILE', but since
we have not yet documented the 'AVROFILE' syntax I left out support for the old syntax.

Change-Id: I10c73a71a94ee488c9ae205485777b58ab8957c9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1053
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:18 -08:00
Nong Li
c1a64d6863 Add kill-mini-llama to CDH4 branch.
This makes it easier to switch between our branches and a no-op if
for those of us staying on CDH4.

Change-Id: Ic07eb8a7ba7e48db118c06c221aabe5e124f3bfb
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1033
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:54:17 -08:00
ishaan
fcdcf1a9d8 Parallelize data loaded through Impala to speed up data loading.
Currently, we execute all the queries involved in data loading serially. This change
creates a separate .sql file for each file format, compression codec and compression
scheme combination, and executes all the files in parallel. Additionally, we now store all the
.sql files (independent of workload) in $IMPALA_HOME/data_load_files/<dataset_name>. Note
that only data loaded through Impala is parallelized, data loaded through hive and hbase
remains serial.

On our build machines, the time taken to load all the data from snapshot was on the order
of 15 minutes.

Change-Id: If8a862c43f0e75b506ca05d83eacdc05621cbbf8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/804
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:53 -08:00
Lenni Kuff
498c2529d4 Test CR: Change spacing in run-all.sh
Change-Id: I2362799213a7faca3892e38fb874bfbbd0c1718f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/803
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:50 -08:00
Lenni Kuff
8b2acf5c22 IMPALA-425: Detect read-only tables and disable INSERT/LOAD operations on these tables
With this change we now detect if a table is read-only and disable INSERT/LOAD operations
on these tables. A table is read-only if Impala does not have write permission on the HDFS
base directory of the table or any one of the partition directories (if
the table is partitioned).

Change-Id: I25515b2d0ffb7fe297359437fd937a3d6e0406a0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/713
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:37 -08:00
Alex Behm
51e914e911 Use hive-exec instead of hive-builtin because hive-builtin does not exist in CDH5 Hive.
Change-Id: I11993c7eebc9f5f07f112810d7e81d07ce157193
Reviewed-on: http://gerrit.ent.cloudera.com:8080/715
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:53:33 -08:00
Lenni Kuff
72e211ca4a Use Hive Metastore Service instead of HiveServer 1 in test infrastructure
Change-Id: I4e2ba02b2101bae95d196ab13f9453e1b3a9d7be
Reviewed-on: http://gerrit.ent.cloudera.com:8080/689
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:26 -08:00
Nong Li
4800995d44 Add execution for Hive UDFs.
Change-Id: I6a5ad96fed77e2b8a2701f21a917a8eb7a11d500
Reviewed-on: http://gerrit.ent.cloudera.com:8080/458
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:25 -08:00
Nong Li
6b9a7de02e Add symbol resolution during analysis for create function stmts.
Before this, we had to specify the entire mangled symbol. This can be quite
long and quite tedious (take a look at some of the create UDA test cases that
specify all the symbols).

This patch adds some code to convert from the user function signature to the
mangled name. This means the user can specify the unmangled name and we can
do the symbol lookup. The mangling rules are pretty convoluted but if it is
messed up, the user can always specify the full symbol.

Some other minor cleanup in:
  - JNI from FE to BE
  - UDFs/UDAs that are loaded as test data

Change-Id: I733dbf3a72cb7b06221c27e622d161bcca0d74a8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/624
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:20 -08:00
ishaan
8a43426879 Sleep after starting the hiveserver2 service to guards against it not starting on time.
Change-Id: I9a0de1cc63089cba2f9b59942ee45abc44b8662e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/643
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:53:17 -08:00
Lenni Kuff
b07a3ccfd6 Use an external Hive Metastore Service for local test runs
Using an external Hive Metastore Service for local test runs has a number of benefits.
Some of the benefits are that it helps separate the metastore logs from the impala
logs, and that it is more representative of what is on real cluster environments.
It also may help with some of the concurrency issues that we have been seeing when
running directly against the backend database since we no longer spin up an in-process
metastore server for each client connection.

The metastore is started by running "run-hive-server.sh" which is invoked as part of
"run-all.sh".

Change-Id: If60fa97aa38e4ad5cf578b9b409eeea1e0e29375
Reviewed-on: http://gerrit.ent.cloudera.com:8080/628
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:15 -08:00
Skye Wanderman-Milne
b7f83bcd73 Add support for LLVM IR UDFs.
This patch also adds a number of improvements to NativeUdfExpr. Highlights include:

* Correctly handling the lowering of AnyVal struct types (required for ABI compatibility)
* A rudimentary library cache for reusing handles produced by dlopen
* More complicated test cases

Change-Id: Iab9acdd7d7c4308e5d7ee3210f21b033fda5a195
Reviewed-on: http://gerrit.ent.cloudera.com:8080/540
Tested-by: jenkins
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:53:03 -08:00
Nong Li
e5ed8e4105 Move minicluster_xml_conf to HADOOP_CONF_DIR.
The current location gets deleted if you rebuild, making you have to restart mini dfs.

Change-Id: If71b144534255fa8df2bfa187c0814ffdf28463e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/550
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:03 -08:00
Lenni Kuff
79cdeac3d6 Consolidate test cluster under IMPALA_HOME/cluster_logs + store logs during data loading
Change-Id: I8f6239e4ccb0515c85bf80193a475788fb18dedb
Reviewed-on: http://gerrit.ent.cloudera.com:8080/518
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-01-08 10:52:56 -08:00
Skye Wanderman-Milne
fd99db0300 First pass at UdfExpr.
Change-Id: I517bf56541749b5c2459554821c7bf838239fdf0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/439
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:52:50 -08:00
Henry Robinson
a46276325c IMPALA-415: Don't delete hidden files in the root directory for INSERT
OVERWRITE

INSERT OVERWRITE into an unpartitioned table is supposed to remove all
data files from the root. This should not include hidden files or
directories. This patch excludes hidden files from deletion, and adds a
test case.

Partition directories are still removed in their entirety: the cost of
statting a large number of files and directories rather than issuing a
single "rm -rf" outweighs the benefits of preserving hidden files for
now.

Hive does not preserve hidden files in either configuration.

Change-Id: Ia73e55e011c26c88f14745075210cf359764e3c1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/418
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:50 -08:00
Lenni Kuff
a1f2f72f49 Add Impala DDL support for creation of AVRO tables + support for CREATE/ALTER SERDEPROPERTIES
This change adds Impala DDL support for creation of AVRO tables.
Additionally, it add Impala support for CREATE and ALTER SERDEPROPERTIES
which are used when creating Avro backed tables. This syntax is not
exactly the same as the Hive support since it introduces a new
fileformat (AVROFILE) that implies the needed Serialization library,
input format, and output format.

Change-Id: I5047e419198a89599e9d014fdedfee1a20437a7d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/464
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:48 -08:00
Lenni Kuff
d6d1557fe7 Capture cluster logs with each test run / don't use mvn for starting cluster services
Change-Id: I708b547e49d035c5f029ea86119cc844ccbc5643
Reviewed-on: http://gerrit.ent.cloudera.com:8080/404
Tested-by: jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:52:40 -08:00
Lenni Kuff
9f54242941 Add retry loop around split-hbase to fix build breaks
Change-Id: I539407ce05d705b6b4e88d0791fc4ec236c79c80
Reviewed-on: http://gerrit.ent.cloudera.com:8080/399
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:39 -08:00
ishaan
6735e3983f Fix build failure because of hbase data loading.
Change-Id: I796656332c3733a1ffdc338d206009efa6c451ac
Reviewed-on: http://gerrit.ent.cloudera.com:8080/360
Tested-by: jenkins
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:37 -08:00
ishaan
53cd9eadab Treat HBase as a file format for functional tests
Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922
Reviewed-on: http://gerrit.ent.cloudera.com:8080/102
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:36 -08:00
Lenni Kuff
f264db1647 Automatically force load partitioned tables to ensure valid partition metadata
Change-Id: Ief91102f30d4669503d473299256a74a50d8fe3c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/261
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:17 -08:00
Lenni Kuff
17ed6ea177 Partition TPC-DS dataset and add additional TPC-DS workload queries
Change-Id: I5410e68fdfd818a8287e0974332c3e36c344c300
Reviewed-on: http://gerrit.ent.cloudera.com:8080/99
Tested-by: jenkins <kitchen-build@cloudera.com>
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
2014-01-08 10:52:13 -08:00
Skye Wanderman-Milne
6e7406df8b IMPALA-502: Impala does not return NULL for case where table has extra string column and data does not (it returns an empty string)
Change-Id: I0cfe5ce5fc279d46610a3cc191a501ccbc335296
Reviewed-on: http://gerrit.ent.cloudera.com:8080/127
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:52:02 -08:00
Skye Wanderman-Milne
3fecdeb793 IMPALA-441: support default values for Avro tables 2014-01-08 10:51:39 -08:00
Skye Wanderman-Milne
c8a8308ece Avro schema resolution (minus default values) 2014-01-08 10:51:26 -08:00
ishaan
e7c6d57f9c IMP-773: Add better logging/error detection to start-impala-cluster.py 2014-01-08 10:51:25 -08:00
Alan Choi
254ee6ef89 IMPALA-434 Support binary hbase encoding 2014-01-08 10:51:18 -08:00
Lenni Kuff
abdfae5b24 Update DESCRIBE FORMATTED results to match the Hive HS2 output 2014-01-08 10:51:14 -08:00
Lenni Kuff
7ac88e1fa9 IMPALA-400: Add support for SQL statement authorization
This changes adds support for SQL statement authorization in Impala. The authorization
works by updating the Catalog API to require a User + Privilege when getting Table/Db
objects (and in the future can be extended to cover columns as well).
If the user doesn't have permission to access the object, an AuthorizationException is
thrown. The authorization checks are done during analysis as new Catalog objects are
encountered.

These changes build on top of the Hive Access code which handles the actually
processing of authorization requests.  The authorization is currently based
on a "policy file" which will be stored in HDFS. This policy file is read once
on startup and then reloaded every 5 minutes. It can also be reloaded on a
specific impalad by executing a "refresh" command.

Authorization is enabled by setting:
--server_name='server1'
and then pointing the impalad to the policy file using the flag:
--authorization_policy_file=/path/to/policy/file

any authorization configuration problems will result in impalad failing to
start.
2014-01-08 10:50:56 -08:00
Alan Choi
2bdba77f61 Perform HBase deterministic region assigment and enable HBase scan range location test in the planner test 2014-01-08 10:50:54 -08:00
Skye Wanderman-Milne
1ab189c789 Fix build 2014-01-08 10:50:52 -08:00
Skye Wanderman-Milne
c8fd4f8016 IMPALA-362: impalad hangs when read sequence file without contents 2014-01-08 10:50:49 -08:00
Alan Choi
bd59bbb07a IMPALA-300/356
Always reload region server info.
Clear keyRange.start/stopkey before setting it in setKeyRangeStart/End.
Split HBase tables into multiple regions.
I've to disable HBase scanrangelocations planner test because region assigment
is non-deterministic. I'll have a follow up patch to address that.
2014-01-08 10:50:48 -08:00
Lenni Kuff
2f7198292a Add support for auxiliary workloads, tests, and datasets
This change adds support for auxiliary worksloads, tests, and datasets. This is useful
to augment the regular test runs with some additional tests that do not belong in the
main Impala repo.
2014-01-08 10:50:32 -08:00
ishaan
f026354721 IMP-912 Make force killing an option. Update run-all-tests to pre-emptively force kill. 2014-01-08 10:50:29 -08:00
Skye Wanderman-Milne
223b1a8e47 IMPALA-293: Impala is unable to query RCFile tables which describe less columns than the file's header. 2014-01-08 10:50:17 -08:00
Skye Wanderman-Milne
cc6007cf9e IMPALA-262: Querying text/lzo table that is not indexed causes an impalad segfault 2014-01-08 10:49:52 -08:00
Nong Li
563cbfa3a8 Enable parquet testing 2014-01-08 10:49:40 -08:00
Lenni Kuff
cba9cd00dd Fix full data load build break due to constructing incorrect HDFS paths 2014-01-08 10:49:34 -08:00
Lenni Kuff
558d5ce755 Data loading: Exec DDL statements via Impala and don't recreate metadata if it exists 2014-01-08 10:49:28 -08:00
Nong Li
ebab23841a Add back pressure in hdfs-scan-node to prevent excessive buffer queueing. 2014-01-08 10:49:25 -08:00
Lenni Kuff
36e9fe1c1a Run compute table stats statements using Hive CLI
This works around a problem with computing table stats via the Hive Meta Store client
API. When executing these stements via the MetaStoreClient, all tables were getting a
num_rows=0 value returned from the ANALYZE TABLE query.
2014-01-08 10:49:19 -08:00
Elliott Clark
0e0c02b6bd Add the ability to Select into HBase table.
* Changed frontend analysis for HBase tables
* Changed Thrift messages to allow HBase as a sink type.
* JNI Wrapper around htable
* Create hbase-table-sink
* Create hbase-table-writer
* Static init lots of JNI related code for HBase.
* Cleaned up some cpplint issues.
* Changed junit analysis tests
* Create a new HBase test table.
* Added functional tests for HBase inserts.
2014-01-08 10:49:06 -08:00
Lenni Kuff
993da8fcba Fix bug in how insert tables are generated 2014-01-08 10:49:05 -08:00
Lenni Kuff
5f81becd84 Create tables used by insert tests in a supported insert format 2014-01-08 10:49:00 -08:00
Lenni Kuff
831ee529be Fixed data loading bugs, moved most tables out of load-dependent-tables 2014-01-08 10:48:56 -08:00
Lenni Kuff
7584312540 IMPALA-167: Impala should gracefully handle unsupported Hive table types 2014-01-08 10:48:56 -08:00
Skye Wanderman-Milne
811d5dd00b Create Avro schema directory in test warehouse 2014-01-08 10:48:50 -08:00
Nong Li
0df9476be1 Parquet data loading. 2014-01-08 10:48:48 -08:00