Commit Graph

176 Commits

Author SHA1 Message Date
casey
2351266d0e Replace single process mini-dfs with multiple processes
This should allow individual service components, such as a single nodemanager,
to be shutdown for failure testing. The mini-cluster bundled with hadoop is a
single process that does not expose the ability to control individual roles.
Now each role can be controlled and configured independently of the others.

Change-Id: Ic1d42e024226c6867e79916464d184fce886d783
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1432
Tested-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2297
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-04-23 18:24:05 -07:00
Nong Li
87295a4e06 Decimal implementation.
This patch implements decimal support for text based formats.

Change-Id: I8e2c9e512ed149fe965216a72cb21fffd4f18e75
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1669
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2238
Tested-by: jenkins
2014-04-14 21:07:32 -07:00
Lenni Kuff
aa0b7a35f5 IMPALA-880: COMPUTE STATS should update partitions in batches
When updating partition metadata as part of COMPUTE STATS we would previously
attempt to update all partitions at once. This could lead to HMS socket timeouts
and also could run into issues if there were > 32K partitions.

In this change we now update the partitions in batches, with a max size of 500
partitions per batch. We also compare whether the row count has changed and only
update partitions that have been modified.

Change-Id: If7bfcc30f86fc2fdd79855b981067ac29a47b5e1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1913
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1918
2014-03-14 19:20:12 -07:00
ishaan
9e043e862c Fix run-hbase.sh to correctly pick up the classpath.
We run wat-for-hbase-master.py after starting hbase to account for a race between
the master and region server. This script has not been working for some time. It caused
no ill effects sinc the said race was absent. However, the race has manifested itself
again, so the script needs to be fixed. Setting the correct classpath does so.

Change-Id: I783a7473cfd24a9cb66711f5428f7052ceb96282
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1756
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-03-05 01:04:56 -08:00
ishaan
00724a47da Prefix the path to the local core-site to the classpath used by minillama
With a recent upstream change, a core-site.xml was introduced in a YARN test jar pulled in
by thirdparty. This causes MiniLlama to ignore options set in
fe/src/test/resources/core-site.xml. The problem manifests itself with the MiniDfsCluster
starting on an arbitary port, but it would have also caused a lot of tests to fail as none
of the compression codecs are pulled in. This change prepends the classpath used by
minillama with the path to the internal core-site.

Change-Id: Iee267fe12e02301baec059a1f7469288c038d6fa
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1739
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-03-04 09:59:50 -08:00
Lenni Kuff
bf16b5cd0d IMPALA-749: Fetch partitions in batches, rather than all at once.
This updates how Impala fetches partition metadata from the Hive Metastore to fetch
partitions in batches, rather than all at once. This helps reduce the load on the
HMS and also lets Impala scale to above 32K partitions. The downside is that it
may require additional RPCs to get all the partitions.

This is done by first querying the metastore to get all the partition names that
exist, then splitting the list of names into seperate batches to get the actual
partition metadata.

Impala uses a default size of 1000 partitions per batch, but it can be configured
by setting the 'hive.metastore.batch.retrieve.table.partition.max' parameter
in the hive-site.xml config file.

Change-Id: Ide0ec30ef8a9e00f79c26551aa8e5e7814c73034
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1662
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1698
2014-02-28 22:30:45 -08:00
Alex Behm
9cabee4a71 Wait for the Metastore to come up before starting HiveServer2.
Change-Id: Ic8e29efe63f6745e1ff44248657cbd7882bb16d9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1626
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1670
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-02-25 21:05:33 -08:00
Alex Behm
8223e1e44b Avoid Hive replication bug (CDH-17414) by 'warming up' HiveServer2 after it starts.
The purpose of this patch is to avoid CDH-17414 which causes data files loaded
with Hive to incorrectly have a replication factor of 1. When using beeline
this problem only appears to occur immediately after creating the first HBase table
since starting HiveServer2, i.e., subsequent loads seem to function correctly.
This patch add a new script that creates an external HBase table in Hive to
'warm up' HiveServer2 immediately after it is started.
Subsequent loads should assign a correct replication factor.

Change-Id: Ic54c9401b67b748a8848d19f82b8e7df9535e845
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1640
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-02-25 17:33:53 -08:00
Lenni Kuff
b4f5c1edcf Enable lazy loading of table metadata for the CatalogService/Impalad
This change adds support for lazy loading of table metadata to the
CatalogService/Impalad. The way this works is that the CatalogService initially
sends out an update with only the databases and table names (wrapped as
IncompleteTables). When an Impalad encounters one of these tables, it will contact
the catalog service to get the metadata, possibly triggering a metadata load if the
catalog server has not yet loaded this table.

With these changes the catalog server starts up in just seconds, even for large
metastores since it only needs to call into the metastore to get the list of tables
and databases. The performance of "invalidate metadata" also improves for the same reason.

I also picked up the catalog cleanup patch I had to make the APIs a bit more consistent and
remove the need for using a LoadingCache for databases.

This also fixes up the FE tests to run in a more realistic fashion. The FE tests now run
against catalog object recieved from the catalog server. This actually turned up some bugs
in our previous test configuration where we were not running with the correct column stats
(we were always running with avgSerializedSize = slotSize).  This changed some plans so the
planner tests needed to be updated.

Still TODO:
This does not include the changes to perform background metadata loading. I will send
that out as a separate patch on top of this.

Change-Id: Ied16f8a7f3a3393e89d6bfea78f0ba708d0ddd0e

Saving changes

Change-Id: I48c34408826b7396004177f5fc61a9523e664acc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1328
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1338
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-21 21:43:29 -08:00
Nong Li
04b501d3a1 [CDH5] Collect metadata for cached blocks.
Change-Id: I81026de2f9a08553dc15e07090b8297120aa7462
(cherry picked from commit 69414f67b20016e49b739a46d6e2b4b57e1d1a3c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1252
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-15 15:12:20 -08:00
Nong Li
53d7bbb97a [CDH5] Impala changes for updated thirdparty components.
Changes include:
  - version changes in impala-config
  - version changes in various loading scripts
  - hbase jars are no longer in hive/lib
  - mini-llama script changes
  - updates due to sentry api changes
  - JDBC tests disabled
  - unsupported types tests disabled.

Change-Id: If8cf1b7ad8e22aa4d23094b9a4b1047f7e9d93ee
2014-01-15 15:12:13 -08:00
Alex Behm
c70905628b Using MiniLlama's --write-hdfs-conf to dump the MiniDfs conf for our test setup.
Change-Id: I238f375bda4ef95fa3d5ae9a29bd1dfc2aa3e401
2014-01-15 15:12:06 -08:00
Alex Behm
760750af27 Enforcing reserved memory resources via mem limits.
Fixed codepath with rm disabled. Set enable_rm to false by default.

Change-Id: I3bf2d0525d91243ec3c0ea048b0c03680befcda2

Conflicts:
	be/src/runtime/runtime-state.cc
2014-01-15 15:12:05 -08:00
Alex Behm
dc7b398bd3 Impala reserves resources from YARN via LLama.
Impala reserves resources from YARN via Llama and handles resources
preemptions by cancelling affected queries. Adds the Impala Resource
Broker for interacting with Llama. Refactors scheduler and coordinator
to move fragment-to-host assignment logic into scheduler. Local test
setup uses MiniLLama.

Change-Id: Ic7b0fe43de52d30f4207b4e65cce7e6a294e54e1
2014-01-15 15:12:04 -08:00
Alex Behm
fc6ecd39e5 [CDH5] Fixed issue with data loading using JDK7 and Hive (HIVE-5068). Fixed missing dependency in testdata for HBase region splitting.
Change-Id: Iab002f652bc1b1c2f8ce60b7505f592eedcb9cc0
2014-01-15 15:11:32 -08:00
Alex Behm
60003ad211 [CDH5] Changes to make Impala work on CDH5. Mostly fixing up dependency versions. Minor code changes to address HBase API changes.
Change-Id: Icbbeb13eefa29e38286328d45600117a383cd106
2014-01-15 15:11:23 -08:00
Skye Wanderman-Milne
561da008c7 IMPALA-729: fix resource management in Parquet scanner for multiple row groups
We weren't attaching resources to the row batch when starting a new
row group, so it was possible for string data to be overwritten. This
patch removes CloseStreams() and merges its functionality with
AttachCompletedResources() so it's not possible to destroy streams
without transferring the resources first. It also merges and removes
ScannerContext::Close().

Also adds test cases for IMPALA-720.

Change-Id: Ia8f40c7d39d8702716f1d337fe797e2696bd0fcb
2014-01-08 10:56:26 -08:00
Lenni Kuff
fbe79fc47b Use separate log files for each of our mini-cluster services
Also adds a bit more logging on which individual services are starting.

Change-Id: I53f12e1825fbf738e2fb8325874c3126e55f3f44
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1147
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:54:37 -08:00
Alex Behm
c6397ca1e3 Revert "Revert to FROM-clause order if any table is lacking stats."
This reverts commit 7e84cbe3bab9bf30a57ac58d9ef525ebc10a7b7a.

Change-Id: I89d55ca2bcb8eb6eddc244d3e7b005074d04c26a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1104
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:29 -08:00
Alex Behm
df0b28d163 Revert to FROM-clause order if any table is lacking stats.
Change-Id: I7d09c0f393e2bfeefa386845fc6bbba4ab6c8812
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1095
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:28 -08:00
Skye Wanderman-Milne
9e17042185 Allow zero bit width dict/RLE decoders.
This allows us to read single-value dictionary-encoded columns
generated by parquet-mr.

Change-Id: I80903d910d0cc3a3e4ebf02e34212d868e94feb4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1098
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:27 -08:00
Skye Wanderman-Milne
de531e15bd IMPALA-694: Allow Impala to read files produced by parquet-mr version <= 1.2.8
parquet-mr had a bug where it didn't include the dictionary page's
header in the total column size. We now compensate for this by
detecting these files and padding the scan range length. This required
changing how the scanner detects when it's finished: it now counts the
number of rows rather than checking eosr (since the scan range may be
longer than the column).

Change-Id: Id9933808b965003c0c3b3aa78c32fe29a0c4bcbe
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1097
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:27 -08:00
Lenni Kuff
e63cc59a94 Add partitioned tpcds planner tests (SQL-92 style joins)
Adds the TPCDS queries as planner tests and fixes a few small issues
with the Planner test file parser. This adds the TPC-DS queries using
SQL-92 style joins that have a hand optimized (although
not perfect) join order.

Change-Id: I2d81e66af740b2d826b8ebd0c5ba8553b5faf0a2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1019
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:26 -08:00
Skye Wanderman-Milne
acdc792355 IMPALA-695: Use the local path of Hive UDF jars in the FE.
The FE was creating class loaders with the HDFS locations of Hive UDF
libs, rather than the local locations created by the BE. Our tests
still passed since we only used UDFs already on the classpath
(e.g. Hive builtins).

Change-Id: Idbe9c98ad6adb84b70cb44efbf9ad0afc53366ca
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1081
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:25 -08:00
Skye Wanderman-Milne
b54d16dabd IMPALA-679: Append hash of HDFS path to filename in CopyHdfsFile() to avoid collisions.
Change-Id: Ia84fa81fe043a9604248d66ed963ef3f91b0601e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1018
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:22 -08:00
Lenni Kuff
0bae3978c9 Update compute-stats.py to execute using Impala
Updates our compute stats script to execute using Impala. This allows us
to easily compute stats on all tables in a database or all tables in the
metastore.
The updated stats caused one of the TPCH plans to change so this also
updates the TPCH planner test results.

Change-Id: I17e5dcd1036a35e40eb4eb2c8e4a20702db9049c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1024
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:18 -08:00
Lenni Kuff
76fa3b2ded Update DDL to support 'STORED AS PARQUET' and 'STORED AS AVRO' syntax
This change updates our DDL syntax support to allow for using 'STORED AS PARQUET'
as well as 'STORED AS PARQUETFILE'. Moving forward we should prefer the new syntax,
but continue to support the old.  I made the same change for 'AVROFILE', but since
we have not yet documented the 'AVROFILE' syntax I left out support for the old syntax.

Change-Id: I10c73a71a94ee488c9ae205485777b58ab8957c9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1053
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:18 -08:00
Nong Li
c1a64d6863 Add kill-mini-llama to CDH4 branch.
This makes it easier to switch between our branches and a no-op if
for those of us staying on CDH4.

Change-Id: Ic07eb8a7ba7e48db118c06c221aabe5e124f3bfb
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1033
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:54:17 -08:00
ishaan
fcdcf1a9d8 Parallelize data loaded through Impala to speed up data loading.
Currently, we execute all the queries involved in data loading serially. This change
creates a separate .sql file for each file format, compression codec and compression
scheme combination, and executes all the files in parallel. Additionally, we now store all the
.sql files (independent of workload) in $IMPALA_HOME/data_load_files/<dataset_name>. Note
that only data loaded through Impala is parallelized, data loaded through hive and hbase
remains serial.

On our build machines, the time taken to load all the data from snapshot was on the order
of 15 minutes.

Change-Id: If8a862c43f0e75b506ca05d83eacdc05621cbbf8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/804
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:53 -08:00
Lenni Kuff
498c2529d4 Test CR: Change spacing in run-all.sh
Change-Id: I2362799213a7faca3892e38fb874bfbbd0c1718f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/803
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:50 -08:00
Lenni Kuff
8b2acf5c22 IMPALA-425: Detect read-only tables and disable INSERT/LOAD operations on these tables
With this change we now detect if a table is read-only and disable INSERT/LOAD operations
on these tables. A table is read-only if Impala does not have write permission on the HDFS
base directory of the table or any one of the partition directories (if
the table is partitioned).

Change-Id: I25515b2d0ffb7fe297359437fd937a3d6e0406a0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/713
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:37 -08:00
Alex Behm
51e914e911 Use hive-exec instead of hive-builtin because hive-builtin does not exist in CDH5 Hive.
Change-Id: I11993c7eebc9f5f07f112810d7e81d07ce157193
Reviewed-on: http://gerrit.ent.cloudera.com:8080/715
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:53:33 -08:00
Lenni Kuff
72e211ca4a Use Hive Metastore Service instead of HiveServer 1 in test infrastructure
Change-Id: I4e2ba02b2101bae95d196ab13f9453e1b3a9d7be
Reviewed-on: http://gerrit.ent.cloudera.com:8080/689
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:26 -08:00
Nong Li
4800995d44 Add execution for Hive UDFs.
Change-Id: I6a5ad96fed77e2b8a2701f21a917a8eb7a11d500
Reviewed-on: http://gerrit.ent.cloudera.com:8080/458
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:25 -08:00
Nong Li
6b9a7de02e Add symbol resolution during analysis for create function stmts.
Before this, we had to specify the entire mangled symbol. This can be quite
long and quite tedious (take a look at some of the create UDA test cases that
specify all the symbols).

This patch adds some code to convert from the user function signature to the
mangled name. This means the user can specify the unmangled name and we can
do the symbol lookup. The mangling rules are pretty convoluted but if it is
messed up, the user can always specify the full symbol.

Some other minor cleanup in:
  - JNI from FE to BE
  - UDFs/UDAs that are loaded as test data

Change-Id: I733dbf3a72cb7b06221c27e622d161bcca0d74a8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/624
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:20 -08:00
ishaan
8a43426879 Sleep after starting the hiveserver2 service to guards against it not starting on time.
Change-Id: I9a0de1cc63089cba2f9b59942ee45abc44b8662e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/643
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:53:17 -08:00
Lenni Kuff
b07a3ccfd6 Use an external Hive Metastore Service for local test runs
Using an external Hive Metastore Service for local test runs has a number of benefits.
Some of the benefits are that it helps separate the metastore logs from the impala
logs, and that it is more representative of what is on real cluster environments.
It also may help with some of the concurrency issues that we have been seeing when
running directly against the backend database since we no longer spin up an in-process
metastore server for each client connection.

The metastore is started by running "run-hive-server.sh" which is invoked as part of
"run-all.sh".

Change-Id: If60fa97aa38e4ad5cf578b9b409eeea1e0e29375
Reviewed-on: http://gerrit.ent.cloudera.com:8080/628
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:15 -08:00
Skye Wanderman-Milne
b7f83bcd73 Add support for LLVM IR UDFs.
This patch also adds a number of improvements to NativeUdfExpr. Highlights include:

* Correctly handling the lowering of AnyVal struct types (required for ABI compatibility)
* A rudimentary library cache for reusing handles produced by dlopen
* More complicated test cases

Change-Id: Iab9acdd7d7c4308e5d7ee3210f21b033fda5a195
Reviewed-on: http://gerrit.ent.cloudera.com:8080/540
Tested-by: jenkins
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:53:03 -08:00
Nong Li
e5ed8e4105 Move minicluster_xml_conf to HADOOP_CONF_DIR.
The current location gets deleted if you rebuild, making you have to restart mini dfs.

Change-Id: If71b144534255fa8df2bfa187c0814ffdf28463e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/550
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:03 -08:00
Lenni Kuff
79cdeac3d6 Consolidate test cluster under IMPALA_HOME/cluster_logs + store logs during data loading
Change-Id: I8f6239e4ccb0515c85bf80193a475788fb18dedb
Reviewed-on: http://gerrit.ent.cloudera.com:8080/518
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-01-08 10:52:56 -08:00
Skye Wanderman-Milne
fd99db0300 First pass at UdfExpr.
Change-Id: I517bf56541749b5c2459554821c7bf838239fdf0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/439
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:52:50 -08:00
Henry Robinson
a46276325c IMPALA-415: Don't delete hidden files in the root directory for INSERT
OVERWRITE

INSERT OVERWRITE into an unpartitioned table is supposed to remove all
data files from the root. This should not include hidden files or
directories. This patch excludes hidden files from deletion, and adds a
test case.

Partition directories are still removed in their entirety: the cost of
statting a large number of files and directories rather than issuing a
single "rm -rf" outweighs the benefits of preserving hidden files for
now.

Hive does not preserve hidden files in either configuration.

Change-Id: Ia73e55e011c26c88f14745075210cf359764e3c1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/418
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:50 -08:00
Lenni Kuff
a1f2f72f49 Add Impala DDL support for creation of AVRO tables + support for CREATE/ALTER SERDEPROPERTIES
This change adds Impala DDL support for creation of AVRO tables.
Additionally, it add Impala support for CREATE and ALTER SERDEPROPERTIES
which are used when creating Avro backed tables. This syntax is not
exactly the same as the Hive support since it introduces a new
fileformat (AVROFILE) that implies the needed Serialization library,
input format, and output format.

Change-Id: I5047e419198a89599e9d014fdedfee1a20437a7d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/464
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:48 -08:00
Lenni Kuff
d6d1557fe7 Capture cluster logs with each test run / don't use mvn for starting cluster services
Change-Id: I708b547e49d035c5f029ea86119cc844ccbc5643
Reviewed-on: http://gerrit.ent.cloudera.com:8080/404
Tested-by: jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:52:40 -08:00
Lenni Kuff
9f54242941 Add retry loop around split-hbase to fix build breaks
Change-Id: I539407ce05d705b6b4e88d0791fc4ec236c79c80
Reviewed-on: http://gerrit.ent.cloudera.com:8080/399
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:39 -08:00
ishaan
6735e3983f Fix build failure because of hbase data loading.
Change-Id: I796656332c3733a1ffdc338d206009efa6c451ac
Reviewed-on: http://gerrit.ent.cloudera.com:8080/360
Tested-by: jenkins
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:37 -08:00
ishaan
53cd9eadab Treat HBase as a file format for functional tests
Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922
Reviewed-on: http://gerrit.ent.cloudera.com:8080/102
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:36 -08:00
Lenni Kuff
f264db1647 Automatically force load partitioned tables to ensure valid partition metadata
Change-Id: Ief91102f30d4669503d473299256a74a50d8fe3c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/261
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:17 -08:00
Lenni Kuff
17ed6ea177 Partition TPC-DS dataset and add additional TPC-DS workload queries
Change-Id: I5410e68fdfd818a8287e0974332c3e36c344c300
Reviewed-on: http://gerrit.ent.cloudera.com:8080/99
Tested-by: jenkins <kitchen-build@cloudera.com>
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
2014-01-08 10:52:13 -08:00
Skye Wanderman-Milne
6e7406df8b IMPALA-502: Impala does not return NULL for case where table has extra string column and data does not (it returns an empty string)
Change-Id: I0cfe5ce5fc279d46610a3cc191a501ccbc335296
Reviewed-on: http://gerrit.ent.cloudera.com:8080/127
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:52:02 -08:00