This change whitelists the supported filesystems which can be set
as Default FS for Impala to run on.
This patch configures Impala to use S3 as the default filesystem, rather
than a secondary filesystem as before.
Change-Id: I2f45bef6c94ece634045acb906d12591587ccfed
Reviewed-on: http://gerrit.cloudera.org:8080/1121
Reviewed-by: anujphadke <aphadke@cloudera.com>
Tested-by: Internal Jenkins
If a stale snapshot is detected, the full data load proceeds even
if the option to skip data load was set. A check is added to fail
immediately if this happens for isilon or s3 because the full data
load will not work on these filesystems currently.
Change-Id: I98faaa4a66e5715bd86289a56d199599b9011f52
Reviewed-on: http://gerrit.cloudera.org:8080/2811
Reviewed-by: Harrison Sheinblatt <hs7@hotmail.com>
Tested-by: Internal Jenkins
Before this fix our "Waiting for something to happen" print
output would be buffered and dumped all at once when the
event we were waiting for succeeded or we hit a timeout.
After this fix the output of "print" is displayed on
the console imemdiately, as was originally intended.
Change-Id: Icf341e81d0d459504918ae7c9e88918fe5e16c59
Reviewed-on: http://gerrit.cloudera.org:8080/2810
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
I. Start HBase per directions
1. https://hbase.apache.org/book.html#_configuration_files mentions a
'regionservers' file that points to a list of hosts on which to start
HBase RegionServers. When HBase starts in our mini-cluster there are
messages printed like this:
cat: /home/mikeb/Impala/fe/src/test/resources/regionservers: No such file or directory
The presence of this file now starts a single RegionServer and takes the
place of RegionServer 1 in the "additional region servers" startup, a
separate call.
2. The additional RegionServers are started but now we only start 2 from
index 2. See https://hbase.apache.org/book.html#quickstart_pseudo
There are still 3 total RegionServers using the same ports as before. We
are simply configuring our settings as directed in the documentation.
There were mentions in testdata/bin/run-hbase.sh of a "hbase race". One
possible such bug is https://issues.apache.org/jira/browse/HBASE-5780
which has been fixed for a while. I've removed the check to wait for
that Master, though I have not removed the Python script that does the
waiting. We could remove that later after we let this patch bake.
Also, https://issues.apache.org/jira/browse/HBASE-4467 has been marked
"not a problem", so I've removed references to that.
II. Implement HBase start retry
If starting either HBase Master or additional RegionServers fails, kill
all of HBase and try again. Do this for some number of attempts.
In order to keep errexit ("set -e") happy, we expect the possibility of
some of the startup attempts failing. We use control flow in those
cases. In the last case, errexit can fail on our behalf.
There is some code duplication here, but because Bash can't give us a
stack trace on failure, and only a line number, I chose not to use
functions to handle reuse. We don't really have functions anywhere else
at the moment, either.
Testing:
It's pretty difficult to try to trigger a real "HBase fails to start"
situation. I tested my changes by faking HBase failures, both when
starting up the Master and first RegionServer, and also starting
subsequent RegionServers.
Multiple private builds have passed.
Change-Id: Ib1d055a8a9098ce24e2f31b969501b6e090eab19
Reviewed-on: http://gerrit.cloudera.org:8080/2804
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
Previously Kudu would only be started when the test configuration was
the standard mini-cluster. That led to failures during data loading when
testing without the mini-cluster (ex: local file system). Kudu doesn't
require any other services so now it'll be started for all test
environments.
Change-Id: I92643ca6ef1acdbf4d4cd2fa5faf9ac97a3f0865
Reviewed-on: http://gerrit.cloudera.org:8080/2690
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This failure happens on filesystems other than HDFS because as a
part of IMPALA-2466, the $FILESYSTEM_PREFIX was not added to the
new directories that the patch tries to create in create-load-data.
Change-Id: I8de74db93893c5273ccc9c687f608959628f5004
Reviewed-on: http://gerrit.cloudera.org:8080/2644
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
The 20 lines we dump currently are often not enough to
diagnose a failure quickly. Increasing to 50 lines.
Printing 50 lines is also consistent with our run-step
script which also prints 50 lines.
Change-Id: I353a2030be6fad1cd63879b4717e237344f85c73
Reviewed-on: http://gerrit.cloudera.org:8080/2632
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
All logs, test results and SQL files generated during data
loading and testing are now consolidated under a single new
directory $IMPALA_HOME/logs. The goal is to simplify archiving
in Jenkins runs and debugging.
The new structure is as follows:
$IMPALA_HOME/logs/cluster
- logs of Hadoop components and Impala
$IMPALA_HOME/logs/data_loading
- logs and SQL files produced in data loading
$IMPALA_HOME/logs/fe_tests
- logs and test output of Frontend unit tests
$IMPALA_HOME/logs/be_tests
- logs and test output of Backend unit tests
$IMPALA_HOME/logs/ee_tests
- logs and test output of end-to-end tests
$IMPALA_HOME/logs/custom_cluster_tests
- logs and test output of custom cluster tests
I tested this change with a full data load which
was successful.
Change-Id: Ief1f58f3320ec39d31b3c6bc6ef87f58ff7dfdfa
Reviewed-on: http://gerrit.cloudera.org:8080/2456
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
These tests functionally test whether the following type of files
are able to be scanned properly:
1) Add a parquet file with multiple blocks such that each node has to
scan multiple blocks.
2) Add a parquet file with multiple blocks but only one row group
that spans the entire file. Only one scan range should do any work
in this case.
Change-Id: I4faccd9ce3fad42402652c8f17d4e7aa3d593368
Reviewed-on: http://gerrit.cloudera.org:8080/1500
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
This is for review purposes only. This patch will be merged with David's
big merge patch.
Changes:
1) Make Kudu compilation dependent on the OS since not all OSs support
Kudu.
2) Only run Kudu related tests when Kudu is supported (see #1).
3) Look for Kudu locally, but in a different location. To use a local
build of Kudu, set KUDU_BUILD_DIR to the path Kudu was built in and
set KUDU_CLIENT_DIR to the path KUDU was installed in.
Example:
git clone https://github.com/cloudera/kudu.git
...build 3rd party etc...
mkdir -p $KUDU_BUILD_DIR
cd $KUDU_BUILD_DIR
cmake <path to Kudu source dir>
make
DESTDIR=$KUDU_CLIENT_DIR make install
4) Look for Kudu in the toolchain if not using a local Kudu build.
5) Add Kudu service startup scripts. The Kudu in the toolchain is
actually a parcel that has been renamed (the contents were not
modified in any way), that mean the Kudu service binaries are there.
Those binaries are now used to run the Kudu service.
Change-Id: I3db88cbd27f2ea2394f011bc8d1face37411ed58
This merges the 'feature/kudu' branch with cdh5-trunk as of commit:
055500cc753f87f6d1c70627321fcc825044e183
This patch is not a pure merge patch in the sense that goes beyond conflict
resolution to also address reviews to the 'feature/kudu' branch as a whole.
The review items and their resolution can be inspected at:
http://gerrit.cloudera.org:8080/#/c/1403/
Change-Id: I6dd4270cd17a4f5c02811c343726db3504275a92
Previously, we tried to dynamically name the metastore db. With the introduction of
metatsore snapshots, this is no longer necessary and may cause naming ambiguity if the
Impala repository has a non-standard directory structure.
This patch use a constant name - impala_hive - defined as an environment variable in
impala-config.
Change-Id: Iadc59db8c538113171c9c2b8cea3ef3f6b3bd4fc
Reviewed-on: http://gerrit.cloudera.org:8080/517
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This patch is required for updating thirdparty.
Sentry does not ship with the Postgres JDBC driver anymore,
so we need to point it to ours in thirdparty. Sentry picks
up JARs from the HADOOP_CLASSPATH and not the CLASSPATH,
so this patch adds the JDBC driver there in run-sentry-service.sh.
Change-Id: Iee950dfcd2839b4ca0fc827a45da2a9386c4404d
Reviewed-on: http://gerrit.cloudera.org:8080/1991
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Use psql -q to suppress verbose output during metastore creation.
Also use -q instead of redirection everywhere for consistency.
Change-Id: I539da86a50d18546474b2cfdc848f992745a7875
Reviewed-on: http://gerrit.cloudera.org:8080/1884
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
In commit 960808 I forgot to update the data-loading script for the
conversion of a shell script to a python script. It turns out there were
a couple of other little problems too. I checked manually that the data
was loaded after these changes.
Change-Id: Id81fc423348515ab446835868025cb839c77f52c
Reviewed-on: http://gerrit.cloudera.org:8080/1851
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
The major changes are:
1) Collect backtrace and fatal log on crash.
2) Poll memory usage. The data is only displayed at this time.
3) Support kerberos.
4) Add random queries.
5) Generate random and TPC-H nested data on a remote cluster. The
random data generator was converted to use MR for scaling.
6) Add a cluster abstraction to run data loading for #5 on a
remote or local cluster. This also moves and consolidates some
Cloudera Manager utilities that were in the stress test.
7) Cleanup the wrappers around impyla. That stuff was getting
messy.
Change-Id: I4e4b72dbee1c867626a0b22291dd6462819e35d7
Reviewed-on: http://gerrit.cloudera.org:8080/1298
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Log output of data loading steps to files only print to stdout
if there is an actual failure. The output of some steps is very noisy,
and some steps even have output that looks like errors.
This is implemented with a run-step helper function in bash that handles
redirection and logging. Any bash command can be prefixed with run-step
<step description> <log file name> to redirect the output to a log file.
Sample output is:
Starting Impala cluster (logging to start-impala-cluster.log)... OK
Setting up HDFS environment (logging to setup-hdfs-env.log)... OK
Skipped loading the metadata.
Loading HBase data only (logging to load-hbase-only.log)... OK
Loading Hive UDFs (logging to build-and-copy-hive-udfs.log)... OK
Running custom post-load steps (logging to custom-post-load-steps.log)... OK
Caching test tables (logging to cache-test-tables.log)... OK
Loading external data sources (logging to load-ext-data-source.log)... OK
Splitting HBase (logging to create-hbase.log)... OK
Change-Id: I6396540858c408b084039a87efc81e1004626f39
Reviewed-on: http://gerrit.cloudera.org:8080/1760
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
This adds a new 'latest' symlink in be/build that links to the latest
build configuration. This makes our script behave better as we don't
need to hard-code specific build types but can rather depend on sensible
defaults.
This patch addresses this issue in the cluster startup and a script that
is executed in the context of data loading. There might be more places
but so far my search did not yield any additional places where we rely
on a hardcoded path.
Change-Id: Ic814a1bef1d3088b2f8c1c34f25e2112b74315f8
Reviewed-on: http://gerrit.cloudera.org:8080/1797
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Use mvn-quiet.sh in a couple of places it was missed.
Fix mvn warnings.
Provide -q flag to git clean to prevent it reporting all of the files it
deletes.
Change-Id: I77ec2265bf35f64ab1ac76b0a253e67c5f97eccd
Reviewed-on: http://gerrit.cloudera.org:8080/1804
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Maven's INFO log level is very verbose and includes a lot of progress
information that is minimally useful.
Maven doesn't have an option to output only ERROR and WARNING log
messages. As a workaround, use grep to filter out the majority of the
output (only warnings, errors, tests, and success/failure).
Also add a header with relevant info about the maven command:
targets and working directory.
Change-Id: I828b870edc2fc80a6460e6ed594d507c46e69c82
Reviewed-on: http://gerrit.cloudera.org:8080/1752
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This patch allows passing additional cluster startup flags.
This is needed when building with optimizations in release
mode as the default cluster startup would only pick up a
debug build.
Change-Id: Ib98d6814558f2d82bdeac0e3cce1fb7db048c459
Reviewed-on: http://gerrit.cloudera.org:8080/1775
Tested-by: Internal Jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
The original error reporting relied on $0 being accessible from the
current working dir, which failed if a script changed the working dir
and $0 was relative. This updates the error reporting command to cd back
to the original dir before accessing $0.
Change-Id: I2185af66e35e29b41dbe1bb08de24200bacea8a1
Reviewed-on: http://gerrit.cloudera.org:8080/1666
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Impala could crash or return wrong result if it uses codegend
avro decoding function to scan avro file that has different
schema than table schema. With AVRO-1617 fix, we make sure
Impala doesn't use codegen if table schema has less columns
than file schema.
Change-Id: I268419e421404ad6b084482dee417634f17ecf60
Reviewed-on: http://gerrit.cloudera.org:8080/1696
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
Various test scripts operating on postgres databases output
unhelpful log messages, including "ERROR" messages that
aren't actual errors when trying to drop a database that doesn't exist.
Send useless output to /dev/null and consistently use || true to
ignore errors from dropdb.
Change-Id: I95f123a8e8cc083bf4eb81fe1199be74a64180f5
Reviewed-on: http://gerrit.cloudera.org:8080/1753
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Changes:
1) Consistently use "set -euo pipefail".
2) When an error happens, print the file and line.
3) Consolidated some of the kill scripts.
4) Added better error messages to the load data script.
5) Changed use of #!/bin/sh to bash.
Change-Id: I14fef66c46c1b4461859382ba3fd0dee0fbcdce1
Reviewed-on: http://gerrit.cloudera.org:8080/1620
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This is for compatibility with docker containers. Before this patch,
when the scripts were run on the docker host, the scripts would try
to kill the mini-cluster in the docker containers and fail because they
didn't have permissions (the user is different). Now the scripts will
only try to kill mini-cluster processes that were started by the current
user.
Also some psutil availability checks were removed because psutil is now
provided by the python virtualenv.
Change-Id: Ida371797bbaffd0a3bd84ab353cb9f466ca510fd
Reviewed-on: http://gerrit.cloudera.org:8080/1541
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Allow Impala to start only with a running HMS (and no additional services like HDFS,
HBase, Hive, YARN) and use the local file system.
Skip all tests that need these services, use HDFS caching or assume that multiple impalads
are running.
To run Impala with the local filesystem, set TARGET_FILESYSTEM to 'local' and
WAREHOUSE_LOCATION_PREFIX to a location on the local filesystem where the current user has
permissions since this is the location where the test data will be extracted.
Test coverage (with core strategy) in comparison with HDFS and S3:
HDFS 1348 tests passed
S3 1157 tests passed
Local Filesystem 1161 tests passed
Change-Id: Ic9718c7e0307273382b1cc6baf203ff2fb2acd03
Reviewed-on: http://gerrit.cloudera.org:8080/1352
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Readability: Alex Behm <alex.behm@cloudera.com>
The test_verify_runtime_profile test failed during C5.5 builds and
GVMs because this test relies on the table lineitem_multiblock to
have 3 blocks. However, due to the rules to load the data not being
followed in the functional_schema_template.sql file, the table ended
up being stored with only one block.
This change moves the data load to the end of create-load-data.sh
file which would load the data even for snapshots.
Change-Id: I78030dd390d2453230c4b7b581ae33004dbf71be
Reviewed-on: http://gerrit.cloudera.org:8080/1153
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
Until now, we ignored the impact of the selectivity of predicates that
will be pushed down to Kudu, this was wrong and yielded bad plans. This
patch makes sure that the selectivity of the predicates that are pushed
down to kudu is used in the computeStats() calculation.
Change-Id: Iae587143dfafa0b008e00a356a80b55747b762e4
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/8426
Reviewed-by: Todd Lipcon <todd@cloudera.com>
Tested-by: jenkins
Recently, the full data load started failing because Hive ran out of heap space while
writing the nested tpch tables. This patch simply bumps up the heap space, and the query
is now successfull.
Change-Id: I92d0029659c41417d76a15f703df1d42e5187d5e
Reviewed-on: http://gerrit.cloudera.org:8080/776
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The combination of --force and text/lzo was broken if the partition
directories already contained data. For reasons explained in the
comments, the ALTER TABLE ADD PARTITION step is skipped in this case,
which causes HIVE to not do a full overwrite with INSERT OVERWRITE.
Fix it by manually removing the directories.
Testing: Verified the following combinations of load-data.py for
text/lzo now work:
{--force, ""} x {no partition dirs, partition dirs with files}
Change-Id: I3ee34c4d85c58644345eadd8fc0976665c1bbaf5
Reviewed-on: http://gerrit.cloudera.org:8080/752
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Due to a possible change in behaviour in Hive/MR, it is no longer possible to use
arbitrarily large values for parquet.block.size. This breaks the loading of nested tpch
data on newer Hive. This patch addresses the problem by using a permissble value.
Change-Id: Ib5b14651fb579cec6aa8d45bd2253cecb4346eb9
Reviewed-on: http://gerrit.cloudera.org:8080/755
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
Before this patch, we used to accept any query referencing complex
types, regardless of the table/partition's file format being scanned.
We would ultimately hit a DCHECK in the BE when attempting to scan
complex types of a table/partition with an unsupported format.
This patch makes queries fail gracefully during planning if a scan
would access a table/partition in a format for which we do not
support complex types.
For mixed-format partitioned Hdfs tables we perform this check
at the partition granularity, so such a table can be scanned as
long as only partitions with supported formats are accessed.
HBase tables with complex-typed columns can be scanned as long as
no complex-typed columns are accessed in the query.
Change-Id: I2fd2e386c9755faf2cfe326541698a7094fa0ffc
Reviewed-on: http://gerrit.cloudera.org:8080/705
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Add support for creating a table based on a parquet file which contains arrays,
structs and/or maps.
Change-Id: I56259d53a3d9b82f318228e864c783b48a03f9ae
Reviewed-on: http://gerrit.cloudera.org:8080/582
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
A script is added that generates two parquet files with nested data.
One file has modern nested types encoding and the other one has
legacy encoding. This data will be used for testing nested types
support for "create table like file" statement.
Change-Id: I8a4f64c9f7b3228583f3cb0af5507a9dd4d152ef
Reviewed-on: http://gerrit.cloudera.org:8080/610
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
Python tests and infra scripts will now use "python" from the virtualenv
via $IMPALA_HOME/bin/impala-python. Some scripts could be simplified now
that python 2.6 and a dependable set of third-party libraries are
available but that is not done as part of this commit.
Change-Id: If1cf96898d6350e78ea107b9026b12ba63a4162f
Reviewed-on: http://gerrit.cloudera.org:8080/603
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
Hive allows creating Avro tables without an explicit Avro schema since 0.14.0.
For such tables, the Avro schema is inferred from the column definitions,
and not stored in the metadata at all (no Avro schema literal or Avro schema file).
This patch adds support for loading the metadata of such tables, although Impala
currently cannot create such tables (expect a follow-on patch).
Change-Id: I9e66921ffbeff7ce6db9619bcfb30278b571cd95
Reviewed-on: http://gerrit.cloudera.org:8080/538
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Patch c0c9fbdf57df667f63632437f612a63baf1534dd: "Load Kudu as part of
the normal data loading workflow" passed the build when it was first
introduced as it had introduced changes to the datasets directory
which cause the metadata loading not to be skipped. However it failed
all subsequent times as there were no further changes to the metadata
directory.
This patch makes data loading for Kudu run independently of whether
metadata load is skipped or not since a new Kudu cluster is now created
on each build.
This patch also removes one last reference to 'functional_kudu.liketbl'
in AnalyzeDDLTest since we don't create/load data for that table anymore.
Change-Id: Ibe9acc7da17062ac317dff06a8c57dd87cf566d6
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/7110
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: David Alves <david.alves@cloudera.com>
When loading a large nested table using the GROUP_CONCAT function,
Impala runs out of memory. We prevent this from happening by adding
an option to partition the table and load one partition at a time.
Change-Id: I8d517f94ef97e98d36eb8ebc8180865023655114
Reviewed-on: http://gerrit.cloudera.org:8080/448
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
This loads data into Kudu as part of end-to-end test setup, which
was previously disabled as INSERT was missing.
Specifically this used Impala loading (vs. hive loading) and uses
Kudu specific handling in two ways:
- Tables are created "managed", i.e. without the EXTERNAL keyword
so that tables are actually dropped in kudu as part of the setup.
This was required as Kudu does not support overwriting tables
(i.e. INSERT OVERWRITE).
- Insert statements in Kudu do not have the OVERWRITE keyword since
tables in kudu cannot be safely overwritten, so this replaces
OVERWRITE with INTO.
Change-Id: I4ee3ded542e9b5c71207fd0cb494e0c2c0890667
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/6985
Tested-by: jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Currently we were using the default data directory to store the data
used for Impala tests, without ever formatting it. This is contrary
to how the other Impala data sources behave, i.e. when "--format" is
passed to build-all.sh only Kudu wouldn't be formatted.
This also moves Kudu's data directory inside of the Impala directory
structure, where it's easier to account for it.
Change-Id: Iae2870df0e625de07a761687e75999ef30f2be06
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/7055
Tested-by: jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
This allows to specify split keys in Kudu's CREATE TABLE statement as part
of the key/value pairs in TBLPROPERTIES.
Splits are expected to be specified in as json arrays of arrays: [[key1], [key2], ...]
'key1', 'key2 might be single values or comma separated lists of values, depending
on whether the table has a simple of compound primary key.
This also adds a series of test tables to be created for the kutu table format
when load-data.py is executed.
Change-Id: I1824199fda14abb2d7352800789f2b9c2f2124ae
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/6974
Tested-by: jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
This adds a new TABLE_PROPERTIES section that will be included in the
CREATE TABLE statement as TBLPROPERTIES. Each line in this new section
is expected to be in the form:
<file_format>:<key>=<value>
Properties are only added to create table statements of the file format
they specify.
Change-Id: I89ef7ced3351ecf2c727050ca426f6616f3e5bcd
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/6945
Tested-by: jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
This patch enables running Impala tests against Isilon as the default file system. The
intention is to run tests against a realistic deployment, i.e, Isilon replacing HDFS as
the underlying filesystem.
Specifically, it does the following:
- Adds a new environment variable DEFAULT_FS, which points to HDFS by default.
- Makes the fs.defaultFs property in core-site.xml use the DEFAULT_FS environment
variable, such that all clients talk to Isilon implicitly.
- Unset FILESYSTEM_PREFIX when the TARGET_FILESYSTEM is Isilon, since path prefixes
are no longer needed.
- Only starts the Hive Metastore and the Impala service stack when running
tests against Isilon.
We don't start KMS/HBase because they're not relevant to Isilon. We also don't
start YARN, Hive and LLama because hive queries are disabled with Isilon.
The scripts that start/stop Hive, YARN and Llama should be modified to point to a
filesystem other than HDFS in the future.
Change-Id: Id66bfb160fe57f66a64a089b465b536c6c514b63
Reviewed-on: http://gerrit.cloudera.org:8080/449
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
The database will be used for testing in the future.
Change-Id: I60b54b36db9493a5bea308151b4027cd47d73047
Reviewed-on: http://gerrit.cloudera.org:8080/400
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins