Fixes a regression in the data load process that had been introduced
by commit 75a857c. To making check-schema-diff.sh work from anywhere.
we need to specify the git-dir and work-tree arguments everywhere we
call git.
Change-Id: I32e0dce2c10c443763a038aa3b64b1c123ed62ad
Reviewed-on: http://gerrit.cloudera.org:8080/4726
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
The original error reporting relied on $0 being accessible from the
current working dir, which failed if a script changed the working dir
and $0 was relative. This updates the error reporting command to cd back
to the original dir before accessing $0.
Change-Id: I2185af66e35e29b41dbe1bb08de24200bacea8a1
Reviewed-on: http://gerrit.cloudera.org:8080/1666
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Impala could crash or return wrong result if it uses codegend
avro decoding function to scan avro file that has different
schema than table schema. With AVRO-1617 fix, we make sure
Impala doesn't use codegen if table schema has less columns
than file schema.
Change-Id: I268419e421404ad6b084482dee417634f17ecf60
Reviewed-on: http://gerrit.cloudera.org:8080/1696
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
Changes:
1) Consistently use "set -euo pipefail".
2) When an error happens, print the file and line.
3) Consolidated some of the kill scripts.
4) Added better error messages to the load data script.
5) Changed use of #!/bin/sh to bash.
Change-Id: I14fef66c46c1b4461859382ba3fd0dee0fbcdce1
Reviewed-on: http://gerrit.cloudera.org:8080/1620
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Allow Impala to start only with a running HMS (and no additional services like HDFS,
HBase, Hive, YARN) and use the local file system.
Skip all tests that need these services, use HDFS caching or assume that multiple impalads
are running.
To run Impala with the local filesystem, set TARGET_FILESYSTEM to 'local' and
WAREHOUSE_LOCATION_PREFIX to a location on the local filesystem where the current user has
permissions since this is the location where the test data will be extracted.
Test coverage (with core strategy) in comparison with HDFS and S3:
HDFS 1348 tests passed
S3 1157 tests passed
Local Filesystem 1161 tests passed
Change-Id: Ic9718c7e0307273382b1cc6baf203ff2fb2acd03
Reviewed-on: http://gerrit.cloudera.org:8080/1352
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Readability: Alex Behm <alex.behm@cloudera.com>
This patch contains the following changes:
- Add a metastore_snapshot_file parameter to build.sh
- Enable skipping loading the metadata.
- create-load-data.sh is refactored into functions.
- A lot of scripts source impala-config, which creates a lot of log spew. This has now
been muted.
- Unecessary log spew from compute-table-stats has been muted.
- build_thirdparty.sh determins its parallelism from the system, it was previously hard
coded to 4
- Only force load data of the particular dataset if a schema change is detected.
Change-Id: I909336451e5c1ca57d21f040eb94c0e831546837
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5540
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
This is the first iteration of a kerberized development environment.
All the daemons start and use kerberos, with the sole exception of the
hive metastore. This is sufficient to test impala authentication.
When buildall.sh is run using '-kerberize', it will stop before
loading data or attempting to run tests.
Loading data into the cluster is known to not work at this time, the
root causes being that Beeline -> HiveServer2 -> MapReduce throws
errors, and Beeline -> HiveServer2 -> HBase has problems. These are
left for later work.
However, the impala daemons will happily authenticate using kerberos
both from clients (like the impala shell) and amongst each other.
This means that if you can get data into the mini-cluster, you could
query it.
Usage:
* Supply a '-kerberize' option to buildall.sh, or
* Supply a '-kerberize' option to create-test-configuration.sh, then
'run-all.sh -format', re-source impala-config.sh, and then start
impala daemons as usual. You must reformat the cluster because
kerberizing it will change all the ownership of all files in HDFS.
Notable changes:
* Added clean start/stop script for the llama-minikdc
* Creation of Kerberized HDFS - namenode and datanodes
* Kerberized HBase (and Zookeeper)
* Kerberized Hive (minus the MetaStore)
* Kerberized Impala
* Loading of data very nearly working
Still to go:
* Kerberize the MetaStore
* Get data loading working
* Run all tests
* The unknown unknowns
* Extensive testing
Change-Id: Iee3f56f6cc28303821fc6a3bf3ca7f5933632160
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4019
Reviewed-by: Michael Yoder <myoder@cloudera.com>
Tested-by: jenkins
This patch checks the test-warehouse's stored githash (if it exists) to determine if the
current patch has changed the schema if a table. If a change is detected, we force load
all the data.
Change-Id: I314f9f3364d3e6b2d66de38a9e6d9f57c4e279a7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3049
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins