Maven's batch (or non-interactive) mode prevents progress bar output
when Maven is downloading artifacts, which isn't generally useful.
Now that we keep Maven logs in logs/mvn/mvn.log, this makes
them slightly more tidy.
Change-Id: I5aa117272c2a86b63b0f9062099a4145324eb6fc
Reviewed-on: http://gerrit.cloudera.org:8080/9792
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
We use Java 1.7 in fe/pom.xml, where most of our Java code is. For
consistency, this updates the rest of our Maven configurations to use
the same version of Java. A change I'm working with uses
try-with-resources in HBase splitting, which is how I ran into
this.
Testing: ran core tests
Change-Id: I6cecddf367f00185a14a8b08c03456e3b756bd70
Reviewed-on: http://gerrit.cloudera.org:8080/9600
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins
This commit links together all the individual pom.xml files to have a
new "impala-parent" pom as the parent. This enables de-duplicating all
the repository configuration.
I ran the build to test this.
Change-Id: Id744e4357ee4d8e4be4e5490b2159bb76a2192f0
Reviewed-on: http://gerrit.cloudera.org:8080/8753
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
As a preparation to start generating Protobuf files
for IMPALA-4856, this change introduces a new build
target "gen-deps" which serves as an umbrella for all
build targets of generated code. For now, it only
includes thrift-deps and protobuf targets will be added
in the future.
Change-Id: I360c63773efdeab4c26ca96b915e0c8d0ce2b9c9
Reviewed-on: http://gerrit.cloudera.org:8080/7851
Reviewed-by: Lars Volker <lv@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
The server is no longer even up, so there is no point in having it in
our poms.
Change-Id: I2310000e51c5e6d85a0fa30874629f4f19427c6c
Reviewed-on: http://gerrit.cloudera.org:8080/6678
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Use CMake's dependency resolution always instead of serial execution of
targets via shell scripts. This improves parallelism by building fe,
be, and other targets at the same time and avoid some overhead from
invoking "make" multiple times. This reduces the time taken for
an incremental compilation of fe and be from 56s to 24s with this
command:
./buildall.sh -debug -noclean -notests -skiptests -ninja
Also use Impala-lzo's build script. This depends on the IMPALA-4277
fixes to the Impala-lzo build script.
Log directory creation is also moved from impala-config.sh to
buildall.sh. This means that impala-config.sh has no side-effects and
can be run concurrently with no issues.
Also make sure that "make" builds all the same artifacts as buildall.sh
when run with no args.
Testing:
Ran a jenkins core job, also experimented locally. Ran a jenkins core
job with distcc disabled - this exposed some concurrency bugs where
impala-config.sh fails if run concurrently.
Change-Id: I23617adf13bdeb034c24f6bba14b5ae480e8dd26
Reviewed-on: http://gerrit.cloudera.org:8080/4790
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
The main outcome of this change is to avoid making unnecessary
modification to the Impala or other source trees when we don't need the
test cluster.
To achieve that, this refactors the script to make the flow easier
to understand and makes it more consistent which build steps are
executed in which modes.
Change-Id: I429da7bc6681b16c07fe58bb3efac6d1a8579137
Reviewed-on: http://gerrit.cloudera.org:8080/4685
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
As part of the ASF transition, we need to replace references to
Cloudera in Impala with references to Apache. This primarily means
changing Java package names from com.cloudera.impala.* to
org.apache.impala.*
A prior patch renamed all the files as necessary, and this patch
performs the actual code changes. Most of the changes in this patch
were generated with some commands of the form:
find . | grep "\.java\|\.py\|\.h\|\.cc" | \
xargs sed -i s/'com\(.\)cloudera\(\.\)impala/org\1apache\2impala/g
along with some manual fixes.
After this patch, the remaining references to Cloudera in the repo
mostly fall into the categories:
- External components that have cloudera in their own package names,
eg. com.cloudera.kudu/llama
- URLs, eg. https://repository.cloudera.com/
Change-Id: I0d35fa6602a7fc0c212b2ef5e2b3322b77dde7e2
Reviewed-on: http://gerrit.cloudera.org:8080/3937
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Internal Jenkins
As part of the ASF transition, we need to replace references to
Cloudera in Impala with references to Apache. This primarily means
changing Java package names from com.cloudera.impala.* to
org.apache.impala.*
To make this easier to review, this patch only renames files,
eg. fe/src/main/java/com/cloudera -> fe/src/main/java/org/apache
A follow up patch performs the actual code updates.
Change-Id: I3767dd1ee86df767075fdf1b371eb6b0b06668db
Reviewed-on: http://gerrit.cloudera.org:8080/3936
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Internal Jenkins
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Maven was complaining that the source encoding was not set, and that the
version of a plugin was not specified.
Change-Id: I2bc6bbe95fc71575aeec5b6969cc869794309a49
Reviewed-on: http://gerrit.cloudera.org:8080/1741
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This patch introduces the concept of error codes for errors that are
recorded in Impala and are going to be presented to the client. These
error codes are used to aggregate and group incoming error / warning
messages to reduce the spill on the shell and increase the usefulness of
the messages. By splitting the message string from the implementation,
it becomes possible to edit the string independently of the code and
pave the way for internationalization.
Error messages are defined as a combination of an enum value and a
string. Both are defined in the Error.thrift file that is automatically
generated using the script in common/thrift/generate_error_codes.py. The
goal of the script is to have a central understandable repository of
error messages. Adding new messages to this file will require rebuilding
the thrift part. The proxy class ErrorMessage is responsible to
represent an error and capture the parameters that are used to format
the error message string.
When error messages are recorded they are recorded based on the
following algorithm:
- If an error message is of type GENERAL, do not aggregate this message
and simply add it to the total number of messages
- If an error messages is of specific type, record the first error
message as a sample and for all other occurrences increment the count.
- The coordinator will merge all error messages except the ones of type
GENERAL and display a count.
For example, in the case of the parquet file spanning multiple blocks
the output will look like:
Parquet files should not be split into multiple hdfs-blocks.
file=hdfs://localhost:20500/fid.parq (1 of 321 similar)
All messages are always logged to VLOG. In the coordinator error
messages are merged across all backends to retain readability in the
case of large clusters.
The current version of this patch adds these new error codes to some of
the most important error messages as a reference implementation.
Change-Id: I1f1811631836d2dd6048035ad33f7194fb71d6b8
Reviewed-on: http://gerrit.cloudera.org:8080/39
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
ext-data-source only needs a small subset of the thrift structures, so this
separates the dependencies between files so that just the necessary structs
are generated for ext-data-source. Afterwards, we can remove extra maven
dependencies which were using environment variables to get versions. While the
environment variables work when building the pom, they are not propagated to
dependencies so building fe/pom.xml ended up producing lots of warnings which
are now gone.
Change-Id: I267fe7bc7a54c3c21aad8c1ffce07cf1a1e07c5e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3748
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 1f738962ccb7a34834decfe6cb27307ed4548870)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3767
Syntax is "CREATE TABLE name LIKE fileformat '/path/to/file'".
Supports all options that CREATE TABLE does. Currently only PARQUET is supported.
Run testdata/bin/create-load-data.sh after pulling this patch.
Change-Id: Ibb9fbb89dbde6acceb850b914c48d12f22b33f55
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2720
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3158
* Rename getStats() to prepare()
* Adds TRowBatch.num_rows to indicate number of rows when no cols are
materialized
* Changes api and sample poms to produce source jars
Change-Id: I02dcc89e27716978708386cfc3f7940ee5dbc023
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2406
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 2d7fcba8b7442b54a388f8b994d0cfa08940bbd7)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2434
A few changes to the external data source thrift types:
* Change RowBatch to return entire columns. Adds Data.TColumnData to
represent an entire column.
* Makes all fields in ExternalDataSource (except for status fields on
the result structures) optional in case fields become deprecated in
the future.
* Adds a limit parameter to the TOpenParams structure in case the
data source needs to apply the limit itself.
Change-Id: I62db68bfb64d2190dfdd0c84be5925ad5db031ef
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2345
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
(cherry picked from commit faf220d628359be1368f898493900fc2e2913c53)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2385
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Adds the thrift structures for the public external data source API
and a new maven project containing the Java ExternalDataSource
interface and the generated Java thrift classes.
The ExternalDataSource.thrift structures can evolve in a backward
compatible way. The ExternalDataSource Java interface will always
contain a version number in the namespace (e.g.
com.cloudera.impala.extdatasource.v1 for V1) so we can potentially
make breaking changes to the interface in the future but still
support older versions.
A trivial implementation of the ExternalDataSource API is also
added for testing purposes.
TODO: Make the sample data source implementation realistic.
Change-Id: I827d6420a87ed7a2bce34e050362ca98ddc5dbcc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2241
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit f29814e9ede9d4c889f2648606fcf511feeb47ae)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2313