Commit Graph

17 Commits

Author SHA1 Message Date
Sailesh Mukil
44e8bbffc3 IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"
We use the new libHDFS API hdfsGetLastExceptionRootCause() to return
the last seen HDFS error on that thread.

This patch depends on the recent HDFS commit:
fda86ef2a3

Testing: A test has been added which puts HDFS in safe mode and then
verifies that we see a 255 error with the root cause.

Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d
Reviewed-on: http://gerrit.cloudera.org:8080/6894
Tested-by: Impala Public Jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2017-05-23 16:42:48 +00:00
Sailesh Mukil
edcc593ee5 IMPALA-5244 test_hdfs_file_open_fail fails on local filesystem build
This test had to be skipped for non HDFS filesystems.

Change-Id: I5318a5eb27b15fed5df770b9c3ea23e7e1a97a4c
Reviewed-on: http://gerrit.cloudera.org:8080/6723
Reviewed-by: Michael Ho <kwho@cloudera.com>
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-25 10:50:18 +00:00
Sailesh Mukil
e0d1db51ed IMPALA-5198: Error messages are sometimes dropped before reaching client
The Status::ToThrift() function takes the ErrorMsg, and pushes both
the msg() and details() into the TStatus::error_msgs list.

However, when we unpack the TStatus object into a Status object, we
just copy all the TStatus::error_msgs to Status::ErrorMsg::details_
and leave Status::ErrorMsg::message_ blank.

This led to the error message not being printed in certain cases which
is now fixed.

The PlanFragmentExecutor had some code to add query statuses to
the error_log (IMP-633), which is no longer necessary after a
future patch (IMPALA-762) explicitly returned the query status to
the client via get_log(), making the adding of the query statuses
to the error_log redundant. That code in the PFE has been removed
and a test has been added to make sure that the case it previously
tried to fix doesn't regress.

Change-Id: I5d9d63610eb0d2acae3a9303ce46e1410727ce87
Reviewed-on: http://gerrit.cloudera.org:8080/6627
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-20 22:58:56 +00:00
David Knupp
f590bc0da6 IMPALA-4750: Rename test infra classes so they don't mimic test classes.
This patch addresses warning messages from pytest re: the imported
TestMatrix, TestVector, and TestDimension classes, which were being
collected as potential test classes. The fix was to simply prepend
the class names with Impala-

git grep -l 'TestDimension' | xargs \
    sed -i 's/TestDimension/ImpalaTestDimension/g'

git grep -l 'TestMatrix' | xargs \
    sed -i 's/TestMatrix/ImpalaTestMatrix/g'

git grep -l 'TestVector' | xargs \
    sed -i 's/TestVector/ImpalaTestVector/g'

The tests all passed in an exhaustive run on the upstream jenkins
server:

http://jenkins.impala.io:8080/view/Utility/job/pre-review-test/8/

Change-Id: I06b7bc6fd99fbb637a47ba376bf9830705c1fce1
Reviewed-on: http://gerrit.cloudera.org:8080/5794
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
2017-01-26 23:40:22 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Tim Armstrong
bc8c55afcd IMPALA-3729: batch_size=1 coverage for avro scanner
Also fix a stale comment in the avro scanner header.

The main work here is to fix the handling of empty result sets in the
test result verifier. This is a problem because we wanted to verify
that the results in the test file were a superset of the rows
returned, and this was thrown off by superflous '' rows in the expected
and actual result sets.

The basic problem is that the way test file sections
was parsed conflated an empty result section with non-empty result
section that had a single empty string. I.e.:

---- RESULTS
====

vs
---- RESULTS

====

both got resolved to [''].

Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Reviewed-on: http://gerrit.cloudera.org:8080/3413
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2016-07-19 23:30:02 -07:00
Taras Bobrovytsky
609b80410e Clean up Python test import statements
Many of our test scripts have import statements that look like
"from xxx import *". It is a good practice to explicitly name what
needs to be imported. This commit implements this practice. Also,
unused import statements are removed.

Change-Id: I6a33bb66552ae657d1725f765842f648faeb26a8
Reviewed-on: http://gerrit.cloudera.org:8080/3444
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
2016-07-15 23:26:18 +00:00
Skye Wanderman-Milne
01287a3ba9 IMPALA-3441, IMPALA-3659: check for malformed Avro data
This patch adds error checking to the Avro scanner (both the codegen'd
and interepted paths), including out-of-bounds checks and data
validity checks.

I ran a local benchmark using the following queries:
  set num_scanner_threads=1;
  select count(i) from default.avro_bigints_big; # file contains only longs
  select max(l_orderkey) from biglineitem_avro; # file has tpch.lineitem schema

Both benchmark queries see negligable or no performance impact.

This patch adds a new Avro scanner unit test and an end-to-end test
that queries several corrupted files, as well as updates the zig-zag
varlen int unit test.

Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132
Reviewed-on: http://gerrit.cloudera.org:8080/3072
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-06-13 18:32:32 -07:00
Alex Behm
72e4c41400 IMPALA-3491: Use unique_database fixture in test_data_errors.py.
Testing: Ran the test locally 10 times in a loop on exhaustive.

Change-Id: I8337daf499b90819a253b883fedaa55bd6b6630e
Reviewed-on: http://gerrit.cloudera.org:8080/3087
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-05-23 08:40:19 -07:00
Sailesh Mukil
ed7f5ebf53 IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems
Previously Impala disallowed LOAD DATA and INSERT on S3. This patch
functionally enables LOAD DATA and INSERT on S3 without making major
changes for the sake of improving performance over S3. This patch also
enables both INSERT and LOAD DATA between file systems.

S3 does not support the rename operation, so the staged files in S3
are copied instead of renamed, which contributes to the slow
performance on S3.

The FinalizeSuccessfulInsert() function now does not make any
underlying assumptions of the filesystem it is on and works across
all supported filesystems. This is done by adding a full URI field to
the base directory for a partition in the TInsertPartitionStatus.
Also, the HdfsOp class now does not assume a single filesystem and
gets connections to the filesystems based on the URI of the file it
is operating on.

Added a python S3 client called 'boto3' to access S3 from the python
tests. A new class called S3Client is introduced which creates
wrappers around the boto3 functions and have the same function
signatures as PyWebHdfsClient by deriving from a base abstract class
BaseFileSystem so that they can be interchangeably through a
'generic_client'. test_load.py is refactored to use this generic
client. The ImpalaTestSuite setup creates a client according to the
TARGET_FILESYSTEM environment variable and assigns it to the
'generic_client'.

P.S: Currently, the test_load.py runs 4x slower on S3 than on
HDFS. Performance needs to be improved in future patches. INSERT
performance is slower than on HDFS too. This is mainly because of an
extra copy that happens between staging and the final location of a
file. However, larger INSERTs come closer to HDFS permformance than
smaller inserts.

ACLs are not taken care of for S3 in this patch. It is something
that still needs to be discussed before implementing.

Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d
Reviewed-on: http://gerrit.cloudera.org:8080/2574
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:49 -07:00
Juan Yu
97af107729 IMPALA-2914: Fix DCHECK Check failed: HasDateOrTime()
Some TimestampValue converting functions assume caller
ensures TimestampValue instance has a valid date or time
but that's not true. Change those functions to return
result in output parameter and return boolean to indicate
the conversion is good or not.

Change-Id: I7a68a1e14d9c4ee5d83da760d4d76c20c36bc359
(cherry picked from commit 47d8977f5976b9be405f44add966820138fbda6f)
Reviewed-on: http://gerrit.cloudera.org:8080/2195
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
2016-02-24 13:31:00 -08:00
Vlad Berindei
b6c20b2a40 Allow Impala to run against local filesystem.
Allow Impala to start only with a running HMS (and no additional services like HDFS,
HBase, Hive, YARN) and use the local file system.

Skip all tests that need these services, use HDFS caching or assume that multiple impalads
are running.

To run Impala with the local filesystem, set TARGET_FILESYSTEM to 'local' and
WAREHOUSE_LOCATION_PREFIX to a location on the local filesystem where the current user has
permissions since this is the location where the test data will be extracted.

Test coverage (with core strategy) in comparison with HDFS and S3:
HDFS             1348 tests passed
S3               1157 tests passed
Local Filesystem 1161 tests passed

Change-Id: Ic9718c7e0307273382b1cc6baf203ff2fb2acd03
Reviewed-on: http://gerrit.cloudera.org:8080/1352
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Readability: Alex Behm <alex.behm@cloudera.com>
2015-12-05 06:48:32 +00:00
Casey Ching
074e5b4349 Remove hashbang from non-script python files
Many python files had a hashbang and the executable bit set though
they were not intended to be run a standalone script. That makes
determining which python files are actually scripts very difficult.
A future patch will update the hashbang in real python scripts so they
use $IMPALA_HOME/bin/impala-python.

Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba
Reviewed-on: http://gerrit.cloudera.org:8080/599
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2015-08-04 05:26:07 +00:00
ishaan
09e5eaeda2 Introduce classes for pytest's skipif markers.
This patch encapsulates pytests's skipif markers in classes. It leads to the following
benefits:
  - Provide context and grouping for tests being skipped.
  - As we improve test reporting, annotations will give us a better idea of coverage.

Change-Id: Ib0557fb78c873047c214bb62bb6b045ceabaf0c9
Reviewed-on: http://gerrit.cloudera.org:8080/297
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
Reviewed-on: http://gerrit.cloudera.org:8080/343
2015-04-19 03:09:59 +00:00
Matthew Jacobs
7558a4752b IMPALA-1502: Fix and re-enable broken data errors tests
Re-enables data error tests which were not being included in
run-tests.py. Broken tests were updated, with one exception which
is tracked by IMPALA-1862. Depends on a related change to
Impala-lzo.

Change-Id: I4c42498bdebf9155a8722695a3305b63ecc6e5f3
Reviewed-on: http://gerrit.cloudera.org:8080/194
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-03-11 16:39:40 -07:00
ishaan
8369c3b13b Remove explicit references to functional_hbase tables from .test files.
Additionally, this patch also disabled the hbase/none test dimension if the
TARGET_FILESYSTEM environment variable is set to either s3 of isilon.

Change-Id: I63aecaa478d2ba9eb68de729e9640071359a2eeb
Reviewed-on: http://gerrit.cloudera.org:8080/74
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-02-23 23:32:41 +00:00
Lenni Kuff
15327e8136 Migrate DataErrors tests to Python test framework, re-enable subset of tests
This re-enables a subset of the stable data errors tests and updates them to
work in our test framework. This includes support for updating results via --update_results.

This also lets us remove a lot of old code that was there only to support these disabled
tests.

Change-Id: I4c40c3976d00dfc710d59f3f96c99c1ed33e7e9b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1952
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2277
2014-04-18 02:25:11 -07:00