25 Commits

Author SHA1 Message Date
Zoltan Borok-Nagy
20d289bd13 IMPALA-13814: test_iceberg_table_metrics fails in non-DFS builds
test_iceberg_table_metrics checks the value of memory-estimate-bytes
for Iceberg tables. The value depends on the number of blocks in the
table which will be 0 in non-DFS (not HDFS, not Ozone) builds. With
this patch the test has different checks for filesystems that support
block location information and for file filesystems that don't.

Change-Id: Ie0c6d8a93e0a07cc680fac6e3f83218b853ca5e3
Reviewed-on: http://gerrit.cloudera.org:8080/22565
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-03 21:31:11 +00:00
Gaurav Singh
9a132bc436 IMPALA-12380: Securing dbcp.password for JDBC
external data source

In the current implementation of external JDBC data source,
the user has to provide both the username and password in
plain text which is not a good practice.

This patch extends the functionality of existing implementation
to either provide:
a) username and password
b) username or key and keystore

If the user provides the password, then that password is used.
However, if no password is provided and the user provides only the
key/keystore, then it fetches the password from the secure jceks
keystore.

Testing:
- Added unit test TestExtDataSourcesWithKeyStore

Change-Id: Iec83a9b6e00456f0a1bbee747bd752b2cf9bf238
Reviewed-on: http://gerrit.cloudera.org:8080/20809
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-01-02 23:43:42 +00:00
Joe McDonnell
82bd087fb1 IMPALA-11973: Add absolute_import, division to all eligible Python files
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
 1. Python 3 requires absolute imports within packages. This
    can be emulated via "from __future__ import absolute_import"
 2. Python 3 changed division to "true" division that doesn't
    round to an integer. This can be emulated via
    "from __future__ import division"

This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.

I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.

Testing:
 - Ran core tests

Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
yx91490
f4d306cbca IMPALA-11629: Support for huawei OBS FileSystem
This patch adds support for huawei OBS (Object Storage Service)
FileSystem. The implementation is similar to other remote FileSystems.

New flags for OBS:
- num_obs_io_threads: Number of OBS I/O threads. Defaults to be 16.

Testing:
 - Upload hdfs test data to an OBS bucket. Modify all locations in HMS
   DB to point to the OBS bucket. Remove some hdfs caching params.
   Run CORE tests.

Change-Id: I84a54dbebcc5b71e9bcdd141dae9e95104d98cb1
Reviewed-on: http://gerrit.cloudera.org:8080/19110
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-02-09 08:10:19 +00:00
yacai
c953426692 IMPALA-11683: Support Aliyun OSS File System
This patch adds support for OSS (Aliyun Object Storage Service).
Using the hadoop-aliyun, the implementation is similar to other
remote FileSystems.

Tests:
- Prepare:
  Initialize OSS-related environment variables:
  OSS_ACCESS_KEY_ID, OSS_SECRET_ACCESS_KEY, OSS_ACCESS_ENDPOINT.
  Compile and create hdfs test data on a ECS instance. Upload test data
  to an OSS bucket.
- Modify all locations in HMS DB to point to the OSS bucket.
  Remove some hdfs caching params. Run CORE tests.

Change-Id: I267e6531da58e3ac97029fea4c5e075724587910
Reviewed-on: http://gerrit.cloudera.org:8080/19165
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-11-16 10:14:49 +00:00
Michael Smith
ded49055d3 IMPALA-9448: (Addendum) Avoid generic recreate for Ozone bucket
Avoids using `hadoop fs` to recreate an Ozone bucket during snapshot
load, as this removes the encryption key.

Change-Id: I31a3813fac6f2d68859ba7fed53b5c5a82502780
Reviewed-on: http://gerrit.cloudera.org:8080/19082
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-10-06 23:51:39 +00:00
Michael Smith
3577030df6 IMPALA-11562: Revert support for o3fs as default filesystem
Reverts support for o3fs as a default filesystem added in IMPALA-9442.
Updates test setup to use ofs instead.

Munges absolute paths in Iceberg metadata to match the new location
required for ofs. Ozone has strict requirements on volume and bucket
names, so all tables must be created within a bucket (e.g. inside
/impala/test-warehouse/).

Change-Id: I45e90d30b2e68876dec0db3c43ac15ee510b17bd
Reviewed-on: http://gerrit.cloudera.org:8080/19001
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-09-28 22:35:48 +00:00
Michael Smith
830625b104 IMPALA-9442: Add Ozone to minicluster
Adds Ozone as an alternative to hdfs in the minicluster. Select by
setting `export TARGET_FILESYSTEM=ozone`. With that flag,
run-mini-dfs.sh will start Ozone instead of HDFS. Requires a snapshot
because Ozone does not support HBase (HDDS-3589); snapshot loading
doesn't work yet primarily due to HDDS-5502.

Uses the o3fs interface because Ozone puts specific restrictions on
bucket names (no underscores, for instance), and it was a lot easier to
use an interface where everything is written to a single bucket than to
update all Impala's use of HDFS-style paths to make `test-warehouse` a
bucket inside a volume.

Specifies reduced Ozone client retries during shutdown where Ozone may
not be available.

Passes tests with FE_TEST=false BE_TEST=false.

Change-Id: Ibf8b0f7b2d685d8b011df1926e12bf5434b5a2be
Reviewed-on: http://gerrit.cloudera.org:8080/18738
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2022-08-03 16:58:20 +00:00
Fucun Chu
157086cb80 IMPALA-10771: Add Tencent COS support
This patch adds support for COS(Cloud Object Storage). Using the
hadoop-cos, the implementation is similar to other remote FileSystems.

New flags for COS:
- num_cos_io_threads: Number of COS I/O threads. Defaults to be 16.

Follow-up:
- Support for caching COS file handles will be addressed in
   IMPALA-10772.
- test_concurrent_inserts and test_failing_inserts in
   test_acid_stress.py are skipped due to slow file listing on
   COS (IMPALA-10773).

Tests:
 - Upload hdfs test data to a COS bucket. Modify all locations in HMS
   DB to point to the COS bucket. Remove some hdfs caching params.
   Run CORE tests.

Change-Id: Idce135a7591d1b4c74425e365525be3086a39821
Reviewed-on: http://gerrit.cloudera.org:8080/17503
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-12-08 16:32:02 +00:00
stiga-huang
2dfc68d852 IMPALA-7712: Support Google Cloud Storage
This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.

New flags for GCS:
 - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.

Follow-up:
 - Support for spilling to GCS will be addressed in IMPALA-10561.
 - Support for caching GCS file handles will be addressed in
   IMPALA-10568.
 - test_concurrent_inserts and test_failing_inserts in
   test_acid_stress.py are skipped due to slow file listing on
   GCS (IMPALA-10562).
 - Some tests are skipped due to issues introduced by /etc/hosts setting
   on GCE instances (IMPALA-10563).

Tests:
 - Compile and create hdfs test data on a GCE instance. Upload test data
   to a GCS bucket. Modify all locations in HMS DB to point to the GCS
   bucket. Remove some hdfs caching params. Run CORE tests.
 - Compile and load snapshot data to a GCS bucket. Run CORE tests.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Reviewed-on: http://gerrit.cloudera.org:8080/17121
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-13 11:20:08 +00:00
Joe McDonnell
6b09612e76 IMPALA-8344: Add support for running the minicluster with S3Guard
Some tests can fail on S3 due to some operations that are eventually
consistent. S3Guard stores extra metadata in a DynamoDB to solve
several consistency issues.

This adds support for running the minicluster on S3 with S3Guard.
S3Guard is configured by the following environment variables:
S3GUARD_ENABLED: defaults to false, set to true to enable S3Guard
S3GUARD_DYNAMODB_TABLE: name of the DynamoDB table to use. This must
  be exclusively owned by this minicluster. The dataload scripts
  initialize this table and will purge entries if the table already
  exists. The table should be in the same region as the S3_BUCKET
  for the minicluster.
S3GUARD_DYNAMODB_REGION - AWS region for S3GUARD_DYNAMODB_TABLE
These environment variables only impact S3 configurations.

The support comes from three pieces:
1. Configuration changes in core-site.xml to add the appropriate
   parameters.
2. Updating dataload to initialize/purge the s3guard dynamodb table
   and import data appropriately.
3. Update tests to manipulate files through the HDFS command line
   rather than through s3 utilities. This takes the filesystem
   utility code for ABFS (which actually calls HDFS command line),
   makes it generic, and uses it for S3Guard.

Testing:
 - Ran multiple rounds of s3 tests
 - Aborted tests in the middle and restarted the s3 tests (to test
   the s3guard reinitialization code)

Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Reviewed-on: http://gerrit.cloudera.org:8080/13020
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Sahil Takiar <stakiar@cloudera.com>
2019-05-23 18:25:46 +00:00
stakiar
a103cb8ee2 IMPALA-6050: Query profiles should indicate storage layer(s) used
This patch updates Impala explain plans so that the Scan Node section clearly
displays which filesystems the Scan Node is reading data from (support
has been added for scans from HDFS, S3, ADLS, and the local filesystem).

Before this patch, if an Impala query scanned a table with partitions
across different storage layers, the explain plan would look like this:

 PLAN-ROOT SINK
 |
 01:EXCHANGE [UNPARTITIONED]
 |
 00:SCAN HDFS [functional.alltypes]
    partitions=24/24 files=24 size=478.45KB

Now the explain plan will look like this:

 PLAN-ROOT SINK
 |
 01:EXCHANGE [UNPARTITIONED]
 |
 00:SCAN S3 [functional.alltypes]
    ADLS partitions=4/24 files=4 size=478.45KB
    HDFS partitions=10/24 files=10 size=478.45KB
    S3 partitions=10/24 files=10 size=478.45KB

The explain plan differentiates "SCAN HDFS" vs "SCAN S3" by using the
root table path. This means that even scans of non-partitioned tables
will see their explain plans change from "SCAN HDFS" to "SCAN
[storage-layer-name]". This change affects explain plans that are stored on
an single storage layer as well: 'partitions=...' will become
'HDFS partitions-...'.

This patch makes several changes to PlannerTest.java so that by default
test files do not validate the value of the storage layer displayed in
the explain plan. This is necessary to support classes such as
S3PlannerTest which run test files against S3. It makes several changes
to impala_test_suite.py as well in order to support validation of
explain plans in test files that run via Python. Specifically, it adds
support for a new substitution variable in test files called
$FILESYSTEM_NAME which is the name of the storage layer the test is
being run against.

Testing:
* Ran core tests
* Added new tests to PlannerTest
* Added ExplainTest to allow for more fine-grained testing of explain
plan logic

Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Reviewed-on: http://gerrit.cloudera.org:8080/12282
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-04-12 07:28:26 +00:00
Sean Mackrory
7a022cf36a IMPALA-7681. Add Azure Blob File System (ADLS Gen2) support.
HADOOP-15407 adds a new FileSystem implementation called "ABFS" for the
ADLS Gen2 service. It's in the hadoop-azure module as a replacement for
WASB. Filesystem semantics should be the same, so skipped tests and
other behavior changes have simply mirrored what is done for ADLS Gen1
by default. Tests skipped on ADLS Gen1 due to eventual consistency of
the Python client can be run against ADLS Gen2.

Change-Id: I5120b071760e7655e78902dce8483f8f54de445d
Reviewed-on: http://gerrit.cloudera.org:8080/11630
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-10-20 06:43:00 +00:00
Dan Hecht
495d778189 IMPALA-7071: make get_fs_path() idempotent
This avoids future errors like IMPALA-7068.

Change-Id: I10c7cbb36119883718e60660db2ac7313f14d388
Reviewed-on: http://gerrit.cloudera.org:8080/10517
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-06-08 00:14:37 +00:00
Tianyi Wang
21d92aacbf IMPALA-7019: Schedule EC as remote & disable failed tests
This patch schedules HDFS EC files without considering locality. Failed
tests are disabled and a jenkins build should succeed with export
ERASURE_COINDG=true.

Testing: It passes core tests.

Cherry-picks: not for 2.x.

Change-Id: I138738d3e28e5daa1718c05c04cd9dd146c4ff84
Reviewed-on: http://gerrit.cloudera.org:8080/10413
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-22 01:10:14 +00:00
Sailesh Mukil
50bd015f2d IMPALA-5333: Add support for Impala to work with ADLS
This patch leverages the AdlFileSystem in Hadoop to allow
Impala to talk to the Azure Data Lake Store. This patch has
functional changes as well as adds test infrastructure for
testing Impala over ADLS.

We do not support ACLs on ADLS since the Hadoop ADLS
connector does not integrate ADLS ACLs with Hadoop users/groups.

For testing, we use the azure-data-lake-store-python client
from Microsoft. This client seems to have some consistency
issues. For example, a drop table through Impala will delete
the files in ADLS, however, listing that directory through
the python client immediately after the drop, will still show
the files. This behavior is unexpected since ADLS claims to be
strongly consistent. Some tests have been skipped due to this
limitation with the tag SkipIfADLS.slow_client. Tracked by
IMPALA-5335.

The azure-data-lake-store-python client also only works on CentOS 6.6
and over, so the python dependencies for Azure will not be downloaded
when the TARGET_FILESYSTEM is not "adls". While running ADLS tests,
the expectation will be that it runs on a machine that is at least
running CentOS 6.6.
Note: This is only a test limitation, not a functional one. Clusters
with older OSes like CentOS 6.4 will still work with ADLS.

Added another dependency to bootstrap_build.sh for the ADLS Python
client.

Testing: Ran core tests with and without TARGET_FILESYSTEM as
'adls' to make sure that all tests pass and that nothing breaks.

Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Reviewed-on: http://gerrit.cloudera.org:8080/6910
Tested-by: Impala Public Jenkins
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
2017-05-25 19:35:24 +00:00
Henry Robinson
34b5f1c416 IMPALA-(3895,3859): Don't log file data on parse errors
Logging file or table data is a bad idea, and doing it by default is
particularly bad. This patch changes HdfsScanNode::LogRowParseError() to
log a file and offset only.

Testing: See rewritten tests.

To support testing this change, we also fix IMPALA-3895, by introducing
a canonical string __HDFS_FILENAME__ that all Hadoop filenames in the ERROR
output are replaced with before comparing with the expected
results. This fixes a number of issues with the old way of matching
filenames which purported to be a regex, but really wasn't. In
particular, we can now match the rest of an ERROR line after the
filename, which was not possible before.

In some cases, we don't want to substitute filenames because the ERROR
output is looking for a very specific output. In that case we can write:

$NAMENODE/<filename>

and this patch will not perform _any_ filename substitutions on ERROR
sections that contain the $NAMENODE string.

Finally, this patch fixes a bug where a test that had an ERRORS section
but no RESULTS section would silently pass without testing anything.

Change-Id: I5a604f8784a9ff7b4bf878f82ee7f56697df3272
Reviewed-on: http://gerrit.cloudera.org:8080/4020
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
2016-08-25 10:20:36 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Sailesh Mukil
6f1fe4ebe7 IMPALA-3577, IMPALA-3486: Partitions on multiple filesystems breaks with S3_SKIP_INSERT_STAGING
The HdfsTableSink usualy creates a HDFS connection to the filesystem
that the base table resides in. However, if we create a partition in
a FS different than that of the base table and set
S3_SKIP_INSERT_STAGING to "true", the table sink will try to write to
a different filesystem with the wrong filesystem connector.

This patch allows the table sink itself to work with different
filesystems by getting rid of a single FS connector and getting a
connector per partition.

This also reenables the multiple_filesystems test and modifies it to
use the unique_database fixture so that parallel runs on the same
bucket do not clash and end up in failures.

This patch also introduces a SECONDARY_FILESYSTEM environment variable
which will be set by the test to allow S3, Isilon and the localFS to
be used as the secondary filesystems.

All jobs with HDFS as the default filesystem need to set the
appropriate environment for S3 and Isilon, i.e. the following:
 - export AWS_SECERT_ACCESS_KEY
 - export AWS_ACCESS_KEY_ID
 - export SECONDARY_FILESYSTEM (to whatever filesystem needs to be
   tested)

TODO: SECONDARY_FILESYSTEM and FILESYSTEM_PREFIX and NAMENODE have a
lot of similarities. Need to clean them up in a following patch.

Change-Id: Ib13b610eb9efb68c83894786cea862d7eae43aa7
Reviewed-on: http://gerrit.cloudera.org:8080/3146
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2016-05-31 23:32:11 -07:00
Sailesh Mukil
ed7f5ebf53 IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems
Previously Impala disallowed LOAD DATA and INSERT on S3. This patch
functionally enables LOAD DATA and INSERT on S3 without making major
changes for the sake of improving performance over S3. This patch also
enables both INSERT and LOAD DATA between file systems.

S3 does not support the rename operation, so the staged files in S3
are copied instead of renamed, which contributes to the slow
performance on S3.

The FinalizeSuccessfulInsert() function now does not make any
underlying assumptions of the filesystem it is on and works across
all supported filesystems. This is done by adding a full URI field to
the base directory for a partition in the TInsertPartitionStatus.
Also, the HdfsOp class now does not assume a single filesystem and
gets connections to the filesystems based on the URI of the file it
is operating on.

Added a python S3 client called 'boto3' to access S3 from the python
tests. A new class called S3Client is introduced which creates
wrappers around the boto3 functions and have the same function
signatures as PyWebHdfsClient by deriving from a base abstract class
BaseFileSystem so that they can be interchangeably through a
'generic_client'. test_load.py is refactored to use this generic
client. The ImpalaTestSuite setup creates a client according to the
TARGET_FILESYSTEM environment variable and assigns it to the
'generic_client'.

P.S: Currently, the test_load.py runs 4x slower on S3 than on
HDFS. Performance needs to be improved in future patches. INSERT
performance is slower than on HDFS too. This is mainly because of an
extra copy that happens between staging and the final location of a
file. However, larger INSERTs come closer to HDFS permformance than
smaller inserts.

ACLs are not taken care of for S3 in this patch. It is something
that still needs to be discussed before implementing.

Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d
Reviewed-on: http://gerrit.cloudera.org:8080/2574
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:49 -07:00
Vlad Berindei
b6c20b2a40 Allow Impala to run against local filesystem.
Allow Impala to start only with a running HMS (and no additional services like HDFS,
HBase, Hive, YARN) and use the local file system.

Skip all tests that need these services, use HDFS caching or assume that multiple impalads
are running.

To run Impala with the local filesystem, set TARGET_FILESYSTEM to 'local' and
WAREHOUSE_LOCATION_PREFIX to a location on the local filesystem where the current user has
permissions since this is the location where the test data will be extracted.

Test coverage (with core strategy) in comparison with HDFS and S3:
HDFS             1348 tests passed
S3               1157 tests passed
Local Filesystem 1161 tests passed

Change-Id: Ic9718c7e0307273382b1cc6baf203ff2fb2acd03
Reviewed-on: http://gerrit.cloudera.org:8080/1352
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Readability: Alex Behm <alex.behm@cloudera.com>
2015-12-05 06:48:32 +00:00
Casey Ching
074e5b4349 Remove hashbang from non-script python files
Many python files had a hashbang and the executable bit set though
they were not intended to be run a standalone script. That makes
determining which python files are actually scripts very difficult.
A future patch will update the hashbang in real python scripts so they
use $IMPALA_HOME/bin/impala-python.

Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba
Reviewed-on: http://gerrit.cloudera.org:8080/599
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2015-08-04 05:26:07 +00:00
ishaan
dbc78aaa2c Enable isilon end to end tests for Impala.
This patch introduces changes to run tests against Isilon, combined with minor cleanup of
the test and client code.
For Isilon, it:
  - Populates the SkipIfIsilon class with appropriate pytest markers.
  - Introduces a new default for the hdfs client in order to connect to Isilon.
  - Cleans up a few test files take the underlying filesystem into account.
  - Cleans up the interface for metadata/test_insert_behaviour, query_test/test_ddl

On the client side, we introduce a wrapper around a few pywebhdfs's methods, specifically:
  - delete_file_dir does not throw an error if the file does not exist.
  - get_file_dir_status automatically strips the leading '/'

Change-Id: Ic630886e253e43b2daaf5adc8dedc0a271b0391f
Reviewed-on: http://gerrit.cloudera.org:8080/370
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
2015-05-27 22:25:12 +00:00
ishaan
09e5eaeda2 Introduce classes for pytest's skipif markers.
This patch encapsulates pytests's skipif markers in classes. It leads to the following
benefits:
  - Provide context and grouping for tests being skipped.
  - As we improve test reporting, annotations will give us a better idea of coverage.

Change-Id: Ib0557fb78c873047c214bb62bb6b045ceabaf0c9
Reviewed-on: http://gerrit.cloudera.org:8080/297
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
Reviewed-on: http://gerrit.cloudera.org:8080/343
2015-04-19 03:09:59 +00:00
ishaan
970a9d7b83 Explicitly use the target filesystem in metadata and authorization tests.
This patch also adds a small utility file for for supporting different filesystems.

Change-Id: I28b1217b0cb901360e28e8d0ba269c9144117d2e
Reviewed-on: http://gerrit.cloudera.org:8080/124
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
2015-03-05 04:57:41 +00:00