This patch adds an end-to-end test to validate and characterize HMS'
behavior with respect to external table creation after HIVE-25569 via
which a user is allowed to create an external table associated with a
single file.
Change-Id: Ia4f57f07a9f543c660b102ebf307a6cf590a6784
Reviewed-on: http://gerrit.cloudera.org:8080/18033
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Aman Sinha <amsinha@cloudera.com>
With HIVE-24920, the HMS runs in a mode that prohibits
creating a managed directory where the default location
already exists. Some Impala test helper functions copied
files into a particular location and then created a
table without specifying the location in the create statement.
This is no longer possible.
This changes the helper functions in test/common/file_utils.py
to create the table before pulling files in.
Tests:
- Ran a core job against a Hive with HIVE-24920 and
verified that the failures due to changes in
behavior are gone.
- Ran a core job against the current Hive
Change-Id: Idfe5468a0b9e1025ec7a0ad3cdce4793f35ca7ba
Reviewed-on: http://gerrit.cloudera.org:8080/17956
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Vihang Karajgaonkar <vihang@cloudera.com>
ORC-189 and ORC-666 added support for a new timestamp type
'TIMESTMAP WITH LOCAL TIMEZONE' to the Orc library.
This patch adds support for reading such timestamps with Impala.
These are UTC-normalized timestamps, therefore we convert them
to local timezone during scanning.
Testing:
* added test for CREATE TABLE LIKE ORC
* added scanner tests to test_scanners.py
Change-Id: Icb0c6a43ebea21f1cba5b8f304db7c4bd43967d9
Reviewed-on: http://gerrit.cloudera.org:8080/17347
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds the 'host:port' to all links on the webserver. This
will facilitate proxying connections to the debug webui through Knox
by allowing us to create rewrite rules that do the transform:
<a href="scheme://host:port/path">...</a>
=>
<a href="<knox-host>/topology/impalaui/path?scheme-scheme&host=host&port=port">...</a>
which allows us to have a single IMPALAUI Knox service that can proxy
connections to any impalad/statestored/catalogd webui in a cluster.
Note that this works because currently all of the links on Impala's
webui are within the same webserver (it would also be possible to add
links to other Impala daemon webuis within a cluster, eg. if we wanted
to add webui links on the /backends page). If we ever need to add
links to external pages, the Knox service definition will likely need
to be modified.
This patch also adds hidden fields to all forms for the scheme, host,
and port value, so that GET requests from forms will result in the
same form as the transformed url shown above.
Testing:
- Ran the webserver and manually clicked around on a bunch of links to
ensure everything works as expected.
- Ran in a cluster and verified the new Knox service defintion works
as intended with this change.
- Added a test that uses a regex to check for template files that
don't conform to the requirements.
Change-Id: If1195709a0f21f39d9a1e484880a0c46c9967ed2
Reviewed-on: http://gerrit.cloudera.org:8080/14151
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
test_scanners.py has seen several flaky failures on
s3 due to eventual consistency. The symptom is Impala
being unable to read a file that it just loaded to s3.
A large number of tables used in test_scanners.py
use the file_utils helper functions for creating
the tables. These follow the pattern:
1. Copy files to temporary directory in HDFS/S3/etc
2. Create table
3. Run LOAD DATA to move the files to the table
In step #3, LOAD DATA gets the metadata for the
table before it runs the move statement on the
files. Subsequent queries on the table will not
need to reload metadata and can access the file
quickly after the move.
This changes the ordering to put the files in place
before loading metadata. This may improve the
likelihood that the filesystem is consistent by
the time we read it. Specifically, we now do:
1. Put the files in directory that the table
will use when it is created.
2. Create table
Neither of these steps load metadata, so the next
query that runs will load metadata.
Change-Id: Id042496beabe0d0226b347e0653b820fee369f4e
Reviewed-on: http://gerrit.cloudera.org:8080/11959
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch simply adds a warning message to the log when the
authorization_policy_file run-time flag is used. Sentry has
deprecated the use of policy files and they do not support
user level privileges which are required for object ownership.
Here is the Jira where it will be removed. SENTRY-1922
Test:
- Added custom cluster test to validate logs
- Ran all custom cluster tests
Change-Id: Ibbb13f3ef1c3a00812c180ecef022ea638c2ebc7
Reviewed-on: http://gerrit.cloudera.org:8080/11502
Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We had quite a few tests that created a table and used
"hdfs dfs -copyFromLocal" to copy data files to the
warehouse directory for this table.
This operation needs some boilerplate code that I
refactored to the new functions called
create_table_from_parquet() and
create_table_and_copy_files().
Change-Id: Ie00a4561825facf8abe2e8e74a6b6e93194f416f
Reviewed-on: http://gerrit.cloudera.org:8080/11127
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>