Commit Graph

7 Commits

Author SHA1 Message Date
Fucun Chu
157086cb80 IMPALA-10771: Add Tencent COS support
This patch adds support for COS(Cloud Object Storage). Using the
hadoop-cos, the implementation is similar to other remote FileSystems.

New flags for COS:
- num_cos_io_threads: Number of COS I/O threads. Defaults to be 16.

Follow-up:
- Support for caching COS file handles will be addressed in
   IMPALA-10772.
- test_concurrent_inserts and test_failing_inserts in
   test_acid_stress.py are skipped due to slow file listing on
   COS (IMPALA-10773).

Tests:
 - Upload hdfs test data to a COS bucket. Modify all locations in HMS
   DB to point to the COS bucket. Remove some hdfs caching params.
   Run CORE tests.

Change-Id: Idce135a7591d1b4c74425e365525be3086a39821
Reviewed-on: http://gerrit.cloudera.org:8080/17503
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-12-08 16:32:02 +00:00
stiga-huang
2dfc68d852 IMPALA-7712: Support Google Cloud Storage
This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.

New flags for GCS:
 - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.

Follow-up:
 - Support for spilling to GCS will be addressed in IMPALA-10561.
 - Support for caching GCS file handles will be addressed in
   IMPALA-10568.
 - test_concurrent_inserts and test_failing_inserts in
   test_acid_stress.py are skipped due to slow file listing on
   GCS (IMPALA-10562).
 - Some tests are skipped due to issues introduced by /etc/hosts setting
   on GCE instances (IMPALA-10563).

Tests:
 - Compile and create hdfs test data on a GCE instance. Upload test data
   to a GCS bucket. Modify all locations in HMS DB to point to the GCS
   bucket. Remove some hdfs caching params. Run CORE tests.
 - Compile and load snapshot data to a GCS bucket. Run CORE tests.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Reviewed-on: http://gerrit.cloudera.org:8080/17121
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-13 11:20:08 +00:00
Csaba Ringhofer
5dccf8024b IMPALA-10171: Create query options for local time related flags
convert_legacy_hive_parquet_utc_timestamps and
use_local_tz_for_unix_timestamp_conversions were controllable only by
flags until now. After this change the old flags are only used on the
Coordinator to set the defaults for the query options.

If default_query_options also sets these query options, it will
take precedence over the old flags.

Possible follow up work:
- the old flags could be deprecated, as default_query_options
  can be used to set this on a server level
- testing these functionalities no longer needs custom cluster
  tests - rewriting the existing tests could speed up test
  execution

Testing:
- expr-test was rewritten to use the query option instead of the flag
- extended the custom cluster tests for the old flags to also use the
  query options
- ran related tests

Change-Id: I5c4252d1c8f8e224c1d0e8234f09374bcc0c6f68
Reviewed-on: http://gerrit.cloudera.org:8080/16469
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-18 22:12:47 +00:00
Tim Armstrong
48e08e42ba IMPALA-9445: skip test_local_tz_conversion for hbase
This skips the test for filesystems where we don't start
HBase. It's a bit unfortunate that we have to disable
all of this test, so this is a stopgap to get builds
green.

Change-Id: I9fb603ebe0a5820d2eed0b6ec5dca2980fd5521d
Reviewed-on: http://gerrit.cloudera.org:8080/15338
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2020-03-02 21:31:42 +00:00
Csaba Ringhofer
2c54dbe225 IMPALA-9385: Unix time conversion cleanup + ORC fix
ORC scanner uses TimestampValue::FromUnixTimeNanos() to convert
sec + nano representation to Impala's TimestampValue (day + nano).
FromUnixTimeNanos was affected by flag
use_local_tz_for_unix_timestamp_conversions, while that global option
should not affect ORC. By default there was no conversion, but if the
flag is 1, then timestamps were interpreted as UTC and converted to
local time.

This could be solved by creating a UTC version of FromUnixTimeNanos,
but I decided to change the interface in the hope of making To/From
timestamp functions less confusing.

Changes:
- Fixed the bug by passing UTC as timezone in the ORC scanner.
- Changed the interface of these TimestampValue functions to expect
  a timezone pointer, interpret null as UTC and skip conversion. It
  would be also possible to pass the actual UTC timezone and check
  for this in the functions, but I guess it is easier to optimize
  the inlined functions this way.
- Moved the checking of use_local_tz_for_unix_timestamp_conversions to
  RuntimeState and added property time_zone_for_unix_time_conversions()
  to return the timezone to use in Unix time conversions. This made
  TimestampValue's interface clearer and makes it easy to replace the
  flag with a query option if we want to.
- Changed RuntimeState and the Parquet scanner to skip timezone
  conversion if convert_legacy_hive_parquet_utc_timestamps=1 but the
  timezone is UTC. This allows users to avoid the performance penalty
  of this flag by setting query option timezone to UTC in their
  session (IMPALA-7557). CCTZ is not good at this, actually
  conversions are slower with fixed offset timezones (including UTC)
  than with timezones that have DST/historical rule changes.

Postponed changes:
- Didn't remove the UTC versions of the functions yet, as that would
  require changing (and possibly rethinking) several BE tests and
  benchmarks (IMPALA-9409).

Tests:
- Added regression test for Orc and other file formats to
  check that they are not affected by this flag.
- Extended test_hive_parquet_timestamp_conversion.py to cover the case
  when convert_legacy_hive_parquet_utc_timestamps=1 and timezone=UTC.
  Also did some cleanup there to use query option timezone instead of
  env var TZ.

Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Reviewed-on: http://gerrit.cloudera.org:8080/15222
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-22 02:02:56 +00:00
Csaba Ringhofer
dc32bf7703 IMPALA-7362: Add query option to set timezone
This change adds a new query option "timezone" which
defines the timezone used for utc<->local conversions.
The main goal is to simplify testing, but I think that
some users may also find it useful so it is added as a
"general" query option.

Examples:
set timezone=UTC;
set timezone="Europe/Budapest"

The timezones are validated, but as query options are not
sent to the coordinator immediately, the error checking
will only happen when running a query.

Leading/trailing " and 'characters are stripped because the
/ character cannot be entered unquoted in some contexts.

Currently the timezone has effect in the following cases:
-function now()
-conversions between unix time and timestamp if flag
 use_local_tz_for_unix_timestamp_conversions is true
-reading parquet timestamps written by Hive if flag
 convert_legacy_hive_parquet_utc_timestamps is true

In the near future Parquet timestamps's isAdjustedToUTC
property will be supported, which will decide whether
to do utc->local conversion on a per file+column basis.
This conversion will be also affected.

Testing:
- Extended test_local_tz_conversion.py to actually
  test utc<->local conversion. Until now the effect
  of flag use_local_tz_for_unix_timestamp_conversions
  was practically untested.
- Added a shell test to check that the default of the
  query option is the system's timezone.
- Added a shell test to check timezone validation.

Change-Id: I73de86eff096e1c581d3b56a0d9330d686f77272
Reviewed-on: http://gerrit.cloudera.org:8080/11064
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-03 17:45:25 +00:00
Matthew Jacobs
7a1ff1e5e9 IMPALA-5539: Fix Kudu timestamp with -use_local_tz_for_unix_ts
The -use_local_tz_for_unix_timestamp_conversion flag exists
to specify if TIMESTAMPs should be interpreted as localtime
or UTC when converting to/from Unix time via builtins:
  from_unixtime(bigint unixtime)
  unix_timestamp(string datetime[, ...])
  unix_timestamp(timestamp datetime)

However, the KuduScanner was calling into code that, when
the gflag above was set, interpreted Unix times as local
time.  Unfortunately the write path (KuduTableSink) and some
FE TIMESTAMP code (see KuduUtil.java) did not have this
behavior, i.e. we were handling the gflag inconsistently.

Tests:
* Adds a custom cluster test to run Kudu test cases with
  -use_local_tz_for_unix_timestamp_conversion.
* Adds tests for the new builtin
  unix_micros_to_utc_timestamp() which run in a custom
  cluster test (added test_local_tz_conversion.py) as well
  as in the regular tests (added to test_exprs.py).

Change-Id: I423a810427353be76aa64442044133a9a22cdc9b
Reviewed-on: http://gerrit.cloudera.org:8080/7311
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-07-19 22:17:13 +00:00