impala

mirror of https://github.com/apache/impala.git synced 2026-01-07 18:02:33 -05:00

Files

casey 87b9fac2ad IMPALA-1658: Add compatibility flag for Hive-Parquet-Timestamps

No changes to writing were made. No changes to reading Impala written
files were made.

Hive writes TIMESTAMP values to parquet files differently than Impala
does. Hive converts the value from local time to UTC before writing;
Impala does not. This change adds a startup flag that will convert UTC
to local when reading files written by Hive.

The Hive-file detection actually checks for "parquet-mr" (which is the
library Hive uses) in the file metadata. A slight possibility exists
that TIMESTAMP values written by something other than Hive but also
using parquet-mr may become incorrect. The possibility should be very
small because TIMESTAMP values are stored and encoded in a non-standard
way other applications are unlikely to be aware of.

Flags from be/src/exec/hdfs-parquet-scanner.cc:
  -convert_legacy_hive_parquet_utc_timestamps (When true, TIMESTAMPs
    read from files written by Parquet-MR (used by Hive) will be
    converted from UTC to local time. Writes are unaffected.) type: bool
    default: false

Change-Id: I79a499fe24049b7025ee2dd76c9c3e07010d346a
Reviewed-on: http://gerrit.cloudera.org:8080/35
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins

2015-02-11 13:28:17 +00:00

functional

IMPALA-1658: Add compatibility flag for Hive-Parquet-Timestamps

2015-02-11 13:28:17 +00:00

hive-benchmark

Parquet writer.

2014-01-08 10:48:44 -08:00

tpcds

[CDH5] Modified TPCDS schema and queries to match Impala TPCDS kit

2014-08-08 02:20:40 -07:00

tpch

[CDH5] Modified TPCH queries to match the specification

2014-10-29 22:07:33 -07:00

README

Move functional data loading to new framework + initial changes for workload directory structure

2014-01-08 10:44:18 -08:00

README

This directory contains Impala test data sets. The directory layout is structured as follows:

datasets/
   <data set>/<data set>_schema_template.sql
   <data set>/<data files SF1>/data files
   <data set>/<data files SF2>/data files

Where SF is the scale factor controlling data size. This allows for scaling the same schema to
different sizes based on the target test environment.