impala

mirror of https://github.com/apache/impala.git synced 2026-01-04 18:00:57 -05:00

Files

Skye Wanderman-Milne de531e15bd IMPALA-694: Allow Impala to read files produced by parquet-mr version <= 1.2.8

parquet-mr had a bug where it didn't include the dictionary page's
header in the total column size. We now compensate for this by
detecting these files and padding the scan range length. This required
changing how the scanner detects when it's finished: it now counts the
number of rows rather than checking eosr (since the scan range may be
longer than the column).

Change-Id: Id9933808b965003c0c3b3aa78c32fe29a0c4bcbe
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1097
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins

2014-01-08 10:54:27 -08:00

functional

IMPALA-694: Allow Impala to read files produced by parquet-mr version <= 1.2.8

2014-01-08 10:54:27 -08:00

hive-benchmark

Parquet writer.

2014-01-08 10:48:44 -08:00

tpcds

Don't load tpcds.store_sales_unpartitioned into any file format except text.

2014-01-08 10:52:43 -08:00

tpch

Refactor testing framework to generate Avro tables.

2014-01-08 10:48:45 -08:00

README

Move functional data loading to new framework + initial changes for workload directory structure

2014-01-08 10:44:18 -08:00

README

This directory contains Impala test data sets. The directory layout is structured as follows:

datasets/
   <data set>/<data set>_schema_template.sql
   <data set>/<data files SF1>/data files
   <data set>/<data files SF2>/data files

Where SF is the scale factor controlling data size. This allows for scaling the same schema to
different sizes based on the target test environment.