impala

mirror of https://github.com/apache/impala.git synced 2026-01-08 03:02:48 -05:00

Files

Matthew Jacobs 25428fdb21 Add support for streaming decompression of gzip text

Compressed text formats currently require entire compressed
files be read into memory to be decompressed in a single call
to the decompression codec. This changes the HdfsTextScanner
to drive gzip in a streaming mode, i.e. produce partial output
as input is consumed.

Change-Id: Id5c0805e18cf6b606bcf27a5df4b5f58895809fd
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5233
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 05c3cc55e7a601d97adc4eebe03f878c68a33e56)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5385

2014-11-23 01:55:55 -08:00

functional

Add support for streaming decompression of gzip text

2014-11-23 01:55:55 -08:00

hive-benchmark

Parquet writer.

2014-01-08 10:48:44 -08:00

tpcds

[CDH5] Modified TPCDS schema and queries to match Impala TPCDS kit

2014-08-08 02:20:40 -07:00

tpch

[CDH5] Modified TPCH queries to match the specification

2014-10-29 22:07:33 -07:00

README

Move functional data loading to new framework + initial changes for workload directory structure

2014-01-08 10:44:18 -08:00

README

This directory contains Impala test data sets. The directory layout is structured as follows:

datasets/
   <data set>/<data set>_schema_template.sql
   <data set>/<data files SF1>/data files
   <data set>/<data files SF2>/data files

Where SF is the scale factor controlling data size. This allows for scaling the same schema to
different sizes based on the target test environment.