impala

mirror of https://github.com/apache/impala.git synced 2026-01-03 15:00:52 -05:00

Files

stiga-huang 192cd96d9e IMPALA-5448: fix invalid number of splits reported in Parquet scan node

Parquet splits with multi columns are marked as completed by using
HdfsScanNodeBase::RangeComplete(). It duplicately counts the file types
as column codec types. Thus the number of parquet splits are the real count
multiplies number of materialized columns.

Furthermore, according to the Parquet definition, it allows mixed compression
codecs on different columns. This's handled in this patch as well. A parquet file
using gzip and snappy compression codec will be reported as:
	FileFormats: PARQUET/(GZIP,SNAPPY):1

This patch introduces a compression types set for the above cases.

Testing:
Add end-to-end tests handling parquet files with all columns compressed in
snappy, and handling parquet files with multi compression codec.

Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1
Reviewed-on: http://gerrit.cloudera.org:8080/8147
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins

2017-10-10 01:30:33 +00:00

functional-planner

IMPALA-5955: Use totalSize tblproperty instead of rawDataSize.

2017-09-22 03:39:04 +00:00

functional-query

IMPALA-5448: fix invalid number of splits reported in Parquet scan node

2017-10-10 01:30:33 +00:00

hive-benchmark

Refactor testing framework to generate Avro tables.

2014-01-08 10:48:45 -08:00

perf-regression

IMPALA-3311: fix string data coming out of aggs in subplans

2016-05-12 23:06:36 -07:00

targeted-perf

IMPALA-3200: more buffer pool end-to-end tests

2017-08-07 00:57:46 +00:00

targeted-stress

IMPALA-4674: Part 2: port backend exec to BufferPool

2017-08-05 01:03:02 +00:00

tpcds

IMPALA-5376: Loads all TPC-DS tables

2017-05-27 05:19:53 +00:00

tpcds-insert

[CDH5] Modified TPCDS schema and queries to match Impala TPCDS kit

2014-08-08 02:20:40 -07:00

tpch

IMPALA-5681: release reservation from blocking operators

2017-08-17 20:17:48 +00:00

tpch_nested

IMPALA-5617: Include full workload name in tpch_nested query filenames

2017-08-31 01:08:40 +00:00

README

Move functional data loading to new framework + initial changes for workload directory structure

2014-01-08 10:44:18 -08:00

README

This directory contains Impala test workloads. The directory layout for the workloads should follow:

workloads/
   <data set name>/<data set name>_dimensions.csv  <- The test dimension file
   <data set name>/<data set name>_core.csv  <- A test vector file
   <data set name>/<data set name>_pairwise.csv
   <data set name>/<data set name>_exhaustive.csv
   <data set name>/queries/<query test>.test <- The queries for this workload