mirror of
https://github.com/apache/impala.git
synced 2026-01-08 12:02:54 -05:00
IMPALA-9515: Full ACID Milestone 3: Read support for "original files"
"Original files" are files that don't have full ACID schema. We can see such files if we upgrade a non-ACID table to full ACID. Also, the LOAD DATA statement can load non-ACID files into full ACID tables. So such files don't store special ACID columns, that means we need to auto-generate their values. These are (operation, originalTransaction, bucket, rowid, and currentTransaction). With the exception of 'rowid', all of them can be calculated based on the file path, so I add their values to the scanner's template tuple. 'rowid' is the ordinal number of the row inside a bucket inside a directory. For now Impala only allows one file per bucket per directory. Therefore we can generate row ids for each file independently. Multiple files in a single bucket in a directory can only be present if the table was non-transactional earlier and we upgraded it to full ACID table. After the first compaction we should only see one original file per bucket per directory. In HdfsOrcScanner we calculate the first row id for our split then the OrcStructReader fills the rowid slot with the proper values. Testing: * added e2e tests to check if the generated values are correct * added e2e test to reject tables that have multiple files per bucket * added unit tests to the new auxiliary functions Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953 Reviewed-on: http://gerrit.cloudera.org:8080/16001 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
committed by
Impala Public Jenkins
parent
94c1b6354b
commit
930264afbd
@@ -374,6 +374,31 @@ TBLPROPERTIES("hbase.table.name" = "functional_hbase.hbasealltypeserror");
|
||||
---- DATASET
|
||||
functional
|
||||
---- BASE_TABLE_NAME
|
||||
alltypes_promoted
|
||||
---- PARTITION_COLUMNS
|
||||
year int
|
||||
month int
|
||||
---- COLUMNS
|
||||
id int COMMENT 'Add a comment'
|
||||
bool_col boolean
|
||||
tinyint_col tinyint
|
||||
smallint_col smallint
|
||||
int_col int
|
||||
bigint_col bigint
|
||||
float_col float
|
||||
double_col double
|
||||
date_string_col string
|
||||
string_col string
|
||||
timestamp_col timestamp
|
||||
---- DEPENDENT_LOAD_HIVE
|
||||
INSERT INTO TABLE {db_name}{db_suffix}.{table_name} SELECT * FROM {db_name}{db_suffix}.alltypes;
|
||||
ALTER TABLE {db_name}{db_suffix}.{table_name} SET tblproperties('EXTERNAL'='FALSE','transactional'='true');
|
||||
---- TABLE_PROPERTIES
|
||||
transactional=false
|
||||
====
|
||||
---- DATASET
|
||||
functional
|
||||
---- BASE_TABLE_NAME
|
||||
hbasecolumnfamilies
|
||||
---- HBASE_COLUMN_FAMILIES
|
||||
0
|
||||
|
||||
Reference in New Issue
Block a user