impala/testdata at fb6d96e001c1a04475a8fd01f757dd0605cf3279 - impala - Gitea: Git with a cup of tea

jprdonnelly/impala

mirror of https://github.com/apache/impala.git synced 2026-01-07 18:02:33 -05:00

Files

History

skyyws fb6d96e001 IMPALA-9741: Support querying Iceberg table by impala

This patch mainly realizes the querying of iceberg table through impala,
we can use the following sql to create an external iceberg table:
    CREATE EXTERNAL TABLE default.iceberg_test (
        level string,
        event_time timestamp,
        message string,
    )
    STORED AS ICEBERG
    LOCATION 'hdfs://xxx'
    TBLPROPERTIES ('iceberg_file_format'='parquet');
Or just including table name and location like this:
    CREATE EXTERNAL TABLE default.iceberg_test
    STORED AS ICEBERG
    LOCATION 'hdfs://xxx'
    TBLPROPERTIES ('iceberg_file_format'='parquet');
'iceberg_file_format' is the file format in iceberg, currently only
support PARQUET, other format would be supported in the future. And
if you don't specify this property in your SQL, default file format
is PARQUET.

We achieved this function by treating the iceberg table as normal
unpartitioned hdfs table. When querying iceberg table, we pushdown
partition column predicates to iceberg to decide which data files
need to be scanned, and then transfer this information to BE to
do the real scan operation.

Testing:
- Unit test for Iceberg in FileMetadataLoaderTest
- Create table tests in functional_schema_template.sql
- Iceberg table query test in test_scanners.py

Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Reviewed-on: http://gerrit.cloudera.org:8080/16143
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

2020-09-06 02:12:07 +00:00

..

Fix IMP-297

2014-01-08 10:46:44 -08:00

AllTypesErrorNoNulls

…

IMPALA-1136, IMPALA-2161: Skip \u0000 characters when dealing Avro schemas

2015-09-02 00:37:28 +00:00

avro_schema_resolution

IMPALA-9068: Use different directories for external vs managed warehouse

2020-01-24 17:29:15 +00:00

IMPALA-8198: DATE: Read from avro.

2019-09-27 17:18:35 +00:00

bad_parquet_data

IMPALA-3745: parquet invalid data handling

2016-06-15 21:33:39 -07:00

Add HdfsLzoTextScanner

2014-01-08 10:46:35 -08:00

Add support for streaming decompression of gzip text

2014-11-23 01:55:55 -08:00

IMPALA-10064: Support constant propagation for eligible range predicates

2020-09-02 22:57:55 +00:00

IMPALA-3695: Remove KUDU_IS_SUPPORTED

2020-06-18 01:11:18 +00:00

Convert dataload hdfs copy commands to LOAD DATA statements

2020-02-24 21:22:18 +00:00

ComplexTypesTbl

IMPALA-6503: Support reading complex types from ORC

2019-03-08 04:39:08 +00:00

compressed_formats

IMPALA-1619: Support 64-bit allocations.

2016-07-08 15:42:09 -07:00

CustomerMultiBlock

IMPALA-4993: extend dictionary filtering to collections

2018-01-19 20:37:25 +00:00

IMPALA-9741: Support querying Iceberg table by impala

2020-09-06 02:12:07 +00:00

IMPALA-9741: Support querying Iceberg table by impala

2020-09-06 02:12:07 +00:00

…

…

ImpalaDemoDataset

Test data loading framework improvements

2014-01-08 10:46:49 -08:00

…

…

LineItemMultiBlock

IMPALA-5717: Support for reading ORC data files

2018-04-11 05:13:02 +00:00

IMPALA-3918: Remove Cloudera copyrights and add ASF license header

2016-08-09 08:19:41 +00:00

max_nesting_depth

Nested Types: Enforce and test maximum nesting depth of 100.

2015-10-05 11:30:54 -07:00

multi_compression_parquet_data

IMPALA-5448: fix invalid number of splits reported in Parquet scan node

2017-10-10 01:30:33 +00:00

IMPALA-8095: Detailed expression cardinality tests

2019-02-09 02:56:52 +00:00

IMPALA-13: Use SSE string functions that take an explicit length

2014-04-11 11:16:24 -07:00

parquet_nested_types_encodings

IMPALA-4725: Query option to control Parquet array resolution.

2017-03-09 05:07:44 +00:00

parquet_schema_resolution

IMPALA-3786: Replace "cloudera" with "apache" (part 2)

2016-09-29 21:14:13 +00:00

src/main/java/org/apache/impala/datagenerator

IMPALA-7061: Rework HBase splitting and assignment

2018-05-25 00:28:18 +00:00

Bump FE pom to Java 8 source/target version

2018-08-29 23:10:45 +00:00

TblWithRaggedColumns

IMP-380 handle '\r' at end of row.

2014-01-08 10:46:14 -08:00

IMP-232: Parallel INSERT OVERWRITE

2014-01-08 10:45:04 -08:00

…

tinytable_seq_snap

IMPALA-362: impalad hangs when read sequence file without contents

2014-01-08 10:50:49 -08:00

IMPALA-8043: Fix BE test failures related to SystemV timezones.

2019-01-15 17:04:55 +00:00

IMPALA-5050: Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS from Parquet

2018-11-14 20:16:14 +00:00

IMPALA-4171: Remove JAR from repo.

2016-09-22 02:00:50 +00:00

UnsupportedTypes

IMPALA-3812: Fix error message for unsupported types

2016-11-17 05:31:34 +00:00

IMPALA-9741: Support querying Iceberg table by impala

2020-09-06 02:12:07 +00:00

__init__.py

CDH-18416: Don't inline ReadWriteUtil::ReadZLong()

2014-04-28 15:58:15 -07:00

.gitignore

Update .gitignore

2018-10-26 22:19:35 +00:00

pom.xml

IMPALA-9192: Move Avro-Java and Parquet dependencies to the CDP version

2020-06-10 04:12:39 +00:00