impala

mirror of https://github.com/apache/impala.git synced 2026-02-01 21:00:29 -05:00

Author	SHA1	Message	Date
Zoltan Borok-Nagy	3f51a6a761	IMPALA-11051: Add support for 'void' Iceberg partition transform Iceberg recently added a new partition transform called 'void': https://iceberg.apache.org/#spec/#partition-transforms This patch adds support for this transform. When the user wants to drop a column from the partition spec, the VOID transform should be used instead of just omitting the column. Simply omitting the column might cause problems when the metadata table is being queried (currently only supported by other engines). Testing * added SHOW CREATE TABLE test * added e2e test Change-Id: Icbe11d56cdeb82aaadedfdb3ad61dd7cc4c2f4d0 Reviewed-on: http://gerrit.cloudera.org:8080/18102 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-01-17 15:42:26 +00:00
Zoltan Borok-Nagy	61964882d1	IMPALA-10914: Consistently schedule scan ranges for Iceberg tables Before this patch Impala inconsistently scheduled scan ranges for Iceberg tables on HDFS, in local catalog mode. It did so because LocalIcebergTable reloaded all the files descriptors, and the HDFS block locations were not consistent across the reloads. Impala's scheduler uses the block location list for scan range assignment, hence the assignments were inconsistent between queries. This has a negative effect on caching and hence hit performance quite badly. It is redundant and expensive to reload file descriptors for each query in local catalog mode. This patch extends the GetPartialInfo() RPC with Iceberg-specific snapshot information. It means that the coordinator is now able to fetch Iceberg data file descriptors from the CatalogD. This way scan range assignment becomes consistent because we reuse the same file descriptors with the same block location information. Fixing the above revealed another bug. Before this patch we didn't handle self-events of Iceberg tables. When an Iceberg table is stored in the HiveCatalog it means that Iceberg will update the HMS table on modifications because it needs to update table property 'metadata_location' (this points to the new snapshot file). Then Catalogd processes these modifications again when they arrive via the event notification mechanism. I fixed this by creating Iceberg transactions in which I set the catalog service ID and new catalog version for the Iceberg table. Since we are using transactions now Iceberg has to embed all table modifications in a single ALTER TABLE request to HMS, and detect the corresponding alter event later via the aforementioned catalog service ID and version. Testing: * added e2e test for the scan range assignment * added e2e test for detecting self-events Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5 Reviewed-on: http://gerrit.cloudera.org:8080/17857 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Qifan Chen <qchen@cloudera.com>	2021-10-04 17:34:56 +00:00
ShikhaAsrani	b1ca089446	IMPALA-10797: Frontend changes to enable 'stored as JSONFILE' This change will allow usage of commands that do not require reading the Json File like: - Create Table <Table> stored as JSONFILE - Show Create Table <Table> - Describe <Table> Changes: - Added JSON as FileFormat to thrift and HdfsFileFormat. - Allowing Sql keyword 'jsonfile' and mapping it to JSON format. - Adding JSON serDe. - JsonFiles have input format same as TextFile, so we need to use SerDe library in use to differentiate between the two formats. Overloaded the functions querying File Format based on input format to consider serDe library too. - Added tests for 'Create Table' and 'Show Create Table' commmands Pending Changes: - test for Describe command - to be added with backend changes. Change-Id: I5b8cb2f59df3af09902b49d3bdac16c19954b305 Reviewed-on: http://gerrit.cloudera.org:8080/17727 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-08-13 16:04:19 +00:00
Zoltan Borok-Nagy	a1d5891c57	IMPALA-10741: Set engine.hive.enabled=true table property for Iceberg tables Hive relies on engine.hive.enabled=true table property to be set for Iceberg tables. Without it Hive overwrites table metadata with different storage handler, SerDe/Input/OutputFormatter when it writes the table, making it unusable. With this patch Impala sets this table property during table creation. Testing: * updated show-create-table.test * tested Impala/Hive interop manually Change-Id: I6aa0240829697a27f48d0defcce48920a5d6f49b Reviewed-on: http://gerrit.cloudera.org:8080/17750 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-08-05 15:45:24 +00:00
Attila Jeges	fabe994d1f	IMPALA-10627: Use standard parquet-related Iceberg table properties This patch adds support for the following standard Iceberg properties: write.parquet.compression-codec: Parquet compression codec. Supported values are: NONE, GZIP, SNAPPY (default value), LZ4, ZSTD. The table property will be ignored if COMPRESSION_CODEC query option is set. write.parquet.compression-level: Parquet compression level. Used with ZSTD compression only. Supported range is [1, 22]. Default value is 3. The table property will be ignored if COMPRESSION_CODEC query option is set. write.parquet.row-group-size-bytes : Parquet row group size in bytes. Supported range is [8388608, 2146435072] (8MB - 2047MB). The table property will be ignored if PARQUET_FILE_SIZE query option is set. If neither the table property nor the PARQUET_FILE_SIZE query option is set, the way Impala calculates row group size will remain unchanged. write.parquet.page-size-bytes: Parquet page size in bytes. Used for PLAIN encoding. Supported range is [65536, 1073741824] (64KB - 1GB). If the table property is unset, the way Impala calculates page size will remain unchanged. write.parquet.dict-size-bytes: Parquet dictionary page size in bytes. Used for dictionary encoding. Supported range is [65536, 1073741824] (64KB - 1GB). If the table property is unset, the way Impala calculates dictionary page size will remain unchanged. This patch also renames 'iceberg.file_format' table property to 'write.format.default' which is the standard Iceberg name for the table property. Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23 Reviewed-on: http://gerrit.cloudera.org:8080/17654 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-07-20 23:58:06 +00:00
Zoltan Borok-Nagy	62028d00e6	IMPALA-10802: test_show_create_table and test_catalogs fails with Iceberg syntax error Two Iceberg commits got into master branch in parallel. One of them modified the DDL syntax, the other one added some tests. They were correct on their own, but mixing the two causes test failures. The affected tests have been updated. Change-Id: Id3cf6ff04b8da5782df2b84a580cdbd4a4a16d06 Reviewed-on: http://gerrit.cloudera.org:8080/17689 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-07-19 22:16:18 +00:00
Zoltan Borok-Nagy	474e022fda	IMPALA-10626: Add support for Iceberg's Catalogs API Iceberg recently switched to use its Catalogs class to define catalog and table properties. Catalog information is stored in a configuration file such as hive-site.xml. And the table properties contain information about which catalog is being used and what is the Iceberg table id. E.g. in the Hive conf we can have the following properties to define catalogs: iceberg.catalog.<catalog_name>.type = hadoop iceberg.catalog.<catalog_name>.warehouse = somelocation or iceberg.catalog.<catalog_name>.type = hive And at the table level we can have the following: iceberg.catalog = <catalog_name> name = <table_identifier> Table property 'iceberg.catalog' refers to a Catalog defined in the configuration file. This is in contradiction with Impala's current behavior where we are already using 'iceberg.catalog', and it can have the following values: * hive.catalog for HiveCatalog * hadoop.catalog for HadoopCatalog * hadoop.tables for HadoopTables To be backward-compatible and also support the new Catalogs properties Impala still recognizes the above special values. But, from now Impala doesn't define 'iceberg.catalog' by default. 'iceberg.catalog' being NULL means HiveCatalog for both Impala and Iceberg's Catalogs API, hence for Hive and Spark as well. If 'iceberg.catalog' has a different value than the special values it indicates that Iceberg's Catalogs API is being used, so Impala will try to look up the catalog configuration from the Hive config file. Testing: * added SHOW CREATE TABLE tests * added e2e tests that create/insert/drop Iceberg tables with Catalogs * manually tested interop behavior with Hive Change-Id: I5dfa150986117fc55b28034c4eda38a736460ead Reviewed-on: http://gerrit.cloudera.org:8080/17466 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-07-15 16:47:35 +00:00
Zoltan Borok-Nagy	d0749d59de	IMPALA-10732: Use consistent DDL for specifying Iceberg partitions Currently we have a DDL syntax for defining Iceberg partitions that differs from SparkSQL: https://iceberg.apache.org/spark-ddl/#partitioned-by E.g. Impala is using the following syntax: CREATE TABLE ice_t (i int, s string, ts timestamp, d date) PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR) STORED AS ICEBERG; The same in Spark is: CREATE TABLE ice_t (i int, s string, ts timestamp, d date) USING ICEBERG PARTITIONED BY (bucket(5, i), months(ts), years(d)) HIVE-25179 added the following syntax for Hive: CREATE TABLE ice_t (i int, s string, ts timestamp, d date) PARTITIONED BY SPEC (bucket(5, i), months(ts), years(d)) STORED BY ICEBERG; I.e. the same syntax as Spark, but adding the keyword "SPEC". This patch makes Impala use Hive's syntax, i.e. we will also use the PARTITIONED BY SPEC clause + the unified partition transform syntax. Testing: * existing tests has been rewritten with the new syntax Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62 Reviewed-on: http://gerrit.cloudera.org:8080/17575 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-07-15 15:15:07 +00:00
Zoltan Borok-Nagy	6c6b0ee869	IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables This patch adds support for CREATE TABLE AS SELECT statements for Iceberg tables. CTAS statements work like the following in Impala: 1. Analysis of the whole CTAS statement 2. Divide CTAS to CREATE stmt and INSERT stmt 3. Create temporary in-memory target table from the CREATE stmt 4. Analyse the INSERT statement by using the temporary target table 5. If everything is OK so far, create the target table 6. Execute the INSERT query For Iceberg tables the non-trivial thing was to create the temporary target table without actually creating it via Iceberg API. I've created a new class 'IcebergCtasTarget' that mimics an FeIceberg table. It can be used with catalog V1 and V2 as well. Testing * e2e CTAS tests in iceberg-ctas.test * SHOW CREATE TABLE stmts in show-create-table.test Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67 Reviewed-on: http://gerrit.cloudera.org:8080/17130 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-03-12 19:28:19 +00:00
Zoltan Borok-Nagy	08367e91f0	IMPALA-10452: CREATE Iceberg tables with old PARTITIONED BY syntax For convenience this patch adds support with the old-style CREATE TABLE ... PARTITIONED BY ...; syntax for Iceberg tables. So users should be able to write the following: CREATE TABLE ice_t (i int) PARTITIONED BY (p int) STORED AS ICEBERG; Which should be equivalent to this: CREATE TABLE ice_t (i int, p int) PARTITION BY SPEC (p IDENTITY) STORED AS ICEBERG; Please note that the old-style CREATE TABLE statement creates IDENTITY-partitioned tables. For other partition transforms the users must use the new, more generic syntax. Hive also supports the old PARTITIONED BY syntax with the same behavior. Testing: * added e2e tests Change-Id: I789876c161bc0987820955aa9ae01414e0dcb45d Reviewed-on: http://gerrit.cloudera.org:8080/16979 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-26 22:12:25 +00:00
skyyws	1093a563e6	IMPALA-10368: Support required/optional property when creating Iceberg table We supported create required/optional field for Iceberg table in this patch. If we set 'NOT NULL' property for Iceberg table column in SQL, Impala will create required field by Iceberg api, 'NULL' or default will create optional field. Besides, 'DESCRIBE XXX' for Iceberg table will display 'optional' property like this: +------+--------+---------+----------+ \| name \| type \| comment \| nullable \| +------+--------+---------+----------+ \| id \| int \| \| false \| \| name \| string \| \| true \| \| age \| int \| \| true \| +------+--------+---------+----------+ And 'SHOW CREATE TABLE XXX' will also display 'NULL'/'NOT NULL' property for Iceberg table. Tests: * added new test in iceberg-create.test * added new test in iceberg-negative.test * added new test in show-create-table.test * modify 'DESCRIBE XXX' result in iceberg-create.test * modify 'DESCRIBE XXX' result in iceberg-alter.test * modify create table result in show-create-table.test Change-Id: I70b8014ba99f43df1b05149ff7a15cf06b6cd8d3 Reviewed-on: http://gerrit.cloudera.org:8080/16904 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-11 17:08:21 +00:00
Zoltan Borok-Nagy	4448b8755b	IMPALA-10152: Add support for Iceberg HiveCatalog HiveCatalog is one of Iceberg's catalog implementations. It uses the Hive metastore and it is the recommended catalog implementation when the table data is stored in object stores like S3. This commit updates the Iceberg version to a newer one, and it also retrieves Iceberg from the CDP distribution because that version of Iceberg is built against Hive 3 (Impala is only compatible with Hive 3). This commit makes HiveCatalog the default Iceberg catalog in Impala because it can be used in more environments (e.g. cloud stores), and it is more featureful. Also, other engines that store their table metadata in HMS will probably use HiveCatalog as well. Tables stored in HiveCatalog are similar to Kudu tables with HMS integration, i.e. modifying an Iceberg table via the Iceberg APIs also modifies the HMS table. So in CatalogOpExecutor we handle such Iceberg tables similarly to integrated Kudu tables. Testing: * Added e2e tests for creating, writing, and altering Iceberg tables * Added SHOW CREATE TABLE tests Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197 Reviewed-on: http://gerrit.cloudera.org:8080/16721 Reviewed-by: wangsheng <skyyws@163.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-11-20 21:40:28 +00:00
skyyws	0c0985a825	IMPALA-10159: Supporting ORC file format for Iceberg table This patch mainly realizes querying Iceberg table with ORC file format. We can using following SQL to create table with ORC file format: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg.file_format'='orc', 'iceberg.catalog'='hadoop.tables'); But pay attention, there still some problems when scan ORC files with Timestamp, more details please refer IMPALA-9967. We may add new tests with Timestmap type after this JIRA fixed. Testing: - Create table tests in functional_schema_template.sql - Iceberg table create test in test_iceberg.py - Iceberg table query test in test_scanners.py Change-Id: Ib579461aa57348c9893a6d26a003a0d812346c4d Reviewed-on: http://gerrit.cloudera.org:8080/16568 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-14 19:19:19 +00:00
Gabor Kaszab	13a78fc1b0	IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables This patch adds support for Iceberg Bucket and Truncate partition transforms. Both accept a parameter: number of buckets and width respectively. Usage: CREATE TABLE tbl_name (i int, p1 int, p2 timestamp) PARTITION BY SPEC ( p1 BUCKET 10, p1 TRUNCATE 5 ) STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.tables'); Testing: - Extended AnalyzerStmtsTest to cover creating partitioned Iceberg tables with the new partition transforms. - Extended ParserTest. - Extended iceberg-create.test to create Iceberg tables with the new partition transforms. - Extended show-create-table.test to check that the new partition transforms are displayed with their parameters in the SHOW CREATE TABLE output. Change-Id: Idc75cd23045b274885607c45886319f4f6da19de Reviewed-on: http://gerrit.cloudera.org:8080/16551 Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-14 19:07:06 +00:00
skyyws	5912c47617	IMPALA-10221: Rename 'iceberg_file_format' to 'iceberg.file_format' as Iceberg table property We provide several new table properties in IMPALA-10164, such as 'iceberg.catalog', in order to keep consist of these properties, we rename 'iceberg_file_format' to 'iceberg.file_format'. When we creating Iceberg table, we should use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG TBLPROPERTIES ('iceberg.file_format'='parquet', 'iceberg.catalog'='hadoop.tables') Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623 Reviewed-on: http://gerrit.cloudera.org:8080/16550 Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-06 16:58:04 +00:00
Gabor Kaszab	a5019eb12e	IMPALA-10184: Add PARTITON BY SPEC to SHOW CREATE TABLE for Iceberg Tables A SHOW CREATE TABLE output didn't contain the PARTITION BY SPEC section for partitioned Iceberg tables. This patch addresses this shortcoming. Change-Id: Ie4c43b75057807ab513a220d348155be2487e714 Reviewed-on: http://gerrit.cloudera.org:8080/16512 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-06 01:42:30 +00:00
skyyws	5b720a4d18	IMPALA-10164: Supporting HadoopCatalog for Iceberg table This patch mainly realizes creating Iceberg table by HadoopCatalog. We only supported HadoopTables api before this patch, but now we can use HadoopCatalog to create Iceberg table. When creating managed table, we can use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test'); We supported two values ('hadoop.catalog', 'hadoop.tables') for 'iceberg.catalog' now. If you don't specify this property in your SQL, default catalog type is 'hadoop.catalog'. As for external Iceberg table, you can use SQL like this: CREATE EXTERNAL TABLE default.iceberg_test_external STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test', 'iceberg.table_identifier'='default.iceberg_test'); We cannot set table location for both managed and external Iceberg table with 'hadoop.catalog', and 'SHOW CREATE TABLE' will not display table location yet. We need to use 'DESCRIBE FORMATTED/EXTENDED' to get this location info. 'iceberg.catalog_location' is necessary for 'hadoop.catalog' table, which used to reserved Iceberg table metadata and data, and we use this location to load table metadata from Iceberg. 'iceberg.table_identifier' is used for Icebreg TableIdentifier.If this property not been specified in SQL, Impala will use database and table name to load Iceberg table, which is 'default.iceberg_test_external' in above SQL. This property value is splitted by '.', you can alse set this value like this: 'org.my_db.my_tbl'. And this property is valid for both managed and external table. Testing: - Create table tests in functional_schema_template.sql - Iceberg table create test in test_iceberg.py - Iceberg table query test in test_scanners.py - Iceberg table show create table test in test_show_create_table.py Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Reviewed-on: http://gerrit.cloudera.org:8080/16446 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-01 13:54:48 +00:00
skyyws	fb6d96e001	IMPALA-9741: Support querying Iceberg table by impala This patch mainly realizes the querying of iceberg table through impala, we can use the following sql to create an external iceberg table: CREATE EXTERNAL TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg_file_format'='parquet'); Or just including table name and location like this: CREATE EXTERNAL TABLE default.iceberg_test STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg_file_format'='parquet'); 'iceberg_file_format' is the file format in iceberg, currently only support PARQUET, other format would be supported in the future. And if you don't specify this property in your SQL, default file format is PARQUET. We achieved this function by treating the iceberg table as normal unpartitioned hdfs table. When querying iceberg table, we pushdown partition column predicates to iceberg to decide which data files need to be scanned, and then transfer this information to BE to do the real scan operation. Testing: - Unit test for Iceberg in FileMetadataLoaderTest - Create table tests in functional_schema_template.sql - Iceberg table query test in test_scanners.py Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 Reviewed-on: http://gerrit.cloudera.org:8080/16143 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-06 02:12:07 +00:00
skyyws	8fcad905a1	IMPALA-9688: Support create iceberg table by impala This patch mainly realizes the creation of iceberg table through impala, we can use the following sql to create a new iceberg table: create table iceberg_test( level string, event_time timestamp, message string, register_time date, telephone array <string> ) partition by spec( level identity, event_time identity, event_time hour, register_time day ) stored as iceberg; 'identity' is one of Iceberg's Partition Transforms. 'identity' means that the source data values are used to create partitions, and other partition transfroms would be supported in the future, such as BUCKET/TRUNCATE. We can alse use 'show create table iceberg_test' to display table schema, and use 'show partitions iceberg_test' to display partition column info. By the way, partition column must be the source column. Testing: - Add test cases in metadata/test_show_create_table.py. - Add custom cluster test test_iceberg.py. Change-Id: I8d85db4c904a8c758c4cfb4f19cfbdab7e6ea284 Reviewed-on: http://gerrit.cloudera.org:8080/15797 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-18 21:56:32 +00:00
Joe McDonnell	f15a311065	IMPALA-9709: Remove Impala-lzo from the development environment This removes Impala-lzo from the Impala development environment. Impala-lzo is not built as part of the Impala build. The LZO plugin is no longer loaded. LZO tables are not loaded during dataload, and LZO is no longer tested. This removes some obsolete scan APIs that were only used by Impala-lzo. With this commit, Impala-lzo would require code changes to build against Impala. The plugin infrastructure is not removed, and this leaves some LZO support code in place. If someone were to decide to revive Impala-lzo, they would still be able to load it as a plugin and get the same functionality as before. This plugin support may be removed later. Testing: - Dryrun of GVO - Modified TestPartitionMetadataUncompressedTextOnly's test_unsupported_text_compression() to add LZO case Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e Reviewed-on: http://gerrit.cloudera.org:8080/15814 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2020-06-15 23:42:12 +00:00
Anurag Mantripragada	567b3cd04c	IMPALA-9311: Store SQLPrimaryKeys in canonical order. HMS seems to be returning SQLPrimaryKeys in inconsistent orders. This makes some of the primary keys tests flaky. This change sorts the list of primary keys and stores them in canonical order within Impala. Testing: - Modified the tests that were relying on HMS to return same order every time. - Ran parametrized job. Change-Id: I0f798d7a2659c6cd061002db151f3fa787eb6370 Reviewed-on: http://gerrit.cloudera.org:8080/15106 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-01-27 21:48:23 +00:00
Anurag Mantripragada	cfe60858da	IMPALA-9158: Support loading primary key/foreign key constraints in LocalCatalog Mode. This change add a new method 'loadConstraints()' to the MetaProvider interface. 1. In CatalogdMetaProvider implementation, we fetch the primary key (PK) and foreign key(FK) information via the GetPartialCatalogObject() RPC to the catalogd. This is modified to include PK/FK information. This is because, on catalog side we eagerly load PK/FK information which can be sent over to local catalog in a single RPC to Catalog. This information is then stored in TableMetaRef object for future consumers. 2. In the DirectMetaProvider implementation, we make two RPCs to HMS to directly get PK/FK information. Load constraints can be extended to include other constraints later (for ex: unique constraints.) Testing: - Added tests in LocalCatalogTest, CatalogTest and PartialCatalogInfoTest - This change also modifies the toSqlUtil for show create table statements. Added a test for the same. Change-Id: I7ea7e1bacf6eb502c67caf310a847b32687e0d58 Reviewed-on: http://gerrit.cloudera.org:8080/14731 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-01-18 03:36:37 +00:00
Vihang Karajgaonkar	6ebea33a9d	IMPALA-9092: Add support for creating external Kudu table In HMS-3 the translation layer converts a managed kudu table into an external kudu table and adds additional table property 'external.table.purge' to 'true'. This means any installation which is using HMS-3 (or a Hive version which has HIVE-22158) will always create Kudu tables as external tables. This is problematic since the output of show create table will now be different and may confuse the users. In order to improve the user experience of such synchronized tables (external tables with external.table.purge property set to true), this patch adds support in Impala to create external Kudu tables. Previous versions of Impala disallowed creating a external Kudu table if the Kudu table did not exist. After this patch, Impala will check if the Kudu table exists and if it does not it will create a Kudu table based on the schema provided in the create table statement. The command will error out if the Kudu table already exists. However, this applies to only the synchronized tables. Previous way to create a pure external table behaves the same. Following syntax of creating a synchronized table is now allowed: CREATE EXTERNAL TABLE foo ( id int PRIMARY KEY, name string) PARTITION BY HASH PARTITIONS 8 STORED AS KUDU TBLPROPERTIES ('external.table.purge'='true') The syntax is very similar to creating a managed table, except for the EXTERNAL keyword and additional table property. A synchronized table will behave similar to managed Kudu tables (drops and renames are allowed). The output of show create table on a synchronized table will display the full column and partition spec similar to the managed tables. Testing: 1. After the CDP version bump all of the existing Kudu tables now create synchronized tables so there is good coverage there. 2. Added additional tests which create synchronized tables and compares the show create table output. 3. Ran exhaustive tests with both CDP and CDH builds. Change-Id: I76f81d41db0cf2269ee1b365857164a43677e14d Reviewed-on: http://gerrit.cloudera.org:8080/14750 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-12-13 23:02:13 +00:00
norbert.luksa	288c8c41b5	IMPALA-8755: Frontend support for Z-ordering Extended the SQL grammar with an optional and a default flag for SORT BY, namely ZORDER and LEXICAL. If set, the new 'sort.algorithm' table property will be set to ZORDER and the information will sink down to the backend. The default order is indicated by LEXICAL and can be omitted. Examples are: CREATE TABLE t (a INT, b INT) PARTITIONED BY (c INT) SORT BY ZORDER (a, b); CREATE TABLE t SORT BY ZORDER (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY ZORDER (id,zip); ALTER TABLE t SORT BY ZORDER (int_col,id); The following two are the same statements: CREATE TABLE t (a INT, b INT) SORT BY (a, b); CREATE TABLE t (a INT, b INT) SORT BY LEXICAL (a, b); For strings, varchars, floats and doubles Z-ordering is currently not supported. It's not suitable for strings and varchars, but support can be added for floats and doubles later. The supported types are: boolean, int types, decimals, date, timestamp, and char. Currently ZORDER has the same functionality as a simple SORT BY clause, therefore hidden behind a feature flag: unlock_zorder. The custom sorting with Z-ordering will be in a different commit later. Testing: * Added tests for the ZORDER option for every SORT BY test. * Modified some tests by adding the LEXICAL option. * The .test workloads are temporarily put in separate test files in order to set up the feature flag. These tests are run from tests/custom_cluster/test_zorder.py which is a duplication of the relevant tests, but with CustomClusterTestSuite decorator. Change-Id: Ie122002ca8f52ca2c1e1ec8ff1d476ae1f4f875d Reviewed-on: http://gerrit.cloudera.org:8080/13955 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-09-26 18:35:06 +00:00
Tianyi Wang	3cb784310f	IMPALA-7347: Ignore numFilesErasureCoded in TestShowCreateTable This table properties only exist for HDFS tables. To get the test work on local tables, it needs to be ignored. Change-Id: Icc8494fb91c4777cee662a97f750486aa8e79a8e Reviewed-on: http://gerrit.cloudera.org:8080/11192 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-13 21:36:16 +00:00
Tianyi Wang	fb3d47d356	IMPALA-7347: Update tests to accomodate HIVE-18118 HIVE-18118 adds 'numFilesErasureCoded' to table properties. This patch addes it to test_show_create_table to work with the latest Hive. Change-Id: I6aae402dd38374de90b35c32166a9507e6eb29f9 Reviewed-on: http://gerrit.cloudera.org:8080/11108 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-02 20:25:31 +00:00
Fredy Wijaya	8173e9ab4d	IMPALA-6571: NullPointerException in SHOW CREATE TABLE for HBase tables This patch fixes the NullPointerException in SHOW CREATE TABLE for HBase tables. Testing: - Moved the content of back hbase-show-create-table.test to show-create-table.test - Ran show-create-table end-to-end tests Change-Id: Ibe018313168fac5dcbd80be9a8f28b71a2c0389b Reviewed-on: http://gerrit.cloudera.org:8080/9884 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-04-04 00:12:30 +00:00
Lars Volker	1ada9dac88	IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); Sort columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying sort columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the sort columns. We do this because sort columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. Sort columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Until then, it is possible to specify sort columns using both ways at the same time and the column lists will be concatenated. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Reviewed-on: http://gerrit.cloudera.org:8080/6495 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-12 15:43:30 +00:00
Joe McDonnell	5755261954	IMPALA-4036: invalid SQL generated for partitioned table with comment For a table that has both a table comment and a partition specified, "show create table" incorrectly outputs the comment before the partition. This is not the correct order, and it results in an invalid SQL. This transaction fixes the ordering (partition comes before comment) and adds tests for this case. Change-Id: I29a33cfd142b473997fdc3acfe3f0966bc7ed784 Reviewed-on: http://gerrit.cloudera.org:8080/5648 Tested-by: Impala Public Jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2017-01-12 20:41:35 +00:00
Dimitris Tsirogiannis	1da57019ad	IMPALA-4579: SHOW CREATE VIEW fails for view containing a subquery This commit fixes an issue where a SHOW CREATE VIEW statement throws an analysis error if the view contains a subquery. Change-Id: I4a89e46a022f0ccec198b6e3e2b30230103831ce Reviewed-on: http://gerrit.cloudera.org:8080/5333 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-12-04 08:35:15 +00:00
Tim Armstrong	9894cf6a55	IMPALA-783: add show create view as alias for show create table SHOW CREATE TABLE already outputs information for views. As a convenience, this patch adds SHOW CREATE VIEW as an alias for SHOW CREATE TABLE. Switch some SHOW CREATE VIEW tests to use SHOW CREATE VIEW and add additional test for SHOW CREATE VIEW on a table so that expected behaviour is tested. Change-Id: I9925e0789573e9b097a2ef52b5023964dcf8f32c Reviewed-on: http://gerrit.cloudera.org:8080/1661 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-01-20 04:32:21 +00:00
Lars Volker	82a1aef91b	IMPALA-1687: Expand CTAS to allow partition clauses This changes implements support for PARTITIONED BY clauses in CTAS statements. The syntax and semantics follow the PARTITION feature of insert from select statements: inside the PARTITIONED BY (...) column list the user must specify names of the columns to partition by. These column names must appear in that particular order at the end of the select statement. A remapping between columns of the source and destination tables is not possible, because the destination table does not yet exist. Specifying static values for the partition columns is also not possible, as their type needs to be deduced from columns in the select statement. Example: CREATE TABLE t (a DOUBLE, b INT); INSERT INTO t VALUES (1.5, 3); CREATE TABLE p PARTITIONED BY (b) AS SELECT a, b FROM t; This change also contains a fix for setting the PYTHONPATH environment variable correctly, so you can run single python tests from the command line. Change-Id: I5f61854d36d1ee30cfcd1c6b2b3eb971f6cf4b2f Reviewed-on: http://gerrit.cloudera.org:8080/1740 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Internal Jenkins	2016-01-18 16:55:45 +00:00
Tim Armstrong	ab3e9f19bf	IMPALA-783: view support for show create table SHOW CREATE TABLE now supports views. It returns a CREATE VIEW statement with column names and the original sql statement. Authorization allows SHOW CREATE TABLE to be run on view if the user has VIEW_METADATA privilege on the view and SELECT privilege on all underlying views and table. E.g. "SHOW CREATE TABLE some_view" returns output of form: CREATE VIEW a_database.some_view (id, bool_col, tinyint_col) AS SELECT id, bool_col, tinyint_col FROM functional.alltypes Change-Id: Id633af2f5c1f5b0e01c13ed85c4bf9c045dc0666 Reviewed-on: http://gerrit.cloudera.org:8080/713 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-12-17 03:28:32 +00:00
ishaan	8369c3b13b	Remove explicit references to functional_hbase tables from .test files. Additionally, this patch also disabled the hbase/none test dimension if the TARGET_FILESYSTEM environment variable is set to either s3 of isilon. Change-Id: I63aecaa478d2ba9eb68de729e9640071359a2eeb Reviewed-on: http://gerrit.cloudera.org:8080/74 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-02-23 23:32:41 +00:00
Alex Behm	19bab59854	Create/alter/describe tables with complex types. This patch adds parsing of complex types and tests for using complex types in various exprs and create/alter/describe stmts. Change-Id: Ibc211a560c889f5ccfb616813700b923c89d8245 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3577 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3594	2014-07-23 17:26:14 -07:00
Lenni Kuff	76fa3b2ded	Update DDL to support 'STORED AS PARQUET' and 'STORED AS AVRO' syntax This change updates our DDL syntax support to allow for using 'STORED AS PARQUET' as well as 'STORED AS PARQUETFILE'. Moving forward we should prefer the new syntax, but continue to support the old. I made the same change for 'AVROFILE', but since we have not yet documented the 'AVROFILE' syntax I left out support for the old syntax. Change-Id: I10c73a71a94ee488c9ae205485777b58ab8957c9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1053 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:18 -08:00
Matthew Jacobs	51bfc99c63	IMPALA-395: Impala "show create table" statement Adds support for "show create table", a DDL statement that outputs a DDL statement that creates the specified table. In general, the output DDL works in Impala, so a user can copy the output and execute it to create the same table. However, there are a few special cases that output Hive DDL because we do not support creating some tables in Impala: HBase tables and tables with LZO compressed text. When we do support creating these tables in Impala, users should be able to execute the DDL in Impala as well. Change-Id: I8c130297a657810dea5b994bf99d72b0e61b847b Reviewed-on: http://gerrit.ent.cloudera.com:8080/842 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-01-08 10:53:53 -08:00

37 Commits