Files
impala/testdata/bin/load-dependent-tables-hive2.sql
Todd Lipcon 3567a2b5d4 IMPALA-8369 (part 4): Hive 3: fixes for functional dataset loading
This fixes three issues for functional dataset loading:

- works around HIVE-21675, a bug in which 'CREATE VIEW IF NOT EXISTS'
  does not function correctly in our current Hive build. This has been
  fixed already, but the workaround is pretty simple, and actually the
  'drop and recreate' pattern is used more widely for data-loading than
  the 'create if not exists' one.

- Moves the creation of the 'hive_index' table from
  load-dependent-tables.sql to a new load-dependent-tables-hive2.sql
  file which is only executed on Hive 2.

- Moving from MR to Tez execution changed the behavior of data loading
  by disabling the auto-merging of small files. With Hive-on-MR, this
  behavior defaulted to true, but with Hive-on-Tez it defaults false.
  The change is likely motivated by the fact that Tez automatically
  groups small splits on the _input_ side and thus is less likely to
  produce lots of small files. However, that grouping functionality
  doesn't work properly in localhost clusters (TEZ-3310) so we aren't
  seeing the benefit. So, this patch enables the post-process merging of
  small files.

  Prior to this change, the 'alltypesaggmultifilesnopart' test table was
  getting 40+ files inside it, which broke various planner tests. With
  the change, it gets the expected 4 files.

Change-Id: Ic34930dc064da3136dde4e01a011d14db6a74ecd
Reviewed-on: http://gerrit.cloudera.org:8080/13251
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-05-15 11:00:45 +00:00

31 lines
1.2 KiB
SQL

-- Licensed to the Apache Software Foundation (ASF) under one
-- or more contributor license agreements. See the NOTICE file
-- distributed with this work for additional information
-- regarding copyright ownership. The ASF licenses this file
-- to you under the Apache License, Version 2.0 (the
-- "License"); you may not use this file except in compliance
-- with the License. You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing,
-- software distributed under the License is distributed on an
-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-- KIND, either express or implied. See the License for the
-- specific language governing permissions and limitations
-- under the License.
-- Create and load tables that depend upon data in the hive test-warehouse
-- already existing.
--
-- The queries in this file will only be executed on Hive 2 (and not later
-- versions).
USE functional;
DROP INDEX IF EXISTS hive_index ON alltypes;
CREATE INDEX hive_index ON TABLE alltypes (int_col)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
WITH DEFERRED REBUILD IN TABLE hive_index_tbl;