Files
impala/testdata/bin
Joe McDonnell fd66890bf1 IMPALA-6579: Always force reload Kudu tables for dataload
When loading from an up-to-date snapshot, dataload will
load all of the metadata and load data into HDFS. Then,
it will skip load-data.py for functional/exhaustive,
tpch/core, and tpcds/core. It will invoke a special
round of load-data.py calls to populate Kudu tables,
and it always runs these with a force reload.

However, when loading from an old snapshot, dataload will
still load all of the metadata and load the data into
HDFS, but then it will still invoke load-data.py for
functional/exhaustive, tpch/core, and tpcds/core.
These invocations mostly do DDLs with very few load
statements. However, these invocations are a problem
for Kudu. The metadata of Impala tables referencing
Kudu entities have been imported along with all the other
metadata, but the Kudu entities have not been created, as
they are separate from HDFS. This means that Kudu tables
are not really valid in this circumstance.

Since Kudu has been added to the list of data formats
for tpch/core (see IMPALA-6475), load-data.py with
tpch/core will attempt to insert into these invalid
Kudu tables.

To avoid this, always force reload any Kudu tables.
generate-schema-statements.py will always generate a
drop table statement before any create of a Kudu table.
This guarantees that the create will also create the
corresponding Kudu entity.

Change-Id: I2d07f3513c543e2590f2f62b96b37472316868ee
Reviewed-on: http://gerrit.cloudera.org:8080/9445
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins
2018-02-25 03:04:58 +00:00
..