mirror of
https://github.com/apache/impala.git
synced 2025-12-30 03:01:44 -05:00
INSERT OVERWRITE commands in Hive will only affect partitions that Hive knows about. If an external table gets dropped and recreated, then 'MSCK REPAIR TABLE' needs to be executed to recover any preexisting partitions. Otherwise, an INSERT OVERWRITE will not remove the data files in those partitions and will fail to move the new data in place. More information can be found here: http://www.ericlin.me/hive-insert-overwrite-does-not-remove-existing-data I tested the fix by running the following commands, making sure that the second run of the .sql script completed without errors and validating the number of lines was correct (10) after both runs. export JDBC_URL="jdbc:hive2://${HS2_HOST_PORT}/default;" export HS2_HOST_PORT=localhost:11050 beeline -n $USER -u "${JDBC_URL}" -f ${IMPALA_HOME}/testdata/avro_schema_resolution/create_table.sql beeline -n $USER -u "${JDBC_URL}" -f ${IMPALA_HOME}/testdata/avro_schema_resolution/create_table.sql Change-Id: I0f68eeb75ba2f43b96b8f3d82f902e291d3bd396 Reviewed-on: http://gerrit.cloudera.org:8080/6317 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins
This folder contains the files necessary to test Impala support for Avro schema resolution (along with the TestAvroSchemaResolution query test). create_table.sql creates a functional_avro_snap.schema_resolution_test table and loads records1.avro and records2.avro. The .avro files were created via the following commands: java -jar ~/avro-tools-1.7.4.jar fromjson --schema-file file_schema1.avsc --codec snappy records1.json > records1.avro java -jar ~/avro-tools-1.7.4.jar fromjson --schema-file file_schema2.avsc --codec snappy records2.json > records2.avro create_table.sql, file_schema1.avsc and file_schema2.avsc contain the relevant schema definitions.