impala

mirror of https://github.com/apache/impala.git synced 2026-01-26 12:02:21 -05:00

Author	SHA1	Message	Date
Joe McDonnell	0163a10332	IMPALA-9068: Use different directories for external vs managed warehouse Hive 3 changed the typical storage model for tables to split them between two directories: - hive.metastore.warehouse.dir stores managed tables (which is now defined to be only transactional tables) - hive.metastore.warehouse.external.dir stores external tables (everything that is not a transactional table) In more recent commits of Hive, there is now validation that the external tables cannot be stored in the managed directory. In order to adopt these newer versions of Hive, we need to use separate directories for external vs managed warehouses. Most of our test tables are not transactional, so they would reside in the external directory. To keep the test changes small, this uses /test-warehouse for the external directory and /test-warehouse/managed for the managed directory. Having the managed directory be a subdirectory of /test-warehouse means that the data snapshot code should not need to change. The Hive 2 configuration doesn't change as it does not have this concept. Since this changes the dataload layout, this also sets the CDH_MAJOR_VERSION to 7 for USE_CDP_HIVE=true. This means that dataload will uses a separate location for data as compared to USE_CDP_HIVE=false. That should reduce conflicts between the two configurations. Testing: - Ran exhaustive tests with USE_CDP_HIVE=false - Ran exhaustive tests with USE_CDP_HIVE=true (with current Hive version) - Verified that dataload succeeds and tests are able to run with a newer Hive version. Change-Id: I3db69f1b8ca07ae98670429954f5f7a1a359eaec Reviewed-on: http://gerrit.cloudera.org:8080/15026 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-01-24 17:29:15 +00:00
Attila Jeges	27fa27e808	IMPALA-8198: DATE: Read from avro. This change is a follow-up to IMPALA-7368 and adds support for DATE type to the avro scanner. Similarly to parquet, avro uses DATE logical type for dates. DATE logical type annotates an INT32 that stores the number of days since the unix epoch, 1 January 1970. This representation introduces an avro interoperability issue between Impala and older versions of Hive: - Before version 3.1, Hive used Julian calendar to represent dates up to 1582-10-05 and Gregorian calendar for dates starting with 1582-10-15. Dates between 1582-10-05 and 1582-10-15 were lost. - Impala uses proleptic Gregorian calendar, extending the Gregorian calendar backward to dates preceding its official introduction in 1582-10-15. This means that pre-1582-10-15 dates written to an avro table by Hive will be read back incorrectly by Impala. Note that Hive 3.1 switched to proleptic Gregorian calendar too, so for Hive 3.1+ this is no longer an issue. Dependency changes: - BE uses avro 1.7.4-p5 from native-toolchain. Change-Id: I7a9d5b93a22cf3a00244037e187f8c145cacc959 Reviewed-on: http://gerrit.cloudera.org:8080/13944 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-09-27 17:18:35 +00:00
Lars Volker	32d45f4262	IMPALA-4615: Fix create_table.sql command order INSERT OVERWRITE commands in Hive will only affect partitions that Hive knows about. If an external table gets dropped and recreated, then 'MSCK REPAIR TABLE' needs to be executed to recover any preexisting partitions. Otherwise, an INSERT OVERWRITE will not remove the data files in those partitions and will fail to move the new data in place. More information can be found here: http://www.ericlin.me/hive-insert-overwrite-does-not-remove-existing-data I tested the fix by running the following commands, making sure that the second run of the .sql script completed without errors and validating the number of lines was correct (10) after both runs. export JDBC_URL="jdbc:hive2://${HS2_HOST_PORT}/default;" export HS2_HOST_PORT=localhost:11050 beeline -n $USER -u "${JDBC_URL}" -f ${IMPALA_HOME}/testdata/avro_schema_resolution/create_table.sql beeline -n $USER -u "${JDBC_URL}" -f ${IMPALA_HOME}/testdata/avro_schema_resolution/create_table.sql Change-Id: I0f68eeb75ba2f43b96b8f3d82f902e291d3bd396 Reviewed-on: http://gerrit.cloudera.org:8080/6317 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-03-09 06:00:15 +00:00
Jim Apple	bd2947329e	IMPALA-4110: Clean up issues found by Apache RAT. Change-Id: I5bfe77f9a871018e7a67553ed270e2df53006962 Reviewed-on: http://gerrit.cloudera.org:8080/4361 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-09-14 22:09:24 +00:00
Anuj Phadke	a915293109	IMPALA-1850: Allow fs.defaultFS to be set to a non-HDFS filesystem This change whitelists the supported filesystems which can be set as Default FS for Impala to run on. This patch configures Impala to use S3 as the default filesystem, rather than a secondary filesystem as before. Change-Id: I2f45bef6c94ece634045acb906d12591587ccfed Reviewed-on: http://gerrit.cloudera.org:8080/1121 Reviewed-by: anujphadke <aphadke@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:40 -07:00
Juan Yu	4f61edee1d	IMPALA-2798: Bring in AVRO-1617 fix and add test case for it Impala could crash or return wrong result if it uses codegend avro decoding function to scan avro file that has different schema than table schema. With AVRO-1617 fix, we make sure Impala doesn't use codegen if table schema has less columns than file schema. Change-Id: I268419e421404ad6b084482dee417634f17ecf60 Reviewed-on: http://gerrit.cloudera.org:8080/1696 Reviewed-by: Juan Yu <jyu@cloudera.com> Tested-by: Internal Jenkins	2016-01-14 06:04:48 +00:00
ishaan	dee6911b20	Enable loading metadata from the hive metastore snapshot and cleanup build scripts. This patch contains the following changes: - Add a metastore_snapshot_file parameter to build.sh - Enable skipping loading the metadata. - create-load-data.sh is refactored into functions. - A lot of scripts source impala-config, which creates a lot of log spew. This has now been muted. - Unecessary log spew from compute-table-stats has been muted. - build_thirdparty.sh determins its parallelism from the system, it was previously hard coded to 4 - Only force load data of the particular dataset if a schema change is detected. Change-Id: I909336451e5c1ca57d21f040eb94c0e831546837 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5540 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-12-19 13:41:00 -08:00
Alex Behm	ce40134ad0	IMPALA-867: Fail COMPUTE STATS in analysis for Avro tables affected by HIVE-6308. Avro tables that were not created with a column-definition list do not have their columns properly populated in the Metastore backend DB (HIVE-6308). For such tables COMPUTE STATS and Hive's ANALYZE TABLE cannot succeed. This patch fails COMPUTE STATS in analysis for such broken Avro tables and adds tests for Avro tables with mismatched a column-definition list and Avro schema. Change-Id: I561ecea944ae2f83d69950b7a1ab9edaa89bdcea Reviewed-on: http://gerrit.ent.cloudera.com:8080/1892 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1920	2014-03-14 23:24:55 -07:00
Lenni Kuff	fc7733d530	Fix resolution of mismatched column names that come from the deserializer (ex. Avro tables) Fixes a bug (regression) where the catalog server was not properly resolving column names when a table's column definition did not match its Avro schema definition. The expected behavior in this case is that the the Avro scehma definition should be used instead of the table columns. We had no test tables that were mismatched so this wasn't caught. This loading of the schema and columns happens when a table's metadata is loaded, so the fix is to just add a toThrift() to Column and not reference metastore.getSd().getCols() directly since it might be the "wrong" set of columns. Change-Id: I341a3a8834f5748f90c246d2093ddb983ecfdd4f Reviewed-on: http://gerrit.ent.cloudera.com:8080/770 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:44 -08:00
Lenni Kuff	be1d42c05a	IMPALA-538: Look for Avro schema in SERDEPROPERTIES as well as TBLPROPERTIES Change-Id: If5c0b36d5a3963176b07a0cb1ea680e3e36b2f96 Reviewed-on: http://gerrit.ent.cloudera.com:8080/248 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:15 -08:00
Skye Wanderman-Milne	3fecdeb793	IMPALA-441: support default values for Avro tables	2014-01-08 10:51:39 -08:00

11 Commits