impala

mirror of https://github.com/apache/impala.git synced 2026-02-02 15:00:38 -05:00

Author	SHA1	Message	Date
Sai Hemanth Gantasala	a0cdb7b594	IMPALA-12231: Bump GBN to get HMS thrift API changes We need a couple of hive changes HIVE-27319 and HIVE-27337 for catalogD to work with latest HMS server to fix IMPALA-11768 and IMPALA-11939 respectively. Bump CDP_BUILD_NUMBER (GBN) to 44206393 Bump various CDP versiona numbers to be based on 7.2.18.0-273 TESTING: Exhaustive tests ran clean Added a couple of tests for IMPALA-11939 and IMPALA-11768 Change-Id: I117873b628aed3e24280f9fcd79643f918c8d5f3 Reviewed-on: http://gerrit.cloudera.org:8080/20420 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-09-12 09:36:57 +00:00
Steve Carlin	bc83d46a9a	IMPALA-12424: Allow third party JniFrontend interface. This patch allows a third party to inject their own frontend class instead of using the default JniFrontend included in the project. The test case includes an interface that runs queries as normal except for the "select 1" query which gets changed to "select 42". Change-Id: I89e677da557b39232847644b6ff17510e2b3c3d5 Reviewed-on: http://gerrit.cloudera.org:8080/20459 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-09-08 20:20:56 +00:00
Zoltan Borok-Nagy	a95859be0b	IMPALA-12359: Add missing package-info file used by HiveVersionInfo We create a minimal-impala-hive-exec.jar based on Hive's hive-exec.jar: https://github.com/apache/impala/blob/master/java/shaded-deps/hive-exec/pom.xml#L34 This excludes lots of class files, including org/apache/hive/common/package-info.class that is used by HiveVersionAnnotation and HiveVersionInfo classes. Because of this HiveVersionInfo returns "Unknown" version resulting in failing Iceberg operations. Change-Id: I444330a654d7d86e653588eb91d2f063d5be8c08 Reviewed-on: http://gerrit.cloudera.org:8080/20340 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-08-11 11:28:43 +00:00
Michael Smith	7fb6a9a1d2	IMPALA-11941: (Addendum) Use released jamm 0.4.0 Switches to the 0.4.0 release of jamm, as building a shaded JAR from source was a temporary measure. Change-Id: I5b88b479580f7d0baff502ad9551d2764971babf Reviewed-on: http://gerrit.cloudera.org:8080/20237 Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-07-25 00:27:56 +00:00
Laszlo Gaal	ee069687fc	IMPALA-12212: Bump Maven to 3.9.2, pull dependencies in parallel Maven 3.9.x offers a new dependency resolver, HttpClient, which allows downloading project dependencies in parallel. This patch bumps the Maven version installed by bootstrap_system.sh to v3.9.2, and adds the flags enabling the new resolver to download dependencies (including POM files) in parallel. Parallelism is set to 10 threads. The flags are added to a project-specific Maven setting file in the newly created java/.mvn directory. The settings file is added to the RAT exclusion list in bin/rat_exclude_files.txt. The --show-version flag is added for debugging purposes. The same flags are added to the JAMM subproject as well. The new resolver in Maven 3.9 has also changed the warning message emitted for missing component checksums, so the new warning string is added to the filter in bin/mvn-quiet.sh Unfortunately Maven 3.9 has also changed the way it responds to missing checksum files: the resolver now emits a stack trace when checksums cannot be determined, and missing checksums are not explicitly ignored. Detailed documentation for the new Maven resolver in Maven 3.9.0+ is located at: https://maven.apache.org/guides/mini/guide-resolver-transport.html resolver configuration reference: https://maven.apache.org/resolver/configuration.html Tests: - verified in a core-mode test run with Maven 3.9.2 installed - verified in a local build using an earlier version of Maven to verify that the new default setting does not cause regressions with the old dependency resolver. Change-Id: I75d05215effc724f5bd471646fb352f37443e185 Reviewed-on: http://gerrit.cloudera.org:8080/20142 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2023-07-24 18:50:34 +00:00
Joe McDonnell	a281d8eb8e	IMPALA-12284: Use Maven's batch mode when building jamm This adds the --batch-mode flag to the maven invocation the builds jamm. That disables some of the download progress output, reducing the total size of the output. Testing: - Ran a build locally Change-Id: I1634240b191168b13cf3be7c9266e21a746844b1 Reviewed-on: http://gerrit.cloudera.org:8080/20196 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-07-14 23:45:28 +00:00
Michael Smith	87fd844d3e	IMPALA-11941: (Addendum) Produce shaded copy of Jamm Produces a shaded copy of a pre-release jamm jar that supports Java 17. Building a copy of jamm and directly depending on it meant any consumer of Impala would have to provide their own build of it. Testing: ran custom_cluster/test_local_catalog.py with Java 8 and 17 Change-Id: Ida42d720a2639b65391c07a9237556311e04fac6 Reviewed-on: http://gerrit.cloudera.org:8080/20147 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2023-07-01 01:10:12 +00:00
Daniel Becker	679d58fa6d	IMPALA-12238: RandomNestedDataGenerator should take a seed argument RandomNestedDataGenerator can be used to produce parquet files with random data from Avro schemas. This change makes it possible to provide a seed value for the random generator so the generated files are reproducible. The seed can be given as the last (optional) command line argument. It is parsed as a Java 'long'. Testing: - manually verified that when run with the same arguments (including the seed), the data generator produces the same results Change-Id: Iee33604bbfe12895100afbd0f98ac302dee9a238 Reviewed-on: http://gerrit.cloudera.org:8080/20136 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Daniel Becker <daniel.becker@cloudera.com>	2023-06-28 16:18:08 +00:00
Michael Smith	3b0705ba63	IMPALA-11941: Support Java 17 in Impala Enables building for Java 17 - and particularly using Java 17 in containers - but won't run a minicluster fully with Java 17 as some projects (Hadoop) don't yet support it. Starting with Java 15, ehcache.sizeof encounters UnsupportedOperationException: can't get field offset on a hidden class in class members pointing to capturing lambda functions. Java 17 also introduces new modules that need to be added to add-opens. Both of these pose problems for continued use of ehcache. Adds https://github.com/jbellis/jamm as a new cache weigher for Java 15+. We build from HEAD as an external project until Java 17 support is released (https://github.com/jbellis/jamm/issues/44). Adds the 'java_weigher' option to select 'sizeof' or 'jamm'; defaults to 'auto', which uses jamm for Java 15+ and sizeof for everything else. Also adds metrics for viewing cache weight results. Adds JAVA_HOME/lib/server to LD_LIBRARY_PATH in run-jvm-binary to simplify switching between JDK versions for testing. You can now - export IMPALA_JDK_VERSION=11 - source bin/impala-config.sh - start-impala-cluster.py and have Impala running a different JDK (11) version. Retains add-opens calls that are still necessary due to dependencies' use of lambdas for jamm, and all others for ehcache. Add-opens are still required as a fallback, as noted in https://github.com/jbellis/jamm#object-graph-crawling. We catch the exceptions jamm and ehcache throw - CannotAccessFieldException, UnsupportedOperationException - to avoid crashing Impala, and add it to the list of banned log messages (as we should add-opens when we find them). Testing: - container test run with Java 11 and 17 (excludes custom cluster) - manual custom_cluster/test_local_catalog.py + test_banned_log_messages.py run with Java 11 and 17 (Java 8 build) - full Java 11 build (passed except IMPALA-12184) - add test catalog cache entry size metrics fit reasonable bounds - add unit test for utility to find jamm jar file in classpath Change-Id: Ic378896f572e030a3a019646a96a32a07866a737 Reviewed-on: http://gerrit.cloudera.org:8080/19863 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-06-24 10:11:54 +00:00
Peter Rozsa	6b571eb7e4	IMPALA-12184: Java UDF increment on an empty string is inconsistent This change removes the Text-typed overload for BufferAlteringUDF to avoid ambiguous function matchings. It also changes the 2-parameter function in BufferAlteringUDF to cover Text typed arguments. Tests: - test_udfs.py manually executed Change-Id: I3a17240ce39fef41b0453f162ab5752f1c940f41 Reviewed-on: http://gerrit.cloudera.org:8080/20038 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-06-20 17:00:35 +00:00
Michael Smith	683bef1ca4	IMPALA-11253: Support testing with Java 11 (take 2) Adds new environment variable IMPALA_JDK_VERSION which can be 'system', '8', or '11'. The default is 'system', which uses the same logic as before. If set to 8 or 11, it will ignore the system java and search for java of that specific version (based on specific directories for Ubuntu and Redhat). This is used by bin/bootstrap_system.sh to determine whether to install java 8 or java 11 (other versions can come later). If IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens needed to deal with the ehcache issue. This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of bootstrap_system.sh. Instead, it provides a new environment variable IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over IMPALA_JDK_VERSION. This also updates the versions of Maven plugins related to the build. Source and target releases are still set to Java 8 compatibility. Adds a verifier to the end of run-all-tests that InaccessibleObjectException is not present in impalad logs. Tested with JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \ CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \ run-all-tests.sh Testing: ran test suite with Java 11 This reverts the revert commit `1b6011c`, restoring these changes minus code to update IMPALA_JDK_VERSION based on $JAVA -version as that could break subsequent sourcing of impala-config.sh. Change-Id: Ie16504ad5738b1f228f97044afd3d9017ccc6c53 Reviewed-on: http://gerrit.cloudera.org:8080/19928 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-05-25 16:04:29 +00:00
Michael Smith	1b6011c6a0	Revert "IMPALA-11253: Support testing with Java 11" This reverts commit `ee6395db76` as it is not flexible enough at detecting Java automatically in likely build environments. Change-Id: I836c9f7fd10740b15f7e40b2e7f889ac7ee61fc3 Reviewed-on: http://gerrit.cloudera.org:8080/19908 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2023-05-21 14:00:14 +00:00
Michael Smith	ee6395db76	IMPALA-11253: Support testing with Java 11 Adds new environment variable IMPALA_JDK_VERSION which can be 'system', '8', or '11'. The default is 'system', which uses the same logic as before. If set to 8 or 11, it will ignore the system java and search for java of that specific version (based on specific directories for Ubuntu and Redhat). This is used by bin/bootstrap_system.sh to determine whether to install java 8 or java 11 (other versions can come later). If IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens needed to deal with the ehcache issue. This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of bootstrap_system.sh. Instead, it provides a new environment variable IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over IMPALA_JDK_VERSION. This also updates the versions of Maven plugins related to the build. Source and target releases are still set to Java 8 compatibility. Adds a verifier to the end of run-all-tests that InaccessibleObjectException is not present in impalad logs. Tested with JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \ CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \ run-all-tests.sh Testing: ran test suite with Java 11 Change-Id: I15d309e2092c12d7fdd2c99b727f3a8eed8bc07a Reviewed-on: http://gerrit.cloudera.org:8080/19539 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2023-05-19 22:32:00 +00:00
Michael Smith	d91cdb0cec	IMPALA-12077: Remove deprecated Avro methods Switches Avro methods deprecated in Avro 1.8 to new alternatives. Change-Id: I8c01886774eb4ca5964a82c2fa568d7c4354c70c Reviewed-on: http://gerrit.cloudera.org:8080/19772 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-04-21 22:53:46 +00:00
Michael Smith	f0289f3cbb	IMPALA-11273: Remove APIs deprecated in Java 11 Replaces constructor calls for object versions of primitives - Integer, Long, Float, Double, Boolean - with optimized valueOf calls as using constructors for these is deprecated according to jdeprscan. Removes override of finalize. Use of finalize is deprecated, and hive-udf-call.cc ensures we always call close when unloading the UDF. Adds try-with-resources to UdfExecutorTest to handle test cleanup. Updates BigDecimal.setScale to use RoundingMode. Change-Id: Idfb053223b6e098e6032502f873361696dd2da84 Reviewed-on: http://gerrit.cloudera.org:8080/19721 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-04-15 09:45:29 +00:00
Peter Rozsa	afe59f7f0d	IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs This change fixes the behavior of BytesWritable and TextWritable's getBytes() method. Now the returned byte array could be handled as the underlying buffer as it gets loaded before the UDF's evaluation, and tracks the changes as a regular Java byte array; the resizing operation still resets the reference. The operations that wrote back to the native heap were also removed as these operations are now handled in the byte array. ImpalaStringWritable class is also removed, writables that used it before now store the data directly. Tests: - Test UDFs added as BufferAlteringUdf and GenericBufferAlteringUdf - E2E test ran for UDFs Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212 Reviewed-on: http://gerrit.cloudera.org:8080/19507 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-03-08 19:54:38 +00:00
Csaba Ringhofer	67bb870aa3	IMPALA-11911: Fix NULL argument handling in Hive GenericUDFs Before this patch if an argument of a GenericUDF was NULL, then Impala passed it as null instead of a DeferredObject. This was incorrect, as a DeferredObject is expected with a get() function that returns null. See the Jira for more details and GenericUDF examples in Hive. TestGenericUdf's NULL handling was further broken in IMPALA-11549, leading to throwing null pointer exceptions when the UDF's result is NULL. This test bug was not detected, because Hive udf tests were running with default abort_java_udf_on_exception=false, which means that exceptions from Hive UDFs only led to warnings and returning NULL, which was the expected result in all affected test queries. This patch fixes the behavior in HiveUdfExecutorGeneric and improves FE/EE tests to catch null handling related issues. Most Hive UDF tests are run with abort_java_udf_on_exception=true after this patch to treat exceptions in UDFs as errors. The ones where the test checks that NULL is returned if an exception is thrown while abort_java_udf_on_exception is false are moved to new .test files. TestGenericUdf is also fixed (and simplified) to handle NULL return values correctly. Change-Id: I53238612f4037572abb6d2cc913dd74ee830a9c9 Reviewed-on: http://gerrit.cloudera.org:8080/19499 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-03-06 13:45:56 +00:00
yx91490	f4d306cbca	IMPALA-11629: Support for huawei OBS FileSystem This patch adds support for huawei OBS (Object Storage Service) FileSystem. The implementation is similar to other remote FileSystems. New flags for OBS: - num_obs_io_threads: Number of OBS I/O threads. Defaults to be 16. Testing: - Upload hdfs test data to an OBS bucket. Modify all locations in HMS DB to point to the OBS bucket. Remove some hdfs caching params. Run CORE tests. Change-Id: I84a54dbebcc5b71e9bcdd141dae9e95104d98cb1 Reviewed-on: http://gerrit.cloudera.org:8080/19110 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-02-09 08:10:19 +00:00
Peter Rozsa	1d05381b7b	IMPALA-11745: Add Hive's ESRI geospatial functions as builtins This change adds geospatial functions from Hive's ESRI library as builtin UDFs. Plain Hive UDFs are imported without changes, but the generic and varargs functions are handled differently; generic functions are added with all of the combinations of their parameters (cartesian product of the parameters), and varargs functions are unfolded as an nth parameter simple function. The varargs function wrappers are generated at build time and they can be configured in gen_geospatial_udf_wrappers.py. These additional steps are required because of the limitations in Impala's UDF Executor (lack of varargs support and only partial generics support) which could be further improved; in this case, the additional wrapping/mapping steps could be removed. Changes regarding function handling/creating are sourced from https://gerrit.cloudera.org/c/19177 A new backend flag was added to turn this feature on/off as "geospatial_library". The default value is "NONE" which means no geospatial function gets registered as builtin, "HIVE_ESRI" value enables this implementation. The ESRI geospatial implementation for Hive currently only available in Hive 4, but CDP Hive backported it to Hive 3, therefore for Apache Hive this feature is disabled regardless of the "geospatial_library" flag. Known limitations: - ST_MultiLineString, ST_MultiPolygon only works with the WKT overload - ST_Polygon supports a maximum of 6 pairs of coordinates - ST_MultiPoint, ST_LineString supports a maximum of 7 pairs of coordinates - ST_ConvexHull, ST_Union supports a maximum of 6 geoms These limits can be increased in gen_geospatial_udf_wrappers.py Tests: - test_geospatial_udfs.py added based on https://github.com/Esri/spatial-framework-for-hadoop Co-Authored-by: Csaba Ringhofer <csringhofer@cloudera.com> Change-Id: If0ca02a70b4ba244778c9db6d14df4423072b225 Reviewed-on: http://gerrit.cloudera.org:8080/19425 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-02-07 20:18:47 +00:00
Zoltan Borok-Nagy	b88cfadbbd	IMPALA-11777: Bump CDP_BUILD_NUMBER to get HIVE-24498 Without HIVE-24498 we get java.lang.NoClassDefFoundError exceptions when we write Iceberg tables via Hive. This makes it hard to write interop tests between Hive and Impala which use Iceberg tables. I also exclude some private Java components to get things built. Change-Id: I486c2b1b224f72e082e331a57cf25a37ebb9fa54 Reviewed-on: http://gerrit.cloudera.org:8080/19331 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Reviewed-by: Tamas Mate <tmater@apache.org>	2022-12-13 13:30:25 +00:00
Daniel Becker	a71e69f570	IMPALA-11792: Update Impala version to 4.3.0-SNAPSHOT As 4.2.0 has been released this commit updates the master to 4.3.0. This step needs to happen on each release. Testing: - Ran a build Change-Id: Iebedcfbc1fd8018391a6c78a9aca4a9d754780fa Reviewed-on: http://gerrit.cloudera.org:8080/19344 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-12-13 05:44:10 +00:00
Csaba Ringhofer	86740a7d35	IMPALA-11549: Support Hive GenericUdfs that return primitive java types Before this patch only the Writable* types were accepted in GenericUdfs as return types, while some GenericUdfs in the wild return primitive java types (e.g. Integer instead of IntWritable). For legacy Hive UDFs these return types were already handled, so the only change needed was to map the ObjectInspector subclasses (e.g. JavaIntObjectInspector) to the correct JavaUdfDataType in Impala. Testing: - Added a subclass for TestGenericUdf (TestGenericUdfWithJavaReturnTypes) that returns primitive java types (probably inheriting in the opposite direction would be more logical, but the diff is smaller this way). - Changed EE tests to also use TestGenericUdfWithJavaReturnTypes. - Changed FE tests (UdfExecutorTest) to check both TestGenericUdfWithJavaReturnTypes and TestGenericUdf. - Also added a test with BINARY type to UdfExecutorTest as this was forgotten during the original BINARY patch. Change-Id: I30679045d6693ebd35718b6f1a22aaa4963c1e63 Reviewed-on: http://gerrit.cloudera.org:8080/19304 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-12-08 17:51:00 +00:00
Michael Smith	c3ec9272c5	IMPALA-11724: Use CDP Ozone in test environment Updates the test environment to default to the CDP build of Ozone, as the latest build of CDP Hive depends on pre-release features unavailable in Ozone 1.2.1. Apache Ozone 1.2 can still be used by setting USE_APACHE_OZONE=true. The latest CDP build also includes a version of Ozone based on ozone#master with a candidate version of 1.3.0. Both Apache and CDP therefore have builds of Ozone we can test with that use the new artifact names introduced in Ozone 1.2, so this patch cleans up setup that was only needed for Ozone versions prior to 1.2. Change-Id: I1177a1b820fe21adca9f8c1cc51ff73ee001d3f2 Reviewed-on: http://gerrit.cloudera.org:8080/19247 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2022-11-16 22:13:06 +00:00
yacai	c953426692	IMPALA-11683: Support Aliyun OSS File System This patch adds support for OSS (Aliyun Object Storage Service). Using the hadoop-aliyun, the implementation is similar to other remote FileSystems. Tests: - Prepare: Initialize OSS-related environment variables: OSS_ACCESS_KEY_ID, OSS_SECRET_ACCESS_KEY, OSS_ACCESS_ENDPOINT. Compile and create hdfs test data on a ECS instance. Upload test data to an OSS bucket. - Modify all locations in HMS DB to point to the OSS bucket. Remove some hdfs caching params. Run CORE tests. Change-Id: I267e6531da58e3ac97029fea4c5e075724587910 Reviewed-on: http://gerrit.cloudera.org:8080/19165 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-11-16 10:14:49 +00:00
Michael Smith	83c5e6e409	IMPALA-11670: Upgrade components, add envvars for override Upgrades guava to 31.1-jre and jackson-databind to 2.13.4.2 to address CVEs. Adds environment variables for commonly-updated components so they can be customized via the branch-specific impala-config-branch.sh in a way that allows both to be updated regularly without merge conflicts. Also updates httpcomponents.httpcore to 4.4.14 to be consistent with other httpcomponents libraries included transitively. Change-Id: I1c2c4481ca3f498abf302aa05361d950b1ed1216 Reviewed-on: http://gerrit.cloudera.org:8080/19147 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-10-19 15:54:00 +00:00
Michael Smith	22e5ca3d0a	IMPALA-11667: Clean up Java dependency exclusions Use dependencyManagement to simplify Java dependencies by directly controlling versions of transitive dependencies instead of using exclusions and direct inclusion. Dependency management specifies versions authoritatively, so redundant version declarations are also removed. Change-Id: I424a175135855dcbd38ae432ea111cca5f562633 Reviewed-on: http://gerrit.cloudera.org:8080/19146 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-10-19 15:54:00 +00:00
Michael Smith	a1fddf1022	IMPALA-11628: Switch to reload4j, update slf4j Switches from log4j 1.x to reload4j, a maintained fork. Updates slf4j to the latest version so we can include all CVE fixes. slf4j 2.0.x requires Java 8 and adds a backward-compatible fluent logging api. Neither seems like a problem for Impala. Bans all use of log4j 1.x so we only use reload4j. Change-Id: I5238b9c8247af3e0f4cb05c0b76a75bfee37f5c8 Reviewed-on: http://gerrit.cloudera.org:8080/19102 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-10-11 21:23:11 +00:00
wzhou-code	010f1b943c	IMPALA-11639: Upgrade Spring framework to 5.3.20 This patch upgrade the Spring framework to 5.3.20 to address multiple CVEs: - CVE-2022-22971 - CVE-2022-22968 - CVE-2022-22970 Testing: - Ran core job - Ran custom cluster tests in exhaustive mode Change-Id: I33f4f1d22fc27227e31d744658a17c16b61b9677 Reviewed-on: http://gerrit.cloudera.org:8080/19091 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-10-05 19:34:39 +00:00
Steve Carlin	4e813b7085	IMPALA-11528: Catalogd should start up with a corrupt Hive function. This commit handles the case for a specific kind of corrupt function within the Hive Metastore in the following situation: A valid Hive SQL function gets created in HMS. This UDF is written in Java and must derive from the "UDF" class. After creating this function in Impala, we then replace the underlying jar file with a class that does NOT derive from the "UDF" class. In this scenario, catalogd should reject the function and still start up gracefully. Before this commit, catalogd wasn't coming up. The reason for this was because the Hive function FunctionUtils.getUDFClassType() has a dependency on UDAF and was throwing a LinkageError exception, so we need to include the UDAF class in the shaded jar. Change-Id: I54e7a1df6d018ba6cf5ecf32dc9946edf86e2112 Reviewed-on: http://gerrit.cloudera.org:8080/18927 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tamas Mate <tmater@apache.org>	2022-09-13 14:48:31 +00:00
Joe McDonnell	7581cedd52	IMPALA-11394: Update jackson-databind to 2.12.6.1 This updates jackson-databind to address CVE-2020-36518. Testing: - Ran a core job Change-Id: I8db403a102097a22c48f5d9d42ced3b85930078f Reviewed-on: http://gerrit.cloudera.org:8080/18891 Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-08-23 15:50:35 +00:00
Csaba Ringhofer	7ca11dfc7f	IMPALA-9482: Support for BINARY columns This patch adds support for BINARY columns for all table formats with the exception of Kudu. In Hive the main difference between STRING and BINARY is that STRING is assumed to be UTF8 encoded, while BINARY can be any byte array. Some other differences in Hive: - BINARY can be only cast from/to STRING - Only a small subset of built-in STRING functions support BINARY. - In several file formats (e.g. text) BINARY is base64 encoded. - No NDV is calculated during COMPUTE STATISTICS. As Impala doesn't treat STRINGs as UTF8, BINARY and STRING become nearly identical, especially from the backend's perspective. For this reason, BINARY is implemented a bit differently compared to other types: while the frontend treats STRING and BINARY as two separate types, most of the backend uses PrimitiveType::TYPE_STRING for BINARY too, e.g. in SlotDesc. Only the following parts of backend need to differentiate between STRING and BINARY: - table scanners - table writers - HS2/Beeswax service These parts have access to column metadata, which allows to add special handling for BINARY. Only a very few builtins are allowed for BINARY at the moment: - length - min/max/count - coalesce and similar "selector" functions Other STRING functions can be only used by casting to STRING first. Adding support for more of these functions is very easy, as simply the BINARY type has to be "connected" to the already existing STRING function's signature. Functions where the result depends on utf8_mode need to ensure that with BINARY it always works as if utf8_mode=0 (for example length() is mapped to bytes() as length count utf8 chars if utf8_mode=1). All kinds of UDFs (native, Hive legacy, Hive generic) support BINARY, though in case of legacy Hive UDFs it is only supported if the argument and return types are set explicitely to ensure backward compatibility. See IMPALA-11340 for details. The original plan was to behave as close to Hive as possible, but I realized that Hive has more relaxed casting rules than Impala, which led to STRING<->BINARY casts being necessary in more cases in Impala. This was needed to disallow passing a BINARY to functions that expect a STRING argument. An example for the difference is that in INSERT ... VALUES () string literals need to be explicitly cast to BINARY, while this is not needed in Hive. Testing: - Added functional.binary_tbl for all file formats (except Kudu) to test scanning. - Removed functional.unsupported_types and related tests, as now Impala supports all (non-complex) types that Hive does. - Added FE/EE tests mainly based on the ones added to the DATE type Change-Id: I36861a9ca6c2047b0d76862507c86f7f153bc582 Reviewed-on: http://gerrit.cloudera.org:8080/16066 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-08-19 13:55:42 +00:00
Joe McDonnell	4845f36b4e	IMPALA-11207: Use hadoop-cloud-storage for Cloud dependencies Hadoop provides hadoop-cloud-storage, which includes most of the dependencies that Impala currently uses like hadoop-aws, hadoop-azure, Knox's gateway-cloud-bindings, etc. Hadoop has put in a lot of work to make sure that this package includes the right version of dependencies (including shading some dependencies for GCS). It seems like this is a more reliable way to consume these dependencies. This switches the Java build to use hadoop-cloud-storage and removes the dependencies that it replaces. This eliminates the need to control the version of oauth and GCS, as those are determined by hadoop-cloud-storage. Change-Id: I3a1631289f990513823c2b17eb9241cc1b5a7ffd Reviewed-on: http://gerrit.cloudera.org:8080/18817 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-08-15 21:11:42 +00:00
Michael Smith	830625b104	IMPALA-9442: Add Ozone to minicluster Adds Ozone as an alternative to hdfs in the minicluster. Select by setting `export TARGET_FILESYSTEM=ozone`. With that flag, run-mini-dfs.sh will start Ozone instead of HDFS. Requires a snapshot because Ozone does not support HBase (HDDS-3589); snapshot loading doesn't work yet primarily due to HDDS-5502. Uses the o3fs interface because Ozone puts specific restrictions on bucket names (no underscores, for instance), and it was a lot easier to use an interface where everything is written to a single bucket than to update all Impala's use of HDFS-style paths to make `test-warehouse` a bucket inside a volume. Specifies reduced Ozone client retries during shutdown where Ozone may not be available. Passes tests with FE_TEST=false BE_TEST=false. Change-Id: Ibf8b0f7b2d685d8b011df1926e12bf5434b5a2be Reviewed-on: http://gerrit.cloudera.org:8080/18738 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-08-03 16:58:20 +00:00
Csaba Ringhofer	112f887ea5	IMPALA-11342: Fix class loading in Hive UDFs' constructors Loading new classes from the same jar in the constructor of UDFs did not work in the catalog because the URLClassLoader was closed too early. Extended the lifecycle of the class loader a bit to let the catalog finish all initialisation. Note that the instantiation of legacy Hive UDFs doesn't seem necessary in the catalog, we can get all relevant info from the class. Generic UDFs do need to be instantiated to be able to call initialize(). Testing: - added new classes to load in test UDFs and loaded these in constructor / initialize() - ran the Hive UDF ee tests Change-Id: If16e38b8fc3b2577a5d32104ea9e6948b9562e24 Reviewed-on: http://gerrit.cloudera.org:8080/18611 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-07-20 19:58:15 +00:00
Michael Smith	00db9a27df	IMPALA-11407: Upgrade google-oauth-client to 1.33.3 Upgrades google-oauth-client and google-oauth-client-java6 to 1.33.3 to address CVE-2021-22573. These are included as dependencies of com.google.cloud.bigdataoss/gcs-connector, which does not yet have a release that includes versions 1.33.3 or later. Change-Id: I8d95913f26e6073373374e169ee045881f40f065 Reviewed-on: http://gerrit.cloudera.org:8080/18683 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-07-01 04:10:13 +00:00
Riza Suminto	06b1db4675	IMPALA-11369: Separate thrift compiler for different component Impala used to have one thrift compiler version to compile C++, Java, and Python code. Most Thrift serialization/deserialization between minor versions are compatible with each other. So it is possible to have different thrift compiler versions for different target codes. It is beneficial to do so because it will allow Impala to upgrade separate components independently. This patch implements the infrastructure change required to do so. It replace most of the 'THRIFT_' environment variable and CMake variable with 'THRFIT_CPP_', 'THRFIT_JAVA_', and 'THRFIT_PY_' to compile C++, Java, and Python code accordingly. All three still refer to the same thrift version (thrift-0.11.0-p5). Testing: - Build Impala and pass core tests. Change-Id: I56479dc69b79024d1a4d09211bbe88a61fa0c6a4 Reviewed-on: http://gerrit.cloudera.org:8080/18636 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-06-21 02:40:59 +00:00
Tamas Mate	97d3b25be3	IMPALA-11338: Update IMPALA_VERSION to 4.2.0-SNAPSHOT As 4.1.0 has been released this commit updates the master to 4.2.0. This step needs to happen on each release, related changes are: IMPALA-10198, IMPALA-10057 Testing: - Ran a build Change-Id: Idab47eedb27ca4be42300dfc2eeb81eefe407b70 Reviewed-on: http://gerrit.cloudera.org:8080/18595 Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-06-07 22:50:50 +00:00
Steve Carlin	ca5ea4aeab	IMPALA-11162: Support GenericUDFs for Hive Hive has 2 types of UDFs. This commit contains limited support for the second generation UDFs called GenericUDFs. The main limitations are as follows: Decimal types are not supported. The Impala framework determines the precision and scale of the decimal return type. However, the Hive GenericUDFs allow the capability to choose its own return type based on the parameters. Until this can be resolved, it is safer to forbid decimals from being used. Note that this limitation currently exists in the first generation of Hive Java UDFs. Complex types are not supported. Functions are not extracted from the jar file. The first generation of Hive UDFs allowed this because the method prototypes are explicitly defined and can be determined at function creation time. For GenericUDFs, the return types are determined based on the parameters passed in when running a query. For the same reason as above, GenericUDFs cannot be made permanent. They will need to be recreated everytime the server is restarted. This is a severe limitation and will be resolved in the near future. Change-Id: Ie6fd09120db413fade94410c83ebe8ff104013cd Reviewed-on: http://gerrit.cloudera.org:8080/18295 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2022-05-11 15:10:28 +00:00
Michael Smith	e6ed98c22b	IMPALA-11201: update gitignore files Updates gitignore for files generated during bootstrap_development. Fixes deleting tracked files in be/src/thirdparty. Includes ignore rules for past versions of shell dependencies and updates ignores for current versions. Change-Id: I03deba5e7fb151ef8e34039becdcc3fb47684084 Reviewed-on: http://gerrit.cloudera.org:8080/18499 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-05-10 03:06:59 +00:00
Joe McDonnell	4118522b9c	IMPALA-10057: Fix log spew by using jars in the classpath Some tests saw log spew that causes the INFO log files to be filled with output like this: E0903 02:25:39.453887 12060 TransactionKeepalive.java:137] Unexpected exception thrown Java exception follows: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: org/apache/impala/common/TransactionKeepalive$HeartbeatContext at org.apache.impala.common.TransactionKeepalive$DaemonThread.run(TransactionKeepalive.java:114) at java.lang.Thread.run(Thread.java:748) ... It turns out that the catalogd/impalad use a CLASSPATH in tests that refers to fe/target/classes. The maven command that runs frontend tests recompiles these classes and causes the files in fe/target/classes to be deleted and recreated. There are race conditions where this causes the symptoms above. This changes the CLASSPATH to use the frontend jars, which are not impacted by the machinations on fe/target/classes. To find the appropriate jar, set-classpath.sh needs to know the Impala version. This adds IMPALA_VERSION in bin/impala-config.sh to provide an easy to use environment variable. To make the versioning more uniform, this modifies bin/save-version.sh to use this environment variable. It also adds a check to make sure that the Java pom.xml files use the same version as the environment variable. It fails the build if the Java pom.xml files do not match. Testing: - Ran core jobs - Checked the log file sizes on jobs - Changed a Java pom.xml's version and verified that bin/validate-java-pom-versions.sh fails Change-Id: Id35544e446c5bf283c322d3fe2e7ad475cfa12eb Reviewed-on: http://gerrit.cloudera.org:8080/18415 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2022-05-10 00:19:18 +00:00
Joe McDonnell	3627b027fe	IMPALA-11229: Upgrade Spring framework to 5.3.18 This upgrade the Spring framework to 5.3.18 to address multiple CVEs: - CVE-2022-22965 - CVE-2022-22950 - CVE-2021-22060 Testing: - Ran core job - Ran custom cluster tests in exhaustive mode Change-Id: Ie1b299c5b24e70c9db6eb0ce37fee9e32908423e Reviewed-on: http://gerrit.cloudera.org:8080/18405 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tamas Mate <tmater@apache.org>	2022-04-13 15:12:25 +00:00
Joe McDonnell	26398855bf	IMPALA-10930: Bump the Java artifact versions to 4.1.0-SNAPSHOT This changes the Maven pom.xml files to use verison 4.1.0-SNAPSHOT rather than 4.0.0-SNAPSHOT. In the past, these versions were a fixed value, but that changed with IMPALA-10198. This is a new step that needs to happen on each release. Testing: - Ran a build Change-Id: I10a589b4fbc15048199943a0e06d079f57840239 Reviewed-on: http://gerrit.cloudera.org:8080/18390 Reviewed-by: Tamas Mate <tmater@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-04-11 16:06:46 +00:00
Joe McDonnell	aa404b856f	IMPALA-11197/IMPALA-11149: Address CVEs in pac4j/xmlsec This upgrades pac4j and several of its dependencies (including xmlsec) to address CVEs in those components. Specifically: - pac4j 4.5.5 addresses CVE-2021-44878 - xmlsec 2.2.3 addresses CVE-2021-40690 - bcprov 1.68 addresses CVE-2020-15522 This also upgrade springframework to 5.2.9.RELEASE to match the version for pac4j 4.5.5. Testing: - Ran core job Change-Id: I8421d867dd0fce8eeaa6bc13a511ca3e8dd05723 Reviewed-on: http://gerrit.cloudera.org:8080/18348 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-03-24 15:49:47 +00:00
Fucun Chu	4186727fe6	IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 Like IMPALA-8369, this patch adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.2. we need adds a new Metastoreshim class under compat-apache-hive-3 directory. These shim classes implement method which are different in cdp-hive-3 vs apache-hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is added to as source using the fe/pom.xml build plugin. Some codes that directly use Hive 4 APIs need to be ignored in compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/. Use Maven profile to ignore some codes, profile will automatically activated based on the IMPALA_HIVE_DIST_TYPE. Testing: 1. Code compiles and runs against both HMS-3 and ASF-HMS-3 2. Ran full-suite of tests against HMS-3 3. Running full-tests against ASF-HMS-3 will need more work supporting Tez in the mini-cluster (for dataloading) and HMS transaction support. This will be on-going effort and test failures on ASF-Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Apache Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038. This hack will be added to the build script in additional sub-tasks. Change-Id: I9f08db5f6da735ac431819063060941f0941f606 Reviewed-on: http://gerrit.cloudera.org:8080/17774 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-02-27 06:36:19 +00:00
Zoltan Haindrich	7131ee34d2	IMPALA-10934 (Part 1): Enable table definition over a single file Implements an abstraction layer to show files in a single directory. Impala side part - filesystem drivers are in HIVE-25569. Suppose that the filesystem has a directory in which there are multiple files: hdfs://somedir/f1.txt hdfs://somedir/f2.txt In case of a HMS backed table(s) - the contents of a directory could be considered as table. This patch enables a new file system wrapper 'sfs+' (sfs = single file system) which provides a view of a single file in a directory.' The '+' indicates that this wrapper can be added on top of multiple underlying file systems/object storage such as HDFS, S3 etc. The directory which contains the file could be specified: sfs+hdfs://somedir/f1.txt/#SINGLEFILE# This will be a directory containing only the f1.txt and nothing else. This patch was tested locally - with a custom build of Hive version which also had HIVE-25569. Change-Id: I32be936243aa4c8320f5d06d2b7fbf98822f82e7 Reviewed-on: http://gerrit.cloudera.org:8080/17878 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Aman Sinha <amsinha@cloudera.com>	2022-01-05 03:32:11 +00:00
Fucun Chu	157086cb80	IMPALA-10771: Add Tencent COS support This patch adds support for COS(Cloud Object Storage). Using the hadoop-cos, the implementation is similar to other remote FileSystems. New flags for COS: - num_cos_io_threads: Number of COS I/O threads. Defaults to be 16. Follow-up: - Support for caching COS file handles will be addressed in IMPALA-10772. - test_concurrent_inserts and test_failing_inserts in test_acid_stress.py are skipped due to slow file listing on COS (IMPALA-10773). Tests: - Upload hdfs test data to a COS bucket. Modify all locations in HMS DB to point to the COS bucket. Remove some hdfs caching params. Run CORE tests. Change-Id: Idce135a7591d1b4c74425e365525be3086a39821 Reviewed-on: http://gerrit.cloudera.org:8080/17503 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-12-08 16:32:02 +00:00
Zoltan Borok-Nagy	7b2bb13ecc	IMPALA-10810: Bump json-smart from 2.3 to 2.4.7 I noticed that our json-smart dependency is stale and we could pick up a newer version. This patch upgrades it to 2.4.7 which is the newest version at the time of writing. Change-Id: I6b43f606f40e172aa267b55c564fa64d68515bd5 Reviewed-on: http://gerrit.cloudera.org:8080/17702 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-07-22 23:05:34 +00:00
Zoltan Borok-Nagy	e26543426c	IMPALA-9967: Add support for reading ORC's TIMESTAMP WITH LOCAL TIMEZONE ORC-189 and ORC-666 added support for a new timestamp type 'TIMESTMAP WITH LOCAL TIMEZONE' to the Orc library. This patch adds support for reading such timestamps with Impala. These are UTC-normalized timestamps, therefore we convert them to local timezone during scanning. Testing: * added test for CREATE TABLE LIKE ORC * added scanner tests to test_scanners.py Change-Id: Icb0c6a43ebea21f1cba5b8f304db7c4bd43967d9 Reviewed-on: http://gerrit.cloudera.org:8080/17347 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-05-07 14:11:08 +00:00
Csaba Ringhofer	94f67a3432	IMPALA-7825: Upgrade Thrift version to 0.11.0 Before this patch Impala mainly used Thrift 0.9.3, but it was possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0 Thrift lib was already included in the toolchain. Most of the changes are related to replacing boost:: with std:: shared_ptr-s in cpp code (this is a continuation of patch by Sahil). The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as Impala's test framework relies on Impyla. A thrift_sasl release is also needed, because it currently pins Thrift version to 0.9.3 for Python 2. The current patch uses alpha releases from Impyla and thrift_sasl that use thrift 0.11.0. Notable side effects: - old logic to compile thrift for impala-shell with 0.11.0 was removed - impala_shell's utf8 handling had to be updated as the new 0.11.0 compilation happens with no_utf8strings. This also made things a bit faster, e.g the following is ~0.22s instead of ~0.25 shell/impala_shell.py \ -B -q "select * from functional_parquet.alltypes;" > /dev/null - THRIFT-3921 changed the stream operators to print an enum's name instead of its number, leading to slightly different messages in some cases. - "templates" was added to the thift generator's parameters to avoid a compilation issue (related to IMPALA-10600). I didn't notice any change in compilation time. This option generated .tcc files with templetized readers/writers for Thrift types. Currently we don't use these, but they could potentially speed up (de)serialization. Testing: - ran Impyla's test suite with Python 2 and 3 - ran core tests Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Reviewed-on: http://gerrit.cloudera.org:8080/17170 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-04-27 13:36:54 +00:00
Joe McDonnell	267f4d67f4	IMPALA-10455: Reorder Maven repositories for cleaner mirror semantics When using a Maven mirror that uses a mirrorOf pattern, the order of repositories in the pom.xml has a strong influence on whether the build tries the mirror for a particular artifact. If an early repository matches the mirrorOf condition, Maven may try the mirror for all artifacts, even those that only exist in the s3 bucket. This extra check can slow down the build, especially if the mirror is slow to respond for unknown artifacts. For Impala, the common case is for a mirror to cover everything except the artifacts that come from the Kudu local repository or the s3 bucket. To optimize for that case, this reorders the Maven repositories to be in this order: 1. Local/S3 repositories 2. Regular repositories 3. Banned repositories The repositories are otherwise unchanged. Testing: - Ran an ordinary build - Ran a build with a mirrorOf "external:*,!impala.cdp.repo" and verified that the build went directly to the s3 bucket first. Change-Id: I7046c7ec5391833e98ee6a463fb8c08b6a04cb26 Reviewed-on: http://gerrit.cloudera.org:8080/17020 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-04-08 21:38:35 +00:00

1 2

62 Commits