Zoltan Borok-Nagy afa329fd89 IMPALA-13931: TestIcebergRestCatalog.test_rest_catalog_basic failed at setup
There were several issues with test_rest_catalog_basic which made it
fail in environments that used Ozone or S3.

Missing dependency of Ozone and S3 classes:
* This is resolved in iceberg-rest-catalog-test/pom.xml by adding
  a dependency to impala-executor-deps

Hadoop configuration was initialized properly:
* run-iceberg-rest-server.sh used Maven to run Iceberg REST Catalog in
  which case Maven is in charge of setting the CLASSPATH but the
  core-site/ozone-site/etc. config files were not on it, so the
  REST Catalog used a default Hadoop configuration that wasn't good
  for our environment.
* To overcome the CLASSPATH problem now we create a runnable JAR in
  iceberg-rest-catalog-test/pom.xml and also generate the proper
  CLASSPATH during compilation.
* run-iceberg-rest-server.sh now uses java -cp to run the REST
  Catalog

S3 builds threw NoSuchMethodException for the "create" method of
ApacheHttpClientConfigurations:
* The Iceberg library dynamically load its http client builders
  to workaround an error, see details in
  https://github.com/apache/iceberg/issues/6715
* So the Iceberg lib dynamically wants to load the "create" method
  of its own ApacheHttpClientConfigurations class but it fails
  with NoSuchMethodException.
* The critical code is invoked from Impala's IcebergMetadataScanner's
  ScanMetadataTable() method which happens to be invoked through
  JNI from the C++ backend.
* The context class loader of such threads are NULL, which means
  Java will use the bootstrap class loader to load classes and methods,
  but that doesn't have the proper resources on its classpath.
* To overcome this issue we set the context class loader for the thread
  to the class loader that originally loaded the IcebergMetadataScanner
  class.

Change-Id: I9dc0e30aeaff0b8de41426ba38506383b4af472c
Reviewed-on: http://gerrit.cloudera.org:8080/22818
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-05-09 17:01:56 +00:00

Welcome to Impala

Lightning-fast, distributed SQL queries for petabytes of data stored in open data and table formats.

Impala is a modern, massively-distributed, massively-parallel, C++ query engine that lets you analyze, transform and combine data from a variety of data sources:

More about Impala

The fastest way to try out Impala is a quickstart Docker container. You can try out running queries and processing data sets in Impala on a single machine without installing dependencies. It can automatically load test data sets into Apache Kudu and Apache Parquet formats and you can start playing around with Apache Impala SQL within minutes.

To learn more about Impala as a user or administrator, or to try Impala, please visit the Impala homepage. Detailed documentation for administrators and users is available at Apache Impala documentation.

If you are interested in contributing to Impala as a developer, or learning more about Impala's internals and architecture, visit the Impala wiki.

Supported Platforms

Impala only supports Linux at the moment. Impala supports x86_64 and has experimental support for arm64 (as of Impala 4.0). Impala Requirements contains more detailed information on the minimum CPU requirements.

Supported OS Distributions

Impala runs on Linux systems only. The supported distros are

  • Ubuntu 16.04/18.04
  • CentOS/RHEL 7/8

Other systems, e.g. SLES12, may also be supported but are not tested by the community.

Export Control Notice

This distribution uses cryptographic software and may be subject to export controls. Please refer to EXPORT_CONTROL.md for more information.

Build Instructions

See Impala's developer documentation to get started.

Detailed build notes has some detailed information on the project layout and build.

Description
Apache Impala
Readme 288 MiB
Languages
C++ 49.6%
Java 29.9%
Python 14.6%
JavaScript 1.4%
C 1.2%
Other 3.2%