mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
Configure separate compile and link pools for ninja. Configures link parallelism based on expected memory use, which can be reduced by setting IMPALA_MINIMAL_DEBUG_INFO=true or IMPALA_SPLIT_DEBUG_INFO=true. Adds IMPALA_MAKE_CMD to simplify using the ninja build tool for all make operations in scripts. Install ninja on Ubuntu. Adds a '-make' option to buildall.sh to force using 'make'. Adds MOLD_JOBS=1 to avoid overloading the system when trying 'mold' and linking test binaries. However 'mold' is not selected as the default due to test failures around SASL/GSSAPI (see IMPALA-14527). Switches bin/jenkins/all-tests.sh to use ninja and removes the guard in bootstrap_development.sh limiting IMPALA_BUILD_THREADS as it's no longer needed with ninja. SKIP_BE_TEST_PATTERN in run-backend-tests is unused (only used with TARGET_FILESYSTEM=local) so I don't attempt to make it work with ninja. Tested with local 'IMPALA_SPLIT_DEBUG_INFO=true buildall.sh -skiptests' with default (make) and IMPALA_MAKE_CMD=ninja. Change-Id: I0952dc19ace5c9c42bed0d2ffb61499656c0a2db Reviewed-on: http://gerrit.cloudera.org:8080/23572 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Pranav Lodha <pranav.lodha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
103 lines
6.4 KiB
Markdown
103 lines
6.4 KiB
Markdown
This document introduces the Impala project layout and some key configuration variables.
|
|
Beware that it may become stale over time as the project evolves.
|
|
|
|
# Detailed Build Notes
|
|
|
|
Impala can be built with pre-built components or components downloaded from S3.
|
|
The components needed to build Impala are Apache Hadoop, Hive, and HBase.
|
|
If you need to manually override the locations or versions of these components, you
|
|
can do so through the environment variables and scripts listed below.
|
|
|
|
## Scripts and directories
|
|
|
|
| Location | Purpose |
|
|
|------------------------------|---------|
|
|
| bin/impala-config.sh | This script must be sourced to setup all environment variables properly to allow other scripts to work |
|
|
| bin/impala-config-local.sh | A script can be created in this location to set local overrides for any environment variables |
|
|
| bin/impala-config-branch.sh | A version of the above that can be checked into a branch for convenience. |
|
|
| bin/bootstrap_build.sh | A helper script to bootstrap some of the build requirements. |
|
|
| bin/bootstrap_development.sh | A helper script to bootstrap a developer environment. Please read it before using. |
|
|
| be/build/ | Impala build output goes here. |
|
|
| be/generated-sources/ | Thrift and other generated source will be found here. |
|
|
|
|
## Build Related Variables
|
|
|
|
| Environment variable | Default value | Description |
|
|
|----------------------|---------------|-------------|
|
|
| IMPALA_HOME | | Top level Impala directory |
|
|
| IMPALA_TOOLCHAIN | "${IMPALA_HOME}/toolchain" | Native toolchain directory (for compilers, libraries, etc.) |
|
|
| SKIP_TOOLCHAIN_BOOTSTRAP | "false" | Skips downloading the toolchain any python dependencies if "true" |
|
|
| CDP_BUILD_NUMBER | | Identifier to indicate the CDP build number
|
|
| CDP_COMPONENTS_HOME | "${IMPALA_HOME}/toolchain/cdp_components-${CDP_BUILD_NUMBER}" | Location of the CDP components within the toolchain. |
|
|
| CDH_MAJOR_VERSION | "7" | Identifier used to uniqueify paths for potentially incompatible component builds. |
|
|
| IMPALA_CONFIG_SOURCED | "1" | Set by ${IMPALA_HOME}/bin/impala-config.sh (internal use) |
|
|
| IMPALA_JDK_VERSION | "" | Set to 8+ to select a system Java version. Empty value uses JAVA_HOME, or sets it based on system defaults. |
|
|
| JAVA_HOME | "" | Uses Java from JAVA_HOME unless IMPALA_JDK_VERSION is set. |
|
|
| JAVA | "${JAVA_HOME}/bin/java" | Java binary location. |
|
|
| CLASSPATH | | See bin/set-classpath.sh for details. |
|
|
| PYTHONPATH | | See bin/set-pythonpath.sh for details. |
|
|
| USE_APACHE_COMPONENTS | false | Use Apache components for Hadoop, HBase, Hive, Tez, Ranger. It will set USE_APACHE_{HADOOP,HBASE,HIVE,TEZ,RANGER} variable as true if not set false. |
|
|
| USE_APACHE_HADOOP | false | Use Apache Hadoop |
|
|
| USE_APACHE_HBASE | false | Use Apache HBase |
|
|
| USE_APACHE_HIVE_3 | false | Use Apache Hive-3 |
|
|
| USE_APACHE_TEZ | false | Use Apache Tez |
|
|
| USE_APACHE_RANGER | false | Use Apache Ranger |
|
|
| DOWNLOAD_CDH_COMPONENTS | true | Download CDH components |
|
|
| DOWNLOAD_APACHE_COMPONENTS | true | Download Apache components |
|
|
|
|
## Source Directories for Impala
|
|
|
|
| Environment variable | Default value | Description |
|
|
|----------------------|---------------|-------------|
|
|
| IMPALA_BE_DIR | "${IMPALA_HOME}/be" | Backend directory. Build output is also stored here. |
|
|
| IMPALA_FE_DIR | "${IMPALA_HOME}/fe" | Frontend directory |
|
|
| IMPALA_COMMON_DIR | "${IMPALA_HOME}/common" | Common code (thrift, function registry) |
|
|
|
|
## Various Compilation Settings
|
|
|
|
| Environment variable | Default value | Description |
|
|
|----------------------|---------------|-------------|
|
|
| IMPALA_BUILD_THREADS | Number of processors. | Used for make -j and distcc -j settings. |
|
|
| IMPALA_LINK_THREADS | Bounded based on available memory. | Used for ninja. |
|
|
| IMPALA_MAKE_CMD | "make" | Make tool to use by default, options are make or ninja. |
|
|
| IMPALA_MAKE_FLAGS | "" | Any extra settings to pass to make. Also used when copying udfs / udas into HDFS. |
|
|
| USE_SYSTEM_GCC | "0" | If set to any other value, directs cmake to not set GCC_ROOT, CMAKE_C_COMPILER, CMAKE_CXX_COMPILER, as well as setting TOOLCHAIN_LINK_FLAGS |
|
|
| IMPALA_CXX_COMPILER | "default" | Used by cmake (cmake_modules/toolchain and clang_toolchain.cmake) to select gcc / clang |
|
|
| IMPALA_LINKER | "gold" | Specifies the linker to use; options are "gold", "mold", or "ld". |
|
|
| IS_OSX | "false" | (Experimental) currently only used to disable Kudu. |
|
|
|
|
## Dependencies
|
|
| Environment variable | Default value | Description |
|
|
|----------------------|---------------|-------------|
|
|
| HADOOP_HOME | "${CDP_COMPONENTS_HOME}/hadoop-${IMPALA_HADOOP_VERSION}/" | Used to locate Hadoop |
|
|
| HADOOP_INCLUDE_DIR | "${HADOOP_HOME}/include" | For 'hdfs.h' |
|
|
| HADOOP_LIB_DIR | "${HADOOP_HOME}/lib" | For 'libhdfs.a' or 'libhdfs.so' |
|
|
| HIVE_HOME | "${CDP_COMPONENTS_HOME}/{hive-${IMPALA_HIVE_VERSION}/" | |
|
|
| HBASE_HOME | "${CDP_COMPONENTS_HOME}/hbase-${IMPALA_HBASE_VERSION}/" | |
|
|
| THRIFT_CPP_HOME | "${IMPALA_TOOLCHAIN}/thrift-${IMPALA_THRIFT_CPP_VERSION}" | |
|
|
| THRIFT_JAVA_HOME | "${IMPALA_TOOLCHAIN}/thrift-${IMPALA_THRIFT_JAVA_VERSION}" | |
|
|
| THRIFT_PY_HOME | "${IMPALA_TOOLCHAIN}/thrift-${IMPALA_THRIFT_PY_VERSION}" | |
|
|
|
|
## Hive Dependency Overrides
|
|
Typically used together to specify a local build of Apache Hive. Care should be taken
|
|
while using these variables since they take precedence over the defaults in
|
|
impala-config.sh, they may cause confusion when switching between branches or versions of
|
|
Apache Impala.
|
|
|
|
| Environment variable | Description |
|
|
|----------------------|-------------|
|
|
| HIVE_VERSION_OVERRIDE | Used to specify different Hive version from default |
|
|
| HIVE_STORAGE_API_VERSION_OVERRIDE | Used to specify different Hive Storage API version from default |
|
|
| HIVE_METASTORE_THRIFT_DIR_OVERRIDE | Used to specify location of metastore thrift files to use during Thrift compilation |
|
|
| HIVE_HOME_OVERRIDE | Used to specify location of Hive |
|
|
|
|
## Ranger Dependency Overrides
|
|
Typically used together to specify a local build of Apache Ranger. Care should be taken
|
|
while using these variables since they take precedence over the defaults in
|
|
impala-config.sh.
|
|
|
|
| Environment variable | Description |
|
|
|----------------------|-------------|
|
|
| RANGER_VERSION_OVERRIDE | Used to specify different Ranger version from default |
|
|
| RANGER_HOME_OVERRIDE | Used to specify location of Ranger |
|