mirror of
https://github.com/apache/impala.git
synced 2025-12-19 18:12:08 -05:00
IMPALA-9646: clean up README
Misc improvements to get the README up-to-date and direct readers to the most appropriate docs. Change-Id: I05fb4a97b6a915fd6e460d9a2079b2d23134678f Reviewed-on: http://gerrit.cloudera.org:8080/15719 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
This commit is contained in:
69
README-build.md
Normal file
69
README-build.md
Normal file
@@ -0,0 +1,69 @@
|
||||
This document introduces the Impala project layout and some key configuration variables.
|
||||
Beware that it may become stale over time as the project evolves.
|
||||
|
||||
# Detailed Build Notes
|
||||
|
||||
Impala can be built with pre-built components or components downloaded from S3.
|
||||
The components needed to build Impala are Apache Hadoop, Hive, HBase, and Sentry.
|
||||
If you need to manually override the locations or versions of these components, you
|
||||
can do so through the environment variables and scripts listed below.
|
||||
|
||||
## Scripts and directories
|
||||
|
||||
| Location | Purpose |
|
||||
|------------------------------|---------|
|
||||
| bin/impala-config.sh | This script must be sourced to setup all environment variables properly to allow other scripts to work |
|
||||
| bin/impala-config-local.sh | A script can be created in this location to set local overrides for any environment variables |
|
||||
| bin/impala-config-branch.sh | A version of the above that can be checked into a branch for convenience. |
|
||||
| bin/bootstrap_build.sh | A helper script to bootstrap some of the build requirements. |
|
||||
| bin/bootstrap_development.sh | A helper script to bootstrap a developer environment. Please read it before using. |
|
||||
| be/build/ | Impala build output goes here. |
|
||||
| be/generated-sources/ | Thrift and other generated source will be found here. |
|
||||
|
||||
## Build Related Variables
|
||||
|
||||
| Environment variable | Default value | Description |
|
||||
|----------------------|---------------|-------------|
|
||||
| IMPALA_HOME | | Top level Impala directory |
|
||||
| IMPALA_TOOLCHAIN | "${IMPALA_HOME}/toolchain" | Native toolchain directory (for compilers, libraries, etc.) |
|
||||
| SKIP_TOOLCHAIN_BOOTSTRAP | "false" | Skips downloading the toolchain any python dependencies if "true" |
|
||||
| CDH_BUILD_NUMBER | | Identifier to indicate the CDH build number
|
||||
| CDH_COMPONENTS_HOME | "${IMPALA_HOME}/toolchain/cdh_components-${CDH_BUILD_NUMBER}" | Location of the CDH components within the toolchain. |
|
||||
| CDH_MAJOR_VERSION | "5" | Identifier used to uniqueify paths for potentially incompatible component builds. |
|
||||
| IMPALA_CONFIG_SOURCED | "1" | Set by ${IMPALA_HOME}/bin/impala-config.sh (internal use) |
|
||||
| JAVA_HOME | "/usr/lib/jvm/${JAVA_VERSION}" | Used to locate Java |
|
||||
| JAVA_VERSION | "java-7-oracle-amd64" | Can override to set a local Java version. |
|
||||
| JAVA | "${JAVA_HOME}/bin/java" | Java binary location. |
|
||||
| CLASSPATH | | See bin/set-classpath.sh for details. |
|
||||
| PYTHONPATH | Will be changed to include: "${IMPALA_HOME}/shell/gen-py" "${IMPALA_HOME}/testdata" "${THRIFT_HOME}/python/lib/python2.7/site-packages" "${HIVE_HOME}/lib/py" |
|
||||
|
||||
## Source Directories for Impala
|
||||
|
||||
| Environment variable | Default value | Description |
|
||||
|----------------------|---------------|-------------|
|
||||
| IMPALA_BE_DIR | "${IMPALA_HOME}/be" | Backend directory. Build output is also stored here. |
|
||||
| IMPALA_FE_DIR | "${IMPALA_HOME}/fe" | Frontend directory |
|
||||
| IMPALA_COMMON_DIR | "${IMPALA_HOME}/common" | Common code (thrift, function registry) |
|
||||
|
||||
## Various Compilation Settings
|
||||
|
||||
| Environment variable | Default value | Description |
|
||||
|----------------------|---------------|-------------|
|
||||
| IMPALA_BUILD_THREADS | "8" or set to number of processors by default. | Used for make -j and distcc -j settings. |
|
||||
| IMPALA_MAKE_FLAGS | "" | Any extra settings to pass to make. Also used when copying udfs / udas into HDFS. |
|
||||
| USE_SYSTEM_GCC | "0" | If set to any other value, directs cmake to not set GCC_ROOT, CMAKE_C_COMPILER, CMAKE_CXX_COMPILER, as well as setting TOOLCHAIN_LINK_FLAGS |
|
||||
| IMPALA_CXX_COMPILER | "default" | Used by cmake (cmake_modules/toolchain and clang_toolchain.cmake) to select gcc / clang |
|
||||
| USE_GOLD_LINKER | "true" | Directs backend cmake to use gold. |
|
||||
| IS_OSX | "false" | (Experimental) currently only used to disable Kudu. |
|
||||
|
||||
## Dependencies
|
||||
| Environment variable | Default value | Description |
|
||||
|----------------------|---------------|-------------|
|
||||
| HADOOP_HOME | "${CDH_COMPONENTS_HOME}/hadoop-${IMPALA_HADOOP_VERSION}/" | Used to locate Hadoop |
|
||||
| HADOOP_INCLUDE_DIR | "${HADOOP_HOME}/include" | For 'hdfs.h' |
|
||||
| HADOOP_LIB_DIR | "${HADOOP_HOME}/lib" | For 'libhdfs.a' or 'libhdfs.so' |
|
||||
| HIVE_HOME | "${CDH_COMPONENTS_HOME}/{hive-${IMPALA_HIVE_VERSION}/" | |
|
||||
| HBASE_HOME | "${CDH_COMPONENTS_HOME}/hbase-${IMPALA_HBASE_VERSION}/" | |
|
||||
| SENTRY_HOME | "${CDH_COMPONENTS_HOME}/sentry-${IMPALA_SENTRY_VERSION}/" | Used to setup test data |
|
||||
| THRIFT_HOME | "${IMPALA_TOOLCHAIN}/thrift-${IMPALA_THRIFT_VERSION}" | |
|
||||
|
||||
88
README.md
88
README.md
@@ -1,6 +1,6 @@
|
||||
# Welcome to Impala
|
||||
|
||||
Lightning-fast, distributed [SQL](http://en.wikipedia.org/wiki/SQL) queries for petabytes
|
||||
Lightning-fast, distributed [SQL](https://en.wikipedia.org/wiki/SQL) queries for petabytes
|
||||
of data stored in Apache Hadoop clusters.
|
||||
|
||||
Impala is a modern, massively-distributed, massively-parallel, C++ query engine that lets
|
||||
@@ -8,18 +8,24 @@ you analyze, transform and combine data from a variety of data sources:
|
||||
|
||||
* Best of breed performance and scalability.
|
||||
* Support for data stored in [HDFS](https://hadoop.apache.org/),
|
||||
[Apache HBase](http://hbase.apache.org/) and [Amazon S3](http://aws.amazon.com/s3/).
|
||||
[Apache HBase](https://hbase.apache.org/), [Apache Kudu](https://kudu.apache.org/),
|
||||
[Amazon S3](https://aws.amazon.com/s3/),
|
||||
[Azure Data Lake Storage](https://azure.microsoft.com/en-us/services/storage/data-lake-storage/),
|
||||
[Apache Hadoop Ozone](https://hadoop.apache.org/ozone/) and more!
|
||||
* Wide analytic SQL support, including window functions and subqueries.
|
||||
* On-the-fly code generation using [LLVM](http://llvm.org/) to generate CPU-efficient
|
||||
* On-the-fly code generation using [LLVM](http://llvm.org/) to generate lightning-fast
|
||||
code tailored specifically to each individual query.
|
||||
* Support for the most commonly-used Hadoop file formats, including the
|
||||
[Apache Parquet](https://parquet.apache.org/) project.
|
||||
* Support for the most commonly-used Hadoop file formats, including
|
||||
[Apache Parquet](https://parquet.apache.org/) and [Apache ORC](https://orc.apache.org).
|
||||
* Support for industry-standard security protocols, including Kerberos, LDAP and TLS.
|
||||
* Apache-licensed, 100% open source.
|
||||
|
||||
## More about Impala
|
||||
|
||||
To learn more about Impala as a business user, or to try Impala live or in a VM, please
|
||||
visit the [Impala homepage](https://impala.apache.org).
|
||||
visit the [Impala homepage](https://impala.apache.org). Detailed documentation for
|
||||
administrators and users is available at
|
||||
[Apache Impala documentation](https://impala.apache.org/impala-docs.html).
|
||||
|
||||
If you are interested in contributing to Impala as a developer, or learning more about
|
||||
Impala's internals and architecture, visit the
|
||||
@@ -36,70 +42,8 @@ Please refer to EXPORT\_CONTROL.md for more information.
|
||||
|
||||
## Build Instructions
|
||||
|
||||
See bin/bootstrap_build.sh.
|
||||
See [Impala's developer documentation](https://cwiki.apache.org/confluence/display/IMPALA/Impala+Home)
|
||||
to get started.
|
||||
|
||||
### Detailed Build Notes
|
||||
|
||||
Impala can be built with pre-built components or components downloaded from S3.
|
||||
The components needed to build Impala are Apache Hadoop, Hive, HBase, and Sentry.
|
||||
If you need to manually override the locations or versions of these components, you
|
||||
can do so through the environment variables and scripts listed below.
|
||||
|
||||
##### Scripts and directories
|
||||
|
||||
| Location | Purpose |
|
||||
|------------------------------|---------|
|
||||
| bin/impala-config.sh | This script must be sourced to setup all environment variables properly to allow other scripts to work |
|
||||
| bin/impala-config-local.sh | A script can be created in this location to set local overrides for any environment variables |
|
||||
| bin/impala-config-branch.sh | A version of the above that can be checked into a branch for convenience. |
|
||||
| bin/bootstrap_build.sh | A helper script to bootstrap some of the build requirements. |
|
||||
| bin/bootstrap_development.sh | A helper script to bootstrap a developer environment. Please read it before using. |
|
||||
| be/build/ | Impala build output goes here. |
|
||||
| be/generated-sources/ | Thrift and other generated source will be found here. |
|
||||
|
||||
##### Build Related Variables
|
||||
|
||||
| Environment variable | Default value | Description |
|
||||
|----------------------|---------------|-------------|
|
||||
| IMPALA_HOME | | Top level Impala directory |
|
||||
| IMPALA_TOOLCHAIN | "${IMPALA_HOME}/toolchain" | Native toolchain directory (for compilers, libraries, etc.) |
|
||||
| SKIP_TOOLCHAIN_BOOTSTRAP | "false" | Skips downloading the toolchain any python dependencies if "true" |
|
||||
| CDH_BUILD_NUMBER | | Identifier to indicate the CDH build number
|
||||
| CDH_COMPONENTS_HOME | "${IMPALA_HOME}/toolchain/cdh_components-${CDH_BUILD_NUMBER}" | Location of the CDH components within the toolchain. |
|
||||
| CDH_MAJOR_VERSION | "5" | Identifier used to uniqueify paths for potentially incompatible component builds. |
|
||||
| IMPALA_CONFIG_SOURCED | "1" | Set by ${IMPALA_HOME}/bin/impala-config.sh (internal use) |
|
||||
| JAVA_HOME | "/usr/lib/jvm/${JAVA_VERSION}" | Used to locate Java |
|
||||
| JAVA_VERSION | "java-7-oracle-amd64" | Can override to set a local Java version. |
|
||||
| JAVA | "${JAVA_HOME}/bin/java" | Java binary location. |
|
||||
| CLASSPATH | | See bin/set-classpath.sh for details. |
|
||||
| PYTHONPATH | Will be changed to include: "${IMPALA_HOME}/shell/gen-py" "${IMPALA_HOME}/testdata" "${THRIFT_HOME}/python/lib/python2.7/site-packages" "${HIVE_HOME}/lib/py" |
|
||||
|
||||
##### Source Directories for Impala
|
||||
|
||||
| Environment variable | Default value | Description |
|
||||
|----------------------|---------------|-------------|
|
||||
| IMPALA_BE_DIR | "${IMPALA_HOME}/be" | Backend directory. Build output is also stored here. |
|
||||
| IMPALA_FE_DIR | "${IMPALA_HOME}/fe" | Frontend directory |
|
||||
| IMPALA_COMMON_DIR | "${IMPALA_HOME}/common" | Common code (thrift, function registry) |
|
||||
|
||||
##### Various Compilation Settings
|
||||
|
||||
| Environment variable | Default value | Description |
|
||||
|----------------------|---------------|-------------|
|
||||
| IMPALA_BUILD_THREADS | "8" or set to number of processors by default. | Used for make -j and distcc -j settings. |
|
||||
| IMPALA_MAKE_FLAGS | "" | Any extra settings to pass to make. Also used when copying udfs / udas into HDFS. |
|
||||
| USE_SYSTEM_GCC | "0" | If set to any other value, directs cmake to not set GCC_ROOT, CMAKE_C_COMPILER, CMAKE_CXX_COMPILER, as well as setting TOOLCHAIN_LINK_FLAGS |
|
||||
| IMPALA_CXX_COMPILER | "default" | Used by cmake (cmake_modules/toolchain and clang_toolchain.cmake) to select gcc / clang |
|
||||
| USE_GOLD_LINKER | "true" | Directs backend cmake to use gold. |
|
||||
| IS_OSX | "false" | (Experimental) currently only used to disable Kudu. |
|
||||
|
||||
##### Dependencies
|
||||
| Environment variable | Default value | Description |
|
||||
|----------------------|---------------|-------------|
|
||||
| HADOOP_HOME | "${CDH_COMPONENTS_HOME}/hadoop-${IMPALA_HADOOP_VERSION}/" | Used to locate Hadoop |
|
||||
| HADOOP_INCLUDE_DIR | "${HADOOP_HOME}/include" | For 'hdfs.h' |
|
||||
| HADOOP_LIB_DIR | "${HADOOP_HOME}/lib" | For 'libhdfs.a' or 'libhdfs.so' |
|
||||
| HIVE_HOME | "${CDH_COMPONENTS_HOME}/{hive-${IMPALA_HIVE_VERSION}/" | |
|
||||
| HBASE_HOME | "${CDH_COMPONENTS_HOME}/hbase-${IMPALA_HBASE_VERSION}/" | |
|
||||
| SENTRY_HOME | "${CDH_COMPONENTS_HOME}/sentry-${IMPALA_SENTRY_VERSION}/" | Used to setup test data |
|
||||
| THRIFT_HOME | "${IMPALA_TOOLCHAIN}/thrift-${IMPALA_THRIFT_VERSION}" | |
|
||||
[Detailed build notes](README-build.md) has some detailed information on the project
|
||||
layout and build.
|
||||
|
||||
@@ -92,7 +92,7 @@ be/src/util/cache/rl-cache-test.cc
|
||||
be/src/testutil/certificates-info.txt
|
||||
bin/README-RUNNING-BENCHMARKS
|
||||
LOGS.md
|
||||
README.md
|
||||
README*.md
|
||||
*/README
|
||||
*/README.dox
|
||||
*/README.txt
|
||||
|
||||
Reference in New Issue
Block a user