For most clusters that have multiple users and production availability requirements, you might set up a proxy server to relay requests to and from Impala.
Currently, the Impala statestore mechanism does not include such proxying and load-balancing features. Set up a software package of your choice to perform these functions.
Using a load-balancing proxy server for Impala has the following advantages:
The following setup steps are a general outline that apply to any load-balancing proxy software:
Set up a port that the load balancer will listen on to relay Impala requests back and forth.
Consider enabling sticky sessions
. Where practical, enable this setting
so that stateless client applications such as
For Kerberized clusters, follow the instructions in
In particular, if you are using Hue or JDBC-based applications,
you typically set up load balancing for both ports 21000 and 21050, because
these client applications connect through port 21050 while the
Load-balancing software offers a number of algorithms to distribute requests. Each algorithm has its own characteristics that make it suitable in some situations but not others.
stick tablescan cause long-running Hue sessions to disconnect, therefore source affinity is often a better choice.
You might need to perform benchmarks and load testing to determine which setting is optimal for your use case. If some client applications have special characteristics, such as long-running Hue queries working best with source affinity, you might configure multiple virtual IP addresses with a different load-balancing algorithm for each.
In a cluster using Kerberos, applications check host credentials to verify that the host they are connecting to is the same one that is actually processing the request, to prevent man-in-the-middle attacks. To clarify that the load-balancing proxy server is legitimate, perform these extra Kerberos setup steps:
If you are not already using a load-balancing proxy, you can experiment with
Install the load balancer:
Set up the configuration file:
Run the load balancer (on a single host, preferably one not running
In
This is the sample