mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
[DOCS] Update impala_proxy.xml with the latest info
Change-Id: Ia9d80e21abb385704eea863d221e333441af9a39 Reviewed-on: http://gerrit.cloudera.org:8080/14857 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Balazs Jeszenszky <jeszyb@gmail.com> Reviewed-by: Vincent Tran <vttran@cloudera.com> Reviewed-by: Alex Rodoni <arodoni@cloudera.com>
This commit is contained in:
@@ -19,9 +19,7 @@ under the License.
|
||||
-->
|
||||
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
||||
<concept id="impala_jdbc">
|
||||
|
||||
<title id="jdbc">Configuring Impala to Work with JDBC</title>
|
||||
|
||||
<prolog>
|
||||
<metadata>
|
||||
<data name="Category" value="Impala"/>
|
||||
@@ -34,181 +32,106 @@ under the License.
|
||||
<data name="Category" value="Developers"/>
|
||||
</metadata>
|
||||
</prolog>
|
||||
|
||||
<conbody>
|
||||
|
||||
<p>
|
||||
<indexterm audience="hidden">JDBC</indexterm>
|
||||
Impala supports the standard JDBC interface, allowing access from commercial Business
|
||||
Intelligence tools and custom software written in Java or other programming languages. The
|
||||
JDBC driver allows you to access Impala from a Java program that you write, or a Business
|
||||
Intelligence or similar tool that uses JDBC to communicate with various database products.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Setting up a JDBC connection to Impala involves the following steps:
|
||||
</p>
|
||||
|
||||
<p> Impala supports the standard JDBC interface, allowing access from
|
||||
commercial Business Intelligence tools and custom software written in Java
|
||||
or other programming languages. The JDBC driver allows you to access
|
||||
Impala from a Java program that you write, or a Business Intelligence or
|
||||
similar tool that uses JDBC to communicate with various database products. </p>
|
||||
<p> Setting up a JDBC connection to Impala involves the following steps: </p>
|
||||
<ul>
|
||||
<li>
|
||||
Verifying the communication port where the Impala daemons in your cluster are listening
|
||||
for incoming JDBC requests.
|
||||
</li>
|
||||
|
||||
<li>
|
||||
Installing the JDBC driver on every system that runs the JDBC-enabled application.
|
||||
</li>
|
||||
|
||||
<li>
|
||||
Specifying a connection string for the JDBC application to access one of the servers
|
||||
running the <cmdname>impalad</cmdname> daemon, with the appropriate security settings.
|
||||
</li>
|
||||
<li> Verifying the communication port where the Impala daemons in your
|
||||
cluster are listening for incoming JDBC requests. </li>
|
||||
<li> Installing the JDBC driver on every system that runs the JDBC-enabled
|
||||
application. </li>
|
||||
<li> Specifying a connection string for the JDBC application to access one
|
||||
of the servers running the <cmdname>impalad</cmdname> daemon, with the
|
||||
appropriate security settings. </li>
|
||||
</ul>
|
||||
|
||||
<p outputclass="toc inpage"/>
|
||||
|
||||
</conbody>
|
||||
|
||||
<concept id="jdbc_port">
|
||||
|
||||
<title>Configuring the JDBC Port</title>
|
||||
|
||||
<conbody>
|
||||
|
||||
<p>
|
||||
The following are the default ports that Impala server accepts JDBC connections through:
|
||||
<simpletable frame="all"
|
||||
<p> The following are the default ports that Impala server accepts JDBC
|
||||
connections through: <simpletable frame="all"
|
||||
relcolwidth="1.0* 1.03* 2.38*" id="simpletable_tr2_gnt_43b">
|
||||
|
||||
<strow>
|
||||
|
||||
<stentry><b>Protocol</b>
|
||||
|
||||
</stentry>
|
||||
|
||||
<stentry><b>Default Port</b>
|
||||
|
||||
</stentry>
|
||||
|
||||
<stentry><b>Flag to Specify an Alternate Port</b>
|
||||
|
||||
</stentry>
|
||||
|
||||
</strow>
|
||||
|
||||
<strow>
|
||||
|
||||
<stentry>HTTP</stentry>
|
||||
|
||||
<stentry>28000</stentry>
|
||||
|
||||
<stentry><codeph>‑‑hs2_http_port</codeph>
|
||||
|
||||
</stentry>
|
||||
|
||||
</strow>
|
||||
|
||||
<strow>
|
||||
|
||||
<stentry>Binary TCP</stentry>
|
||||
|
||||
<stentry>21050</stentry>
|
||||
|
||||
<stentry><codeph>‑‑hs2_port</codeph>
|
||||
|
||||
</stentry>
|
||||
|
||||
</strow>
|
||||
|
||||
</simpletable>
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Make sure the port for the protocol you are using is available for communication with
|
||||
clients, for example, that it is not blocked by firewall software.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
If your JDBC client software connects to a different port, specify that alternative port
|
||||
number with the flag in the above table when starting the <codeph>impalad</codeph>.
|
||||
</p>
|
||||
|
||||
<p> Make sure the port for the protocol you are using is available for
|
||||
communication with clients, for example, that it is not blocked by
|
||||
firewall software. </p>
|
||||
<p> If your JDBC client software connects to a different port, specify
|
||||
that alternative port number with the flag in the above table when
|
||||
starting the <codeph>impalad</codeph>. </p>
|
||||
</conbody>
|
||||
|
||||
</concept>
|
||||
|
||||
<concept id="jdbc_driver_choice">
|
||||
|
||||
<title>Choosing the JDBC Driver</title>
|
||||
|
||||
<prolog>
|
||||
<metadata>
|
||||
<data name="Category" value="Planning"/>
|
||||
</metadata>
|
||||
</prolog>
|
||||
|
||||
<conbody>
|
||||
|
||||
<p>
|
||||
In Impala 2.0 and later, you can use the Hive 0.13 JDBC driver. If you are already using
|
||||
JDBC applications with an earlier Impala release, you should update your JDBC driver,
|
||||
because the Hive 0.12 driver that was formerly the only choice is not compatible with
|
||||
Impala 2.0 and later.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The Hive JDBC driver provides a substantial speed increase for JDBC applications with
|
||||
Impala 2.0 and higher, for queries that return large result sets.
|
||||
</p>
|
||||
|
||||
<p> In Impala 2.0 and later, you can use the Hive 0.13 or higher JDBC
|
||||
driver. If you are already using JDBC applications with an earlier
|
||||
Impala release, you should update your JDBC driver, because the Hive
|
||||
0.12 driver that was formerly the only choice is not compatible with
|
||||
Impala 2.0 and later. </p>
|
||||
<p> The Hive JDBC driver provides a substantial speed increase for JDBC
|
||||
applications with Impala 2.0 and higher, for queries that return large
|
||||
result sets. </p>
|
||||
</conbody>
|
||||
|
||||
</concept>
|
||||
|
||||
<concept id="jdbc_setup">
|
||||
|
||||
<title>Enabling Impala JDBC Support on Client Systems</title>
|
||||
|
||||
<prolog>
|
||||
<metadata>
|
||||
<data name="Category" value="Installing"/>
|
||||
</metadata>
|
||||
</prolog>
|
||||
|
||||
<conbody>
|
||||
|
||||
<section id="install_hive_driver">
|
||||
|
||||
<title>Using the Hive JDBC Driver</title>
|
||||
|
||||
<p>
|
||||
You install the Hive JDBC driver (<codeph>hive-jdbc</codeph> package) through the
|
||||
Linux package manager, on hosts within the cluster. The driver consists of several
|
||||
Java JAR files. The same driver can be used by Impala and Hive.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
To get the JAR files, install the Hive JDBC driver on each host in the cluster that
|
||||
will run JDBC applications.
|
||||
<!-- TODO: Find a URL to point to for instructions and downloads -->
|
||||
</p>
|
||||
|
||||
<note>
|
||||
The latest JDBC driver, corresponding to Hive 0.13, provides substantial performance
|
||||
improvements for Impala queries that return large result sets. Impala 2.0 and later
|
||||
are compatible with the Hive 0.13 driver. If you already have an older JDBC driver
|
||||
installed, and are running Impala 2.0 or higher, consider upgrading to the latest Hive
|
||||
JDBC driver for best performance with JDBC applications.
|
||||
</note>
|
||||
|
||||
<p>
|
||||
If you are using JDBC-enabled applications on hosts outside the cluster, you cannot
|
||||
use the the same install procedure on the hosts. Install the JDBC driver on at least
|
||||
one cluster host using the preceding procedure. Then download the JAR files to each
|
||||
client machine that will use JDBC with Impala:
|
||||
</p>
|
||||
|
||||
<codeblock>commons-logging-X.X.X.jar
|
||||
<p> You install the Hive JDBC driver (<codeph>hive-jdbc</codeph>
|
||||
package) through the Linux package manager, on hosts within the
|
||||
cluster. The driver consists of several JAR files. The same driver can
|
||||
be used by Impala and Hive. </p>
|
||||
<p> To get the JAR files, install the Hive JDBC driver on each host in
|
||||
the cluster that will run JDBC applications. </p>
|
||||
<note> The latest JDBC driver, corresponding to Hive 0.13, provides
|
||||
substantial performance improvements for Impala queries that return
|
||||
large result sets. Impala 2.0 and later are compatible with the Hive
|
||||
0.13 driver. If you already have an older JDBC driver installed, and
|
||||
are running Impala 2.0 or higher, consider upgrading to the latest
|
||||
Hive JDBC driver for best performance with JDBC applications. </note>
|
||||
<p> If you are using JDBC-enabled applications on hosts outside the
|
||||
cluster, you cannot use the the same install procedure on the hosts.
|
||||
Install the JDBC driver on at least one cluster host using the
|
||||
preceding procedure. Then download the JAR files to each client
|
||||
machine that will use JDBC with Impala: </p>
|
||||
<codeblock>commons-logging-X.X.X.jar
|
||||
hadoop-common.jar
|
||||
hive-common-X.XX.X.jar
|
||||
hive-jdbc-X.XX.X.jar
|
||||
@@ -222,185 +145,136 @@ under the License.
|
||||
slf4j-api-X.X.X.jar
|
||||
slf4j-logXjXX-X.X.X.jar
|
||||
</codeblock>
|
||||
|
||||
<p>
|
||||
<b>To enable JDBC support for Impala on the system where you run the JDBC
|
||||
application:</b>
|
||||
<b>To enable JDBC support for Impala on the system where you run the
|
||||
JDBC application:</b>
|
||||
</p>
|
||||
|
||||
<ol>
|
||||
<li>
|
||||
Download the JAR files listed above to each client machine.
|
||||
<note>
|
||||
For Maven users, see <xref keyref="Impala-JDBC-Example">this sample github
|
||||
page</xref> for an example of the dependencies you could add to a
|
||||
<codeph>pom</codeph> file instead of downloading the individual JARs.
|
||||
</note>
|
||||
<li> Download the JAR files listed above to each client machine.
|
||||
<note> For Maven users, see <xref keyref="Impala-JDBC-Example"
|
||||
>this sample github page</xref> for an example of the
|
||||
dependencies you could add to a <codeph>pom</codeph> file instead
|
||||
of downloading the individual JARs. </note>
|
||||
</li>
|
||||
|
||||
<li>
|
||||
Store the JAR files in a location of your choosing, ideally a directory already
|
||||
referenced in your <codeph>CLASSPATH</codeph> setting. For example:
|
||||
<ul>
|
||||
<li>
|
||||
On Linux, you might use a location such as <codeph>/opt/jars/</codeph>.
|
||||
</li>
|
||||
|
||||
<li>
|
||||
On Windows, you might use a subdirectory underneath <filepath>C:\Program
|
||||
Files</filepath>.
|
||||
</li>
|
||||
<li> Store the JAR files in a location of your choosing, ideally a
|
||||
directory already referenced in your <codeph>CLASSPATH</codeph>
|
||||
setting. For example: <ul>
|
||||
<li> On Linux, you might use a location such as
|
||||
<codeph>/opt/jars/</codeph>. </li>
|
||||
<li> On Windows, you might use a subdirectory underneath
|
||||
<filepath>C:\Program Files</filepath>. </li>
|
||||
</ul>
|
||||
</li>
|
||||
|
||||
<li>
|
||||
To successfully load the Impala JDBC driver, client programs must be able to locate
|
||||
the associated JAR files. This often means setting the <codeph>CLASSPATH</codeph>
|
||||
for the client process to include the JARs. Consult the documentation for your JDBC
|
||||
client for more details on how to install new JDBC drivers, but some examples of how
|
||||
to set <codeph>CLASSPATH</codeph> variables include:
|
||||
<ul>
|
||||
<li>
|
||||
On Linux, if you extracted the JARs to <codeph>/opt/jars/</codeph>, you might
|
||||
issue the following command to prepend the JAR files path to an existing
|
||||
classpath:
|
||||
<codeblock>export CLASSPATH=/opt/jars/*.jar:$CLASSPATH</codeblock>
|
||||
<li> To successfully load the Impala JDBC driver, client programs must
|
||||
be able to locate the associated JAR files. This often means setting
|
||||
the <codeph>CLASSPATH</codeph> for the client process to include the
|
||||
JARs. Consult the documentation for your JDBC client for more
|
||||
details on how to install new JDBC drivers, but some examples of how
|
||||
to set <codeph>CLASSPATH</codeph> variables include: <ul>
|
||||
<li> On Linux, if you extracted the JARs to
|
||||
<codeph>/opt/jars/</codeph>, you might issue the following
|
||||
command to prepend the JAR files path to an existing classpath:
|
||||
<codeblock>export CLASSPATH=/opt/jars/*.jar:$CLASSPATH</codeblock>
|
||||
</li>
|
||||
|
||||
<li>
|
||||
On Windows, use the <b>System Properties</b> control panel item to modify the
|
||||
<b>Environment Variables</b> for your system. Modify the environment variables
|
||||
to include the path to which you extracted the files.
|
||||
<note>
|
||||
If the existing <codeph>CLASSPATH</codeph> on your client machine refers to
|
||||
some older version of the Hive JARs, ensure that the new JARs are the first
|
||||
ones listed. Either put the new JAR files earlier in the listings, or delete
|
||||
the other references to Hive JAR files.
|
||||
</note>
|
||||
<li> On Windows, use the <b>System Properties</b> control panel
|
||||
item to modify the <b>Environment Variables</b> for your system.
|
||||
Modify the environment variables to include the path to which
|
||||
you extracted the files. <note> If the existing
|
||||
<codeph>CLASSPATH</codeph> on your client machine refers to
|
||||
some older version of the Hive JARs, ensure that the new JARs
|
||||
are the first ones listed. Either put the new JAR files
|
||||
earlier in the listings, or delete the other references to
|
||||
Hive JAR files. </note>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
</section>
|
||||
|
||||
</conbody>
|
||||
|
||||
</concept>
|
||||
|
||||
<concept id="jdbc_connect">
|
||||
|
||||
<title>Establishing JDBC Connections</title>
|
||||
|
||||
<conbody>
|
||||
|
||||
<p>
|
||||
The JDBC driver class depends on which driver you select.
|
||||
</p>
|
||||
|
||||
<p> The JDBC driver class depends on which driver you select. </p>
|
||||
<note conref="../shared/impala_common.xml#common/proxy_jdbc_caveat"/>
|
||||
|
||||
<section id="class_hive_driver">
|
||||
|
||||
<title>Using the Hive JDBC Driver</title>
|
||||
|
||||
<p>
|
||||
For example, with the Hive JDBC driver, the class name is
|
||||
<codeph>org.apache.hive.jdbc.HiveDriver</codeph>. Once you have configured Impala to
|
||||
work with JDBC, you can establish connections between the two. To do so for a cluster
|
||||
that does not use Kerberos authentication, use a connection string of the form
|
||||
<codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;auth=noSasl</codeph>.
|
||||
<!--
|
||||
<p> For example, with the Hive JDBC driver, the class name is
|
||||
<codeph>org.apache.hive.jdbc.HiveDriver</codeph>. Once you have
|
||||
configured Impala to work with JDBC, you can establish connections
|
||||
between the two. To do so for a cluster that does not use Kerberos
|
||||
authentication, use a connection string of the form
|
||||
<codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;auth=noSasl</codeph>.
|
||||
<!--
|
||||
Include the <codeph>auth=noSasl</codeph> argument
|
||||
only when connecting to a non-Kerberos cluster; if Kerberos is enabled, omit the <codeph>auth</codeph> argument.
|
||||
-->
|
||||
For example, you might use:
|
||||
For example, you might use: </p>
|
||||
<codeblock>jdbc:hive2://myhost.example.com:21050/;auth=noSasl</codeblock>
|
||||
<p> To connect to an instance of Impala that requires Kerberos
|
||||
authentication, use a connection string of the form
|
||||
<codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;principal=<varname>principal_name</varname></codeph>.
|
||||
The principal must be the same user principal you used when starting
|
||||
Impala. For example, you might use: </p>
|
||||
<codeblock>jdbc:hive2://myhost.example.com:21050/;principal=impala/myhost.example.com@H2.EXAMPLE.COM</codeblock>
|
||||
<p> To connect to an instance of Impala that requires LDAP
|
||||
authentication, use a connection string of the form
|
||||
<codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/<varname>db_name</varname>;user=<varname>ldap_userid</varname>;password=<varname>ldap_password</varname></codeph>.
|
||||
For example, you might use: </p>
|
||||
<codeblock>jdbc:hive2://myhost.example.com:21050/test_db;user=fred;password=xyz123</codeblock>
|
||||
<p> To connect to an instance of Impala over HTTP, specify the HTTP
|
||||
port, 28000 by default, and <codeph>transportMode=http</codeph> in the
|
||||
connection string. For example:
|
||||
<codeblock>jdbc:hive2://myhost.example.com:28000/;transportMode=http</codeblock>
|
||||
</p>
|
||||
|
||||
<codeblock>jdbc:hive2://myhost.example.com:21050/;auth=noSasl</codeblock>
|
||||
|
||||
<p>
|
||||
To connect to an instance of Impala that requires Kerberos authentication, use a
|
||||
connection string of the form
|
||||
<codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;principal=<varname>principal_name</varname></codeph>.
|
||||
The principal must be the same user principal you used when starting Impala. For
|
||||
example, you might use:
|
||||
</p>
|
||||
|
||||
<codeblock>jdbc:hive2://myhost.example.com:21050/;principal=impala/myhost.example.com@H2.EXAMPLE.COM</codeblock>
|
||||
|
||||
<p>
|
||||
To connect to an instance of Impala that requires LDAP authentication, use a
|
||||
connection string of the form
|
||||
<codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/<varname>db_name</varname>;user=<varname>ldap_userid</varname>;password=<varname>ldap_password</varname></codeph>.
|
||||
For example, you might use:
|
||||
</p>
|
||||
|
||||
<codeblock>jdbc:hive2://myhost.example.com:21050/test_db;user=fred;password=xyz123</codeblock>
|
||||
|
||||
<p>
|
||||
To connect to an instance of Impala over HTTP, specify the HTTP port, 28000 by
|
||||
default, and <codeph>transportMode=http</codeph> in the connection string. For
|
||||
example:
|
||||
<codeblock>jdbc:hive2://myhost.example.com:28000/;transportMode=http</codeblock>
|
||||
</p>
|
||||
|
||||
<note>
|
||||
<p conref="../shared/impala_common.xml#common/hive_jdbc_ssl_kerberos_caveat"/>
|
||||
<p
|
||||
conref="../shared/impala_common.xml#common/hive_jdbc_ssl_kerberos_caveat"
|
||||
/>
|
||||
</note>
|
||||
|
||||
</section>
|
||||
|
||||
</conbody>
|
||||
|
||||
</concept>
|
||||
|
||||
<concept rev="2.3.0" id="jdbc_odbc_notes">
|
||||
|
||||
<title>Notes about JDBC and ODBC Interaction with Impala SQL Features</title>
|
||||
|
||||
<title>Notes about JDBC and ODBC Interaction with Impala SQL
|
||||
Features</title>
|
||||
<conbody>
|
||||
|
||||
<p>
|
||||
Most Impala SQL features work equivalently through the <cmdname>impala-shell</cmdname>
|
||||
interpreter of the JDBC or ODBC APIs. The following are some exceptions to keep in mind
|
||||
when switching between the interactive shell and applications using the APIs:
|
||||
</p>
|
||||
|
||||
<p> Most Impala SQL features work equivalently through the
|
||||
<cmdname>impala-shell</cmdname> interpreter of the JDBC or ODBC APIs.
|
||||
The following are some exceptions to keep in mind when switching between
|
||||
the interactive shell and applications using the APIs: </p>
|
||||
<ul>
|
||||
<li>
|
||||
<p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
|
||||
<ul>
|
||||
<li>
|
||||
<p>
|
||||
Queries involving the complex types (<codeph>ARRAY</codeph>,
|
||||
<codeph>STRUCT</codeph>, and <codeph>MAP</codeph>) require notation that might
|
||||
not be available in all levels of JDBC and ODBC drivers. If you have trouble
|
||||
querying such a table due to the driver level or inability to edit the queries
|
||||
used by the application, you can create a view that exposes a <q>flattened</q>
|
||||
version of the complex columns and point the application at the view. See
|
||||
<xref href="impala_complex_types.xml#complex_types"/> for details.
|
||||
<p> Queries involving the complex types (<codeph>ARRAY</codeph>,
|
||||
<codeph>STRUCT</codeph>, and <codeph>MAP</codeph>) require
|
||||
notation that might not be available in all levels of JDBC and
|
||||
ODBC drivers. If you have trouble querying such a table due to
|
||||
the driver level or inability to edit the queries used by the
|
||||
application, you can create a view that exposes a
|
||||
<q>flattened</q> version of the complex columns and point the
|
||||
application at the view. See <xref
|
||||
href="impala_complex_types.xml#complex_types"/> for details.
|
||||
</p>
|
||||
</li>
|
||||
|
||||
<li>
|
||||
<p>
|
||||
The complex types available in <keyword keyref="impala23_full"/> and higher are
|
||||
supported by the JDBC <codeph>getColumns()</codeph> API. Both
|
||||
<codeph>MAP</codeph> and <codeph>ARRAY</codeph> are reported as the JDBC SQL
|
||||
Type <codeph>ARRAY</codeph>, because this is the closest matching Java SQL type.
|
||||
This behavior is consistent with Hive. <codeph>STRUCT</codeph> types are
|
||||
reported as the JDBC SQL Type <codeph>STRUCT</codeph>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
To be consistent with Hive's behavior, the TYPE_NAME field is populated with the
|
||||
primitive type name for scalar types, and with the full <codeph>toSql()</codeph>
|
||||
for complex types. The resulting type names are somewhat inconsistent, because
|
||||
nested types are printed differently than top-level types. For example, the
|
||||
following list shows how <codeph>toSQL()</codeph> for Impala types are
|
||||
translated to <codeph>TYPE_NAME</codeph> values:
|
||||
<codeblock><![CDATA[DECIMAL(10,10) becomes DECIMAL
|
||||
<p> The complex types available in <keyword keyref="impala23_full"
|
||||
/> and higher are supported by the JDBC
|
||||
<codeph>getColumns()</codeph> API. Both <codeph>MAP</codeph>
|
||||
and <codeph>ARRAY</codeph> are reported as the JDBC SQL Type
|
||||
<codeph>ARRAY</codeph>, because this is the closest matching
|
||||
Java SQL type. This behavior is consistent with Hive.
|
||||
<codeph>STRUCT</codeph> types are reported as the JDBC SQL
|
||||
Type <codeph>STRUCT</codeph>. </p>
|
||||
<p> To be consistent with Hive's behavior, the TYPE_NAME field is
|
||||
populated with the primitive type name for scalar types, and
|
||||
with the full <codeph>toSql()</codeph> for complex types. The
|
||||
resulting type names are somewhat inconsistent, because nested
|
||||
types are printed differently than top-level types. For example,
|
||||
the following list shows how <codeph>toSQL()</codeph> for Impala
|
||||
types are translated to <codeph>TYPE_NAME</codeph> values: <codeblock><![CDATA[DECIMAL(10,10) becomes DECIMAL
|
||||
CHAR(10) becomes CHAR
|
||||
VARCHAR(10) becomes VARCHAR
|
||||
ARRAY<DECIMAL(10,10)> becomes ARRAY<DECIMAL(10,10)>
|
||||
@@ -413,27 +287,17 @@ ARRAY<VARCHAR(10)> becomes ARRAY<VARCHAR(10)>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
</conbody>
|
||||
|
||||
</concept>
|
||||
|
||||
<concept id="jdbc_kudu">
|
||||
|
||||
<title>Kudu Considerations for DML Statements</title>
|
||||
|
||||
<conbody>
|
||||
|
||||
<p>
|
||||
Currently, Impala <codeph>INSERT</codeph>, <codeph>UPDATE</codeph>, or other DML
|
||||
statements issued through the JDBC interface against a Kudu table do not return JDBC
|
||||
error codes for conditions such as duplicate primary key columns. Therefore, for
|
||||
applications that issue a high volume of DML statements, prefer to use the Kudu Java API
|
||||
directly rather than a JDBC application.
|
||||
</p>
|
||||
|
||||
<p> Currently, Impala <codeph>INSERT</codeph>, <codeph>UPDATE</codeph>, or
|
||||
other DML statements issued through the JDBC interface against a Kudu
|
||||
table do not return JDBC error codes for conditions such as duplicate
|
||||
primary key columns. Therefore, for applications that issue a high
|
||||
volume of DML statements, prefer to use the Kudu Java API directly
|
||||
rather than a JDBC application. </p>
|
||||
</conbody>
|
||||
|
||||
</concept>
|
||||
|
||||
</concept>
|
||||
|
||||
@@ -48,9 +48,7 @@ under the License.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Currently, the Impala statestore mechanism does not include such proxying and
|
||||
load-balancing features. Set up a software package of your choice to perform these
|
||||
functions.
|
||||
Set up a software package of your choice to perform these functions.
|
||||
</p>
|
||||
|
||||
<note>
|
||||
@@ -107,9 +105,7 @@ under the License.
|
||||
<li>
|
||||
Select and download the load-balancing proxy software or other load-balancing hardware
|
||||
appliance. It should only need to be installed and configured on a single host,
|
||||
typically on an edge node. Pick a host other than the DataNodes where
|
||||
<cmdname>impalad</cmdname> is running, because the intention is to protect against the
|
||||
possibility of one or more of these DataNodes becoming unavailable.
|
||||
typically on an edge node.
|
||||
</li>
|
||||
|
||||
<li>
|
||||
@@ -117,13 +113,15 @@ under the License.
|
||||
particular:
|
||||
<ul>
|
||||
<li>
|
||||
Set up a port that the load balancer will listen on to relay Impala requests back
|
||||
and forth.
|
||||
To relay Impala requests back and forth, set up a port that the load balancer will
|
||||
listen on.
|
||||
</li>
|
||||
|
||||
<li>
|
||||
See <xref href="#proxy_balancing" format="dita"/> for load balancing algorithm
|
||||
options.
|
||||
Select a load balancing algorithm. See
|
||||
<xref
|
||||
href="#proxy_balancing" format="dita"/> for load balancing
|
||||
algorithm options.
|
||||
</li>
|
||||
|
||||
<li>
|
||||
@@ -136,7 +134,7 @@ under the License.
|
||||
|
||||
<li>
|
||||
If you are using Hue or JDBC-based applications, you typically set up load balancing
|
||||
for both ports 21000 and 21050, because these client applications connect through port
|
||||
for both ports 21000 and 21050 because these client applications connect through port
|
||||
21050 while the <cmdname>impala-shell</cmdname> command connects through port 21000.
|
||||
See <xref href="impala_ports.xml#ports"/> for when to use port 21000, 21050, or
|
||||
another value depending on what type of connections you are load balancing.
|
||||
@@ -149,8 +147,8 @@ under the License.
|
||||
|
||||
<li>
|
||||
For any scripts, jobs, or configuration settings for applications that formerly
|
||||
connected to a specific DataNode to run Impala SQL statements, change the connection
|
||||
information (such as the <codeph>-i</codeph> option in
|
||||
connected to a specific <cmdname>impalad</cmdname> to run Impala SQL statements,
|
||||
change the connection information (such as the <codeph>-i</codeph> option in
|
||||
<cmdname>impala-shell</cmdname>) to point to the load balancer instead.
|
||||
</li>
|
||||
</ol>
|
||||
@@ -231,10 +229,8 @@ under the License.
|
||||
</dt>
|
||||
|
||||
<dd>
|
||||
<p>
|
||||
Distributes connections to all coordinator nodes. Typically not recommended for
|
||||
Impala.
|
||||
</p>
|
||||
Distributes connections to all coordinator nodes. Typically not recommended for
|
||||
Impala.
|
||||
</dd>
|
||||
|
||||
</dlentry>
|
||||
@@ -267,8 +263,7 @@ under the License.
|
||||
|
||||
<p>
|
||||
In a cluster using Kerberos, applications check host credentials to verify that the host
|
||||
they are connecting to is the same one that is actually processing the request, to
|
||||
prevent man-in-the-middle attacks.
|
||||
they are connecting to is the same one that is actually processing the request.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
@@ -278,13 +273,12 @@ under the License.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
In <keyword keyref="impala212_full">Impala 2.12</keyword> and higher, if you enable a
|
||||
proxy server in a Kerberized cluster, users have an option to connect to Impala daemons
|
||||
directly from <cmdname>impala-shell</cmdname> using the <codeph>-b</codeph> /
|
||||
<codeph>--kerberos_host_fqdn</codeph> option when you start
|
||||
<cmdname>impala-shell</cmdname>. This option can be used for testing or troubleshooting
|
||||
purposes, but not recommended for live production environments as it defeats the purpose
|
||||
of a load balancer/proxy.
|
||||
In <keyword keyref="impala212_full">Impala 2.12</keyword> and higher versions, when you
|
||||
enable a proxy server in a Kerberized cluster, users have an option to connect to Impala
|
||||
daemons directly from <cmdname>impala-shell</cmdname> using the <codeph>-b</codeph> /
|
||||
<codeph>--kerberos_host_fqdn</codeph> <cmdname>impala-shell</cmdname> flag. This option
|
||||
can be used for testing or troubleshooting purposes, but not recommended for live
|
||||
production environments as it defeats the purpose of a load balancer/proxy.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
@@ -305,8 +299,7 @@ impala-shell -i impalad-1.mydomain.com -k -b loadbalancer-1.mydomain.com
|
||||
</p>
|
||||
|
||||
<p>
|
||||
To clarify that the load-balancing proxy server is legitimate, perform these extra
|
||||
Kerberos setup steps:
|
||||
To validate the load-balancing proxy server, perform these extra Kerberos setup steps:
|
||||
</p>
|
||||
|
||||
<ol>
|
||||
@@ -321,26 +314,29 @@ impala-shell -i impalad-1.mydomain.com -k -b loadbalancer-1.mydomain.com
|
||||
Choose the host you will use for the proxy server. Based on the Kerberos setup
|
||||
procedure, it should already have an entry
|
||||
<codeph>impala/<varname>proxy_host</varname>@<varname>realm</varname></codeph> in its
|
||||
keytab. If not, go back over the initial Kerberos configuration steps for the keytab
|
||||
on each host running the <cmdname>impalad</cmdname> daemon.
|
||||
<filepath>keytab</filepath>. If not, go back over the initial Kerberos configuration
|
||||
steps for the <filepath>keytab</filepath> on each host running the
|
||||
<cmdname>impalad</cmdname> daemon.
|
||||
</li>
|
||||
|
||||
<li>
|
||||
Copy the keytab file from the proxy host to all other hosts in the cluster that run
|
||||
the <cmdname>impalad</cmdname> daemon. (For optimal performance,
|
||||
<cmdname>impalad</cmdname> should be running on all DataNodes in the cluster.) Put the
|
||||
keytab file in a secure location on each of these other hosts.
|
||||
Copy the <filepath>keytab</filepath> file from the proxy host to all other hosts in
|
||||
the cluster that run the <cmdname>impalad</cmdname> daemon. Put the
|
||||
<filepath>keytab</filepath> file in a secure location on each of these other hosts.
|
||||
</li>
|
||||
|
||||
<li>
|
||||
Add an entry
|
||||
<codeph>impala/<varname>actual_hostname</varname>@<varname>realm</varname></codeph> to
|
||||
the keytab on each host running the <cmdname>impalad</cmdname> daemon.
|
||||
the <filepath>keytab</filepath> on each host running the <cmdname>impalad</cmdname>
|
||||
daemon.
|
||||
</li>
|
||||
|
||||
<li>
|
||||
For each impalad node, merge the existing keytab with the proxy’s keytab using
|
||||
<cmdname>ktutil</cmdname>, producing a new keytab file. For example:
|
||||
For each <cmdname>impalad</cmdname> node, merge the existing
|
||||
<filepath>keytab</filepath> with the proxy’s <filepath>keytab</filepath> using
|
||||
<cmdname>ktutil</cmdname>, producing a new <filepath>keytab</filepath> file. For
|
||||
example:
|
||||
<codeblock>$ ktutil
|
||||
ktutil: read_kt proxy.keytab
|
||||
ktutil: read_kt impala.keytab
|
||||
@@ -349,44 +345,39 @@ impala-shell -i impalad-1.mydomain.com -k -b loadbalancer-1.mydomain.com
|
||||
</li>
|
||||
|
||||
<li>
|
||||
To verify that the keytabs are merged, run the command:
|
||||
To verify that the <filepath>keytabs</filepath> are merged, run the command:
|
||||
<codeblock>
|
||||
klist -k <varname>keytabfile</varname>
|
||||
</codeblock>
|
||||
which lists the credentials for both <codeph>principal</codeph> and
|
||||
The command lists the credentials for both <codeph>principal</codeph> and
|
||||
<codeph>be_principal</codeph> on all nodes.
|
||||
</li>
|
||||
|
||||
<li>
|
||||
Make sure that the <codeph>impala</codeph> user has permission to read this merged
|
||||
keytab file.
|
||||
Make sure that the <codeph>impala</codeph> user has the permission to read this merged
|
||||
<filepath>keytab</filepath> file.
|
||||
</li>
|
||||
|
||||
<li>
|
||||
Change the following configuration settings for each host in the cluster that
|
||||
participates in the load balancing:
|
||||
<ul>
|
||||
<li>
|
||||
In the <cmdname>impalad</cmdname> option definition, add:
|
||||
For each coordinator <codeph>impalad</codeph> host in the cluster that participates in
|
||||
the load balancing, add the following configuration options to receive client
|
||||
connections coming through the load balancer proxy server:
|
||||
<codeblock>
|
||||
--principal=impala/<i>proxy_host@realm</i>
|
||||
--be_principal=impala/<i>actual_host@realm</i>
|
||||
--keytab_file=<i>path_to_merged_keytab</i>
|
||||
--principal=impala/<varname>proxy_host@realm</varname>
|
||||
--be_principal=impala/<varname>actual_host@realm</varname>
|
||||
--keytab_file=<varname>path_to_merged_keytab</varname>
|
||||
</codeblock>
|
||||
<note>
|
||||
Every host has different <codeph>--be_principal</codeph> because the actual
|
||||
hostname is different on each host. Specify the fully qualified domain name
|
||||
(FQDN) for the proxy host, not the IP address. Use the exact FQDN as returned by
|
||||
a reverse DNS lookup for the associated IP address.
|
||||
</note>
|
||||
</li>
|
||||
<p>
|
||||
The <codeph>--principal</codeph> setting prevents a client from connecting to a
|
||||
coordinator <codeph>impalad</codeph> using a principal other than one specified.
|
||||
</p>
|
||||
|
||||
<li>
|
||||
Modify the startup options. See
|
||||
<xref href="impala_config_options.xml#config_options"/> for the procedure to
|
||||
modify the startup options.
|
||||
</li>
|
||||
</ul>
|
||||
<note>
|
||||
Every host has different <codeph>--be_principal</codeph> because the actual host
|
||||
name is different on each host. Specify the fully qualified domain name (FQDN) for
|
||||
the proxy host, not the IP address. Use the exact FQDN as returned by a reverse DNS
|
||||
lookup for the associated IP address.
|
||||
</note>
|
||||
</li>
|
||||
|
||||
<li>
|
||||
@@ -396,6 +387,40 @@ klist -k <varname>keytabfile</varname>
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
<section id="section_fjz_mfn_yjb">
|
||||
|
||||
<title>Client Connection to Proxy Server in Kerberized Clusters</title>
|
||||
|
||||
<p>
|
||||
When a client connect to Impala, the service principal specified by the client must
|
||||
match the <codeph>-principal</codeph> setting of the Impala proxy server. And the
|
||||
client should connect to the proxy server port.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
In <filepath>hue.ini</filepath>, set the following for to configure Hue to
|
||||
automatically connect to the proxy server:
|
||||
</p>
|
||||
|
||||
<codeblock>[impala]
|
||||
server_host=<varname>proxy_host</varname>
|
||||
impala_principal=impala/<varname>proxy_host</varname></codeblock>
|
||||
|
||||
<p>
|
||||
The following are the JDBC connection string formats when connecting through the load
|
||||
balancer with the load balancer's host name in the principal:
|
||||
</p>
|
||||
|
||||
<codeblock>jdbc:hive2://<varname>proxy_host</varname>:<varname>load_balancer_port</varname>/;principal=impala/_HOST@<varname>realm</varname>
|
||||
jdbc:hive2://<varname>proxy_host</varname>:<varname>load_balancer_port</varname>/;principal=impala/<varname>proxy_host</varname>@<varname>realm</varname></codeblock>
|
||||
|
||||
<p>
|
||||
When starting <cmdname>impala-shell</cmdname>, specify the service principal via the
|
||||
<codeph>-b</codeph> or <codeph>--kerberos_host_fqdn</codeph> flag.
|
||||
</p>
|
||||
|
||||
</section>
|
||||
|
||||
</conbody>
|
||||
|
||||
</concept>
|
||||
@@ -512,8 +537,9 @@ klist -k <varname>keytabfile</varname>
|
||||
<ul>
|
||||
<li>
|
||||
<p>
|
||||
Install the load balancer: <codeph>yum install haproxy</codeph>
|
||||
Install the load balancer:
|
||||
</p>
|
||||
<codeblock>yum install haproxy</codeblock>
|
||||
</li>
|
||||
|
||||
<li>
|
||||
@@ -604,7 +630,8 @@ listen stats :25002
|
||||
stats enable
|
||||
stats auth <varname>username</varname>:<varname>password</varname>
|
||||
|
||||
# This is the setup for Impala. Impala client connect to load_balancer_host:25003.
|
||||
# Setup for Impala.
|
||||
# Impala client connect to load_balancer_host:25003.
|
||||
# HAProxy will balance connections among the list of servers listed below.
|
||||
# The list of Impalad is listening at port 21000 for beeswax (impala-shell) or original ODBC driver.
|
||||
# For JDBC or ODBC version 2.x driver, use port 21050 instead of 21000.
|
||||
@@ -621,12 +648,13 @@ listen impala :25003
|
||||
# Setup for Hue or other JDBC-enabled applications.
|
||||
# In particular, Hue requires sticky sessions.
|
||||
# The application connects to load_balancer_host:21051, and HAProxy balances
|
||||
# connections to the associated hosts, where Impala listens for JDBC
|
||||
# requests on port 21050.
|
||||
# connections to the associated hosts, where Impala listens for
|
||||
# JDBC requests at port 21050.
|
||||
listen impalajdbc :21051
|
||||
mode tcp
|
||||
option tcplog
|
||||
balance source
|
||||
|
||||
server <varname>symbolic_name_5</varname> impala-host-1.example.com:21050 check
|
||||
server <varname>symbolic_name_6</varname> impala-host-2.example.com:21050 check
|
||||
server <varname>symbolic_name_7</varname> impala-host-3.example.com:21050 check
|
||||
@@ -635,8 +663,8 @@ listen impalajdbc :21051
|
||||
|
||||
<note type="important">
|
||||
Hue requires the <codeph>check</codeph> option at end of each line in the above file to
|
||||
ensure HAProxy can detect any unreachable Impalad server, and failover can be
|
||||
successful. Without the TCP check, you may hit an error when the
|
||||
ensure HAProxy can detect any unreachable <cmdname>Impalad</cmdname> server, and
|
||||
failover can be successful. Without the TCP check, you may hit an error when the
|
||||
<cmdname>impalad</cmdname> daemon to which Hue tries to connect is down.
|
||||
</note>
|
||||
|
||||
|
||||
Reference in New Issue
Block a user