mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
Change-Id: Ia9d80e21abb385704eea863d221e333441af9a39 Reviewed-on: http://gerrit.cloudera.org:8080/14857 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Balazs Jeszenszky <jeszyb@gmail.com> Reviewed-by: Vincent Tran <vttran@cloudera.com> Reviewed-by: Alex Rodoni <arodoni@cloudera.com>
304 lines
15 KiB
XML
304 lines
15 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="impala_jdbc">
|
|
<title id="jdbc">Configuring Impala to Work with JDBC</title>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="JDBC"/>
|
|
<data name="Category" value="Java"/>
|
|
<data name="Category" value="SQL"/>
|
|
<data name="Category" value="Querying"/>
|
|
<data name="Category" value="Configuring"/>
|
|
<data name="Category" value="Starting and Stopping"/>
|
|
<data name="Category" value="Developers"/>
|
|
</metadata>
|
|
</prolog>
|
|
<conbody>
|
|
<p> Impala supports the standard JDBC interface, allowing access from
|
|
commercial Business Intelligence tools and custom software written in Java
|
|
or other programming languages. The JDBC driver allows you to access
|
|
Impala from a Java program that you write, or a Business Intelligence or
|
|
similar tool that uses JDBC to communicate with various database products. </p>
|
|
<p> Setting up a JDBC connection to Impala involves the following steps: </p>
|
|
<ul>
|
|
<li> Verifying the communication port where the Impala daemons in your
|
|
cluster are listening for incoming JDBC requests. </li>
|
|
<li> Installing the JDBC driver on every system that runs the JDBC-enabled
|
|
application. </li>
|
|
<li> Specifying a connection string for the JDBC application to access one
|
|
of the servers running the <cmdname>impalad</cmdname> daemon, with the
|
|
appropriate security settings. </li>
|
|
</ul>
|
|
<p outputclass="toc inpage"/>
|
|
</conbody>
|
|
<concept id="jdbc_port">
|
|
<title>Configuring the JDBC Port</title>
|
|
<conbody>
|
|
<p> The following are the default ports that Impala server accepts JDBC
|
|
connections through: <simpletable frame="all"
|
|
relcolwidth="1.0* 1.03* 2.38*" id="simpletable_tr2_gnt_43b">
|
|
<strow>
|
|
<stentry><b>Protocol</b>
|
|
</stentry>
|
|
<stentry><b>Default Port</b>
|
|
</stentry>
|
|
<stentry><b>Flag to Specify an Alternate Port</b>
|
|
</stentry>
|
|
</strow>
|
|
<strow>
|
|
<stentry>HTTP</stentry>
|
|
<stentry>28000</stentry>
|
|
<stentry><codeph>‑‑hs2_http_port</codeph>
|
|
</stentry>
|
|
</strow>
|
|
<strow>
|
|
<stentry>Binary TCP</stentry>
|
|
<stentry>21050</stentry>
|
|
<stentry><codeph>‑‑hs2_port</codeph>
|
|
</stentry>
|
|
</strow>
|
|
</simpletable>
|
|
</p>
|
|
<p> Make sure the port for the protocol you are using is available for
|
|
communication with clients, for example, that it is not blocked by
|
|
firewall software. </p>
|
|
<p> If your JDBC client software connects to a different port, specify
|
|
that alternative port number with the flag in the above table when
|
|
starting the <codeph>impalad</codeph>. </p>
|
|
</conbody>
|
|
</concept>
|
|
<concept id="jdbc_driver_choice">
|
|
<title>Choosing the JDBC Driver</title>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Planning"/>
|
|
</metadata>
|
|
</prolog>
|
|
<conbody>
|
|
<p> In Impala 2.0 and later, you can use the Hive 0.13 or higher JDBC
|
|
driver. If you are already using JDBC applications with an earlier
|
|
Impala release, you should update your JDBC driver, because the Hive
|
|
0.12 driver that was formerly the only choice is not compatible with
|
|
Impala 2.0 and later. </p>
|
|
<p> The Hive JDBC driver provides a substantial speed increase for JDBC
|
|
applications with Impala 2.0 and higher, for queries that return large
|
|
result sets. </p>
|
|
</conbody>
|
|
</concept>
|
|
<concept id="jdbc_setup">
|
|
<title>Enabling Impala JDBC Support on Client Systems</title>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Installing"/>
|
|
</metadata>
|
|
</prolog>
|
|
<conbody>
|
|
<section id="install_hive_driver">
|
|
<title>Using the Hive JDBC Driver</title>
|
|
<p> You install the Hive JDBC driver (<codeph>hive-jdbc</codeph>
|
|
package) through the Linux package manager, on hosts within the
|
|
cluster. The driver consists of several JAR files. The same driver can
|
|
be used by Impala and Hive. </p>
|
|
<p> To get the JAR files, install the Hive JDBC driver on each host in
|
|
the cluster that will run JDBC applications. </p>
|
|
<note> The latest JDBC driver, corresponding to Hive 0.13, provides
|
|
substantial performance improvements for Impala queries that return
|
|
large result sets. Impala 2.0 and later are compatible with the Hive
|
|
0.13 driver. If you already have an older JDBC driver installed, and
|
|
are running Impala 2.0 or higher, consider upgrading to the latest
|
|
Hive JDBC driver for best performance with JDBC applications. </note>
|
|
<p> If you are using JDBC-enabled applications on hosts outside the
|
|
cluster, you cannot use the the same install procedure on the hosts.
|
|
Install the JDBC driver on at least one cluster host using the
|
|
preceding procedure. Then download the JAR files to each client
|
|
machine that will use JDBC with Impala: </p>
|
|
<codeblock>commons-logging-X.X.X.jar
|
|
hadoop-common.jar
|
|
hive-common-X.XX.X.jar
|
|
hive-jdbc-X.XX.X.jar
|
|
hive-metastore-X.XX.X.jar
|
|
hive-service-X.XX.X.jar
|
|
httpclient-X.X.X.jar
|
|
httpcore-X.X.X.jar
|
|
libfb303-X.X.X.jar
|
|
libthrift-X.X.X.jar
|
|
log4j-X.X.XX.jar
|
|
slf4j-api-X.X.X.jar
|
|
slf4j-logXjXX-X.X.X.jar
|
|
</codeblock>
|
|
<p>
|
|
<b>To enable JDBC support for Impala on the system where you run the
|
|
JDBC application:</b>
|
|
</p>
|
|
<ol>
|
|
<li> Download the JAR files listed above to each client machine.
|
|
<note> For Maven users, see <xref keyref="Impala-JDBC-Example"
|
|
>this sample github page</xref> for an example of the
|
|
dependencies you could add to a <codeph>pom</codeph> file instead
|
|
of downloading the individual JARs. </note>
|
|
</li>
|
|
<li> Store the JAR files in a location of your choosing, ideally a
|
|
directory already referenced in your <codeph>CLASSPATH</codeph>
|
|
setting. For example: <ul>
|
|
<li> On Linux, you might use a location such as
|
|
<codeph>/opt/jars/</codeph>. </li>
|
|
<li> On Windows, you might use a subdirectory underneath
|
|
<filepath>C:\Program Files</filepath>. </li>
|
|
</ul>
|
|
</li>
|
|
<li> To successfully load the Impala JDBC driver, client programs must
|
|
be able to locate the associated JAR files. This often means setting
|
|
the <codeph>CLASSPATH</codeph> for the client process to include the
|
|
JARs. Consult the documentation for your JDBC client for more
|
|
details on how to install new JDBC drivers, but some examples of how
|
|
to set <codeph>CLASSPATH</codeph> variables include: <ul>
|
|
<li> On Linux, if you extracted the JARs to
|
|
<codeph>/opt/jars/</codeph>, you might issue the following
|
|
command to prepend the JAR files path to an existing classpath:
|
|
<codeblock>export CLASSPATH=/opt/jars/*.jar:$CLASSPATH</codeblock>
|
|
</li>
|
|
<li> On Windows, use the <b>System Properties</b> control panel
|
|
item to modify the <b>Environment Variables</b> for your system.
|
|
Modify the environment variables to include the path to which
|
|
you extracted the files. <note> If the existing
|
|
<codeph>CLASSPATH</codeph> on your client machine refers to
|
|
some older version of the Hive JARs, ensure that the new JARs
|
|
are the first ones listed. Either put the new JAR files
|
|
earlier in the listings, or delete the other references to
|
|
Hive JAR files. </note>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
</ol>
|
|
</section>
|
|
</conbody>
|
|
</concept>
|
|
<concept id="jdbc_connect">
|
|
<title>Establishing JDBC Connections</title>
|
|
<conbody>
|
|
<p> The JDBC driver class depends on which driver you select. </p>
|
|
<note conref="../shared/impala_common.xml#common/proxy_jdbc_caveat"/>
|
|
<section id="class_hive_driver">
|
|
<title>Using the Hive JDBC Driver</title>
|
|
<p> For example, with the Hive JDBC driver, the class name is
|
|
<codeph>org.apache.hive.jdbc.HiveDriver</codeph>. Once you have
|
|
configured Impala to work with JDBC, you can establish connections
|
|
between the two. To do so for a cluster that does not use Kerberos
|
|
authentication, use a connection string of the form
|
|
<codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;auth=noSasl</codeph>.
|
|
<!--
|
|
Include the <codeph>auth=noSasl</codeph> argument
|
|
only when connecting to a non-Kerberos cluster; if Kerberos is enabled, omit the <codeph>auth</codeph> argument.
|
|
-->
|
|
For example, you might use: </p>
|
|
<codeblock>jdbc:hive2://myhost.example.com:21050/;auth=noSasl</codeblock>
|
|
<p> To connect to an instance of Impala that requires Kerberos
|
|
authentication, use a connection string of the form
|
|
<codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;principal=<varname>principal_name</varname></codeph>.
|
|
The principal must be the same user principal you used when starting
|
|
Impala. For example, you might use: </p>
|
|
<codeblock>jdbc:hive2://myhost.example.com:21050/;principal=impala/myhost.example.com@H2.EXAMPLE.COM</codeblock>
|
|
<p> To connect to an instance of Impala that requires LDAP
|
|
authentication, use a connection string of the form
|
|
<codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/<varname>db_name</varname>;user=<varname>ldap_userid</varname>;password=<varname>ldap_password</varname></codeph>.
|
|
For example, you might use: </p>
|
|
<codeblock>jdbc:hive2://myhost.example.com:21050/test_db;user=fred;password=xyz123</codeblock>
|
|
<p> To connect to an instance of Impala over HTTP, specify the HTTP
|
|
port, 28000 by default, and <codeph>transportMode=http</codeph> in the
|
|
connection string. For example:
|
|
<codeblock>jdbc:hive2://myhost.example.com:28000/;transportMode=http</codeblock>
|
|
</p>
|
|
<note>
|
|
<p
|
|
conref="../shared/impala_common.xml#common/hive_jdbc_ssl_kerberos_caveat"
|
|
/>
|
|
</note>
|
|
</section>
|
|
</conbody>
|
|
</concept>
|
|
<concept rev="2.3.0" id="jdbc_odbc_notes">
|
|
<title>Notes about JDBC and ODBC Interaction with Impala SQL
|
|
Features</title>
|
|
<conbody>
|
|
<p> Most Impala SQL features work equivalently through the
|
|
<cmdname>impala-shell</cmdname> interpreter of the JDBC or ODBC APIs.
|
|
The following are some exceptions to keep in mind when switching between
|
|
the interactive shell and applications using the APIs: </p>
|
|
<ul>
|
|
<li>
|
|
<p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
|
|
<ul>
|
|
<li>
|
|
<p> Queries involving the complex types (<codeph>ARRAY</codeph>,
|
|
<codeph>STRUCT</codeph>, and <codeph>MAP</codeph>) require
|
|
notation that might not be available in all levels of JDBC and
|
|
ODBC drivers. If you have trouble querying such a table due to
|
|
the driver level or inability to edit the queries used by the
|
|
application, you can create a view that exposes a
|
|
<q>flattened</q> version of the complex columns and point the
|
|
application at the view. See <xref
|
|
href="impala_complex_types.xml#complex_types"/> for details.
|
|
</p>
|
|
</li>
|
|
<li>
|
|
<p> The complex types available in <keyword keyref="impala23_full"
|
|
/> and higher are supported by the JDBC
|
|
<codeph>getColumns()</codeph> API. Both <codeph>MAP</codeph>
|
|
and <codeph>ARRAY</codeph> are reported as the JDBC SQL Type
|
|
<codeph>ARRAY</codeph>, because this is the closest matching
|
|
Java SQL type. This behavior is consistent with Hive.
|
|
<codeph>STRUCT</codeph> types are reported as the JDBC SQL
|
|
Type <codeph>STRUCT</codeph>. </p>
|
|
<p> To be consistent with Hive's behavior, the TYPE_NAME field is
|
|
populated with the primitive type name for scalar types, and
|
|
with the full <codeph>toSql()</codeph> for complex types. The
|
|
resulting type names are somewhat inconsistent, because nested
|
|
types are printed differently than top-level types. For example,
|
|
the following list shows how <codeph>toSQL()</codeph> for Impala
|
|
types are translated to <codeph>TYPE_NAME</codeph> values: <codeblock><![CDATA[DECIMAL(10,10) becomes DECIMAL
|
|
CHAR(10) becomes CHAR
|
|
VARCHAR(10) becomes VARCHAR
|
|
ARRAY<DECIMAL(10,10)> becomes ARRAY<DECIMAL(10,10)>
|
|
ARRAY<CHAR(10)> becomes ARRAY<CHAR(10)>
|
|
ARRAY<VARCHAR(10)> becomes ARRAY<VARCHAR(10)>
|
|
]]>
|
|
</codeblock>
|
|
</p>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</conbody>
|
|
</concept>
|
|
<concept id="jdbc_kudu">
|
|
<title>Kudu Considerations for DML Statements</title>
|
|
<conbody>
|
|
<p> Currently, Impala <codeph>INSERT</codeph>, <codeph>UPDATE</codeph>, or
|
|
other DML statements issued through the JDBC interface against a Kudu
|
|
table do not return JDBC error codes for conditions such as duplicate
|
|
primary key columns. Therefore, for applications that issue a high
|
|
volume of DML statements, prefer to use the Kudu Java API directly
|
|
rather than a JDBC application. </p>
|
|
</conbody>
|
|
</concept>
|
|
</concept>
|