mirror of
https://github.com/apache/impala.git
synced 2025-12-30 03:01:44 -05:00
For this change to land in master, the audience="hidden" code review needs to be completed first. Otherwise, the doc build would still work but the audience="hidden" content would be visible rather than hidden as desired. Some work happening in parallel might introduce additional instances of audience="Cloudera". I suggest addressing those in a followup CR so this global change can land quickly. Since the changes apply across so many different files, but are so narrow in scope, I suggest that the way to validate (check that no extraneous changes were introduced accidentally) is to diff just the changed lines: git diff -U0 HEAD^ HEAD In patch set 2, I updated other topics marked audience="Cloudera" by CRs that were pushed in the meantime. Change-Id: Ic93d89da77e1f51bbf548a522d98d0c4e2fb31c8 Reviewed-on: http://gerrit.cloudera.org:8080/5613 Reviewed-by: John Russell <jrussell@cloudera.com> Tested-by: Impala Public Jenkins
340 lines
14 KiB
XML
340 lines
14 KiB
XML
<?xml version="1.0" encoding="UTF-8"?><!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="impala_jdbc">
|
|
|
|
<title id="jdbc">Configuring Impala to Work with JDBC</title>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="JDBC"/>
|
|
<data name="Category" value="Java"/>
|
|
<data name="Category" value="SQL"/>
|
|
<data name="Category" value="Querying"/>
|
|
<data name="Category" value="Configuring"/>
|
|
<data name="Category" value="Starting and Stopping"/>
|
|
<data name="Category" value="Developers"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
<indexterm audience="hidden">JDBC</indexterm>
|
|
Impala supports the standard JDBC interface, allowing access from commercial Business Intelligence tools and
|
|
custom software written in Java or other programming languages. The JDBC driver allows you to access Impala
|
|
from a Java program that you write, or a Business Intelligence or similar tool that uses JDBC to communicate
|
|
with various database products.
|
|
</p>
|
|
|
|
<p>
|
|
Setting up a JDBC connection to Impala involves the following steps:
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
Verifying the communication port where the Impala daemons in your cluster are listening for incoming JDBC
|
|
requests.
|
|
</li>
|
|
|
|
<li>
|
|
Installing the JDBC driver on every system that runs the JDBC-enabled application.
|
|
</li>
|
|
|
|
<li>
|
|
Specifying a connection string for the JDBC application to access one of the servers running the
|
|
<cmdname>impalad</cmdname> daemon, with the appropriate security settings.
|
|
</li>
|
|
</ul>
|
|
|
|
<p outputclass="toc inpage"/>
|
|
</conbody>
|
|
|
|
<concept id="jdbc_port">
|
|
|
|
<title>Configuring the JDBC Port</title>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
The default port used by JDBC 2.0 and later (as well as ODBC 2.x) is 21050. Impala server accepts JDBC
|
|
connections through this same port 21050 by default. Make sure this port is available for communication
|
|
with other hosts on your network, for example, that it is not blocked by firewall software. If your JDBC
|
|
client software connects to a different port, specify that alternative port number with the
|
|
<codeph>--hs2_port</codeph> option when starting <codeph>impalad</codeph>. See
|
|
<xref href="impala_processes.xml#processes"/> for details about Impala startup options. See
|
|
<xref href="impala_ports.xml#ports"/> for information about all ports used for communication between Impala
|
|
and clients or between Impala components.
|
|
</p>
|
|
</conbody>
|
|
</concept>
|
|
|
|
<concept id="jdbc_driver_choice">
|
|
|
|
<title>Choosing the JDBC Driver</title>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Planning"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
In Impala 2.0 and later, you can use the Hive 0.13 JDBC driver. If you are
|
|
already using JDBC applications with an earlier Impala release, you should update
|
|
your JDBC driver, because the Hive 0.12 driver that was formerly the only choice
|
|
is not compatible with Impala 2.0 and later.
|
|
</p>
|
|
|
|
<p>
|
|
The Hive JDBC driver provides a substantial speed increase for JDBC
|
|
applications with Impala 2.0 and higher, for queries that return large result sets.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/jdbc_odbc_complex_types"/>
|
|
<p conref="../shared/impala_common.xml#common/jdbc_odbc_complex_types_views"/>
|
|
|
|
</conbody>
|
|
</concept>
|
|
|
|
<concept id="jdbc_setup">
|
|
|
|
<title>Enabling Impala JDBC Support on Client Systems</title>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Installing"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<section id="install_hive_driver">
|
|
<title>Using the Hive JDBC Driver</title>
|
|
<p>
|
|
You install the Hive JDBC driver (<codeph>hive-jdbc</codeph> package) through the Linux package manager, on
|
|
hosts within the cluster. The driver consists of several Java JAR files. The same driver can be used by Impala and Hive.
|
|
</p>
|
|
|
|
<p>
|
|
To get the JAR files, install the Hive JDBC driver on each host in the cluster that will run
|
|
JDBC applications. <!-- TODO: Find a URL to point to for instructions and downloads -->
|
|
</p>
|
|
|
|
<note>
|
|
The latest JDBC driver, corresponding to Hive 0.13, provides substantial performance improvements for
|
|
Impala queries that return large result sets. Impala 2.0 and later are compatible with the Hive 0.13
|
|
driver. If you already have an older JDBC driver installed, and are running Impala 2.0 or higher, consider
|
|
upgrading to the latest Hive JDBC driver for best performance with JDBC applications.
|
|
</note>
|
|
|
|
<p>
|
|
If you are using JDBC-enabled applications on hosts outside the cluster, you cannot use the the same install
|
|
procedure on the hosts. Install the JDBC driver on at least one cluster host using the preceding
|
|
procedure. Then download the JAR files to each client machine that will use JDBC with Impala:
|
|
</p>
|
|
|
|
<codeblock>commons-logging-X.X.X.jar
|
|
hadoop-common.jar
|
|
hive-common-X.XX.X.jar
|
|
hive-jdbc-X.XX.X.jar
|
|
hive-metastore-X.XX.X.jar
|
|
hive-service-X.XX.X.jar
|
|
httpclient-X.X.X.jar
|
|
httpcore-X.X.X.jar
|
|
libfb303-X.X.X.jar
|
|
libthrift-X.X.X.jar
|
|
log4j-X.X.XX.jar
|
|
slf4j-api-X.X.X.jar
|
|
slf4j-logXjXX-X.X.X.jar
|
|
</codeblock>
|
|
|
|
<p>
|
|
<b>To enable JDBC support for Impala on the system where you run the JDBC application:</b>
|
|
</p>
|
|
|
|
<ol>
|
|
<li>
|
|
Download the JAR files listed above to each client machine.
|
|
<note>
|
|
For Maven users, see
|
|
<xref href="https://github.com/onefoursix/Cloudera-Impala-JDBC-Example" scope="external" format="html">this
|
|
sample github page</xref> for an example of the dependencies you could add to a <codeph>pom</codeph>
|
|
file instead of downloading the individual JARs.
|
|
</note>
|
|
</li>
|
|
|
|
<li>
|
|
Store the JAR files in a location of your choosing, ideally a directory already referenced in your
|
|
<codeph>CLASSPATH</codeph> setting. For example:
|
|
<ul>
|
|
<li>
|
|
On Linux, you might use a location such as
|
|
<codeph>/</codeph><codeph>opt</codeph><codeph>/jars/</codeph>.
|
|
</li>
|
|
|
|
<li>
|
|
On Windows, you might use a subdirectory underneath <filepath>C:\Program Files</filepath>.
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>
|
|
To successfully load the Impala JDBC driver, client programs must be able to locate the associated JAR
|
|
files. This often means setting the <codeph>CLASSPATH</codeph> for the client process to include the
|
|
JARs. Consult the documentation for your JDBC client for more details on how to install new JDBC drivers,
|
|
but some examples of how to set <codeph>CLASSPATH</codeph> variables include:
|
|
<ul>
|
|
<li>
|
|
On Linux, if you extracted the JARs to <codeph>/opt/jars/</codeph>, you might issue the following
|
|
command to prepend the JAR files path to an existing classpath:
|
|
<codeblock>export CLASSPATH=/opt/jars/*.jar:$CLASSPATH</codeblock>
|
|
</li>
|
|
|
|
<li>
|
|
On Windows, use the <b>System Properties</b> control panel item to modify the <b>Environment
|
|
Variables</b> for your system. Modify the environment variables to include the path to which you
|
|
extracted the files.
|
|
<note>
|
|
If the existing <codeph>CLASSPATH</codeph> on your client machine refers to some older version of
|
|
the Hive JARs, ensure that the new JARs are the first ones listed. Either put the new JAR files
|
|
earlier in the listings, or delete the other references to Hive JAR files.
|
|
</note>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
</ol>
|
|
</section>
|
|
|
|
</conbody>
|
|
</concept>
|
|
|
|
<concept id="jdbc_connect">
|
|
|
|
<title>Establishing JDBC Connections</title>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
The JDBC driver class depends on which driver you select.
|
|
</p>
|
|
|
|
<note conref="../shared/impala_common.xml#common/proxy_jdbc_caveat"/>
|
|
|
|
<section id="class_hive_driver">
|
|
<title>Using the Hive JDBC Driver</title>
|
|
|
|
<p>
|
|
For example, with the Hive JDBC driver, the class name is <codeph>org.apache.hive.jdbc.HiveDriver</codeph>.
|
|
Once you have configured Impala to work with JDBC, you can establish connections between the two.
|
|
To do so for a cluster that does not use
|
|
Kerberos authentication, use a connection string of the form
|
|
<codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;auth=noSasl</codeph>.
|
|
<!--
|
|
Include the <codeph>auth=noSasl</codeph> argument
|
|
only when connecting to a non-Kerberos cluster; if Kerberos is enabled, omit the <codeph>auth</codeph> argument.
|
|
-->
|
|
For example, you might use:
|
|
</p>
|
|
|
|
<codeblock>jdbc:hive2://myhost.example.com:21050/;auth=noSasl</codeblock>
|
|
|
|
<p>
|
|
To connect to an instance of Impala that requires Kerberos authentication, use a connection string of the
|
|
form
|
|
<codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;principal=<varname>principal_name</varname></codeph>.
|
|
The principal must be the same user principal you used when starting Impala. For example, you might use:
|
|
</p>
|
|
|
|
<codeblock>jdbc:hive2://myhost.example.com:21050/;principal=impala/myhost.example.com@H2.EXAMPLE.COM</codeblock>
|
|
|
|
<p>
|
|
To connect to an instance of Impala that requires LDAP authentication, use a connection string of the form
|
|
<codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/<varname>db_name</varname>;user=<varname>ldap_userid</varname>;password=<varname>ldap_password</varname></codeph>.
|
|
For example, you might use:
|
|
</p>
|
|
|
|
<codeblock>jdbc:hive2://myhost.example.com:21050/test_db;user=fred;password=xyz123</codeblock>
|
|
|
|
<note>
|
|
<p conref="../shared/impala_common.xml#common/hive_jdbc_ssl_kerberos_caveat"/>
|
|
</note>
|
|
|
|
</section>
|
|
|
|
</conbody>
|
|
</concept>
|
|
|
|
<concept rev="2.3.0" id="jdbc_odbc_notes">
|
|
<title>Notes about JDBC and ODBC Interaction with Impala SQL Features</title>
|
|
<conbody>
|
|
<p>
|
|
Most Impala SQL features work equivalently through the <cmdname>impala-shell</cmdname> interpreter
|
|
of the JDBC or ODBC APIs. The following are some exceptions to keep in mind when switching between
|
|
the interactive shell and applications using the APIs:
|
|
</p>
|
|
<ul>
|
|
<li>
|
|
<p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
|
|
<ul>
|
|
<li>
|
|
<p>
|
|
Queries involving the complex types (<codeph>ARRAY</codeph>, <codeph>STRUCT</codeph>, and <codeph>MAP</codeph>)
|
|
require notation that might not be available in all levels of JDBC and ODBC drivers.
|
|
If you have trouble querying such a table due to the driver level or
|
|
inability to edit the queries used by the application, you can create a view that exposes
|
|
a <q>flattened</q> version of the complex columns and point the application at the view.
|
|
See <xref href="impala_complex_types.xml#complex_types"/> for details.
|
|
</p>
|
|
</li>
|
|
<li>
|
|
<p>
|
|
The complex types available in <keyword keyref="impala23_full"/> and higher are supported by the
|
|
JDBC <codeph>getColumns()</codeph> API.
|
|
Both <codeph>MAP</codeph> and <codeph>ARRAY</codeph> are reported as the JDBC SQL Type <codeph>ARRAY</codeph>,
|
|
because this is the closest matching Java SQL type. This behavior is consistent with Hive.
|
|
<codeph>STRUCT</codeph> types are reported as the JDBC SQL Type <codeph>STRUCT</codeph>.
|
|
</p>
|
|
<p>
|
|
To be consistent with Hive's behavior, the TYPE_NAME field is populated
|
|
with the primitive type name for scalar types, and with the full <codeph>toSql()</codeph>
|
|
for complex types. The resulting type names are somewhat inconsistent,
|
|
because nested types are printed differently than top-level types. For example,
|
|
the following list shows how <codeph>toSQL()</codeph> for Impala types are
|
|
translated to <codeph>TYPE_NAME</codeph> values:
|
|
<codeblock><![CDATA[DECIMAL(10,10) becomes DECIMAL
|
|
CHAR(10) becomes CHAR
|
|
VARCHAR(10) becomes VARCHAR
|
|
ARRAY<DECIMAL(10,10)> becomes ARRAY<DECIMAL(10,10)>
|
|
ARRAY<CHAR(10)> becomes ARRAY<CHAR(10)>
|
|
ARRAY<VARCHAR(10)> becomes ARRAY<VARCHAR(10)>
|
|
]]>
|
|
</codeblock>
|
|
</p>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</conbody>
|
|
</concept>
|
|
|
|
</concept>
|