impala/docs/topics/impala_jdbc.xml

<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied.  See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="impala_jdbc">
  <title id="jdbc">Configuring Impala to Work with JDBC</title>
  <prolog>
    <metadata>
      <data name="Category" value="Impala"/>
      <data name="Category" value="JDBC"/>
      <data name="Category" value="Java"/>
      <data name="Category" value="SQL"/>
      <data name="Category" value="Querying"/>
      <data name="Category" value="Configuring"/>
      <data name="Category" value="Starting and Stopping"/>
      <data name="Category" value="Developers"/>
    </metadata>
  </prolog>
  <conbody>
    <p> Impala supports the standard JDBC interface, allowing access from
      commercial Business Intelligence tools and custom software written in Java
      or other programming languages. The JDBC driver allows you to access
      Impala from a Java program that you write, or a Business Intelligence or
      similar tool that uses JDBC to communicate with various database products. </p>
    <p> Setting up a JDBC connection to Impala involves the following steps: </p>
    <ul>
      <li> Verifying the communication port where the Impala daemons in your
        cluster are listening for incoming JDBC requests. </li>
      <li> Installing the JDBC driver on every system that runs the JDBC-enabled
        application. </li>
      <li> Specifying a connection string for the JDBC application to access one
        of the servers running the <cmdname>impalad</cmdname> daemon, with the
        appropriate security settings. </li>
    </ul>
    <p outputclass="toc inpage"/>
  </conbody>
  <concept id="jdbc_port">
    <title>Configuring the JDBC Port</title>
    <conbody>
      <p> The following are the default ports that Impala server accepts JDBC
        connections through: <simpletable frame="all"
          relcolwidth="1.0* 1.03* 2.38*" id="simpletable_tr2_gnt_43b">
          <strow>
            <stentry><b>Protocol</b>
            </stentry>
            <stentry><b>Default Port</b>
            </stentry>
            <stentry><b>Flag to Specify an Alternate Port</b>
            </stentry>
          </strow>
          <strow>
            <stentry>HTTP</stentry>
            <stentry>28000</stentry>
            <stentry><codeph>&#8209;&#8209;hs2_http_port</codeph>
            </stentry>
          </strow>
          <strow>
            <stentry>Binary TCP</stentry>
            <stentry>21050</stentry>
            <stentry><codeph>&#8209;&#8209;hs2_port</codeph>
            </stentry>
          </strow>
        </simpletable>
      </p>
      <p> Make sure the port for the protocol you are using is available for
        communication with clients, for example, that it is not blocked by
        firewall software. </p>
      <p> If your JDBC client software connects to a different port, specify
        that alternative port number with the flag in the above table when
        starting the <codeph>impalad</codeph>. </p>
    </conbody>
  </concept>
  <concept id="jdbc_driver_choice">
    <title>Choosing the JDBC Driver</title>
    <prolog>
      <metadata>
        <data name="Category" value="Planning"/>
      </metadata>
    </prolog>
    <conbody>
      <p> In Impala 2.0 and later, you can use the Hive 0.13 or higher JDBC
        driver. If you are already using JDBC applications with an earlier
        Impala release, you should update your JDBC driver, because the Hive
        0.12 driver that was formerly the only choice is not compatible with
        Impala 2.0 and later. </p>
      <p> The Hive JDBC driver provides a substantial speed increase for JDBC
        applications with Impala 2.0 and higher, for queries that return large
        result sets. </p>
    </conbody>
  </concept>
  <concept id="jdbc_setup">
    <title>Enabling Impala JDBC Support on Client Systems</title>
    <prolog>
      <metadata>
        <data name="Category" value="Installing"/>
      </metadata>
    </prolog>
    <conbody>
      <section id="install_hive_driver">
        <title>Using the Hive JDBC Driver</title>
        <p> You install the Hive JDBC driver (<codeph>hive-jdbc</codeph>
          package) through the Linux package manager, on hosts within the
          cluster. The driver consists of several JAR files. The same driver can
          be used by Impala and Hive. </p>
        <p> To get the JAR files, install the Hive JDBC driver on each host in
          the cluster that will run JDBC applications.  </p>
        <note> The latest JDBC driver, corresponding to Hive 0.13, provides
          substantial performance improvements for Impala queries that return
          large result sets. Impala 2.0 and later are compatible with the Hive
          0.13 driver. If you already have an older JDBC driver installed, and
          are running Impala 2.0 or higher, consider upgrading to the latest
          Hive JDBC driver for best performance with JDBC applications. </note>
        <p> If you are using JDBC-enabled applications on hosts outside the
          cluster, you cannot use the the same install procedure on the hosts.
          Install the JDBC driver on at least one cluster host using the
          preceding procedure. Then download the JAR files to each client
          machine that will use JDBC with Impala: </p>
        <codeblock>commons-logging-X.X.X.jar
  hadoop-common.jar
  hive-common-X.XX.X.jar
  hive-jdbc-X.XX.X.jar
  hive-metastore-X.XX.X.jar
  hive-service-X.XX.X.jar
  httpclient-X.X.X.jar
  httpcore-X.X.X.jar
  libfb303-X.X.X.jar
  libthrift-X.X.X.jar
  log4j-X.X.XX.jar
  slf4j-api-X.X.X.jar
  slf4j-logXjXX-X.X.X.jar
  </codeblock>
        <p>
          <b>To enable JDBC support for Impala on the system where you run the
            JDBC application:</b>
        </p>
        <ol>
          <li> Download the JAR files listed above to each client machine.
              <note> For Maven users, see <xref keyref="Impala-JDBC-Example"
                >this sample github page</xref> for an example of the
              dependencies you could add to a <codeph>pom</codeph> file instead
              of downloading the individual JARs. </note>
          </li>
          <li> Store the JAR files in a location of your choosing, ideally a
            directory already referenced in your <codeph>CLASSPATH</codeph>
            setting. For example: <ul>
              <li> On Linux, you might use a location such as
                  <codeph>/opt/jars/</codeph>. </li>
              <li> On Windows, you might use a subdirectory underneath
                  <filepath>C:\Program Files</filepath>. </li>
            </ul>
          </li>
          <li> To successfully load the Impala JDBC driver, client programs must
            be able to locate the associated JAR files. This often means setting
            the <codeph>CLASSPATH</codeph> for the client process to include the
            JARs. Consult the documentation for your JDBC client for more
            details on how to install new JDBC drivers, but some examples of how
            to set <codeph>CLASSPATH</codeph> variables include: <ul>
              <li> On Linux, if you extracted the JARs to
                  <codeph>/opt/jars/</codeph>, you might issue the following
                command to prepend the JAR files path to an existing classpath:
                <codeblock>export CLASSPATH=/opt/jars/*.jar:$CLASSPATH</codeblock>
              </li>
              <li> On Windows, use the <b>System Properties</b> control panel
                item to modify the <b>Environment Variables</b> for your system.
                Modify the environment variables to include the path to which
                you extracted the files. <note> If the existing
                    <codeph>CLASSPATH</codeph> on your client machine refers to
                  some older version of the Hive JARs, ensure that the new JARs
                  are the first ones listed. Either put the new JAR files
                  earlier in the listings, or delete the other references to
                  Hive JAR files. </note>
              </li>
            </ul>
          </li>
        </ol>
      </section>
    </conbody>
  </concept>
  <concept id="jdbc_connect">
    <title>Establishing JDBC Connections</title>
    <conbody>
      <p> The JDBC driver class depends on which driver you select. </p>
      <note conref="../shared/impala_common.xml#common/proxy_jdbc_caveat"/>
      <section id="class_hive_driver">
        <title>Using the Hive JDBC Driver</title>
        <p> For example, with the Hive JDBC driver, the class name is
            <codeph>org.apache.hive.jdbc.HiveDriver</codeph>. Once you have
          configured Impala to work with JDBC, you can establish connections
          between the two. To do so for a cluster that does not use Kerberos
          authentication, use a connection string of the form
              <codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;auth=noSasl</codeph>.
          <!--
        Include the <codeph>auth=noSasl</codeph> argument
        only when connecting to a non-Kerberos cluster; if Kerberos is enabled, omit the <codeph>auth</codeph> argument.
-->
          For example, you might use: </p>
        <codeblock>jdbc:hive2://myhost.example.com:21050/;auth=noSasl</codeblock>
        <p> To connect to an instance of Impala that requires Kerberos
          authentication, use a connection string of the form
              <codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;principal=<varname>principal_name</varname></codeph>.
          The principal must be the same user principal you used when starting
          Impala. For example, you might use: </p>
        <codeblock>jdbc:hive2://myhost.example.com:21050/;principal=impala/myhost.example.com@H2.EXAMPLE.COM</codeblock>
        <p> To connect to an instance of Impala that requires LDAP
          authentication, use a connection string of the form
              <codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/<varname>db_name</varname>;user=<varname>ldap_userid</varname>;password=<varname>ldap_password</varname></codeph>.
          For example, you might use: </p>
        <codeblock>jdbc:hive2://myhost.example.com:21050/test_db;user=fred;password=xyz123</codeblock>
        <p> To connect to an instance of Impala over HTTP, specify the HTTP
          port, 28000 by default, and <codeph>transportMode=http</codeph> in the
          connection string. For example:
          <codeblock>jdbc:hive2://myhost.example.com:28000/;transportMode=http</codeblock>
        </p>
        <note>
          <p
            conref="../shared/impala_common.xml#common/hive_jdbc_ssl_kerberos_caveat"
          />
        </note>
      </section>
    </conbody>
  </concept>
  <concept rev="2.3.0" id="jdbc_odbc_notes">
    <title>Notes about JDBC and ODBC Interaction with Impala SQL
      Features</title>
    <conbody>
      <p> Most Impala SQL features work equivalently through the
          <cmdname>impala-shell</cmdname> interpreter of the JDBC or ODBC APIs.
        The following are some exceptions to keep in mind when switching between
        the interactive shell and applications using the APIs: </p>
      <ul>
        <li>
          <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
          <ul>
            <li>
              <p> Queries involving the complex types (<codeph>ARRAY</codeph>,
                  <codeph>STRUCT</codeph>, and <codeph>MAP</codeph>) require
                notation that might not be available in all levels of JDBC and
                ODBC drivers. If you have trouble querying such a table due to
                the driver level or inability to edit the queries used by the
                application, you can create a view that exposes a
                  <q>flattened</q> version of the complex columns and point the
                application at the view. See <xref
                  href="impala_complex_types.xml#complex_types"/> for details.
              </p>
            </li>
            <li>
              <p> The complex types available in <keyword keyref="impala23_full"
                /> and higher are supported by the JDBC
                  <codeph>getColumns()</codeph> API. Both <codeph>MAP</codeph>
                and <codeph>ARRAY</codeph> are reported as the JDBC SQL Type
                  <codeph>ARRAY</codeph>, because this is the closest matching
                Java SQL type. This behavior is consistent with Hive.
                  <codeph>STRUCT</codeph> types are reported as the JDBC SQL
                Type <codeph>STRUCT</codeph>. </p>
              <p> To be consistent with Hive's behavior, the TYPE_NAME field is
                populated with the primitive type name for scalar types, and
                with the full <codeph>toSql()</codeph> for complex types. The
                resulting type names are somewhat inconsistent, because nested
                types are printed differently than top-level types. For example,
                the following list shows how <codeph>toSQL()</codeph> for Impala
                types are translated to <codeph>TYPE_NAME</codeph> values: <codeblock><![CDATA[DECIMAL(10,10)         becomes  DECIMAL
CHAR(10)               becomes  CHAR
VARCHAR(10)            becomes  VARCHAR
ARRAY<DECIMAL(10,10)>  becomes  ARRAY<DECIMAL(10,10)>
ARRAY<CHAR(10)>        becomes  ARRAY<CHAR(10)>
ARRAY<VARCHAR(10)>     becomes  ARRAY<VARCHAR(10)>
]]>
</codeblock>
              </p>
            </li>
          </ul>
        </li>
      </ul>
    </conbody>
  </concept>
  <concept id="jdbc_kudu">
    <title>Kudu Considerations for DML Statements</title>
    <conbody>
      <p> Currently, Impala <codeph>INSERT</codeph>, <codeph>UPDATE</codeph>, or
        other DML statements issued through the JDBC interface against a Kudu
        table do not return JDBC error codes for conditions such as duplicate
        primary key columns. Therefore, for applications that issue a high
        volume of DML statements, prefer to use the Kudu Java API directly
        rather than a JDBC application. </p>
    </conbody>
  </concept>
</concept>