impala/docs/topics/impala_joins.xml

<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied.  See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="joins">

  <title>Joins in Impala SELECT Statements</title>
  <titlealts audience="PDF"><navtitle>Joins</navtitle></titlealts>
  <prolog>
    <metadata>
      <data name="Category" value="Impala"/>
      <data name="Category" value="Data Analysts"/>
      <data name="Category" value="Developers"/>
      <data name="Category" value="SQL"/>
      <data name="Category" value="Querying"/>
    </metadata>
  </prolog>

  <conbody>

    <p>
      <indexterm audience="hidden">joins</indexterm>
      A join query is a <codeph>SELECT</codeph> statement that combines data from two or more tables,
      and returns a result set containing items from some or all of those tables. It is a way to
      cross-reference and correlate related data that is organized into multiple tables, typically
      using identifiers that are repeated in each of the joined tables.
    </p>

    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>

    <p conref="../shared/impala_common.xml#common/join_types"/>

<codeblock>SELECT <varname>select_list</varname> FROM
  <varname>table_or_subquery1</varname> [INNER] JOIN <varname>table_or_subquery2</varname> |
  <varname>table_or_subquery1</varname> {LEFT [OUTER] | RIGHT [OUTER] | FULL [OUTER]} JOIN <varname>table_or_subquery2</varname> |
  <varname>table_or_subquery1</varname> {LEFT | RIGHT} SEMI JOIN <varname>table_or_subquery2</varname> |
  <ph rev="2.0.0"><varname>table_or_subquery1</varname> {LEFT | RIGHT} ANTI JOIN <varname>table_or_subquery2</varname> |</ph>
    [ ON <varname>col1</varname> = <varname>col2</varname> [AND <varname>col3</varname> = <varname>col4</varname> ...] |
      USING (<varname>col1</varname> [, <varname>col2</varname> ...]) ]
  [<varname>other_join_clause</varname> ...]
[ WHERE <varname>where_clauses</varname> ]

SELECT <varname>select_list</varname> FROM
  <varname>table_or_subquery1</varname>, <varname>table_or_subquery2</varname> [, <varname>table_or_subquery3</varname> ...]
  [<varname>other_join_clause</varname> ...]
WHERE
    <varname>col1</varname> = <varname>col2</varname> [AND <varname>col3</varname> = <varname>col4</varname> ...]

SELECT <varname>select_list</varname> FROM
  <varname>table_or_subquery1</varname> CROSS JOIN <varname>table_or_subquery2</varname>
  [<varname>other_join_clause</varname> ...]
[ WHERE <varname>where_clauses</varname> ]</codeblock>

    <p>
      <b>SQL-92 and SQL-89 Joins:</b>
    </p>

    <p>
      Queries with the explicit <codeph>JOIN</codeph> keywords are known as SQL-92 style joins, referring to the
      level of the SQL standard where they were introduced. The corresponding <codeph>ON</codeph> or
      <codeph>USING</codeph> clauses clearly show which columns are used as the join keys in each case:
    </p>

<codeblock>SELECT t1.c1, t2.c2 FROM <b>t1 JOIN t2</b>
  <b>ON t1.id = t2.id and t1.type_flag = t2.type_flag</b>
  WHERE t1.c1 &gt; 100;

SELECT t1.c1, t2.c2 FROM <b>t1 JOIN t2</b>
  <b>USING (id, type_flag)</b>
  WHERE t1.c1 &gt; 100;</codeblock>

    <p>
      The <codeph>ON</codeph> clause is a general way to compare columns across the two tables, even if the column
      names are different. The <codeph>USING</codeph> clause is a shorthand notation for specifying the join
      columns, when the column names are the same in both tables. You can code equivalent <codeph>WHERE</codeph>
      clauses that compare the columns, instead of <codeph>ON</codeph> or <codeph>USING</codeph> clauses, but that
      practice is not recommended because mixing the join comparisons with other filtering clauses is typically
      less readable and harder to maintain.
    </p>

    <p>
      Queries with a comma-separated list of tables and subqueries are known as SQL-89 style joins. In these
      queries, the equality comparisons between columns of the joined tables go in the <codeph>WHERE</codeph>
      clause alongside other kinds of comparisons. This syntax is easy to learn, but it is also easy to
      accidentally remove a <codeph>WHERE</codeph> clause needed for the join to work correctly.
    </p>

<codeblock>SELECT t1.c1, t2.c2 FROM <b>t1, t2</b>
  WHERE
  <b>t1.id = t2.id AND t1.type_flag = t2.type_flag</b>
  AND t1.c1 &gt; 100;</codeblock>

    <p>
      <b>Self-joins:</b>
    </p>

    <p>
      Impala can do self-joins, for example to join on two different columns in the same table to represent
      parent-child relationships or other tree-structured data. There is no explicit syntax for this; just use the
      same table name for both the left-hand and right-hand table, and assign different table aliases to use when
      referring to the fully qualified column names:
    </p>

<codeblock>-- Combine fields from both parent and child rows.
SELECT lhs.id, rhs.parent, lhs.c1, rhs.c2 FROM tree_data lhs, tree_data rhs WHERE lhs.id = rhs.parent;</codeblock>

    <p>
      <b>Cartesian joins:</b>
    </p>

    <p>
      To avoid producing huge result sets by mistake, Impala does not allow Cartesian joins of the form:
<codeblock>SELECT ... FROM t1 JOIN t2;
SELECT ... FROM t1, t2;</codeblock>
      If you intend to join the tables based on common values, add <codeph>ON</codeph> or <codeph>WHERE</codeph>
      clauses to compare columns across the tables. If you truly intend to do a Cartesian join, use the
      <codeph>CROSS JOIN</codeph> keyword as the join operator. The <codeph>CROSS JOIN</codeph> form does not use
      any <codeph>ON</codeph> clause, because it produces a result set with all combinations of rows from the
      left-hand and right-hand tables. The result set can still be filtered by subsequent <codeph>WHERE</codeph>
      clauses. For example:
    </p>

<codeblock>SELECT ... FROM t1 CROSS JOIN t2;
SELECT ... FROM t1 CROSS JOIN t2 WHERE <varname>tests_on_non_join_columns</varname>;</codeblock>

    <p>
      <b>Inner and outer joins:</b>
    </p>

    <p>
      An inner join is the most common and familiar type: rows in the result set contain the requested columns from
      the appropriate tables, for all combinations of rows where the join columns of the tables have identical
      values. If a column with the same name occurs in both tables, use a fully qualified name or a column alias to
      refer to the column in the select list or other clauses. Impala performs inner joins by default for both
      SQL-89 and SQL-92 join syntax:
    </p>

<codeblock>-- The following 3 forms are all equivalent.
SELECT t1.id, c1, c2 FROM t1, t2 WHERE t1.id = t2.id;
SELECT t1.id, c1, c2 FROM t1 JOIN t2 ON t1.id = t2.id;
SELECT t1.id, c1, c2 FROM t1 INNER JOIN t2 ON t1.id = t2.id;</codeblock>

    <p>
      An outer join retrieves all rows from the left-hand table, or the right-hand table, or both; wherever there
      is no matching data in the table on the other side of the join, the corresponding columns in the result set
      are set to <codeph>NULL</codeph>. To perform an outer join, include the <codeph>OUTER</codeph> keyword in the
      join operator, along with either <codeph>LEFT</codeph>, <codeph>RIGHT</codeph>, or <codeph>FULL</codeph>:
    </p>

<codeblock>SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.id = t2.id;
SELECT * FROM t1 RIGHT OUTER JOIN t2 ON t1.id = t2.id;
SELECT * FROM t1 FULL OUTER JOIN t2 ON t1.id = t2.id;</codeblock>

    <p>
      For outer joins, Impala requires SQL-92 syntax; that is, the <codeph>JOIN</codeph> keyword instead of
      comma-separated table names. Impala does not support vendor extensions such as <codeph>(+)</codeph> or
      <codeph>*=</codeph> notation for doing outer joins with SQL-89 query syntax.
    </p>

    <p>
      <b>Equijoins and Non-Equijoins:</b>
    </p>

    <p>
      By default, Impala requires an equality comparison between the left-hand and right-hand tables, either
      through <codeph>ON</codeph>, <codeph>USING</codeph>, or <codeph>WHERE</codeph> clauses. These types of
      queries are classified broadly as equijoins. Inner, outer, full, and semi joins can all be equijoins based on
      the presence of equality tests between columns in the left-hand and right-hand tables.
    </p>

    <p>
      In Impala 1.2.2 and higher, non-equijoin queries are also possible, with comparisons such as
      <codeph>!=</codeph> or <codeph>&lt;</codeph> between the join columns. These kinds of queries require care to
      avoid producing huge result sets that could exceed resource limits. Once you have planned a non-equijoin
      query that produces a result set of acceptable size, you can code the query using the <codeph>CROSS
      JOIN</codeph> operator, and add the extra comparisons in the <codeph>WHERE</codeph> clause:
    </p>

<codeblock>SELECT * FROM t1 CROSS JOIN t2 WHERE t1.total &gt; t2.maximum_price;</codeblock>

    <p rev="2.3.0">
      In <keyword keyref="impala23_full"/> and higher, additional non-equijoin queries are possible due to the addition
      of nested loop joins. These queries typically involve <codeph>SEMI JOIN</codeph>,
      <codeph>ANTI JOIN</codeph>, or <codeph>FULL OUTER JOIN</codeph> clauses.
      Impala sometimes also uses nested loop joins internally when evaluating <codeph>OUTER JOIN</codeph>
      queries involving complex type columns.
      Query phases involving nested loop joins do not use the spill-to-disk mechanism if they
      exceed the memory limit. Impala decides internally when to use each join mechanism; you cannot
      specify any query hint to choose between the nested loop join or the original hash join algorithm.
    </p>

<codeblock rev="2.3.0">SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.int_col &lt; t2.int_col;</codeblock>

    <p>
      <b>Semi-joins:</b>
    </p>

    <p>
      Semi-joins are a relatively rarely used variation. With the left semi-join, only data from the left-hand
      table is returned, for rows where there is matching data in the right-hand table, based on comparisons
      between join columns in <codeph>ON</codeph> or <codeph>WHERE</codeph> clauses. Only one instance of each row
      from the left-hand table is returned, regardless of how many matching rows exist in the right-hand table.
      <ph rev="2.0.0">A right semi-join (available in Impala 2.0 and higher) reverses the comparison and returns
      data from the right-hand table.</ph>
    </p>

<codeblock>SELECT t1.c1, t1.c2, t1.c2 FROM t1 LEFT SEMI JOIN t2 ON t1.id = t2.id;</codeblock>

    <p>
      <b>Natural joins (not supported):</b>
    </p>

    <p>
      Impala does not support the <codeph>NATURAL JOIN</codeph> operator, again to avoid inconsistent or huge
      result sets. Natural joins do away with the <codeph>ON</codeph> and <codeph>USING</codeph> clauses, and
      instead automatically join on all columns with the same names in the left-hand and right-hand tables. This
      kind of query is not recommended for rapidly evolving data structures such as are typically used in Hadoop.
      Thus, Impala does not support the <codeph>NATURAL JOIN</codeph> syntax, which can produce different query
      results as columns are added to or removed from tables.
    </p>

    <p>
      If you do have any queries that use <codeph>NATURAL JOIN</codeph>, make sure to rewrite them with explicit
      <codeph>USING</codeph> clauses, because Impala could interpret the <codeph>NATURAL</codeph> keyword as a
      table alias:
    </p>

<codeblock>-- 'NATURAL' is interpreted as an alias for 't1' and Impala attempts an inner join,
-- resulting in an error because inner joins require explicit comparisons between columns.
SELECT t1.c1, t2.c2 FROM t1 NATURAL JOIN t2;
ERROR: NotImplementedException: Join with 't2' requires at least one conjunctive equality predicate.
  To perform a Cartesian product between two tables, use a CROSS JOIN.

-- If you expect the tables to have identically named columns with matching values,
-- list the corresponding column names in a USING clause.
SELECT t1.c1, t2.c2 FROM t1 JOIN t2 USING (id, type_flag, name, address);</codeblock>

    <p rev="2.0.0">
      <b>Anti-joins (<keyword keyref="impala20_full"/> and higher only):</b>
    </p>

    <p rev="2.0.0">
      Impala supports the <codeph>LEFT ANTI JOIN</codeph> and <codeph>RIGHT ANTI JOIN</codeph> clauses in
      <keyword keyref="impala20"/> and higher. The <codeph>LEFT</codeph> or <codeph>RIGHT</codeph>
      keyword is required for this kind of join. For <codeph>LEFT ANTI JOIN</codeph>, this clause returns those
      values from the left-hand table that have no matching value in the right-hand table. <codeph>RIGHT ANTI
      JOIN</codeph> reverses the comparison and returns values from the right-hand table. You can express this
      negative relationship either through the <codeph>ANTI JOIN</codeph> clause or through a <codeph>NOT
      EXISTS</codeph> operator with a subquery.
    </p>

<!-- Restriction lifted in Impala 2.0.
<p>
Impala does not support <codeph>WHERE</codeph> clauses
such as <codeph>IN (<varname>subquery</varname>)</codeph>,
<codeph>NOT IN (<varname>subquery</varname>)</codeph>,
<codeph>EXISTS (<varname>subquery</varname>)</codeph>,
and <codeph>NOT EXISTS (<varname>subquery</varname>)</codeph>.
Therefore from a practical standpoint, you cannot
express an anti-join condition, where values from one table
are returned only if no matching values are present in another table.
</p>
-->

    <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>

<!-- To do: reuse some complex types examples with joins here or under Examples farther down. -->

    <p rev="2.3.0">
      When referring to a column with a complex type (<codeph>STRUCT</codeph>, <codeph>ARRAY</codeph>, or <codeph>MAP</codeph>)
      in a query, you use join notation to <q>unpack</q> the scalar fields of the struct, the elements of the array, or
      the key-value pairs of the map. (The join notation is not required for aggregation operations, such as
      <codeph>COUNT()</codeph> or <codeph>SUM()</codeph> for array elements.) Because Impala recognizes which complex type elements are associated with which row
      of the result set, you use the same syntax as for a cross or cartesian join, without an explicit join condition.
      See <xref href="impala_complex_types.xml#complex_types"/> for details about Impala support for complex types.
    </p>

    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>

    <p>
      You typically use join queries in situations like these:
    </p>

    <ul>
      <li>
        When related data arrives from different sources, with each data set physically residing in a separate
        table. For example, you might have address data from business records that you cross-check against phone
        listings or census data.
        <note>
          Impala can join tables of different file formats, including Impala-managed tables and HBase tables. For
          example, you might keep small dimension tables in HBase, for convenience of single-row lookups and
          updates, and for the larger fact tables use Parquet or other binary file format optimized for scan
          operations. Then, you can issue a join query to cross-reference the fact tables with the dimension
          tables.
        </note>
      </li>

      <li>
        When data is normalized, a technique for reducing data duplication by dividing it across multiple tables.
        This kind of organization is often found in data that comes from traditional relational database systems.
        For example, instead of repeating some long string such as a customer name in multiple tables, each table
        might contain a numeric customer ID. Queries that need to display the customer name could <q>join</q> the
        table that specifies which customer ID corresponds to which name.
      </li>

      <li>
        When certain columns are rarely needed for queries, so they are moved into separate tables to reduce
        overhead for common queries. For example, a <codeph>biography</codeph> field might be rarely needed in
        queries on employee data. Putting that field in a separate table reduces the amount of I/O for common
        queries on employee addresses or phone numbers. Queries that do need the <codeph>biography</codeph> column
        can retrieve it by performing a join with that separate table.
      </li>

      <li>
        In <keyword keyref="impala23_full"/> or higher, when referring to complex type columns in queries.
        See <xref href="impala_complex_types.xml#complex_types"/> for details.
      </li>
    </ul>

    <p>
      When comparing columns with the same names in <codeph>ON</codeph> or <codeph>WHERE</codeph> clauses, use the
      fully qualified names such as <codeph><varname>db_name</varname>.<varname>table_name</varname></codeph>, or
      assign table aliases, column aliases, or both to make the code more compact and understandable:
    </p>

<codeblock>select t1.c1 as first_id, t2.c2 as second_id from
  t1 join t2 on first_id = second_id;

select fact.custno, dimension.custno from
  customer_data as fact join customer_address as dimension
  using (custno)</codeblock>

    <note>
      <p>
        Performance for join queries is a crucial aspect for Impala, because complex join queries are
        resource-intensive operations. An efficient join query produces much less network traffic and CPU overhead
        than an inefficient one. For best results:
      </p>
      <ul>
        <li rev="1.2">
          Make sure that both <xref href="impala_perf_stats.xml#perf_stats">table and column statistics</xref> are
          available for all the tables involved in a join query, and especially for the columns referenced in any
          join conditions. Impala uses the statistics to automatically deduce an efficient join order.
          Use <xref href="impala_show.xml#show"><codeph>SHOW TABLE STATS <varname>table_name</varname></codeph> and
          <codeph>SHOW COLUMN STATS <varname>table_name</varname></codeph></xref> to check if statistics are
          already present. Issue the <codeph>COMPUTE STATS <varname>table_name</varname></codeph> for a nonpartitioned table,
          or (in Impala 2.1.0 and higher) <codeph>COMPUTE INCREMENTAL STATS <varname>table_name</varname></codeph>
          for a partitioned table, to collect the initial statistics at both the table and column levels, and to keep the
          statistics up to date after any substantial <codeph>INSERT</codeph> or <codeph>LOAD DATA</codeph> operations.
        </li>

        <li rev="1.2">
          If table or column statistics are not available, join the largest table first. You can check the
          existence of statistics with the <codeph>SHOW TABLE STATS <varname>table_name</varname></codeph> and
          <codeph>SHOW COLUMN STATS <varname>table_name</varname></codeph> statements.
        </li>

        <li rev="1.2.2">
          If table or column statistics are not available, join subsequent tables according to which table has the
          most selective filter, based on overall size and <codeph>WHERE</codeph> clauses. Joining the table with
          the most selective filter results in the fewest number of rows being returned.
        </li>
      </ul>
      <p>
        For more information and examples of performance for join queries, see
        <xref href="impala_perf_joins.xml#perf_joins"/>.
      </p>
    </note>

    <p>
      To control the result set from a join query, include the names of corresponding column names in both tables
      in an <codeph>ON</codeph> or <codeph>USING</codeph> clause, or by coding equality comparisons for those
      columns in the <codeph>WHERE</codeph> clause.
    </p>

<codeblock>[localhost:21000] &gt; select c_last_name, ca_city from customer join customer_address where c_customer_sk = ca_address_sk;
+-------------+-----------------+
| c_last_name | ca_city         |
+-------------+-----------------+
| Lewis       | Fairfield       |
| Moses       | Fairview        |
| Hamilton    | Pleasant Valley |
| White       | Oak Ridge       |
| Moran       | Glendale        |
...
| Richards    | Lakewood         |
| Day         | Lebanon          |
| Painter     | Oak Hill         |
| Bentley     | Greenfield       |
| Jones       | Stringtown       |
+-------------+------------------+
Returned 50000 row(s) in 9.82s</codeblock>

    <p>
      One potential downside of joins is the possibility of excess resource usage in poorly constructed queries.
      Impala imposes restrictions on join queries to guard against such issues. To minimize the chance of runaway
      queries on large data sets, Impala requires every join query to contain at least one equality predicate
      between the columns of the various tables. For example, if <codeph>T1</codeph> contains 1000 rows and
      <codeph>T2</codeph> contains 1,000,000 rows, a query <codeph>SELECT <varname>columns</varname> FROM t1 JOIN
      t2</codeph> could return up to 1 billion rows (1000 * 1,000,000); Impala requires that the query include a
      clause such as <codeph>ON t1.c1 = t2.c2</codeph> or <codeph>WHERE t1.c1 = t2.c2</codeph>.
    </p>

    <p>
      Because even with equality clauses, the result set can still be large, as we saw in the previous example, you
      might use a <codeph>LIMIT</codeph> clause to return a subset of the results:
    </p>

<codeblock>[localhost:21000] &gt; select c_last_name, ca_city from customer, customer_address where c_customer_sk = ca_address_sk limit 10;
+-------------+-----------------+
| c_last_name | ca_city         |
+-------------+-----------------+
| Lewis       | Fairfield       |
| Moses       | Fairview        |
| Hamilton    | Pleasant Valley |
| White       | Oak Ridge       |
| Moran       | Glendale        |
| Sharp       | Lakeview        |
| Wiles       | Farmington      |
| Shipman     | Union           |
| Gilbert     | New Hope        |
| Brunson     | Martinsville    |
+-------------+-----------------+
Returned 10 row(s) in 0.63s</codeblock>

    <p>
      Or you might use additional comparison operators or aggregation functions to condense a large result set into
      a smaller set of values:
    </p>

<codeblock>[localhost:21000] &gt; -- Find the names of customers who live in one particular town.
[localhost:21000] &gt; select distinct c_last_name from customer, customer_address where
  c_customer_sk = ca_address_sk
  and ca_city = "Green Acres";
+---------------+
| c_last_name   |
+---------------+
| Hensley       |
| Pearson       |
| Mayer         |
| Montgomery    |
| Ricks         |
...
| Barrett       |
| Price         |
| Hill          |
| Hansen        |
| Meeks         |
+---------------+
Returned 332 row(s) in 0.97s

[localhost:21000] &gt; -- See how many different customers in this town have names starting with "A".
[localhost:21000] &gt; select count(distinct c_last_name) from customer, customer_address where
  c_customer_sk = ca_address_sk
  and ca_city = "Green Acres"
  and substr(c_last_name,1,1) = "A";
+-----------------------------+
| count(distinct c_last_name) |
+-----------------------------+
| 12                          |
+-----------------------------+
Returned 1 row(s) in 1.00s</codeblock>

    <p>
      Because a join query can involve reading large amounts of data from disk, sending large amounts of data
      across the network, and loading large amounts of data into memory to do the comparisons and filtering, you
      might do benchmarking, performance analysis, and query tuning to find the most efficient join queries for
      your data set, hardware capacity, network configuration, and cluster workload.
    </p>

    <p>
      The two categories of joins in Impala are known as <b>partitioned joins</b> and <b>broadcast joins</b>. If
      inaccurate table or column statistics, or some quirk of the data distribution, causes Impala to choose the
      wrong mechanism for a particular join, consider using query hints as a temporary workaround. For details, see
      <xref href="impala_hints.xml#hints"/>.
    </p>

    <p rev="2.5.0">
      <b>Handling NULLs in Join Columns:</b>
    </p>

    <p rev="2.5.0">
      By default, join key columns do not match if either one contains a <codeph>NULL</codeph> value.
      To treat such columns as equal if both contain <codeph>NULL</codeph>, you can use an expression
      such as <codeph>A = B OR (A IS NULL AND B IS NULL)</codeph>.
      In <keyword keyref="impala25_full"/> and higher, the <codeph>&lt;=&gt;</codeph> operator (shorthand for
      <codeph>IS NOT DISTINCT FROM</codeph>) performs the same comparison in a concise and efficient form.
      The <codeph>&lt;=&gt;</codeph> operator is more efficient in for comparing join keys in a <codeph>NULL</codeph>-safe
      manner, because the operator can use a hash join while the <codeph>OR</codeph> expression cannot.
    </p>

    <p conref="../shared/impala_common.xml#common/example_blurb"/>

    <p>
      The following examples refer to these simple tables containing small sets of integers:
<codeblock>[localhost:21000] &gt; create table t1 (x int);
[localhost:21000] &gt; insert into t1 values (1), (2), (3), (4), (5), (6);

[localhost:21000] &gt; create table t2 (y int);
[localhost:21000] &gt; insert into t2 values (2), (4), (6);

[localhost:21000] &gt; create table t3 (z int);
[localhost:21000] &gt; insert into t3 values (1), (3), (5);
</codeblock>
    </p>

<!-- To do: fill in examples for other join types. -->

    <p>
      The following example demonstrates an anti-join, returning the values from <codeph>T1</codeph> that do not
      exist in <codeph>T2</codeph> (in this case, the odd numbers 1, 3, and 5):
    </p>

<codeblock>[localhost:21000] &gt; select x from t1 left anti join t2 on (t1.x = t2.y);
+---+
| x |
+---+
| 1 |
| 3 |
| 5 |
+---+
</codeblock>

    <p conref="../shared/impala_common.xml#common/related_info"/>

    <p>
      See these tutorials for examples of different kinds of joins:
    </p>

    <ul>
      <li>
        <xref href="impala_tutorial.xml#tut_cross_join"/>
      </li>
    </ul>
  </conbody>
</concept>