mirror of
https://github.com/apache/impala.git
synced 2025-12-30 12:02:10 -05:00
This now gives a clean RAT check with bin/check-rat-report.py, which is one way for the Impala community to check compliance with ASF rules on intellectual property. Change-Id: I2ad06435f84a65ba126759e42a18fdaf52cd7036 Reviewed-on: http://gerrit.cloudera.org:8080/5232 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins Reviewed-by: John Russell <jrussell@cloudera.com>
452 lines
19 KiB
XML
452 lines
19 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="struct">
|
|
|
|
<title>STRUCT Complex Type (<keyword keyref="impala23"/> or higher only)</title>
|
|
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="Impala Data Types"/>
|
|
<data name="Category" value="SQL"/>
|
|
<data name="Category" value="Developers"/>
|
|
<data name="Category" value="Data Analysts"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
A complex data type, representing multiple fields of a single item. Frequently used as the element type of an <codeph>ARRAY</codeph>
|
|
or the <codeph>VALUE</codeph> part of a <codeph>MAP</codeph>.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/syntax_blurb"/>
|
|
|
|
<codeblock><varname>column_name</varname> STRUCT < <varname>name</varname> : <varname>type</varname> [COMMENT '<varname>comment_string</varname>'], ... >
|
|
|
|
type ::= <varname>primitive_type</varname> | <varname>complex_type</varname>
|
|
</codeblock>
|
|
|
|
<p>
|
|
The names and number of fields within the <codeph>STRUCT</codeph> are fixed. Each field can be a different type. A field within a
|
|
<codeph>STRUCT</codeph> can also be another <codeph>STRUCT</codeph>, or an <codeph>ARRAY</codeph> or a <codeph>MAP</codeph>, allowing
|
|
you to create nested data structures with a maximum nesting depth of 100.
|
|
</p>
|
|
|
|
<p>
|
|
A <codeph>STRUCT</codeph> can be the top-level type for a column, or can itself be an item within an <codeph>ARRAY</codeph> or the
|
|
value part of the key-value pair in a <codeph>MAP</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
When a <codeph>STRUCT</codeph> is used as an <codeph>ARRAY</codeph> element or a <codeph>MAP</codeph> value, you use a join clause to
|
|
bring the <codeph>ARRAY</codeph> or <codeph>MAP</codeph> elements into the result set, and then refer to
|
|
<codeph><varname>array_name</varname>.ITEM.<varname>field</varname></codeph> or
|
|
<codeph><varname>map_name</varname>.VALUE.<varname>field</varname></codeph>. In the case of a <codeph>STRUCT</codeph> directly inside
|
|
an <codeph>ARRAY</codeph> or <codeph>MAP</codeph>, you can omit the <codeph>.ITEM</codeph> and <codeph>.VALUE</codeph> pseudocolumns
|
|
and refer directly to <codeph><varname>array_name</varname>.<varname>field</varname></codeph> or
|
|
<codeph><varname>map_name</varname>.<varname>field</varname></codeph>.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/complex_types_combo"/>
|
|
|
|
<p>
|
|
A <codeph>STRUCT</codeph> is similar conceptually to a table row: it contains a fixed number of named fields, each with a predefined
|
|
type. To combine two related tables, while using complex types to minimize repetition, the typical way to represent that data is as an
|
|
<codeph>ARRAY</codeph> of <codeph>STRUCT</codeph> elements.
|
|
</p>
|
|
|
|
<p>
|
|
Because a <codeph>STRUCT</codeph> has a fixed number of named fields, it typically does not make sense to have a
|
|
<codeph>STRUCT</codeph> as the type of a table column. In such a case, you could just make each field of the <codeph>STRUCT</codeph>
|
|
into a separate column of the table. The <codeph>STRUCT</codeph> type is most useful as an item of an <codeph>ARRAY</codeph> or the
|
|
value part of the key-value pair in a <codeph>MAP</codeph>. A nested type column with a <codeph>STRUCT</codeph> at the lowest level
|
|
lets you associate a variable number of row-like objects with each row of the table.
|
|
</p>
|
|
|
|
<p>
|
|
The <codeph>STRUCT</codeph> type is straightforward to reference within a query. You do not need to include the
|
|
<codeph>STRUCT</codeph> column in a join clause or give it a table alias, as is required for the <codeph>ARRAY</codeph> and
|
|
<codeph>MAP</codeph> types. You refer to the individual fields using dot notation, such as
|
|
<codeph><varname>struct_column_name</varname>.<varname>field_name</varname></codeph>, without any pseudocolumn such as
|
|
<codeph>ITEM</codeph> or <codeph>VALUE</codeph>.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/complex_types_describe"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/internals_blurb"/>
|
|
|
|
<p>
|
|
Within the Parquet data file, the values for each <codeph>STRUCT</codeph> field are stored adjacent to each other, so that they can be
|
|
encoded and compressed using all the Parquet techniques for storing sets of similar or repeated values. The adjacency applies even
|
|
when the <codeph>STRUCT</codeph> values are part of an <codeph>ARRAY</codeph> or <codeph>MAP</codeph>. During a query, Impala avoids
|
|
unnecessary I/O by reading only the portions of the Parquet data file containing the requested <codeph>STRUCT</codeph> fields.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/added_in_230"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
|
|
|
|
<ul conref="../shared/impala_common.xml#common/complex_types_restrictions">
|
|
<li/>
|
|
</ul>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
|
|
|
<note conref="../shared/impala_common.xml#common/complex_type_schema_pointer"/>
|
|
|
|
<p>
|
|
The following example shows a table with various kinds of <codeph>STRUCT</codeph> columns, both at the top level and nested within
|
|
other complex types. Practice the <codeph>CREATE TABLE</codeph> and query notation for complex type columns using empty tables, until
|
|
you can visualize a complex data structure and construct corresponding SQL statements reliably.
|
|
</p>
|
|
|
|
<codeblock><![CDATA[CREATE TABLE struct_demo
|
|
(
|
|
id BIGINT,
|
|
name STRING,
|
|
|
|
-- A STRUCT as a top-level column. Demonstrates how the table ID column
|
|
-- and the ID field within the STRUCT can coexist without a name conflict.
|
|
employee_info STRUCT < employer: STRING, id: BIGINT, address: STRING >,
|
|
|
|
-- A STRUCT as the element type of an ARRAY.
|
|
places_lived ARRAY < STRUCT <street: STRING, city: STRING, country: STRING >>,
|
|
|
|
-- A STRUCT as the value portion of the key-value pairs in a MAP.
|
|
memorable_moments MAP < STRING, STRUCT < year: INT, place: STRING, details: STRING >>,
|
|
|
|
-- A STRUCT where one of the fields is another STRUCT.
|
|
current_address STRUCT < street_address: STRUCT <street_number: INT, street_name: STRING, street_type: STRING>, country: STRING, postal_code: STRING >
|
|
)
|
|
STORED AS PARQUET;
|
|
]]>
|
|
</codeblock>
|
|
|
|
<p>
|
|
The following example shows how to examine the structure of a table containing one or more <codeph>STRUCT</codeph> columns by using
|
|
the <codeph>DESCRIBE</codeph> statement. You can visualize each <codeph>STRUCT</codeph> as its own table, with columns named the same
|
|
as each field of the <codeph>STRUCT</codeph>. If the <codeph>STRUCT</codeph> is nested inside another complex type, such as
|
|
<codeph>ARRAY</codeph>, you can extend the qualified name passed to <codeph>DESCRIBE</codeph> until the output shows just the
|
|
<codeph>STRUCT</codeph> fields.
|
|
</p>
|
|
|
|
<codeblock><![CDATA[DESCRIBE struct_demo;
|
|
+-------------------+--------------------------+
|
|
| name | type |
|
|
+-------------------+--------------------------+
|
|
| id | bigint |
|
|
| name | string |
|
|
| employee_info | struct< |
|
|
| | employer:string, |
|
|
| | id:bigint, |
|
|
| | address:string |
|
|
| | > |
|
|
| places_lived | array<struct< |
|
|
| | street:string, |
|
|
| | city:string, |
|
|
| | country:string |
|
|
| | >> |
|
|
| memorable_moments | map<string,struct< |
|
|
| | year:int, |
|
|
| | place:string, |
|
|
| | details:string |
|
|
| | >> |
|
|
| current_address | struct< |
|
|
| | street_address:struct< |
|
|
| | street_number:int, |
|
|
| | street_name:string, |
|
|
| | street_type:string |
|
|
| | >, |
|
|
| | country:string, |
|
|
| | postal_code:string |
|
|
| | > |
|
|
+-------------------+--------------------------+
|
|
]]>
|
|
</codeblock>
|
|
|
|
<p>
|
|
The top-level column <codeph>EMPLOYEE_INFO</codeph> is a <codeph>STRUCT</codeph>. Describing
|
|
<codeph><varname>table_name</varname>.<varname>struct_name</varname></codeph> displays the fields of the <codeph>STRUCT</codeph> as if
|
|
they were columns of a table:
|
|
</p>
|
|
|
|
<codeblock><![CDATA[DESCRIBE struct_demo.employee_info;
|
|
+----------+--------+
|
|
| name | type |
|
|
+----------+--------+
|
|
| employer | string |
|
|
| id | bigint |
|
|
| address | string |
|
|
+----------+--------+
|
|
]]>
|
|
</codeblock>
|
|
|
|
<p>
|
|
Because <codeph>PLACES_LIVED</codeph> is a <codeph>STRUCT</codeph> inside an <codeph>ARRAY</codeph>, the initial
|
|
<codeph>DESCRIBE</codeph> shows the structure of the <codeph>ARRAY</codeph>:
|
|
</p>
|
|
|
|
<codeblock><![CDATA[DESCRIBE struct_demo.places_lived;
|
|
+------+------------------+
|
|
| name | type |
|
|
+------+------------------+
|
|
| item | struct< |
|
|
| | street:string, |
|
|
| | city:string, |
|
|
| | country:string |
|
|
| | > |
|
|
| pos | bigint |
|
|
+------+------------------+
|
|
]]>
|
|
</codeblock>
|
|
|
|
<p>
|
|
Ask for the details of the <codeph>ITEM</codeph> field of the <codeph>ARRAY</codeph> to see just the layout of the
|
|
<codeph>STRUCT</codeph>:
|
|
</p>
|
|
|
|
<codeblock><![CDATA[DESCRIBE struct_demo.places_lived.item;
|
|
+---------+--------+
|
|
| name | type |
|
|
+---------+--------+
|
|
| street | string |
|
|
| city | string |
|
|
| country | string |
|
|
+---------+--------+
|
|
]]>
|
|
</codeblock>
|
|
|
|
<p>
|
|
Likewise, <codeph>MEMORABLE_MOMENTS</codeph> has a <codeph>STRUCT</codeph> inside a <codeph>MAP</codeph>, which requires an extra
|
|
level of qualified name to see just the <codeph>STRUCT</codeph> part:
|
|
</p>
|
|
|
|
<codeblock><![CDATA[DESCRIBE struct_demo.memorable_moments;
|
|
+-------+------------------+
|
|
| name | type |
|
|
+-------+------------------+
|
|
| key | string |
|
|
| value | struct< |
|
|
| | year:int, |
|
|
| | place:string, |
|
|
| | details:string |
|
|
| | > |
|
|
+-------+------------------+
|
|
]]>
|
|
</codeblock>
|
|
|
|
<p>
|
|
For a <codeph>MAP</codeph>, ask to see the <codeph>VALUE</codeph> field to see the corresponding <codeph>STRUCT</codeph> fields in a
|
|
table-like structure:
|
|
</p>
|
|
|
|
<codeblock><![CDATA[DESCRIBE struct_demo.memorable_moments.value;
|
|
+---------+--------+
|
|
| name | type |
|
|
+---------+--------+
|
|
| year | int |
|
|
| place | string |
|
|
| details | string |
|
|
+---------+--------+
|
|
]]>
|
|
</codeblock>
|
|
|
|
<p>
|
|
For a <codeph>STRUCT</codeph> inside a <codeph>STRUCT</codeph>, we can see the fields of the outer <codeph>STRUCT</codeph>:
|
|
</p>
|
|
|
|
<codeblock><![CDATA[DESCRIBE struct_demo.current_address;
|
|
+----------------+-----------------------+
|
|
| name | type |
|
|
+----------------+-----------------------+
|
|
| street_address | struct< |
|
|
| | street_number:int, |
|
|
| | street_name:string, |
|
|
| | street_type:string |
|
|
| | > |
|
|
| country | string |
|
|
| postal_code | string |
|
|
+----------------+-----------------------+
|
|
]]>
|
|
</codeblock>
|
|
|
|
<p>
|
|
Then we can use a further qualified name to see just the fields of the inner <codeph>STRUCT</codeph>:
|
|
</p>
|
|
|
|
<codeblock><![CDATA[DESCRIBE struct_demo.current_address.street_address;
|
|
+---------------+--------+
|
|
| name | type |
|
|
+---------------+--------+
|
|
| street_number | int |
|
|
| street_name | string |
|
|
| street_type | string |
|
|
+---------------+--------+
|
|
]]>
|
|
</codeblock>
|
|
|
|
<p>
|
|
The following example shows how to examine the structure of a table containing one or more <codeph>STRUCT</codeph> columns by using
|
|
the <codeph>DESCRIBE</codeph> statement. You can visualize each <codeph>STRUCT</codeph> as its own table, with columns named the same
|
|
as each field of the <codeph>STRUCT</codeph>. If the <codeph>STRUCT</codeph> is nested inside another complex type, such as
|
|
<codeph>ARRAY</codeph>, you can extend the qualified name passed to <codeph>DESCRIBE</codeph> until the output shows just the
|
|
<codeph>STRUCT</codeph> fields.
|
|
</p>
|
|
|
|
<!-- To do: See why the most verbose query form gives an error. -->
|
|
|
|
<codeblock><![CDATA[DESCRIBE struct_demo;
|
|
+-------------------+--------------------------+---------+
|
|
| name | type | comment |
|
|
+-------------------+--------------------------+---------+
|
|
| id | bigint | |
|
|
| name | string | |
|
|
| employee_info | struct< | |
|
|
| | employer:string, | |
|
|
| | id:bigint, | |
|
|
| | address:string | |
|
|
| | > | |
|
|
| places_lived | array<struct< | |
|
|
| | street:string, | |
|
|
| | city:string, | |
|
|
| | country:string | |
|
|
| | >> | |
|
|
| memorable_moments | map<string,struct< | |
|
|
| | year:int, | |
|
|
| | place:string, | |
|
|
| | details:string | |
|
|
| | >> | |
|
|
| current_address | struct< | |
|
|
| | street_address:struct< | |
|
|
| | street_number:int, | |
|
|
| | street_name:string, | |
|
|
| | street_type:string | |
|
|
| | >, | |
|
|
| | country:string, | |
|
|
| | postal_code:string | |
|
|
| | > | |
|
|
+-------------------+--------------------------+---------+
|
|
|
|
SELECT id, employee_info.id FROM struct_demo;
|
|
|
|
SELECT id, employee_info.id AS employee_id FROM struct_demo;
|
|
|
|
SELECT id, employee_info.id AS employee_id, employee_info.employer
|
|
FROM struct_demo;
|
|
|
|
SELECT id, name, street, city, country
|
|
FROM struct_demo, struct_demo.places_lived;
|
|
|
|
SELECT id, name, places_lived.pos, places_lived.street, places_lived.city, places_lived.country
|
|
FROM struct_demo, struct_demo.places_lived;
|
|
|
|
SELECT id, name, pl.pos, pl.street, pl.city, pl.country
|
|
FROM struct_demo, struct_demo.places_lived AS pl;
|
|
|
|
SELECT id, name, places_lived.pos, places_lived.street, places_lived.city, places_lived.country
|
|
FROM struct_demo, struct_demo.places_lived;
|
|
|
|
SELECT id, name, pos, street, city, country
|
|
FROM struct_demo, struct_demo.places_lived;
|
|
|
|
SELECT id, name, memorable_moments.key,
|
|
memorable_moments.value.year,
|
|
memorable_moments.value.place,
|
|
memorable_moments.value.details
|
|
FROM struct_demo, struct_demo.memorable_moments
|
|
WHERE memorable_moments.key IN ('Birthday','Anniversary','Graduation');
|
|
|
|
SELECT id, name, mm.key, mm.value.year, mm.value.place, mm.value.details
|
|
FROM struct_demo, struct_demo.memorable_moments AS mm
|
|
WHERE mm.key IN ('Birthday','Anniversary','Graduation');
|
|
|
|
SELECT id, name, memorable_moments.key, memorable_moments.value.year,
|
|
memorable_moments.value.place, memorable_moments.value.details
|
|
FROM struct_demo, struct_demo.memorable_moments
|
|
WHERE key IN ('Birthday','Anniversary','Graduation');
|
|
|
|
SELECT id, name, key, value.year, value.place, value.details
|
|
FROM struct_demo, struct_demo.memorable_moments
|
|
WHERE key IN ('Birthday','Anniversary','Graduation');
|
|
|
|
SELECT id, name, key, year, place, details
|
|
FROM struct_demo, struct_demo.memorable_moments
|
|
WHERE key IN ('Birthday','Anniversary','Graduation');
|
|
|
|
SELECT id, name,
|
|
current_address.street_address.street_number,
|
|
current_address.street_address.street_name,
|
|
current_address.street_address.street_type,
|
|
current_address.country,
|
|
current_address.postal_code
|
|
FROM struct_demo;
|
|
]]>
|
|
</codeblock>
|
|
|
|
<p>
|
|
For example, this table uses a struct that encodes several data values for each phone number associated with a person. Each person can
|
|
have a variable-length array of associated phone numbers, and queries can refer to the category field to locate specific home, work,
|
|
mobile, and so on kinds of phone numbers.
|
|
</p>
|
|
|
|
<codeblock><![CDATA[CREATE TABLE contact_info_many_structs
|
|
(
|
|
id BIGINT, name STRING,
|
|
phone_numbers ARRAY < STRUCT <category:STRING, country_code:STRING, area_code:SMALLINT, full_number:STRING, mobile:BOOLEAN, carrier:STRING > >
|
|
) STORED AS PARQUET;
|
|
]]>
|
|
</codeblock>
|
|
|
|
<p>
|
|
Because structs are naturally suited to composite values where the fields have different data types, you might use them to decompose
|
|
things such as addresses:
|
|
</p>
|
|
|
|
<codeblock><![CDATA[CREATE TABLE contact_info_detailed_address
|
|
(
|
|
id BIGINT, name STRING,
|
|
address STRUCT < house_number:INT, street:STRING, street_type:STRING, apartment:STRING, city:STRING, region:STRING, country:STRING >
|
|
);
|
|
]]>
|
|
</codeblock>
|
|
|
|
<p>
|
|
In a big data context, splitting out data fields such as the number part of the address and the street name could let you do analysis
|
|
on each field independently. For example, which streets have the largest number range of addresses, what are the statistical
|
|
properties of the street names, which areas have a higher proportion of <q>Roads</q>, <q>Courts</q> or <q>Boulevards</q>, and so on.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/related_info"/>
|
|
|
|
<p>
|
|
<xref href="impala_complex_types.xml#complex_types"/>, <xref href="impala_array.xml#array"/>,
|
|
<!-- <xref href="impala_struct.xml#struct"/>, -->
|
|
<xref href="impala_map.xml#map"/>
|
|
</p>
|
|
|
|
</conbody>
|
|
|
|
</concept>
|