A complex data type, representing multiple fields of a single item. Frequently used as the element type of an ARRAY
or the VALUE part of a MAP.
column_name STRUCT < name : type [COMMENT 'comment_string'], ... >
type ::= primitive_type | complex_type
The names and number of fields within the STRUCT are fixed. Each field can be a different type. A field within a
STRUCT can also be another STRUCT, or an ARRAY or a MAP, allowing
you to create nested data structures with a maximum nesting depth of 100.
A STRUCT can be the top-level type for a column, or can itself be an item within an ARRAY or the
value part of the key-value pair in a MAP.
When a STRUCT is used as an ARRAY element or a MAP value, you use a join clause to
bring the ARRAY or MAP elements into the result set, and then refer to
array_name.ITEM.field or
map_name.VALUE.field. In the case of a STRUCT directly inside
an ARRAY or MAP, you can omit the .ITEM and .VALUE pseudocolumns
and refer directly to array_name.field or
map_name.field.
A STRUCT is similar conceptually to a table row: it contains a fixed number of named fields, each with a predefined
type. To combine two related tables, while using complex types to minimize repetition, the typical way to represent that data is as an
ARRAY of STRUCT elements.
Because a STRUCT has a fixed number of named fields, it typically does not make sense to have a
STRUCT as the type of a table column. In such a case, you could just make each field of the STRUCT
into a separate column of the table. The STRUCT type is most useful as an item of an ARRAY or the
value part of the key-value pair in a MAP. A nested type column with a STRUCT at the lowest level
lets you associate a variable number of row-like objects with each row of the table.
The STRUCT type is straightforward to reference within a query. You do not need to include the
STRUCT column in a join clause or give it a table alias, as is required for the ARRAY and
MAP types. You refer to the individual fields using dot notation, such as
struct_column_name.field_name, without any pseudocolumn such as
ITEM or VALUE.
Within the Parquet data file, the values for each STRUCT field are stored adjacent to each other, so that they can be
encoded and compressed using all the Parquet techniques for storing sets of similar or repeated values. The adjacency applies even
when the STRUCT values are part of an ARRAY or MAP. During a query, Impala avoids
unnecessary I/O by reading only the portions of the Parquet data file containing the requested STRUCT fields.
The following example shows a table with various kinds of STRUCT columns, both at the top level and nested within
other complex types. Practice the CREATE TABLE and query notation for complex type columns using empty tables, until
you can visualize a complex data structure and construct corresponding SQL statements reliably.
,
-- A STRUCT as the element type of an ARRAY.
places_lived ARRAY < STRUCT >,
-- A STRUCT as the value portion of the key-value pairs in a MAP.
memorable_moments MAP < STRING, STRUCT < year: INT, place: STRING, details: STRING >>,
-- A STRUCT where one of the fields is another STRUCT.
current_address STRUCT < street_address: STRUCT , country: STRING, postal_code: STRING >
)
STORED AS PARQUET;
]]>
The following example shows how to examine the structure of a table containing one or more STRUCT columns by using
the DESCRIBE statement. You can visualize each STRUCT as its own table, with columns named the same
as each field of the STRUCT. If the STRUCT is nested inside another complex type, such as
ARRAY, you can extend the qualified name passed to DESCRIBE until the output shows just the
STRUCT fields.
|
| places_lived | array> |
| memorable_moments | map> |
| current_address | struct< |
| | street_address:struct< |
| | street_number:int, |
| | street_name:string, |
| | street_type:string |
| | >, |
| | country:string, |
| | postal_code:string |
| | > |
+-------------------+--------------------------+
]]>
The top-level column EMPLOYEE_INFO is a STRUCT. Describing
table_name.struct_name displays the fields of the STRUCT as if
they were columns of a table:
Because PLACES_LIVED is a STRUCT inside an ARRAY, the initial
DESCRIBE shows the structure of the ARRAY:
|
| pos | bigint |
+------+------------------+
]]>
Ask for the details of the ITEM field of the ARRAY to see just the layout of the
STRUCT:
Likewise, MEMORABLE_MOMENTS has a STRUCT inside a MAP, which requires an extra
level of qualified name to see just the STRUCT part:
|
+-------+------------------+
]]>
For a MAP, ask to see the VALUE field to see the corresponding STRUCT fields in a
table-like structure:
For a STRUCT inside a STRUCT, we can see the fields of the outer STRUCT:
|
| country | string |
| postal_code | string |
+----------------+-----------------------+
]]>
Then we can use a further qualified name to see just the fields of the inner STRUCT:
The following example shows how to examine the structure of a table containing one or more STRUCT columns by using
the DESCRIBE statement. You can visualize each STRUCT as its own table, with columns named the same
as each field of the STRUCT. If the STRUCT is nested inside another complex type, such as
ARRAY, you can extend the qualified name passed to DESCRIBE until the output shows just the
STRUCT fields.
| |
| places_lived | array> | |
| memorable_moments | map> | |
| current_address | struct< | |
| | street_address:struct< | |
| | street_number:int, | |
| | street_name:string, | |
| | street_type:string | |
| | >, | |
| | country:string, | |
| | postal_code:string | |
| | > | |
+-------------------+--------------------------+---------+
SELECT id, employee_info.id FROM struct_demo;
SELECT id, employee_info.id AS employee_id FROM struct_demo;
SELECT id, employee_info.id AS employee_id, employee_info.employer
FROM struct_demo;
SELECT id, name, street, city, country
FROM struct_demo, struct_demo.places_lived;
SELECT id, name, places_lived.pos, places_lived.street, places_lived.city, places_lived.country
FROM struct_demo, struct_demo.places_lived;
SELECT id, name, pl.pos, pl.street, pl.city, pl.country
FROM struct_demo, struct_demo.places_lived AS pl;
SELECT id, name, places_lived.pos, places_lived.street, places_lived.city, places_lived.country
FROM struct_demo, struct_demo.places_lived;
SELECT id, name, pos, street, city, country
FROM struct_demo, struct_demo.places_lived;
SELECT id, name, memorable_moments.key,
memorable_moments.value.year,
memorable_moments.value.place,
memorable_moments.value.details
FROM struct_demo, struct_demo.memorable_moments
WHERE memorable_moments.key IN ('Birthday','Anniversary','Graduation');
SELECT id, name, mm.key, mm.value.year, mm.value.place, mm.value.details
FROM struct_demo, struct_demo.memorable_moments AS mm
WHERE mm.key IN ('Birthday','Anniversary','Graduation');
SELECT id, name, memorable_moments.key, memorable_moments.value.year,
memorable_moments.value.place, memorable_moments.value.details
FROM struct_demo, struct_demo.memorable_moments
WHERE key IN ('Birthday','Anniversary','Graduation');
SELECT id, name, key, value.year, value.place, value.details
FROM struct_demo, struct_demo.memorable_moments
WHERE key IN ('Birthday','Anniversary','Graduation');
SELECT id, name, key, year, place, details
FROM struct_demo, struct_demo.memorable_moments
WHERE key IN ('Birthday','Anniversary','Graduation');
SELECT id, name,
current_address.street_address.street_number,
current_address.street_address.street_name,
current_address.street_address.street_type,
current_address.country,
current_address.postal_code
FROM struct_demo;
]]>
For example, this table uses a struct that encodes several data values for each phone number associated with a person. Each person can
have a variable-length array of associated phone numbers, and queries can refer to the category field to locate specific home, work,
mobile, and so on kinds of phone numbers.
>
) STORED AS PARQUET;
]]>
Because structs are naturally suited to composite values where the fields have different data types, you might use them to decompose
things such as addresses:
);
]]>
In a big data context, splitting out data fields such as the number part of the address and the street name could let you do analysis
on each field independently. For example, which streets have the largest number range of addresses, what are the statistical
properties of the street names, which areas have a higher proportion of Roads
, Courts
or Boulevards
, and so on.
, ,