IMPALA-9495: Support struct in select list for ORC tables

This patch implements the functionality to allow structs in the select
list of inline views, topmost blocks. When displaying the value of a
struct it is formatted into a JSON value and returned as a string. An
example of such a value:

SELECT struct_col FROM some_table;
'{"int_struct_member":12,"string_struct_member":"string value"}'

Another example where we query a nested struct:
SELECT outer_struct_col FROM some_table;
'{"inner_struct":{"string_member":"string value","int_member":12}}'

Note, the conversion from struct to JSON happens on the server side
before sending out the value in HS2 to the client. However, HS2 is
capable of handling struct values as well so in a later change we might
want to add a functionality to send the struct in thrift to the client
so that the client can use the struct directly.

-- Internal representation of a struct:
When scanning a struct the rowbatch will hold the values of the
struct's children as if they were queried one by one directly in the
select list.

E.g. Taking the following table:
CREATE TABLE tbl (id int, s struct<a:int,b:string>) STORED AS ORC

And running the following query:
SELECT id, s FROM tbl;

After scanning a row in a row batch will hold the following values:
(note the biggest size comes first)
 1: The pointer for the string in s.b
 2: The length for the string in s.b
 3: The int value for s.a
 4: The int value of id
 5: A single null byte for all the slots: id, s, s.a, s.b

The size of a struct has an effect on the order of the memory layout of
a row batch. The struct size is calculated by summing the size of its
fields and then the struct gets a place in the row batch to precede all
smaller slots by size. Note, all the fields of a struct are consecutive
to each other in the row batch. Inside a struct the order of the fields
is also based on their size as it does in a regular case for primitives.

When evaluating a struct as a SlotRef a newly introduced StructVal will
be used to refer to the actual values of a struct in the row batch.
This StructVal holds a vector of pointers where each pointer represents
a member of the struct. Following the above example the StructVal would
keep two pointers, one to point to an IntVal and one to point to a
StringVal.

-- Changes related to tuple and slot descriptors:
When providing a struct in the select list there is going to be a
SlotDescriptor for the struct slot in the topmost TupleDescriptor.
Additionally, another TupleDesriptor is created to hold SlotDescriptors
for each of the struct's children. The struct SlotDescriptor points to
the newly introduced TupleDescriptor using 'itemTupleId'.
The offsets for the children of the struct is calculated from the
beginning of the topmost TupleDescriptor and not from the
TupleDescriptor that directly holds the struct's children. The null
indicator bytes as well are stored on the level of the topmost
TupleDescriptor.

-- Changes related to scalar expressions:
A struct in the select list is translated into an expression tree where
the top of this tree is a SlotRef for the struct itself and its
children in the tree are SlotRefs for the members of the struct. When
evaluating a struct SlotRef after the null checks the evaluation is
delegated to the children SlotRefs.

-- Restrictions:
  - Codegen support is not included in this patch.
  - Only ORC file format is supported by this patch.
  - Only HS2 client supports returning structs. Beeswax support is not
    implemented as it is going to be deprecated anyway. Currently we
    receive an error when trying to query a struct through Beeswax.

-- Tests added:
  - The ORC and Parquet functional databases are extended with 3 new
    tables:
    1: A small table with one level structs, holding different
    kind of primitive types as members.
    2: A small table with 2 and 3 level nested structs.
    3: A bigger, partitioned table constructed from alltypes where all
    the columns except the 'id' column are put into a struct.
  - struct-in-select-list.test and nested-struct-in-select-list.test
    uses these new tables to query structs directly or through an
    inline view.

Change-Id: I0fbe56bdcd372b72e99c0195d87a818e7fa4bc3a
Reviewed-on: http://gerrit.cloudera.org:8080/17638
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
Gabor Kaszab
2021-04-09 16:11:39 +02:00
committed by Impala Public Jenkins
parent 6d47927668
commit 1e21aa6b96
63 changed files with 2151 additions and 351 deletions

View File

@@ -468,12 +468,10 @@ Status HdfsOrcScanner::ResolveColumns(const TupleDescriptor& tuple_desc,
continue;
}
// 'col_path'(SchemaPath) of the SlotDescriptor won't map to a STRUCT column.
// We only deal with collection columns (ARRAY/MAP) and primitive columns here.
if (slot_desc->type().IsCollectionType()) {
if (slot_desc->type().IsComplexType()) {
// Recursively resolve nested columns
DCHECK(slot_desc->collection_item_descriptor() != nullptr);
const TupleDescriptor* item_tuple_desc = slot_desc->collection_item_descriptor();
DCHECK(slot_desc->children_tuple_descriptor() != nullptr);
const TupleDescriptor* item_tuple_desc = slot_desc->children_tuple_descriptor();
RETURN_IF_ERROR(ResolveColumns(*item_tuple_desc, selected_nodes, pos_slots));
} else {
VLOG(3) << "Add ORC column " << node->getColumnId() << " for "

View File

@@ -886,7 +886,7 @@ void HdfsScanNodeBase::InitNullCollectionValues(const TupleDescriptor* tuple_des
continue;
}
// Recursively traverse collection items.
const TupleDescriptor* item_desc = slot_desc->collection_item_descriptor();
const TupleDescriptor* item_desc = slot_desc->children_tuple_descriptor();
if (item_desc->collection_slots().empty()) continue;
for (int i = 0; i < slot->num_tuples; ++i) {
int item_offset = i * item_desc->byte_size();

View File

@@ -93,7 +93,7 @@ Status HdfsScanner::Open(ScannerContext* context) {
for (auto& entry : *(scan_node_->thrift_dict_filter_conjuncts_map())) {
SlotDescriptor* slot_desc = state_->desc_tbl().GetSlotDescriptor(entry.first);
TupleId tuple_id = (slot_desc->type().IsCollectionType() ?
slot_desc->collection_item_descriptor()->id() :
slot_desc->children_tuple_descriptor()->id() :
slot_desc->parent()->id());
auto conjunct_evals_it = conjunct_evals_map_.find(tuple_id);
DCHECK(conjunct_evals_it != conjunct_evals_map_.end());

View File

@@ -48,7 +48,14 @@ OrcColumnReader* OrcColumnReader::Create(const orc::Type* node,
DCHECK(slot_desc != nullptr);
OrcColumnReader* reader = nullptr;
if (node->getKind() == orc::TypeKind::STRUCT) {
reader = new OrcStructReader(node, slot_desc, scanner);
if (slot_desc->type().IsStructType() &&
slot_desc->children_tuple_descriptor() != nullptr) {
// This is the case where we should materialize the struct and its children.
reader = new OrcStructReader(node, slot_desc,
slot_desc->children_tuple_descriptor(), scanner);
} else {
reader = new OrcStructReader(node, slot_desc, scanner);
}
} else if (node->getKind() == orc::TypeKind::LIST) {
reader = new OrcListReader(node, slot_desc, scanner);
} else if (node->getKind() == orc::TypeKind::MAP) {
@@ -291,7 +298,7 @@ bool OrcStructReader::EndOfBatch() {
inline uint64_t OrcComplexColumnReader::GetTargetColId(
const SlotDescriptor* slot_desc) const {
return slot_desc->type().IsCollectionType() ?
GetColId(slot_desc->collection_item_descriptor()):
GetColId(slot_desc->children_tuple_descriptor()):
GetColId(slot_desc);
}
@@ -381,6 +388,16 @@ OrcStructReader::OrcStructReader(const orc::Type* node,
}
}
OrcStructReader::OrcStructReader(const orc::Type* node, const SlotDescriptor* slot_desc,
const TupleDescriptor* children_tuple, HdfsOrcScanner* scanner)
: OrcComplexColumnReader(node, slot_desc, scanner) {
tuple_desc_ = children_tuple;
materialize_tuple_ = true;
for (SlotDescriptor* child_slot : tuple_desc_->slots()) {
CreateChildForSlot(node, child_slot);
}
}
OrcStructReader::OrcStructReader(const orc::Type* node,
const SlotDescriptor* slot_desc, HdfsOrcScanner* scanner)
: OrcComplexColumnReader(node, slot_desc, scanner) {
@@ -399,7 +416,7 @@ Status OrcStructReader::ReadValue(int row_idx, Tuple* tuple, MemPool* pool) {
return child->ReadValue(row_idx, tuple, pool);
}
if (IsNull(DCHECK_NOTNULL(batch_), row_idx)) {
for (OrcColumnReader* child : children_) child->SetNullSlot(tuple);
SetNullSlot(tuple);
return Status::OK();
}
for (OrcColumnReader* child : children_) {
@@ -472,6 +489,10 @@ void OrcStructReader::FillSyntheticRowId(ScratchTupleBatch* scratch_batch,
Status OrcStructReader::ReadValueBatch(int row_idx, ScratchTupleBatch* scratch_batch,
MemPool* pool, int scratch_batch_idx) {
if (materialize_tuple_) {
return OrcBatchedReader::ReadValueBatch(row_idx, scratch_batch, pool,
scratch_batch_idx);
}
for (OrcColumnReader* child : children_) {
RETURN_IF_ERROR(
child->ReadValueBatch(row_idx, scratch_batch, pool, scratch_batch_idx));
@@ -515,7 +536,7 @@ OrcCollectionReader::OrcCollectionReader(const orc::Type* node,
// This is a collection SlotDescriptor whose item TupleDescriptor matches
// 'node'. We should materialize the slot (creating a CollectionValue) and its
// collection tuples (see more in HdfsOrcScanner::AssembleCollection).
tuple_desc_ = slot_desc->collection_item_descriptor();
tuple_desc_ = slot_desc->children_tuple_descriptor();
materialize_tuple_ = true;
}
}
@@ -527,7 +548,9 @@ Status OrcCollectionReader::AssembleCollection(int row_idx, Tuple* tuple, MemPoo
}
auto coll_slot = reinterpret_cast<CollectionValue*>(GetSlot(tuple));
*coll_slot = CollectionValue();
const TupleDescriptor* tuple_desc = slot_desc_->collection_item_descriptor();
const TupleDescriptor* tuple_desc = slot_desc_->children_tuple_descriptor();
DCHECK(tuple_desc != nullptr) << "There is no children tuple for slot ID: " <<
slot_desc_->id();
CollectionValueBuilder builder(coll_slot, *tuple_desc, pool, scanner_->state_);
return scanner_->AssembleCollection(*this, row_idx, &builder);
}
@@ -605,7 +628,7 @@ void OrcListReader::CreateChildForSlot(const orc::Type* node,
// We have a position slot descriptor if it refers to this LIST ORC type, but it isn't
// a collection slot.
bool is_pos_slot = slot_col_id == node->getColumnId() &&
slot_desc->collection_item_descriptor() == nullptr;
slot_desc->children_tuple_descriptor() == nullptr;
if (is_pos_slot) {
DCHECK(pos_slot_desc_ == nullptr) << "Should have unique pos slot";
pos_slot_desc_ = slot_desc;

View File

@@ -479,9 +479,8 @@ class OrcDecimal16ColumnReader
/// sub queries). The root reader is always an OrcStructReader since the root of the ORC
/// schema is represented as a STRUCT type.
///
/// For collection readers, they can be divided into two kinds by whether they should
/// materialize collection tuples (reflected by materialize_tuple_). (STRUCTs always
/// delegate materialization to their children.)
/// For complex readers, they can be divided into two kinds by whether they should
/// materialize their tuples (reflected by materialize_tuple_).
///
/// For collection type readers that materialize a CollectionValue they create a
/// CollectionValueBuilder when 'ReadValue' is called. Then recursively delegate the
@@ -569,6 +568,13 @@ class OrcStructReader : public OrcComplexColumnReader {
OrcStructReader(const orc::Type* node, const TupleDescriptor* table_tuple_desc,
HdfsOrcScanner* scanner);
/// Constructor for a slot that materializes all it's children. E.g. when a struct is
/// given in the select list.
OrcStructReader(const orc::Type* node, const SlotDescriptor* slot_desc,
const TupleDescriptor* children_tuple, HdfsOrcScanner* scanner);
/// Constructor for a struct that is not mapped directly to a slot instead it refers to
/// a descendant column.
OrcStructReader(const orc::Type* node, const SlotDescriptor* slot_desc,
HdfsOrcScanner* scanner);
@@ -623,6 +629,7 @@ class OrcStructReader : public OrcComplexColumnReader {
void SetNullSlot(Tuple* tuple) override {
for (OrcColumnReader* child : children_) child->SetNullSlot(tuple);
tuple->SetNull(DCHECK_NOTNULL(slot_desc_)->null_indicator_offset());
}
void CreateChildForSlot(const orc::Type* curr_node, const SlotDescriptor* slot_desc);

View File

@@ -2497,8 +2497,8 @@ Status HdfsParquetScanner::CreateColumnReaders(const TupleDescriptor& tuple_desc
if (col_reader->IsCollectionReader()) {
// Recursively populate col_reader's children
DCHECK(slot_desc->collection_item_descriptor() != nullptr);
const TupleDescriptor* item_tuple_desc = slot_desc->collection_item_descriptor();
DCHECK(slot_desc->children_tuple_descriptor() != nullptr);
const TupleDescriptor* item_tuple_desc = slot_desc->children_tuple_descriptor();
CollectionColumnReader* collection_reader =
static_cast<CollectionColumnReader*>(col_reader);
RETURN_IF_ERROR(CreateColumnReaders(

View File

@@ -119,7 +119,7 @@ bool CollectionColumnReader::ReadSlot(CollectionValue* slot, MemPool* pool) {
// Recursively read the collection into a new CollectionValue.
*slot = CollectionValue();
CollectionValueBuilder builder(
slot, *slot_desc_->collection_item_descriptor(), pool, parent_->state_);
slot, *slot_desc_->children_tuple_descriptor(), pool, parent_->state_);
bool continue_execution =
parent_->AssembleCollection(children_, new_collection_rep_level(), &builder);
if (!continue_execution) return false;

View File

@@ -90,6 +90,9 @@ FunctionContext::TypeDesc AnyValUtil::ColumnTypeToTypeDesc(const ColumnType& typ
case TYPE_DATE:
out.type = FunctionContext::TYPE_DATE;
break;
case TYPE_STRUCT:
out.type = FunctionContext::TYPE_STRUCT;
break;
default:
DCHECK(false) << "Unknown type: " << type;
}

View File

@@ -23,6 +23,7 @@
#include "runtime/decimal-value.h"
#include "runtime/string-value.inline.h"
#include "runtime/timestamp-value.h"
#include "udf/udf-internal.h"
#include "util/decimal-constants.h"
namespace impala {
@@ -42,6 +43,7 @@ struct ExprValue {
Decimal8Value decimal8_val;
Decimal16Value decimal16_val;
CollectionValue collection_val;
impala_udf::StructVal struct_val;
DateValue date_val;
ExprValue()
@@ -58,6 +60,7 @@ struct ExprValue {
decimal8_val(),
decimal16_val(),
collection_val(),
struct_val(),
date_val(0) {
}

View File

@@ -91,6 +91,12 @@ Status ScalarExprEvaluator::Create(const ScalarExpr& root, RuntimeState* state,
DCHECK_EQ(root.fn_ctx_idx_, -1);
DCHECK((*eval)->fn_ctxs_ptr_ == nullptr);
}
if (root.type().IsStructType()) {
DCHECK(root.GetNumChildren() > 0);
Status status = Create(root.children(), state, pool, expr_perm_pool,
expr_results_pool, &((*eval)->childEvaluators_));
DCHECK((*eval)->childEvaluators_.size() == root.GetNumChildren());
}
(*eval)->initialized_ = true;
return Status::OK();
}
@@ -113,9 +119,15 @@ void ScalarExprEvaluator::CreateFnCtxs(RuntimeState* state, const ScalarExpr& ex
const int fn_ctx_idx = expr.fn_ctx_idx();
const bool has_fn_ctx = fn_ctx_idx != -1;
vector<FunctionContext::TypeDesc> arg_types;
for (const ScalarExpr* child : expr.children()) {
CreateFnCtxs(state, *child, expr_perm_pool, expr_results_pool);
if (has_fn_ctx) arg_types.push_back(AnyValUtil::ColumnTypeToTypeDesc(child->type()));
// It's not needed to create contexts for the children of structs here as Create() is
// called recursively for each of their children and that will take care of the context
// creation as well.
if (!expr.type().IsStructType()) {
for (const ScalarExpr* child : expr.children()) {
CreateFnCtxs(state, *child, expr_perm_pool, expr_results_pool);
if (has_fn_ctx) arg_types.push_back(
AnyValUtil::ColumnTypeToTypeDesc(child->type()));
}
}
if (has_fn_ctx) {
FunctionContext::TypeDesc return_type =
@@ -158,6 +170,7 @@ void ScalarExprEvaluator::Close(RuntimeState* state) {
delete fn_ctxs_[i];
}
fn_ctxs_.clear();
for (ScalarExprEvaluator* child : childEvaluators_) child->Close(state);
// Memory allocated by 'fn_ctx_' is still in the MemPools. It's the responsibility of
// the owners of those pools to free it.
closed_ = true;
@@ -355,6 +368,12 @@ void* ScalarExprEvaluator::GetValue(const ScalarExpr& expr, const TupleRow* row)
result_.collection_val.num_tuples = v.num_tuples;
return &result_.collection_val;
}
case TYPE_STRUCT: {
StructVal v = expr.GetStructVal(this, row);
if (v.is_null) return nullptr;
result_.struct_val = v;
return &result_.struct_val;
}
default:
DCHECK(false) << "Type not implemented: " << expr.type_.DebugString();
return nullptr;
@@ -413,6 +432,10 @@ CollectionVal ScalarExprEvaluator::GetCollectionVal(const TupleRow* row) {
return root_.GetCollectionVal(this, row);
}
StructVal ScalarExprEvaluator::GetStructVal(const TupleRow* row) {
return root_.GetStructVal(this, row);
}
TimestampVal ScalarExprEvaluator::GetTimestampVal(const TupleRow* row) {
return root_.GetTimestampVal(this, row);
}

View File

@@ -41,6 +41,7 @@ using impala_udf::TimestampVal;
using impala_udf::StringVal;
using impala_udf::DecimalVal;
using impala_udf::CollectionVal;
using impala_udf::StructVal;
using impala_udf::DateVal;
class MemPool;
@@ -161,6 +162,7 @@ class ScalarExprEvaluator {
DoubleVal GetDoubleVal(const TupleRow* row);
StringVal GetStringVal(const TupleRow* row);
CollectionVal GetCollectionVal(const TupleRow* row);
StructVal GetStructVal(const TupleRow* row);
TimestampVal GetTimestampVal(const TupleRow* row);
DecimalVal GetDecimalVal(const TupleRow* row);
DateVal GetDateVal(const TupleRow* row);
@@ -211,6 +213,8 @@ class ScalarExprEvaluator {
/// not strip these symbols.
static void InitBuiltinsDummy();
std::vector<ScalarExprEvaluator*>& GetChildEvaluators() { return childEvaluators_; }
static const char* LLVM_CLASS_NAME;
protected:
@@ -231,6 +235,7 @@ class ScalarExprEvaluator {
private:
friend class ScalarExpr;
friend class SlotRef;
/// FunctionContexts for nodes in this Expr tree. Created by this ScalarExprEvaluator
/// and live in the same object pool as this evaluator (i.e. same life span as the
@@ -253,6 +258,10 @@ class ScalarExprEvaluator {
/// This is used in interpreted path when we need to return a void*.
ExprValue result_;
/// For a struct scalar expression there is one evaluator created for each child of
/// the struct. This is empty for non-struct expressions.
std::vector<ScalarExprEvaluator*> childEvaluators_;
/// True if this evaluator came from a Clone() call. Used to manage FunctionStateScope.
bool is_clone_ = false;

View File

@@ -213,7 +213,12 @@ Status ScalarExpr::CreateNode(
Status ScalarExpr::OpenEvaluator(FunctionContext::FunctionStateScope scope,
RuntimeState* state, ScalarExprEvaluator* eval) const {
for (int i = 0; i < children_.size(); ++i) {
RETURN_IF_ERROR(children_[i]->OpenEvaluator(scope, state, eval));
ScalarExprEvaluator* child_eval = eval;
if (type_.IsStructType()) {
DCHECK_EQ(children_.size(), eval->GetChildEvaluators().size());
child_eval = eval->GetChildEvaluators()[i];
}
RETURN_IF_ERROR(children_[i]->OpenEvaluator(scope, state, child_eval));
}
return Status::OK();
}
@@ -386,6 +391,7 @@ SCALAR_EXPR_GET_VAL_INTERPRETED(TimestampVal);
SCALAR_EXPR_GET_VAL_INTERPRETED(DecimalVal);
SCALAR_EXPR_GET_VAL_INTERPRETED(DateVal);
SCALAR_EXPR_GET_VAL_INTERPRETED(CollectionVal);
SCALAR_EXPR_GET_VAL_INTERPRETED(StructVal);
string ScalarExpr::DebugString(const string& expr_name) const {
stringstream out;

View File

@@ -56,6 +56,7 @@ using impala_udf::StringVal;
using impala_udf::DecimalVal;
using impala_udf::DateVal;
using impala_udf::CollectionVal;
using impala_udf::StructVal;
class FragmentState;
struct LibCacheEntry;
@@ -232,6 +233,7 @@ class ScalarExpr : public Expr {
friend class Predicate;
friend class ScalarExprEvaluator;
friend class ScalarFnCall;
friend class SlotRef;
/// For BE tests
friend class ExprTest;
@@ -242,7 +244,7 @@ class ScalarExpr : public Expr {
/// nodes which need FunctionContext in the tree. 'next_fn_ctx_idx' is the index
/// of the next available entry in the vector. It's updated as this function is
/// called recursively down the tree.
void AssignFnCtxIdx(int* next_fn_ctx_idx);
virtual void AssignFnCtxIdx(int* next_fn_ctx_idx);
int fn_ctx_idx() const { return fn_ctx_idx_; }
@@ -272,6 +274,7 @@ class ScalarExpr : public Expr {
DoubleVal GetDoubleVal(ScalarExprEvaluator*, const TupleRow*) const;
StringVal GetStringVal(ScalarExprEvaluator*, const TupleRow*) const;
CollectionVal GetCollectionVal(ScalarExprEvaluator*, const TupleRow*) const;
StructVal GetStructVal(ScalarExprEvaluator*, const TupleRow*) const;
TimestampVal GetTimestampVal(ScalarExprEvaluator*, const TupleRow*) const;
DecimalVal GetDecimalVal(ScalarExprEvaluator*, const TupleRow*) const;
DateVal GetDateVal(ScalarExprEvaluator*, const TupleRow*) const;
@@ -293,6 +296,8 @@ class ScalarExpr : public Expr {
virtual StringVal GetStringValInterpreted(ScalarExprEvaluator*, const TupleRow*) const;
virtual CollectionVal GetCollectionValInterpreted(
ScalarExprEvaluator*, const TupleRow*) const;
virtual StructVal GetStructValInterpreted(
ScalarExprEvaluator*, const TupleRow*) const;
virtual TimestampVal GetTimestampValInterpreted(
ScalarExprEvaluator*, const TupleRow*) const;
virtual DecimalVal GetDecimalValInterpreted(

View File

@@ -37,6 +37,7 @@ namespace impala {
/// * ScalarExpr::GetStringVal()
/// * ScalarExpr::GetDateVal()
/// * ScalarExpr::GetCollectionVal()
/// * ScalarExpr::GetStructVal()
#pragma push_macro("SCALAR_EXPR_GET_VAL")
#define SCALAR_EXPR_GET_VAL(val_type, type_validation) \
typedef val_type (*val_type##Wrapper)(ScalarExprEvaluator*, const TupleRow*); \
@@ -63,6 +64,7 @@ SCALAR_EXPR_GET_VAL(StringVal, type_.IsStringType()
|| type_.type == PrimitiveType::TYPE_FIXED_UDA_INTERMEDIATE);
SCALAR_EXPR_GET_VAL(DateVal, type_.type == PrimitiveType::TYPE_DATE);
SCALAR_EXPR_GET_VAL(CollectionVal, type_.IsCollectionType());
SCALAR_EXPR_GET_VAL(StructVal, type_.IsStructType());
#pragma pop_macro("SCALAR_EXPR_GET_VAL")
}

View File

@@ -46,7 +46,7 @@ SlotRef::SlotRef(const TExprNode& node)
slot_offset_(-1), // invalid
null_indicator_offset_(0, 0),
slot_id_(node.slot_ref.slot_id) {
// slot_/null_indicator_offset_ are set in Prepare()
// slot_/null_indicator_offset_ are set in Init()
}
SlotRef::SlotRef(const SlotDescriptor* desc)
@@ -54,7 +54,7 @@ SlotRef::SlotRef(const SlotDescriptor* desc)
slot_offset_(-1),
null_indicator_offset_(0, 0),
slot_id_(desc->id()) {
// slot_/null_indicator_offset_ are set in Prepare()
// slot_/null_indicator_offset_ are set in Init()
}
SlotRef::SlotRef(const SlotDescriptor* desc, const ColumnType& type)
@@ -62,7 +62,7 @@ SlotRef::SlotRef(const SlotDescriptor* desc, const ColumnType& type)
slot_offset_(-1),
null_indicator_offset_(0, 0),
slot_id_(desc->id()) {
// slot_/null_indicator_offset_ are set in Prepare()
// slot_/null_indicator_offset_ are set in Init()
}
SlotRef::SlotRef(const ColumnType& type, int offset, const bool nullable /* = false */)
@@ -74,7 +74,7 @@ SlotRef::SlotRef(const ColumnType& type, int offset, const bool nullable /* = fa
Status SlotRef::Init(
const RowDescriptor& row_desc, bool is_entry_point, FragmentState* state) {
DCHECK_EQ(children_.size(), 0);
DCHECK(type_.IsStructType() || children_.size() == 0);
RETURN_IF_ERROR(ScalarExpr::Init(row_desc, is_entry_point, state));
if (slot_id_ != -1) {
const SlotDescriptor* slot_desc = state->desc_tbl().GetSlotDescriptor(slot_id_);
@@ -85,7 +85,11 @@ Status SlotRef::Init(
LOG(INFO) << error.str();
return Status(error.str());
}
tuple_idx_ = row_desc.GetTupleIdx(slot_desc->parent()->id());
if (slot_desc->parent()->isTupleOfStructSlot()) {
tuple_idx_ = row_desc.GetTupleIdx(slot_desc->parent()->getMasterTuple()->id());
} else {
tuple_idx_ = row_desc.GetTupleIdx(slot_desc->parent()->id());
}
if (tuple_idx_ == RowDescriptor::INVALID_IDX) {
TupleDescriptor* d =
state->desc_tbl().GetTupleDescriptor(slot_desc->parent()->id());
@@ -95,7 +99,9 @@ Status SlotRef::Init(
return Status(error);
}
DCHECK(tuple_idx_ != RowDescriptor::INVALID_IDX);
tuple_is_nullable_ = row_desc.TupleIsNullable(tuple_idx_);
if (!slot_desc->parent()->isTupleOfStructSlot()) {
tuple_is_nullable_ = row_desc.TupleIsNullable(tuple_idx_);
}
slot_offset_ = slot_desc->tuple_offset();
null_indicator_offset_ = slot_desc->null_indicator_offset();
}
@@ -117,6 +123,17 @@ string SlotRef::DebugString() const {
return out.str();
}
void SlotRef::AssignFnCtxIdx(int* next_fn_ctx_idx) {
if (!type_.IsStructType()) {
ScalarExpr::AssignFnCtxIdx(next_fn_ctx_idx);
return;
}
fn_ctx_idx_start_ = *next_fn_ctx_idx;
fn_ctx_idx_ = 0;
fn_ctx_idx_end_ = 1;
for (ScalarExpr* child : children()) child->AssignFnCtxIdx(next_fn_ctx_idx);
}
// There are four possible cases we may generate:
// 1. Tuple is non-nullable and slot is non-nullable
// 2. Tuple is non-nullable and slot is nullable
@@ -451,4 +468,25 @@ CollectionVal SlotRef::GetCollectionValInterpreted(
return CollectionVal(coll_value->ptr, coll_value->num_tuples);
}
StructVal SlotRef::GetStructValInterpreted(
ScalarExprEvaluator* eval, const TupleRow* row) const {
DCHECK(type_.IsStructType() && children_.size() > 0);
DCHECK_EQ(children_.size(), eval->GetChildEvaluators().size());
Tuple* t = row->GetTuple(tuple_idx_);
if (t == nullptr || t->IsNull(null_indicator_offset_)) return StructVal::null();
FunctionContext* fn_ctx = eval->fn_context(fn_ctx_idx_);
DCHECK(fn_ctx != nullptr);
StructVal struct_val(fn_ctx, children_.size());
vector<ScalarExprEvaluator*>& child_evaluators = eval->GetChildEvaluators();
for (int i = 0; i < child_evaluators.size(); ++i) {
ScalarExpr* child_expr = children_[i];
ScalarExprEvaluator* child_eval = child_evaluators[i];
DCHECK(child_eval != nullptr);
void* child_val = child_eval->GetValue(*child_expr, row);
struct_val.addChild(child_val, i);
}
return struct_val;
}
} // namespace impala

View File

@@ -58,14 +58,24 @@ class SlotRef : public ScalarExpr {
virtual int GetSlotIds(std::vector<SlotId>* slot_ids) const override;
const SlotId& slot_id() const { return slot_id_; }
static const char* LLVM_CLASS_NAME;
int GetTupleIdx() const { return tuple_idx_; }
NullIndicatorOffset GetNullIndicatorOffset() const { return null_indicator_offset_; }
int GetSlotOffset() const { return slot_offset_; }
protected:
friend class ScalarExpr;
friend class ScalarExprEvaluator;
/// For struct SlotRefs we need a FunctionContext so that we can use it later for
/// allocating memory to StructVals.
/// If this SlotRef is not a struct then the same function in ScalarExpr is called.
virtual void AssignFnCtxIdx(int* next_fn_ctx_idx) override;
GENERATE_GET_VAL_INTERPRETED_OVERRIDES_FOR_ALL_SCALAR_TYPES
virtual CollectionVal GetCollectionValInterpreted(
ScalarExprEvaluator*, const TupleRow*) const override;
virtual StructVal GetStructValInterpreted(
ScalarExprEvaluator*, const TupleRow*) const override;
private:
int tuple_idx_; // within row

View File

@@ -1878,7 +1878,7 @@ TEST_F(ArrayTupleStreamTest, TestArrayDeepCopy) {
tuple0->SetNull(tuple_descs[0]->slots()[1]->null_indicator_offset());
tuple1->SetNull(tuple_descs[1]->slots()[0]->null_indicator_offset());
const SlotDescriptor* array_slot_desc = tuple_descs[0]->slots()[0];
const TupleDescriptor* item_desc = array_slot_desc->collection_item_descriptor();
const TupleDescriptor* item_desc = array_slot_desc->children_tuple_descriptor();
int array_len = array_lens[array_len_index++ % num_array_lens];
CollectionValue* cv = tuple0->GetCollectionSlot(array_slot_desc->tuple_offset());
@@ -1932,7 +1932,7 @@ TEST_F(ArrayTupleStreamTest, TestArrayDeepCopy) {
ASSERT_TRUE(tuple0->IsNull(tuple_descs[0]->slots()[1]->null_indicator_offset()));
ASSERT_TRUE(tuple1->IsNull(tuple_descs[1]->slots()[0]->null_indicator_offset()));
const TupleDescriptor* item_desc = array_slot_desc->collection_item_descriptor();
const TupleDescriptor* item_desc = array_slot_desc->children_tuple_descriptor();
int expected_array_len = array_lens[array_len_index++ % num_array_lens];
CollectionValue* cv = tuple0->GetCollectionSlot(array_slot_desc->tuple_offset());
ASSERT_EQ(expected_array_len, cv->num_tuples);
@@ -1987,7 +1987,7 @@ TEST_F(ArrayTupleStreamTest, TestComputeRowSize) {
// Tuple 0 has an array.
int expected_row_size = tuple_null_indicator_bytes + array_desc_->GetRowSize();
const SlotDescriptor* array_slot = tuple_descs[0]->slots()[0];
const TupleDescriptor* item_desc = array_slot->collection_item_descriptor();
const TupleDescriptor* item_desc = array_slot->children_tuple_descriptor();
int array_len = 128;
CollectionValue* cv = tuple0->GetCollectionSlot(array_slot->tuple_offset());
CollectionValueBuilder builder(

View File

@@ -73,29 +73,30 @@ BufferedTupleStream::BufferedTupleStream(RuntimeState* state,
const TupleDescriptor* tuple_desc = desc_->tuple_descriptors()[i];
const int tuple_byte_size = tuple_desc->byte_size();
fixed_tuple_sizes_.push_back(tuple_byte_size);
CollectInlinedSlots(tuple_desc, ext_varlen_slots, i);
}
}
vector<SlotDescriptor*> tuple_string_slots;
vector<SlotDescriptor*> tuple_coll_slots;
for (int j = 0; j < tuple_desc->slots().size(); ++j) {
SlotDescriptor* slot = tuple_desc->slots()[j];
if (!slot->type().IsVarLenType()) continue;
if (ext_varlen_slots.find(slot->id()) == ext_varlen_slots.end()) {
if (slot->type().IsVarLenStringType()) {
tuple_string_slots.push_back(slot);
} else {
DCHECK(slot->type().IsCollectionType());
tuple_coll_slots.push_back(slot);
}
}
void BufferedTupleStream::CollectInlinedSlots(const TupleDescriptor* tuple_desc,
const set<SlotId>& ext_varlen_slots, int tuple_idx) {
vector<SlotDescriptor*> inlined_string_slots;
vector<SlotDescriptor*> inlined_coll_slots;
for (SlotDescriptor* slot : tuple_desc->string_slots()) {
if (ext_varlen_slots.find(slot->id()) == ext_varlen_slots.end()) {
inlined_string_slots.push_back(slot);
}
if (!tuple_string_slots.empty()) {
inlined_string_slots_.push_back(make_pair(i, tuple_string_slots));
}
if (!tuple_coll_slots.empty()) {
inlined_coll_slots_.push_back(make_pair(i, tuple_coll_slots));
}
for (SlotDescriptor* slot : tuple_desc->collection_slots()) {
if (ext_varlen_slots.find(slot->id()) == ext_varlen_slots.end()) {
inlined_coll_slots.push_back(slot);
}
}
if (!inlined_string_slots.empty()) {
inlined_string_slots_.push_back(make_pair(tuple_idx, inlined_string_slots));
}
if (!inlined_coll_slots.empty()) {
inlined_coll_slots_.push_back(make_pair(tuple_idx, inlined_coll_slots));
}
}
void BufferedTupleStream::CheckConsistencyFull(const ReadIterator& read_it) const {
@@ -919,7 +920,7 @@ void BufferedTupleStream::FixUpCollectionsForRead(
if (tuple->IsNull(slot_desc->null_indicator_offset())) continue;
CollectionValue* cv = tuple->GetCollectionSlot(slot_desc->tuple_offset());
const TupleDescriptor& item_desc = *slot_desc->collection_item_descriptor();
const TupleDescriptor& item_desc = *slot_desc->children_tuple_descriptor();
int coll_byte_size = cv->num_tuples * item_desc.byte_size();
cv->ptr = reinterpret_cast<uint8_t*>(read_iter->read_ptr_);
read_iter->AdvanceReadPtr(coll_byte_size);
@@ -964,7 +965,7 @@ int64_t BufferedTupleStream::ComputeRowSize(TupleRow* row) const noexcept {
for (auto it = slots.begin(); it != slots.end(); ++it) {
if (tuple->IsNull((*it)->null_indicator_offset())) continue;
CollectionValue* cv = tuple->GetCollectionSlot((*it)->tuple_offset());
const TupleDescriptor& item_desc = *(*it)->collection_item_descriptor();
const TupleDescriptor& item_desc = *(*it)->children_tuple_descriptor();
size += cv->num_tuples * item_desc.byte_size();
if (!item_desc.HasVarlenSlots()) continue;
@@ -1117,7 +1118,7 @@ bool BufferedTupleStream::CopyCollections(const Tuple* tuple,
for (const SlotDescriptor* slot_desc : collection_slots) {
if (tuple->IsNull(slot_desc->null_indicator_offset())) continue;
const CollectionValue* cv = tuple->GetCollectionSlot(slot_desc->tuple_offset());
const TupleDescriptor& item_desc = *slot_desc->collection_item_descriptor();
const TupleDescriptor& item_desc = *slot_desc->children_tuple_descriptor();
if (LIKELY(cv->num_tuples > 0)) {
int coll_byte_size = cv->num_tuples * item_desc.byte_size();
if (UNLIKELY(*data + coll_byte_size > data_end)) return false;

View File

@@ -667,6 +667,12 @@ class BufferedTupleStream {
/// kept pinned until the caller calls UnpinStream().
bool pinned_ = true;
/// Populates 'inlined_string_slots_' and 'inlined_coll_slots_' under the index of
/// 'tuple_idx' with the inlined string and collection slots from the slots of
/// 'tuple_desc'. Excludes the slots in 'ext_varlen_slot'.
void CollectInlinedSlots(const TupleDescriptor* tuple_desc,
const std::set<SlotId>& ext_varlen_slots, int tuple_idx);
/// Return true if 'page' is the current page for the embedded read iterator.
bool is_read_page(const Page* page) const {
return read_it_.read_page_ != pages_.end() && &*read_it_.read_page_ == page;

View File

@@ -98,24 +98,23 @@ ostream& operator<<(ostream& os, const NullIndicatorOffset& null_indicator) {
}
SlotDescriptor::SlotDescriptor(const TSlotDescriptor& tdesc,
const TupleDescriptor* parent, const TupleDescriptor* collection_item_descriptor)
const TupleDescriptor* parent, const TupleDescriptor* children_tuple_descriptor)
: id_(tdesc.id),
type_(ColumnType::FromThrift(tdesc.slotType)),
parent_(parent),
collection_item_descriptor_(collection_item_descriptor),
children_tuple_descriptor_(children_tuple_descriptor),
col_path_(tdesc.materializedPath),
tuple_offset_(tdesc.byteOffset),
null_indicator_offset_(tdesc.nullIndicatorByte, tdesc.nullIndicatorBit),
slot_idx_(tdesc.slotIdx),
slot_size_(type_.GetSlotSize()) {
DCHECK_NE(type_.type, TYPE_STRUCT);
DCHECK(parent_ != nullptr) << tdesc.parent;
if (type_.IsCollectionType()) {
if (type_.IsComplexType()) {
DCHECK(tdesc.__isset.itemTupleId);
DCHECK(collection_item_descriptor_ != nullptr) << tdesc.itemTupleId;
DCHECK(children_tuple_descriptor_ != nullptr) << tdesc.itemTupleId;
} else {
DCHECK(!tdesc.__isset.itemTupleId);
DCHECK(collection_item_descriptor == nullptr);
DCHECK(children_tuple_descriptor == nullptr);
}
}
@@ -138,8 +137,8 @@ string SlotDescriptor::DebugString() const {
out << col_path_[i];
}
out << "]";
if (collection_item_descriptor_ != nullptr) {
out << " collection_item_tuple_id=" << collection_item_descriptor_->id();
if (children_tuple_descriptor_ != nullptr) {
out << " children_tuple_id=" << children_tuple_descriptor_->id();
}
out << " offset=" << tuple_offset_ << " null=" << null_indicator_offset_.DebugString()
<< " slot_idx=" << slot_idx_ << " field_idx=" << slot_idx_
@@ -156,6 +155,10 @@ bool SlotDescriptor::LayoutEquals(const SlotDescriptor& other_desc) const {
return true;
}
inline bool SlotDescriptor::IsChildOfStruct() const {
return parent_->isTupleOfStructSlot();
}
ColumnDescriptor::ColumnDescriptor(const TColumnDescriptor& tdesc)
: name_(tdesc.name),
type_(ColumnType::FromThrift(tdesc.type)) {
@@ -342,8 +345,12 @@ TupleDescriptor::TupleDescriptor(const TTupleDescriptor& tdesc)
void TupleDescriptor::AddSlot(SlotDescriptor* slot) {
slots_.push_back(slot);
if (slot->type().IsVarLenStringType()) {
string_slots_.push_back(slot);
has_varlen_slots_ = true;
TupleDescriptor* target_tuple = this;
// If this is a tuple for struct children then we populate the 'string_slots_' of
// the topmost tuple and not this one.
if (isTupleOfStructSlot()) target_tuple = master_tuple_;
target_tuple->string_slots_.push_back(slot);
target_tuple->has_varlen_slots_ = true;
}
if (slot->type().IsCollectionType()) {
collection_slots_.push_back(slot);
@@ -351,16 +358,6 @@ void TupleDescriptor::AddSlot(SlotDescriptor* slot) {
}
}
bool TupleDescriptor::ContainsStringData() const {
if (!string_slots_.empty()) return true;
for (int i = 0; i < collection_slots_.size(); ++i) {
if (collection_slots_[i]->collection_item_descriptor_->ContainsStringData()) {
return true;
}
}
return false;
}
string TupleDescriptor::DebugString() const {
stringstream out;
out << "Tuple(id=" << id_ << " size=" << byte_size_;
@@ -614,8 +611,7 @@ Status DescriptorTbl::CreateInternal(ObjectPool* pool, const TDescriptorTable& t
(*tbl)->tbl_desc_map_[tdesc.id] = desc;
}
for (size_t i = 0; i < thrift_tbl.tupleDescriptors.size(); ++i) {
const TTupleDescriptor& tdesc = thrift_tbl.tupleDescriptors[i];
for (const TTupleDescriptor& tdesc : thrift_tbl.tupleDescriptors) {
TupleDescriptor* desc = pool->Add(new TupleDescriptor(tdesc));
// fix up table pointer
if (tdesc.__isset.tableId) {
@@ -624,15 +620,22 @@ Status DescriptorTbl::CreateInternal(ObjectPool* pool, const TDescriptorTable& t
(*tbl)->tuple_desc_map_[tdesc.id] = desc;
}
for (size_t i = 0; i < thrift_tbl.slotDescriptors.size(); ++i) {
const TSlotDescriptor& tdesc = thrift_tbl.slotDescriptors[i];
for (const TSlotDescriptor& tdesc : thrift_tbl.slotDescriptors) {
// Tuple descriptors are already populated in tbl
TupleDescriptor* parent = (*tbl)->GetTupleDescriptor(tdesc.parent);
DCHECK(parent != nullptr);
TupleDescriptor* collection_item_descriptor = tdesc.__isset.itemTupleId ?
TupleDescriptor* children_tuple_descriptor = tdesc.__isset.itemTupleId ?
(*tbl)->GetTupleDescriptor(tdesc.itemTupleId) : nullptr;
SlotDescriptor* slot_d = pool->Add(
new SlotDescriptor(tdesc, parent, collection_item_descriptor));
new SlotDescriptor(tdesc, parent, children_tuple_descriptor));
if (slot_d->type().IsStructType() && children_tuple_descriptor != nullptr &&
children_tuple_descriptor->getMasterTuple() == nullptr) {
TupleDescriptor* master_tuple = parent;
// If this struct is nested into another structs then get the topmost tuple for the
// master.
if (parent->getMasterTuple() != nullptr) master_tuple = parent->getMasterTuple();
children_tuple_descriptor->setMasterTuple(master_tuple);
}
(*tbl)->slot_desc_map_[tdesc.id] = slot_d;
parent->AddSlot(slot_d);
}

View File

@@ -115,8 +115,8 @@ class SlotDescriptor {
SlotId id() const { return id_; }
const ColumnType& type() const { return type_; }
const TupleDescriptor* parent() const { return parent_; }
const TupleDescriptor* collection_item_descriptor() const {
return collection_item_descriptor_;
const TupleDescriptor* children_tuple_descriptor() const {
return children_tuple_descriptor_;
}
/// Returns the column index of this slot, including partition keys.
/// (e.g., col_pos - num_partition_keys = the table column this slot corresponds to)
@@ -168,6 +168,8 @@ class SlotDescriptor {
void CodegenSetNullIndicator(LlvmCodeGen* codegen, LlvmBuilder* builder,
llvm::Value* tuple, llvm::Value* is_null) const;
/// Returns true if this slot is a child of a struct slot.
inline bool IsChildOfStruct() const;
private:
friend class DescriptorTbl;
friend class TupleDescriptor;
@@ -175,8 +177,8 @@ class SlotDescriptor {
const SlotId id_;
const ColumnType type_;
const TupleDescriptor* parent_;
/// Non-NULL only for collection slots
const TupleDescriptor* collection_item_descriptor_;
/// Non-NULL only for complex type slots
const TupleDescriptor* children_tuple_descriptor_;
// TODO for 2.3: rename to materialized_path_
const SchemaPath col_path_;
const int tuple_offset_;
@@ -189,9 +191,9 @@ class SlotDescriptor {
/// the byte size of this slot.
const int slot_size_;
/// collection_item_descriptor should be non-NULL iff this is a collection slot
/// 'children_tuple_descriptor' should be non-NULL iff this is a complex type slot.
SlotDescriptor(const TSlotDescriptor& tdesc, const TupleDescriptor* parent,
const TupleDescriptor* collection_item_descriptor);
const TupleDescriptor* children_tuple_descriptor);
/// Generate LLVM code at the insert position of 'builder' to get the i8 value of
/// the byte containing 'null_indicator_offset' in 'tuple'. If 'null_byte_ptr' is
@@ -452,8 +454,10 @@ class TupleDescriptor {
TupleId id() const { return id_; }
std::string DebugString() const;
/// Returns true if this tuple or any nested collection item tuples have string slots.
bool ContainsStringData() const;
bool isTupleOfStructSlot() const { return master_tuple_ != nullptr; }
TupleDescriptor* getMasterTuple() const { return master_tuple_; }
void setMasterTuple(TupleDescriptor* desc) { master_tuple_ = desc; }
/// Return true if the physical layout of this descriptor matches that of other_desc,
/// but not necessarily the id.
@@ -501,6 +505,20 @@ class TupleDescriptor {
/// collection, empty otherwise.
SchemaPath tuple_path_;
/// If this tuple represents the children of a struct slot then 'master_tuple_' is the
/// tuple that holds the topmost struct slot. For example:
/// - Tuple0
/// - Slot1 e.g. INT slot
/// - Slot2 e.g. STRUCT slot
/// - Tuple1 (Holds the children of the struct)
/// - Slot3 e.g. INT child of the STRUCT
/// - Slot4 e.g. STRING child of the STRUCT
/// In the above example the 'master_tuple_' for Tuple1 (that is the struct's tuple to
/// hold its children) would be Tuple0. In case the STRUCT in Slot2 was a nested struct
/// in any depth then the 'master_tuple_' for any of the tuples under Slot2 would be
/// again Tuple0.
TupleDescriptor* master_tuple_ = nullptr;
TupleDescriptor(const TTupleDescriptor& tdesc);
void AddSlot(SlotDescriptor* slot);

View File

@@ -24,12 +24,15 @@
#include "runtime/raw-value.inline.h"
#include "runtime/string-value.inline.h"
#include "runtime/tuple.h"
#include "udf/udf-internal.h"
#include "util/ubsan.h"
#include "common/names.h"
namespace impala {
using impala_udf::StructVal;
const int RawValue::ASCII_PRECISION;
constexpr double RawValue::CANONICAL_DOUBLE_NAN;
constexpr float RawValue::CANONICAL_FLOAT_NAN;
@@ -194,6 +197,10 @@ void RawValue::Write(const void* value, void* dst, const ColumnType& type,
dest->ptr = src->ptr;
break;
}
case TYPE_STRUCT: {
// Structs should be handled by a different Write() function within this class.
DCHECK(false);
}
default:
DCHECK(false) << "RawValue::Write(): bad type: " << type.DebugString();
}
@@ -209,6 +216,69 @@ void RawValue::Write(const void* value, Tuple* tuple, const SlotDescriptor* slot
}
}
template <bool COLLECT_STRING_VALS>
void RawValue::Write(const void* value, Tuple* tuple,
const SlotDescriptor* slot_desc, MemPool* pool,
vector<StringValue*>* string_values) {
DCHECK(value != nullptr && tuple != nullptr && slot_desc != nullptr &&
string_values != nullptr);
DCHECK(string_values->size() == 0);
if (slot_desc->type().IsStructType()) {
WriteStruct<COLLECT_STRING_VALS>(value, tuple, slot_desc, pool, string_values);
} else {
WritePrimitive<COLLECT_STRING_VALS>(value, tuple, slot_desc, pool, string_values);
}
}
template <bool COLLECT_STRING_VALS>
void RawValue::WriteStruct(const void* value, Tuple* tuple,
const SlotDescriptor* slot_desc, MemPool* pool,
vector<StringValue*>* string_values) {
DCHECK(tuple != nullptr);
DCHECK(slot_desc->type().IsStructType());
DCHECK(slot_desc->children_tuple_descriptor() != nullptr);
if (value == nullptr) {
tuple->SetStructToNull(slot_desc);
return;
}
const StructVal* src = reinterpret_cast<const StructVal*>(value);
const TupleDescriptor* children_tuple_desc = slot_desc->children_tuple_descriptor();
DCHECK_EQ(src->num_children, children_tuple_desc->slots().size());
for (int i = 0; i < src->num_children; ++i) {
SlotDescriptor* child_slot = children_tuple_desc->slots()[i];
uint8_t* src_child = src->ptr[i];
if (child_slot->type().IsStructType()) {
// Recursive call in case of nested structs.
WriteStruct<COLLECT_STRING_VALS>(src_child, tuple, child_slot, pool,
string_values);
continue;
}
if (src_child == nullptr) {
tuple->SetNull(child_slot->null_indicator_offset());
} else {
WritePrimitive<COLLECT_STRING_VALS>(src_child, tuple, child_slot, pool,
string_values);
}
}
}
template <bool COLLECT_STRING_VALS>
void RawValue::WritePrimitive(const void* value, Tuple* tuple,
const SlotDescriptor* slot_desc, MemPool* pool,
vector<StringValue*>* string_values) {
DCHECK(value != nullptr && tuple != nullptr && slot_desc != nullptr &&
string_values != nullptr);
DCHECK(!slot_desc->type().IsComplexType());
void* dst = tuple->GetSlot(slot_desc->tuple_offset());
Write(value, dst, slot_desc->type(), pool);
if (COLLECT_STRING_VALS && slot_desc->type().IsVarLenStringType()) {
string_values->push_back(reinterpret_cast<StringValue*>(dst));
}
}
void RawValue::PrintValue(
const void* value, const ColumnType& type, int scale, std::stringstream* stream) {
if (value == NULL) {
@@ -299,4 +369,25 @@ void RawValue::PrintValue(
// Undo setting stream to fixed
stream->flags(old_flags);
}
template void RawValue::Write<true>(const void* value, Tuple* tuple,
const SlotDescriptor* slot_desc, MemPool* pool,
std::vector<StringValue*>* string_values);
template void RawValue::Write<false>(const void* value, Tuple* tuple,
const SlotDescriptor* slot_desc, MemPool* pool,
std::vector<StringValue*>* string_values);
template void RawValue::WriteStruct<true>(const void* value, Tuple* tuple,
const SlotDescriptor* slot_desc, MemPool* pool,
std::vector<StringValue*>* string_values);
template void RawValue::WriteStruct<false>(const void* value, Tuple* tuple,
const SlotDescriptor* slot_desc, MemPool* pool,
std::vector<StringValue*>* string_values);
template void RawValue::WritePrimitive<true>(const void* value, Tuple* tuple,
const SlotDescriptor* slot_desc, MemPool* pool,
std::vector<StringValue*>* string_values);
template void RawValue::WritePrimitive<false>(const void* value, Tuple* tuple,
const SlotDescriptor* slot_desc, MemPool* pool,
std::vector<StringValue*>* string_values);
}

View File

@@ -27,6 +27,7 @@ namespace impala {
class MemPool;
class SlotDescriptor;
struct StringValue;
class Tuple;
/// Useful utility functions for runtime values (which are passed around as void*).
@@ -125,6 +126,13 @@ class RawValue {
/// src must be non-NULL.
static void Write(const void* src, void* dst, const ColumnType& type, MemPool* pool);
/// Wrapper function for Write() to handle struct slots and its children. Additionally,
/// gathers the string slots of the slot tree into 'string_values'.
template <bool COLLECT_STRING_VALS>
static void Write(const void* value, Tuple* tuple,
const SlotDescriptor* slot_desc, MemPool* pool,
std::vector<StringValue*>* string_values);
/// Returns true if v1 == v2.
/// This is more performant than Compare() == 0 for string equality, mostly because of
/// the length comparison check.
@@ -146,5 +154,20 @@ class RawValue {
// Returns positive zero for floating point types.
static inline const void* PositiveFloatingZero(const ColumnType& type);
private:
/// Recursive helper function for Write() to handle struct slots.
template <bool COLLECT_STRING_VALS>
static void WriteStruct(const void* value, Tuple* tuple,
const SlotDescriptor* slot_desc, MemPool* pool,
std::vector<StringValue*>* string_values);
/// Gets the destination slot from 'tuple' and 'slot_desc', writes value to this slot
/// using Write(). Collects pointer of the string slots to 'string_values'. 'slot_desc'
/// has to be primitive type.
template <bool COLLECT_STRING_VALS>
static void WritePrimitive(const void* value, Tuple* tuple,
const SlotDescriptor* slot_desc, MemPool* pool,
std::vector<StringValue*>* string_values);
};
}

View File

@@ -212,7 +212,7 @@ class RowBatchSerializeTest : public testing::Test {
}
if (type.IsCollectionType()) {
const TupleDescriptor& item_desc = *slot_desc->collection_item_descriptor();
const TupleDescriptor& item_desc = *slot_desc->children_tuple_descriptor();
CollectionValue* coll_value = reinterpret_cast<CollectionValue*>(slot);
CollectionValue* deserialized_coll_value =
reinterpret_cast<CollectionValue*>(deserialized_slot);
@@ -259,7 +259,7 @@ class RowBatchSerializeTest : public testing::Test {
break;
}
case TYPE_ARRAY: {
const TupleDescriptor* item_desc = slot_desc.collection_item_descriptor();
const TupleDescriptor* item_desc = slot_desc.children_tuple_descriptor();
int array_len = rand() % (MAX_ARRAY_LEN + 1);
CollectionValue cv;
CollectionValueBuilder builder(&cv, *item_desc, pool, runtime_state_, array_len);
@@ -721,7 +721,7 @@ TEST_F(RowBatchSerializeTest, DedupPathologicalFull) {
// The last tuple is a duplicated array with a large string inside.
const TupleDescriptor* array_tuple_desc = row_desc.tuple_descriptors()[array_tuple_idx];
const SlotDescriptor* array_slot_desc = array_tuple_desc->slots()[0];
const TupleDescriptor* array_item_desc = array_slot_desc->collection_item_descriptor();
const TupleDescriptor* array_item_desc = array_slot_desc->children_tuple_descriptor();
const SlotDescriptor* string_slot_desc = array_item_desc->slots()[0];
MemPool* pool = batch->tuple_data_pool();
for (int i = 0; i < num_distinct_array_tuples; ++i) {

View File

@@ -72,7 +72,7 @@ int64_t Tuple::VarlenByteSize(const TupleDescriptor& desc) const {
if (IsNull((*slot)->null_indicator_offset())) continue;
const CollectionValue* coll_value = GetCollectionSlot((*slot)->tuple_offset());
uint8_t* coll_data = coll_value->ptr;
const TupleDescriptor& item_desc = *(*slot)->collection_item_descriptor();
const TupleDescriptor& item_desc = *(*slot)->children_tuple_descriptor();
for (int i = 0; i < coll_value->num_tuples; ++i) {
result += reinterpret_cast<Tuple*>(coll_data)->TotalByteSize(item_desc);
coll_data += item_desc.byte_size();
@@ -112,7 +112,7 @@ void Tuple::DeepCopyVarlenData(const TupleDescriptor& desc, MemPool* pool) {
DCHECK((*slot)->type().IsCollectionType());
if (IsNull((*slot)->null_indicator_offset())) continue;
CollectionValue* cv = GetCollectionSlot((*slot)->tuple_offset());
const TupleDescriptor* item_desc = (*slot)->collection_item_descriptor();
const TupleDescriptor* item_desc = (*slot)->children_tuple_descriptor();
int coll_byte_size = cv->num_tuples * item_desc->byte_size();
uint8_t* coll_data = reinterpret_cast<uint8_t*>(pool->Allocate(coll_byte_size));
memcpy(coll_data, cv->ptr, coll_byte_size);
@@ -156,7 +156,7 @@ void Tuple::DeepCopyVarlenData(const TupleDescriptor& desc, char** data, int* of
if (IsNull((*slot)->null_indicator_offset())) continue;
CollectionValue* coll_value = GetCollectionSlot((*slot)->tuple_offset());
const TupleDescriptor& item_desc = *(*slot)->collection_item_descriptor();
const TupleDescriptor& item_desc = *(*slot)->children_tuple_descriptor();
int coll_byte_size = coll_value->num_tuples * item_desc.byte_size();
memcpy(*data, coll_value->ptr, coll_byte_size);
uint8_t* coll_data = reinterpret_cast<uint8_t*>(*data);
@@ -197,7 +197,7 @@ void Tuple::ConvertOffsetsToPointers(const TupleDescriptor& desc, uint8_t* tuple
coll_value->ptr = tuple_data + offset;
uint8_t* coll_data = coll_value->ptr;
const TupleDescriptor& item_desc = *(*slot)->collection_item_descriptor();
const TupleDescriptor& item_desc = *(*slot)->children_tuple_descriptor();
for (int i = 0; i < coll_value->num_tuples; ++i) {
reinterpret_cast<Tuple*>(coll_data)->ConvertOffsetsToPointers(
item_desc, tuple_data);
@@ -233,20 +233,35 @@ void Tuple::MaterializeExprs(TupleRow* row, const TupleDescriptor& desc,
slot_desc->type() == evals[i]->root().type());
void* src = evals[i]->GetValue(row);
if (src != NULL) {
void* dst = GetSlot(slot_desc->tuple_offset());
RawValue::Write(src, dst, slot_desc->type(), pool);
if (COLLECT_STRING_VALS && slot_desc->type().IsVarLenStringType()) {
StringValue* string_val = reinterpret_cast<StringValue*>(dst);
*(non_null_string_values++) = string_val;
*total_string_lengths += string_val->len;
++(*num_non_null_string_values);
vector<StringValue*> string_values;
RawValue::Write<COLLECT_STRING_VALS>(src, this, slot_desc, pool, &string_values);
if (string_values.size() > 0) {
for (StringValue* string_val : string_values) {
*(non_null_string_values++) = string_val;
*total_string_lengths += string_val->len;
}
(*num_non_null_string_values) += string_values.size();
}
} else {
SetNull(slot_desc->null_indicator_offset());
if (slot_desc->type().IsStructType()) {
SetStructToNull(slot_desc);
} else {
SetNull(slot_desc->null_indicator_offset());
}
}
}
}
void Tuple::SetStructToNull(const SlotDescriptor* const slot_desc) {
DCHECK(slot_desc != nullptr && slot_desc->type().IsStructType());
DCHECK(slot_desc->children_tuple_descriptor() != nullptr);
SetNull(slot_desc->null_indicator_offset());
for (SlotDescriptor* child_slot : slot_desc->children_tuple_descriptor()->slots()) {
SetNull(child_slot->null_indicator_offset());
if (child_slot->type().IsStructType()) SetStructToNull(child_slot);
}
}
char* Tuple::AllocateStrings(const char* err_ctx, RuntimeState* state,
int64_t bytes, MemPool* pool, Status* status) noexcept {
char* buf = reinterpret_cast<char*>(pool->TryAllocateUnaligned(bytes));

View File

@@ -250,6 +250,11 @@ class Tuple {
return (*null_indicator_byte & offset.bit_mask) != 0;
}
/// 'slot_desc' describes a struct slot in this tuple. Sets 'slot_desc' to null in this
/// tuple and iterates its children and sets all of them to null too. Recursively
/// iterates nested structs.
void SetStructToNull(const SlotDescriptor* const slot_desc);
/// Set the null indicators on 'num_tuples' tuples. The first tuple is stored at
/// 'tuple_mem' and subsequent tuples must be stored at a stride of 'tuple_stride'
/// bytes.

View File

@@ -306,6 +306,9 @@ TTypeEntry ColumnType::ToHs2Type() const {
(type == TYPE_CHAR) ? TTypeId::CHAR_TYPE : TTypeId::VARCHAR_TYPE);
break;
}
case TYPE_STRUCT:
type_entry.__set_type(TTypeId::STRING_TYPE);
break;
default:
// HiveServer2 does not have a type for invalid, date, datetime or
// fixed_uda_intermediate.

View File

@@ -215,6 +215,10 @@ struct ColumnType {
return type == TYPE_STRUCT || type == TYPE_ARRAY || type == TYPE_MAP;
}
inline bool IsStructType() const {
return type == TYPE_STRUCT;
}
inline bool IsCollectionType() const {
return type == TYPE_ARRAY || type == TYPE_MAP;
}
@@ -224,8 +228,72 @@ struct ColumnType {
}
/// Returns the byte size of this type. Returns 0 for variable length types.
inline int GetByteSize() const {
switch (type) {
inline int GetByteSize() const { return GetByteSize(*this); }
/// Returns the size of a slot for this type.
inline int GetSlotSize() const { return GetSlotSize(*this); }
static inline int GetDecimalByteSize(int precision) {
DCHECK_GT(precision, 0);
if (precision <= MAX_DECIMAL4_PRECISION) return 4;
if (precision <= MAX_DECIMAL8_PRECISION) return 8;
return 16;
}
/// Returns the IR version of this ColumnType. Only implemented for scalar types. LLVM
/// optimizer can pull out fields of the returned ConstantStruct for constant folding.
llvm::ConstantStruct* ToIR(LlvmCodeGen* codegen) const;
apache::hive::service::cli::thrift::TTypeEntry ToHs2Type() const;
std::string DebugString() const;
/// Used to create a possibly nested type from the flattened Thrift representation.
///
/// 'idx' is an in/out parameter that is initially set to the index of the type in
/// 'types' being constructed, and is set to the index of the next type in 'types' that
/// needs to be processed (or the size 'types' if all nodes have been processed).
ColumnType(const std::vector<TTypeNode>& types, int* idx);
private:
/// Recursive implementation of ToThrift() that populates 'thrift_type' with the
/// TTypeNodes for this type and its children.
void ToThrift(TColumnType* thrift_type) const;
/// Helper function for GetSlotSize() so that struct size could be calculated
/// recursively.
static inline int GetSlotSize(const ColumnType& col_type) {
switch (col_type.type) {
case TYPE_STRUCT: {
int struct_size = 0;
for (ColumnType child_type : col_type.children) {
struct_size += GetSlotSize(child_type);
}
return struct_size;
}
case TYPE_STRING:
case TYPE_VARCHAR:
return 12;
case TYPE_CHAR:
case TYPE_FIXED_UDA_INTERMEDIATE:
return col_type.len;
case TYPE_ARRAY:
case TYPE_MAP:
return 12;
default:
return GetByteSize(col_type);
}
}
/// Helper function for GetByteSize()
static inline int GetByteSize(const ColumnType& col_type) {
switch (col_type.type) {
case TYPE_STRUCT: {
int struct_size = 0;
for (ColumnType child_type : col_type.children) {
struct_size += GetByteSize(child_type);
}
return struct_size;
}
case TYPE_ARRAY:
case TYPE_MAP:
case TYPE_STRING:
@@ -233,7 +301,7 @@ struct ColumnType {
return 0;
case TYPE_CHAR:
case TYPE_FIXED_UDA_INTERMEDIATE:
return len;
return col_type.len;
case TYPE_NULL:
case TYPE_BOOLEAN:
case TYPE_TINYINT:
@@ -251,58 +319,13 @@ struct ColumnType {
// This is the size of the slot, the actual size of the data is 12.
return 16;
case TYPE_DECIMAL:
return GetDecimalByteSize(precision);
return GetDecimalByteSize(col_type.precision);
case INVALID_TYPE:
default:
DCHECK(false) << "NYI: " << type;
DCHECK(false) << "NYI: " << col_type.type;
}
return 0;
}
/// Returns the size of a slot for this type.
inline int GetSlotSize() const {
switch (type) {
case TYPE_STRING:
case TYPE_VARCHAR:
return 12;
case TYPE_CHAR:
case TYPE_FIXED_UDA_INTERMEDIATE:
return len;
case TYPE_ARRAY:
case TYPE_MAP:
return 12;
case TYPE_STRUCT:
DCHECK(false) << "TYPE_STRUCT slot not possible";
default:
return GetByteSize();
}
}
static inline int GetDecimalByteSize(int precision) {
DCHECK_GT(precision, 0);
if (precision <= MAX_DECIMAL4_PRECISION) return 4;
if (precision <= MAX_DECIMAL8_PRECISION) return 8;
return 16;
}
/// Returns the IR version of this ColumnType. Only implemented for scalar types. LLVM
/// optimizer can pull out fields of the returned ConstantStruct for constant folding.
llvm::ConstantStruct* ToIR(LlvmCodeGen* codegen) const;
apache::hive::service::cli::thrift::TTypeEntry ToHs2Type() const;
std::string DebugString() const;
private:
/// Used to create a possibly nested type from the flattened Thrift representation.
///
/// 'idx' is an in/out parameter that is initially set to the index of the type in
/// 'types' being constructed, and is set to the index of the next type in 'types' that
/// needs to be processed (or the size 'types' if all nodes have been processed).
ColumnType(const std::vector<TTypeNode>& types, int* idx);
/// Recursive implementation of ToThrift() that populates 'thrift_type' with the
/// TTypeNodes for this type and its children.
void ToThrift(TColumnType* thrift_type) const;
};
std::ostream& operator<<(std::ostream& os, const ColumnType& type);

View File

@@ -17,6 +17,10 @@
#include "service/hs2-util.h"
#include <rapidjson/rapidjson.h>
#include <rapidjson/stringbuffer.h>
#include <rapidjson/writer.h>
#include "common/logging.h"
#include "exprs/scalar-expr-evaluator.h"
#include "runtime/date-value.h"
@@ -24,6 +28,7 @@
#include "runtime/raw-value.inline.h"
#include "runtime/row-batch.h"
#include "runtime/types.h"
#include "udf/udf-internal.h"
#include "util/bit-util.h"
#include <gutil/strings/substitute.h>
@@ -336,6 +341,60 @@ static void DecimalExprValuesToHS2TColumn(ScalarExprEvaluator* expr_eval,
}
}
// Gets a StructVal and puts it's JSON format into 'out_stream'. Uses 'column_type' to
// figure out field names and types. This functions can call itself recursively in case
// of nested structs.
static void StructValToJSON(const StructVal& struct_val, const ColumnType& column_type,
rapidjson::Writer<rapidjson::StringBuffer>* writer) {
DCHECK(column_type.type == TYPE_STRUCT);
DCHECK_EQ(struct_val.num_children, column_type.children.size());
writer->StartObject();
for (int i = 0; i < struct_val.num_children; ++i) {
writer->String(column_type.field_names[i].c_str());
void* child = (void*)(struct_val.ptr[i]);
if (child == nullptr) {
writer->Null();
} else if (column_type.children[i].IsStructType()) {
StructValToJSON(*((StructVal*)child), column_type.children[i], writer);
} else {
string tmp;
RawValue::PrintValue(child, column_type.children[i], -1, &tmp);
const ColumnType& child_type = column_type.children[i];
if (child_type.IsStringType() || child_type.IsDateType() ||
child_type.IsTimestampType()) {
writer->String(tmp.c_str());
} else if (child_type.IsBooleanType()) {
writer->Bool( *(reinterpret_cast<bool*>(child)) );
} else {
writer->RawValue(tmp.c_str(), tmp.size(), rapidjson::kNumberType);
}
}
}
writer->EndObject();
}
static void StructExprValuesToHS2TColumn(ScalarExprEvaluator* expr_eval,
const TColumnType& type, RowBatch* batch, int start_idx, int num_rows,
uint32_t output_row_idx, apache::hive::service::cli::thrift::TColumn* column) {
DCHECK(type.types.size() > 1);
ReserveSpace(num_rows, output_row_idx, &column->stringVal);
FOREACH_ROW_LIMIT(batch, start_idx, num_rows, it) {
StructVal struct_val = expr_eval->GetStructVal(it.Get());
if (struct_val.is_null) {
column->stringVal.values.emplace_back();
} else {
int idx = 0;
ColumnType column_type(type.types, &idx);
rapidjson::StringBuffer buffer;
rapidjson::Writer<rapidjson::StringBuffer> writer(buffer);
StructValToJSON(struct_val, column_type, &writer);
column->stringVal.values.emplace_back(buffer.GetString());
}
SetNullBit(output_row_idx, struct_val.is_null, &column->stringVal.nulls);
++output_row_idx;
}
}
// For V6 and above
void impala::ExprValuesToHS2TColumn(ScalarExprEvaluator* expr_eval,
const TColumnType& type, RowBatch* batch, int start_idx, int num_rows,
@@ -344,6 +403,11 @@ void impala::ExprValuesToHS2TColumn(ScalarExprEvaluator* expr_eval,
// the type for every row.
// TODO: instead of relying on stamped out implementations, we could codegen this loop
// to inline the expression evaluation into the loop body.
if (type.types[0].type == TTypeNodeType::STRUCT) {
StructExprValuesToHS2TColumn(
expr_eval, type, batch, start_idx, num_rows, output_row_idx, column);
return;
}
switch (type.types[0].scalar_type.type) {
case TPrimitiveType::NULL_TYPE:
case TPrimitiveType::BOOLEAN:
@@ -398,7 +462,7 @@ void impala::ExprValuesToHS2TColumn(ScalarExprEvaluator* expr_eval,
}
default:
DCHECK(false) << "Unhandled type: "
<< TypeToString(ThriftToType(type.types[0].scalar_type.type));
<< TypeToString(ThriftToType(type.types[0].scalar_type.type));
}
}

View File

@@ -231,6 +231,11 @@ void ImpalaServer::get_results_metadata(ResultsMetadata& results_metadata,
results_metadata.schema.fieldSchemas.resize(result_set_md->columns.size());
for (int i = 0; i < results_metadata.schema.fieldSchemas.size(); ++i) {
const TColumnType& type = result_set_md->columns[i].columnType;
DCHECK_LE(1, type.types.size());
if (type.types[0].type != TTypeNodeType::SCALAR) {
RaiseBeeswaxException("Returning complex types is not supported through the "
"beeswax interface", SQLSTATE_GENERAL_ERROR);
}
DCHECK_EQ(1, type.types.size());
DCHECK_EQ(TTypeNodeType::SCALAR, type.types[0].type);
DCHECK(type.types[0].__isset.scalar_type);

View File

@@ -402,45 +402,51 @@ int64_t HS2ColumnarResultSet::ByteSize(int start_idx, int num_rows) {
void HS2ColumnarResultSet::InitColumns() {
result_set_->__isset.columns = true;
for (const TColumn& col : metadata_.columns) {
DCHECK(col.columnType.types.size() == 1)
<< "Structured columns unsupported in HS2 interface";
ThriftTColumn column;
switch (col.columnType.types[0].scalar_type.type) {
case TPrimitiveType::NULL_TYPE:
case TPrimitiveType::BOOLEAN:
column.__isset.boolVal = true;
break;
case TPrimitiveType::TINYINT:
column.__isset.byteVal = true;
break;
case TPrimitiveType::SMALLINT:
column.__isset.i16Val = true;
break;
case TPrimitiveType::INT:
column.__isset.i32Val = true;
break;
case TPrimitiveType::BIGINT:
column.__isset.i64Val = true;
break;
case TPrimitiveType::FLOAT:
case TPrimitiveType::DOUBLE:
column.__isset.doubleVal = true;
break;
case TPrimitiveType::TIMESTAMP:
case TPrimitiveType::DATE:
case TPrimitiveType::DECIMAL:
case TPrimitiveType::VARCHAR:
case TPrimitiveType::CHAR:
case TPrimitiveType::STRING:
column.__isset.stringVal = true;
break;
default:
DCHECK(false) << "Unhandled column type: "
<< TypeToString(
ThriftToType(col.columnType.types[0].scalar_type.type));
for (const TColumn& col_input : metadata_.columns) {
ThriftTColumn col_output;
if (col_input.columnType.types[0].type == TTypeNodeType::STRUCT) {
DCHECK(col_input.columnType.types.size() > 0);
// Return structs as string.
col_output.__isset.stringVal = true;
} else {
DCHECK(col_input.columnType.types.size() == 1);
DCHECK(col_input.columnType.types[0].__isset.scalar_type);
TPrimitiveType::type input_type = col_input.columnType.types[0].scalar_type.type;
switch (input_type) {
case TPrimitiveType::NULL_TYPE:
case TPrimitiveType::BOOLEAN:
col_output.__isset.boolVal = true;
break;
case TPrimitiveType::TINYINT:
col_output.__isset.byteVal = true;
break;
case TPrimitiveType::SMALLINT:
col_output.__isset.i16Val = true;
break;
case TPrimitiveType::INT:
col_output.__isset.i32Val = true;
break;
case TPrimitiveType::BIGINT:
col_output.__isset.i64Val = true;
break;
case TPrimitiveType::FLOAT:
case TPrimitiveType::DOUBLE:
col_output.__isset.doubleVal = true;
break;
case TPrimitiveType::TIMESTAMP:
case TPrimitiveType::DATE:
case TPrimitiveType::DECIMAL:
case TPrimitiveType::VARCHAR:
case TPrimitiveType::CHAR:
case TPrimitiveType::STRING:
col_output.__isset.stringVal = true;
break;
default:
DCHECK(false) << "Unhandled column type: "
<< TypeToString(ThriftToType(input_type));
}
}
result_set_->columns.push_back(column);
result_set_->columns.push_back(col_output);
}
}

View File

@@ -315,6 +315,39 @@ struct CollectionVal : public AnyVal {
}
};
/// A struct is represented by a vector of pointers where these pointers point to the
/// children of the struct.
struct StructVal : public AnyVal {
int num_children;
/// Pointer to the start of the vector of children pointers. These children pointers in
/// fact are AnyVal pointers where a null pointer means that this child is NULL.
/// The buffer is not null-terminated.
/// Memory allocation to 'ptr' is done using FunctionContext. As a result it's not
/// needed to take care of memory deallocation in StructVal as it will be done through
/// FunctionContext automatically.
uint8_t** ptr;
StructVal() : AnyVal(true), num_children(0), ptr(nullptr) {}
StructVal(FunctionContext* ctx, int num_children) : AnyVal(),
num_children(num_children), ptr(nullptr) {
ReserveMemory(ctx);
}
static StructVal null() { return StructVal(); }
void addChild(void* child, int idx) {
assert(idx >= 0 && idx < num_children);
ptr[idx] = (uint8_t*)child;
}
private:
/// Uses FunctionContext to reserve memory for 'num_children' number of pointers. Sets
/// 'ptr' to the beginning of this allocated memory.
void ReserveMemory(FunctionContext* ctx);
};
#pragma GCC diagnostic ignored "-Winvalid-offsetof"
static_assert(sizeof(CollectionVal) == sizeof(StringVal), "Wrong size.");
static_assert(

View File

@@ -548,6 +548,19 @@ bool StringVal::Resize(FunctionContext* ctx, int new_len) noexcept {
return false;
}
void StructVal::ReserveMemory(FunctionContext* ctx) {
assert(ctx != nullptr);
assert(num_children >= 0);
assert(is_null == false);
if (num_children == 0) return;
ptr = reinterpret_cast<uint8_t**>(
ctx->impl()->AllocateForResults(sizeof(uint8_t*) * num_children));
if (UNLIKELY(ptr == nullptr)) {
num_children = 0;
is_null = true;
}
}
// TODO: why doesn't libudasample.so build if this in udf-ir.cc?
const FunctionContext::TypeDesc* FunctionContext::GetArgType(int arg_idx) const {
if (arg_idx < 0 || arg_idx >= impl_->arg_types_.size()) return NULL;

View File

@@ -90,7 +90,8 @@ class FunctionContext {
TYPE_DECIMAL,
TYPE_VARCHAR,
// A fixed-size buffer, passed as a StringVal.
TYPE_FIXED_UDA_INTERMEDIATE
TYPE_FIXED_UDA_INTERMEDIATE,
TYPE_STRUCT
};
struct TypeDesc {

View File

@@ -181,7 +181,7 @@ string PrintTuple(const Tuple* t, const TupleDescriptor& d) {
if (t->IsNull(slot_d->null_indicator_offset())) {
out << "null";
} else if (slot_d->type().IsCollectionType()) {
const TupleDescriptor* item_d = slot_d->collection_item_descriptor();
const TupleDescriptor* item_d = slot_d->children_tuple_descriptor();
const CollectionValue* coll_value =
reinterpret_cast<const CollectionValue*>(t->GetSlot(slot_d->tuple_offset()));
uint8_t* coll_buf = coll_value->ptr;

View File

@@ -1303,7 +1303,7 @@ public class Analyzer {
registerColumnPrivReq(result);
return result;
}
// SlotRefs with a scalar type are registered against the slot's
// SlotRefs with a scalar or struct types are registered against the slot's
// fully-qualified lowercase path.
String key = slotPath.toString();
Preconditions.checkState(key.equals(key.toLowerCase()),
@@ -2849,6 +2849,12 @@ public class Analyzer {
// Type compatible with the i-th exprs of all expr lists.
// Initialize with type of i-th expr in first list.
Type compatibleType = firstList.get(i).getType();
if (firstList.get(i) instanceof SlotRef &&
compatibleType.isStructType()) {
throw new AnalysisException(String.format(
"Set operations don't support STRUCT type. %s in %s", compatibleType.toSql(),
firstList.get(i).toSql()));
}
widestExprs.add(firstList.get(i));
for (int j = 1; j < exprLists.size(); ++j) {
Preconditions.checkState(exprLists.get(j).size() == firstList.size());

View File

@@ -140,7 +140,20 @@ public class DescriptorTable {
SlotDescriptor slotDesc = getSlotDesc(id);
if (slotDesc.isMaterialized()) continue;
slotDesc.setIsMaterialized(true);
affectedTuples.add(slotDesc.getParent());
// Don't add the TupleDescriptor that is for struct children.
if (slotDesc.getParent().getParentSlotDesc() == null) {
affectedTuples.add(slotDesc.getParent());
}
if (slotDesc.getType().isStructType()) {
TupleDescriptor childrenTuple = slotDesc.getItemTupleDesc();
Preconditions.checkNotNull(childrenTuple);
Preconditions.checkState(childrenTuple.getSlots().size() > 0);
List<SlotId> childrenIds = Lists.newArrayList();
for (SlotDescriptor childSlot : childrenTuple.getSlots()) {
childrenIds.add(childSlot.getId());
}
markSlotsMaterialized(childrenIds);
}
}
return affectedTuples;
}

View File

@@ -393,14 +393,20 @@ public class SelectStmt extends QueryStmt {
}
for (Expr expr: resultExprs_) {
// Complex types are currently not supported in the select list because
// Collection types are currently not supported in the select list because
// we'd need to serialize them in a meaningful way.
if (expr.getType().isComplexType()) {
if (expr.getType().isCollectionType()) {
throw new AnalysisException(String.format(
"Expr '%s' in select list returns a complex type '%s'.\n" +
"Only scalar types are allowed in the select list.",
"Expr '%s' in select list returns a collection type '%s'.\n" +
"Collection types are not allowed in the select list.",
expr.toSql(), expr.getType().toSql()));
}
if (expr.getType().isStructType()) {
if (!analyzer_.getQueryCtx().client_request.query_options.disable_codegen) {
throw new AnalysisException("Struct type in select list is not allowed " +
"when Codegen is ON. You might want to set DISABLE_CODEGEN=true");
}
}
if (!expr.getType().isSupported()) {
throw new AnalysisException("Unsupported type '"
+ expr.getType().toSql() + "' in '" + expr.toSql() + "'.");

View File

@@ -46,7 +46,7 @@ public class SlotDescriptor {
private Path path_;
private Type type_;
// Tuple descriptor for collection items. Only set if type_ is an array or map.
// Tuple descriptor for nested items. Set if type_ is an array, map or struct.
private TupleDescriptor itemTupleDesc_;
// for SlotRef.toSql() in the absence of a path
@@ -89,6 +89,7 @@ public class SlotDescriptor {
parent_ = parent;
type_ = src.type_;
itemTupleDesc_ = src.itemTupleDesc_;
if (itemTupleDesc_ != null) itemTupleDesc_.setParentSlotDesc(this);
path_ = src.path_;
label_ = src.label_;
sourceExprs_ = src.sourceExprs_;
@@ -120,6 +121,10 @@ public class SlotDescriptor {
itemTupleDesc_ == null, "Item tuple descriptor already set.");
itemTupleDesc_ = t;
}
public void clearItemTupleDesc() {
Preconditions.checkState(itemTupleDesc_ != null);
itemTupleDesc_ = null;
}
public boolean isMaterialized() { return isMaterialized_; }
public void setIsMaterialized(boolean value) {
if (isMaterialized_ == value) return;
@@ -145,7 +150,8 @@ public class SlotDescriptor {
public void setPath(Path path) {
Preconditions.checkNotNull(path);
Preconditions.checkState(path.isRootedAtTuple());
Preconditions.checkState(path.getRootDesc() == parent_);
Preconditions.checkState(path.getRootDesc() == parent_ ||
parent_.getType().isStructType());
path_ = path;
type_ = path_.destType();
label_ = Joiner.on(".").join(path.getRawPath());
@@ -239,8 +245,8 @@ public class SlotDescriptor {
Preconditions.checkState(path_.isResolved());
List<Integer> materializedPath = Lists.newArrayList(path_.getAbsolutePath());
// For scalar types, the materialized path is the same as path_
if (type_.isScalarType()) return materializedPath;
// For scalar types and structs the materialized path is the same as path_
if (type_.isScalarType() || type_.isStructType()) return materializedPath;
Preconditions.checkState(type_.isCollectionType());
Preconditions.checkState(path_.getFirstCollectionIndex() != -1);
// Truncate materializedPath after first collection element
@@ -324,6 +330,7 @@ public class SlotDescriptor {
.add("nullIndicatorBit", nullIndicatorBit_)
.add("slotIdx", slotIdx_)
.add("stats", stats_)
.add("itemTupleDesc", itemTupleDesc_)
.toString();
}

View File

@@ -21,7 +21,11 @@ import java.util.List;
import java.util.Set;
import org.apache.impala.analysis.Path.PathType;
import org.apache.impala.catalog.FeFsTable;
import org.apache.impala.catalog.FeTable;
import org.apache.impala.catalog.HdfsFileFormat;
import org.apache.impala.catalog.StructField;
import org.apache.impala.catalog.StructType;
import org.apache.impala.catalog.TableLoadingException;
import org.apache.impala.catalog.Type;
import org.apache.impala.common.AnalysisException;
@@ -42,6 +46,9 @@ public class SlotRef extends Expr {
// Results of analysis.
private SlotDescriptor desc_;
// The resolved path after resolving 'rawPath_'.
private Path resolvedPath_ = null;
public SlotRef(List<String> rawPath) {
super();
rawPath_ = rawPath;
@@ -64,7 +71,8 @@ public class SlotRef extends Expr {
public SlotRef(SlotDescriptor desc) {
super();
if (desc.isScanSlot()) {
rawPath_ = desc.getPath().getRawPath();
resolvedPath_ = desc.getPath();
rawPath_ = resolvedPath_.getRawPath();
} else {
rawPath_ = null;
}
@@ -82,6 +90,7 @@ public class SlotRef extends Expr {
*/
private SlotRef(SlotRef other) {
super(other);
resolvedPath_ = other.resolvedPath_;
rawPath_ = other.rawPath_;
label_ = other.label_;
desc_ = other.desc_;
@@ -108,20 +117,30 @@ public class SlotRef extends Expr {
return numDistinctValues;
}
/**
* Resetting a struct SlotRef remove its children as an analyzeImpl() on this
* particular SlotRef will create the children again.
*/
@Override
public SlotRef reset() {
if (type_.isStructType()) clearChildren();
super.reset();
return this;
}
@Override
protected void analyzeImpl(Analyzer analyzer) throws AnalysisException {
// TODO: derived slot refs (e.g., star-expanded) will not have rawPath set.
// Change construction to properly handle such cases.
Preconditions.checkState(rawPath_ != null);
Path resolvedPath = null;
try {
resolvedPath = analyzer.resolvePathWithMasking(rawPath_, PathType.SLOT_REF);
resolvedPath_ = analyzer.resolvePathWithMasking(rawPath_, PathType.SLOT_REF);
} catch (TableLoadingException e) {
// Should never happen because we only check registered table aliases.
Preconditions.checkState(false);
}
Preconditions.checkNotNull(resolvedPath);
desc_ = analyzer.registerSlotRef(resolvedPath);
Preconditions.checkNotNull(resolvedPath_);
desc_ = analyzer.registerSlotRef(resolvedPath_);
type_ = desc_.getType();
if (!type_.isSupported()) {
throw new UnsupportedFeatureException("Unsupported type '"
@@ -134,17 +153,118 @@ public class SlotRef extends Expr {
throw new UnsupportedFeatureException("Unsupported type in '" + toSql() + "'.");
}
// Register scalar columns of a catalog table.
if (!resolvedPath.getMatchedTypes().isEmpty()
&& !resolvedPath.getMatchedTypes().get(0).isComplexType()) {
if (!resolvedPath_.getMatchedTypes().isEmpty()
&& !resolvedPath_.getMatchedTypes().get(0).isComplexType()) {
analyzer.registerScalarColumnForMasking(desc_);
}
numDistinctValues_ = adjustNumDistinctValues();
FeTable rootTable = resolvedPath.getRootTable();
FeTable rootTable = resolvedPath_.getRootTable();
if (rootTable != null && rootTable.getNumRows() > 0) {
// The NDV cannot exceed the #rows in the table.
numDistinctValues_ = Math.min(numDistinctValues_, rootTable.getNumRows());
}
if (type_.isStructType() && rootTable != null) {
if (!(rootTable instanceof FeFsTable)) {
throw new AnalysisException(String.format(
"%s is not supported when querying STRUCT type %s",
rootTable.getClass().toString(), type_.toSql()));
}
FeFsTable feTable = (FeFsTable)rootTable;
for (HdfsFileFormat format : feTable.getFileFormats()) {
if (format != HdfsFileFormat.ORC) {
throw new AnalysisException("Querying STRUCT is only supported for ORC file " +
"format.");
}
}
}
if (type_.isStructType()) expandSlotRefForStruct(analyzer);
}
// This function expects this SlotRef to be a Struct and creates SlotRefs to represent
// the children of the struct. Also creates slot and tuple descriptors for the children
// of the struct.
private void expandSlotRefForStruct(Analyzer analyzer) throws AnalysisException {
Preconditions.checkState(type_ != null && type_.isStructType());
// If the same struct is present multiple times in the select list we create only a
// single TupleDescriptor instead of one for each occurence.
if (desc_.getItemTupleDesc() == null) {
checkForUnsupportedFieldsForStruct();
createStructTuplesAndSlots(analyzer, resolvedPath_);
}
addStructChildrenAsSlotRefs(analyzer, desc_.getItemTupleDesc());
}
// Expects the type of this SlotRef as a StructType. Throws an AnalysisException if any
// of the struct fields of this Slot ref is a collection or unsupported type.
private void checkForUnsupportedFieldsForStruct() throws AnalysisException {
Preconditions.checkState(type_ instanceof StructType);
for (StructField structField : ((StructType)type_).getFields()) {
if (!structField.getType().isSupported()) {
throw new AnalysisException("Unsupported type '"
+ structField.getType().toSql() + "' in '" + toSql() + "'.");
}
if (structField.getType().isCollectionType()) {
throw new AnalysisException("Struct containing a collection type is not " +
"allowed in the select list.");
}
}
}
/**
* Creates a TupleDescriptor to hold the children of a struct slot and then creates and
* adds SlotDescriptors as struct children to this TupleDescriptor. Sets the created
* parent TupleDescriptor to 'desc_.itemTupleDesc_'.
*/
public void createStructTuplesAndSlots(Analyzer analyzer, Path resolvedPath) {
TupleDescriptor structTuple =
analyzer.getDescTbl().createTupleDescriptor("struct_tuple");
if (resolvedPath != null) structTuple.setPath(resolvedPath);
structTuple.setType((StructType)type_);
structTuple.setParentSlotDesc(desc_);
for (StructField structField : ((StructType)type_).getFields()) {
SlotDescriptor slotDesc = analyzer.getDescTbl().addSlotDescriptor(structTuple);
// 'resolvedPath_' could be null e.g. when the query has an order by clause and
// this is the sorting tuple.
if (resolvedPath != null) {
Path relPath = Path.createRelPath(resolvedPath, structField.getName());
relPath.resolve();
slotDesc.setPath(relPath);
}
slotDesc.setType(structField.getType());
slotDesc.setIsMaterialized(true);
}
desc_.setItemTupleDesc(structTuple);
}
/**
* Assuming that 'structTuple' is the tuple for struct children this function iterates
* its slots, creates a SlotRef for each slot and adds them to 'children_' of this
* SlotRef.
*/
public void addStructChildrenAsSlotRefs(Analyzer analyzer,
TupleDescriptor structTuple) throws AnalysisException {
Preconditions.checkState(structTuple != null);
Preconditions.checkState(structTuple.getParentSlotDesc() != null);
Preconditions.checkState(structTuple.getParentSlotDesc().getType().isStructType());
for (SlotDescriptor childSlot : structTuple.getSlots()) {
SlotRef childSlotRef = new SlotRef(childSlot);
children_.add(childSlotRef);
if (childSlot.getType().isStructType()) {
childSlotRef.expandSlotRefForStruct(analyzer);
}
}
}
/**
* The TreeNode.collect() function shouldn't iterate the children of this SlotRef if
* this is a struct SlotRef. The desired functionality is to collect the struct
* SlotRefs but not their children.
*/
@Override
protected boolean shouldCollectRecursively() {
if (desc_ != null && desc_.getType().isStructType()) return false;
return true;
}
@Override
@@ -265,7 +385,9 @@ public class SlotRef extends Expr {
}
@Override
public Expr clone() { return new SlotRef(this); }
public Expr clone() {
return new SlotRef(this);
}
@Override
public String toString() {

View File

@@ -23,6 +23,7 @@ import java.util.LinkedHashSet;
import java.util.List;
import java.util.Set;
import org.apache.impala.common.AnalysisException;
import org.apache.impala.common.TreeNode;
import org.apache.impala.planner.PlanNode;
import org.apache.impala.thrift.TSortingOrder;
@@ -254,7 +255,20 @@ public class SortInfo {
dstSlotDesc.initFromExpr(srcExpr);
}
dstSlotDesc.setSourceExpr(srcExpr);
outputSmap_.put(srcExpr.clone(), new SlotRef(dstSlotDesc));
SlotRef dstExpr = new SlotRef(dstSlotDesc);
if (dstSlotDesc.getType().isStructType() &&
dstSlotDesc.getItemTupleDesc() != null) {
dstSlotDesc.clearItemTupleDesc();
dstExpr.createStructTuplesAndSlots(analyzer, null);
try {
dstExpr.addStructChildrenAsSlotRefs(analyzer, dstSlotDesc.getItemTupleDesc());
} catch (AnalysisException ex) {
// Adding SlotRefs shouldn't throw here as the source SlotRef had already been
// analysed.
Preconditions.checkNotNull(null);
}
}
outputSmap_.put(srcExpr.clone(), dstExpr);
materializedExprs_.add(srcExpr);
}
}

View File

@@ -86,7 +86,9 @@ public class Subquery extends Expr {
List<Expr> stmtResultExprs = stmt_.getResultExprs();
if (stmtResultExprs.size() == 1) {
type_ = stmtResultExprs.get(0).getType();
Preconditions.checkState(!type_.isComplexType());
if (type_.isComplexType()) {
throw new AnalysisException("A subquery can't return complex types. " + toSql());
}
} else {
type_ = createStructTypeFromExprList();
}

View File

@@ -30,6 +30,7 @@ import org.apache.impala.catalog.FeFsTable;
import org.apache.impala.catalog.FeKuduTable;
import org.apache.impala.catalog.FeTable;
import org.apache.impala.catalog.StructType;
import org.apache.impala.common.Pair;
import org.apache.impala.thrift.TTupleDescriptor;
import com.google.common.base.Joiner;
@@ -109,6 +110,10 @@ public class TupleDescriptor {
// Tuple of the table masking view that masks this tuple's table.
private TupleDescriptor maskedByTuple_ = null;
// If this is a tuple representing the children of a struct slot then 'parentSlot_'
// is the struct slot where this tuple belongs. Otherwise it's null.
private SlotDescriptor parentStructSlot_ = null;
public TupleDescriptor(TupleId id, String debugName) {
id_ = id;
path_ = null;
@@ -166,7 +171,7 @@ public class TupleDescriptor {
public void setPath(Path p) {
Preconditions.checkNotNull(p);
Preconditions.checkState(p.isResolved());
Preconditions.checkState(p.destType().isCollectionType());
Preconditions.checkState(p.destType().isComplexType());
path_ = p;
if (p.destTable() != null) {
// Do not use Path.getTypeAsStruct() to only allow implicit path resolutions,
@@ -209,6 +214,14 @@ public class TupleDescriptor {
table.getDesc().maskedByTuple_ = this;
}
public void setParentSlotDesc(SlotDescriptor parent) {
Preconditions.checkState(parent.getType().isStructType(),
"Parent for a TupleDescriptor should be a STRUCT. Actual type is " +
parent.getType() + " Tuple ID: " + getId());
parentStructSlot_ = parent;
}
public SlotDescriptor getParentSlotDesc() { return parentStructSlot_; }
public String debugString() {
String tblStr = (getTable() == null ? "null" : getTable().getFullName());
List<String> slotStrings = new ArrayList<>();
@@ -223,6 +236,9 @@ public class TupleDescriptor {
.add("is_materialized", isMaterialized_)
.add("slots", "[" + Joiner.on(", ").join(slotStrings) + "]");
if (maskedTable_ != null) toStrHelper.add("masks", maskedTable_.getId());
if (parentStructSlot_ != null) {
toStrHelper.add("parentSlot", parentStructSlot_.getId());
}
return toStrHelper.toString();
}
@@ -274,8 +290,25 @@ public class TupleDescriptor {
computeMemLayout();
}
public void computeMemLayout() {
if (hasMemLayout_) return;
/**
* Computes the memory layout within this tuple including the total size of the tuple,
* size of each underlying slot, slot offsets, offset for the tuple level null
* indicator bytes and the null indicator bits for the slots.
* For struct tuples the offsets are calculated from the topmost parent's beginning and
* not neccessarily starting from zero within this tuple.
* Returns the nullIndicatorByte and nullIndicator bit in a Pair<>. This is needed to
* handle the case when there is a struct or nested structs in the tuple and top level
* nullIndicatorByte and nullIndicatorBit has to be adjusted based on the actual
* structure of the structs.
*/
public Pair<Integer, Integer> computeMemLayout() {
if (parentStructSlot_ != null) {
// If this TupleDescriptor represents the children of a STRUCT then the slot
// offsets are adjusted with the parent struct's offset.
Preconditions.checkState(parentStructSlot_.getType().isStructType());
Preconditions.checkState(parentStructSlot_.getByteOffset() != -1);
}
if (hasMemLayout_) return null;
hasMemLayout_ = true;
boolean alwaysAddNullBit = hasNullableKuduScanSlots();
@@ -289,50 +322,59 @@ public class TupleDescriptor {
int totalSlotSize = 0;
for (SlotDescriptor d: slots_) {
if (!d.isMaterialized()) continue;
ColumnStats stats = d.getStats();
int slotSize = d.getType().getSlotSize();
int slotSize = getSlotSize(d);
addToAvgSerializedSize(d);
if (stats.hasAvgSize()) {
avgSerializedSize_ += d.getStats().getAvgSerializedSize();
} else {
// TODO: for computed slots, try to come up with stats estimates
avgSerializedSize_ += slotSize;
}
// Add padding for a KUDU string slot.
if (d.isKuduStringSlot()) {
slotSize += KUDU_STRING_PADDING;
avgSerializedSize_ += KUDU_STRING_PADDING;
}
if (!slotsBySize.containsKey(slotSize)) {
slotsBySize.put(slotSize, new ArrayList<>());
}
totalSlotSize += slotSize;
slotsBySize.get(slotSize).add(d);
if (d.getIsNullable() || alwaysAddNullBit) ++numNullBits;
totalSlotSize += slotSize;
numNullBits += getNumNullBits(d, alwaysAddNullBit);
}
// we shouldn't have anything of size <= 0
Preconditions.checkState(!slotsBySize.containsKey(0));
Preconditions.checkState(!slotsBySize.containsKey(-1));
// assign offsets to slots in order of descending size
numNullBytes_ = (numNullBits + 7) / 8;
// The total number of bytes for nullable scalar or nested struct fields will be
// computed for the struct at the top level (i.e., parentStructSlot_ == null).
// If this descriptor is inside a struct then don't need to count for an additional
// null byte here as the null indicator will be on the top level tuple. In other
// words the total number of bytes for nullable scalar or nested struct fields will
// be computed for the struct at the top level (i.e., parentStructSlot_ == null).
numNullBytes_ = (parentStructSlot_ == null) ? (numNullBits + 7) / 8 : 0;
int slotOffset = 0;
int nullIndicatorByte = totalSlotSize;
if (parentStructSlot_ != null) {
nullIndicatorByte = parentStructSlot_.getNullIndicatorByte();
}
int nullIndicatorBit = 0;
if (parentStructSlot_ != null) {
// If this is a child tuple from a struct then get the next available bit from the
// parent struct.
nullIndicatorBit = (parentStructSlot_.getNullIndicatorBit() + 1) % 8;
// If the parent struct ran out of null bits in the current null byte just before
// this tuple then start using a new byte.
if (nullIndicatorBit == 0) ++nullIndicatorByte;
}
// slotIdx is the index into the resulting tuple struct. The first (largest) field
// is 0, next is 1, etc.
int slotIdx = 0;
// sort slots in descending order of size
List<Integer> sortedSizes = new ArrayList<>(slotsBySize.keySet());
Collections.sort(sortedSizes, Collections.reverseOrder());
// assign offsets to slots in order of descending size
for (int slotSize: sortedSizes) {
if (slotsBySize.get(slotSize).isEmpty()) continue;
for (SlotDescriptor d: slotsBySize.get(slotSize)) {
Preconditions.checkState(d.isMaterialized());
d.setByteSize(slotSize);
d.setByteOffset(slotOffset);
d.setSlotIdx(slotIdx++);
d.setByteOffset((parentStructSlot_ == null) ? slotOffset :
parentStructSlot_.getByteOffset() + slotOffset);
slotOffset += slotSize;
d.setSlotIdx(slotIdx++);
// assign null indicator
if (d.getIsNullable() || alwaysAddNullBit) {
@@ -348,11 +390,22 @@ public class TupleDescriptor {
d.setNullIndicatorBit(-1);
d.setNullIndicatorByte(0);
}
// For struct slots calculate the mem layout for the tuple representing it's
// children.
if (d.getType().isStructType()) {
Pair<Integer, Integer> nullIndicators =
d.getItemTupleDesc().computeMemLayout();
// Adjust the null indicator byte and bit according to what is set in the
// struct's children
nullIndicatorByte = nullIndicators.first;
nullIndicatorBit = nullIndicators.second;
}
}
}
Preconditions.checkState(slotOffset == totalSlotSize);
byteSize_ = totalSlotSize + numNullBytes_;
return new Pair<Integer, Integer>(nullIndicatorByte, nullIndicatorBit);
}
/**
@@ -363,6 +416,55 @@ public class TupleDescriptor {
hasMemLayout_ = false;
}
/**
* Receives a SlotDescriptor as a parameter and returns its size.
*/
private int getSlotSize(SlotDescriptor slotDesc) {
int slotSize = slotDesc.getType().getSlotSize();
// Add padding for a KUDU string slot.
if (slotDesc.isKuduStringSlot()) {
slotSize += KUDU_STRING_PADDING;
}
return slotSize;
}
/**
* Gets a SlotDescriptor as parameter and calculates its average serialized size and
* adds the result to 'avgSerializedSize_'.
*/
private void addToAvgSerializedSize(SlotDescriptor slotDesc) {
ColumnStats stats = slotDesc.getStats();
if (stats.hasAvgSize()) {
avgSerializedSize_ += stats.getAvgSerializedSize();
} else {
// Note, there are no stats for complex types slots so can't use average serialized
// size from stats for them.
// TODO: for computed slots, try to come up with stats estimates
avgSerializedSize_ += slotDesc.getType().getSlotSize();
}
// Add padding for a KUDU string slot.
if (slotDesc.isKuduStringSlot()) {
avgSerializedSize_ += KUDU_STRING_PADDING;
}
}
// Function to calculate the number of null bits required for a slot descriptor. In
// case of a struct slot it calls itself recursively to get the required null bits for
// the struct's children.
private int getNumNullBits(SlotDescriptor slotDesc, boolean alwaysAddNullBit) {
Preconditions.checkState(!slotDesc.getType().isStructType() ||
slotDesc.getIsNullable());
if (!slotDesc.getIsNullable() && !alwaysAddNullBit) return 0;
if (!slotDesc.getType().isStructType()) return 1;
TupleDescriptor childrenTuple = slotDesc.getItemTupleDesc();
Preconditions.checkState(childrenTuple != null);
int numNullBits = 1;
for (SlotDescriptor child : childrenTuple.getSlots()) {
numNullBits += getNumNullBits(child, alwaysAddNullBit);
}
return numNullBits;
}
/**
* Returns true if this tuple has at least one materialized nullable Kudu scan slot.
*/

View File

@@ -95,6 +95,20 @@ public class StructType extends Type {
}
}
/**
* The size of a struct slot is the sum of the size of its children. Don't have to
* count for null indicators as they are not stored on the level of the struct slot,
* instead it's on the topmost tuple's level.
*/
@Override
public int getSlotSize() {
int size = 0;
for (StructField structField : fields_) {
size += structField.getType().getSlotSize();
}
return size;
}
@Override
public boolean equals(Object other) {
if (!(other instanceof StructType)) return false;

View File

@@ -103,6 +103,12 @@ public abstract class TreeNode<NodeType extends TreeNode<NodeType>> {
return numNodes;
}
/**
* This function helps to decide if a collect() should call itself recursively with the
* children of a given TreeNode.
*/
protected boolean shouldCollectRecursively() { return true; }
/**
* Add all nodes in the tree that satisfy 'predicate' to the list 'matches'
* This node is checked first, followed by its children in order. If the node
@@ -120,7 +126,9 @@ public abstract class TreeNode<NodeType extends TreeNode<NodeType>> {
matches.add((D) this);
return;
}
for (NodeType child: children_) child.collect(predicate, matches);
if (shouldCollectRecursively()) {
for (NodeType child: children_) child.collect(predicate, matches);
}
}
/**

View File

@@ -2066,7 +2066,7 @@ public class HdfsScanNode extends ScanNode {
columnByteSizes.add(computeMinScalarColumnMemReservation(column));
}
} else {
appendMinColumnMemReservationsForCollection(slot, columnByteSizes);
appendMinColumnMemReservationsForComplexType(slot, columnByteSizes);
}
}
}
@@ -2080,14 +2080,14 @@ public class HdfsScanNode extends ScanNode {
/**
* Helper for computeMinColumnMemReservations() - compute minimum memory reservations
* for all of the scalar columns read from disk when materializing collectionSlot.
* for all of the scalar columns read from disk when materializing complexSlot.
* Appends one number per scalar column to columnMemReservations.
*/
private void appendMinColumnMemReservationsForCollection(SlotDescriptor collectionSlot,
private void appendMinColumnMemReservationsForComplexType(SlotDescriptor complexSlot,
List<Long> columnMemReservations) {
Preconditions.checkState(collectionSlot.getType().isCollectionType());
Preconditions.checkState(complexSlot.getType().isComplexType());
boolean addedColumn = false;
for (SlotDescriptor nestedSlot: collectionSlot.getItemTupleDesc().getSlots()) {
for (SlotDescriptor nestedSlot: complexSlot.getItemTupleDesc().getSlots()) {
// Position virtual slots can be materialized by piggybacking on another slot.
if (!nestedSlot.isMaterialized() || nestedSlot.isArrayPosRef()) continue;
if (nestedSlot.getType().isScalarType()) {
@@ -2095,8 +2095,8 @@ public class HdfsScanNode extends ScanNode {
// reservation.
columnMemReservations.add(DEFAULT_COLUMN_SCAN_RANGE_RESERVATION);
addedColumn = true;
} else {
appendMinColumnMemReservationsForCollection(nestedSlot, columnMemReservations);
} else if (nestedSlot.getType().isComplexType()) {
appendMinColumnMemReservationsForComplexType(nestedSlot, columnMemReservations);
}
}
// Need to scan at least one column to materialize the pos virtual slot and/or

View File

@@ -2290,18 +2290,14 @@ public class AnalyzeDDLTest extends FrontendTestBase {
AnalysisError("create table t primary key (cs) partition by hash partitions 3" +
" stored as kudu as select cs from functional.chars_tiny",
"Cannot create table 't': Type CHAR(5) is not supported in Kudu");
AnalysisError("create table t primary key (id) partition by hash partitions 3" +
" stored as kudu as select id, s from functional.complextypes_fileformat",
"Expr 's' in select list returns a complex type 'STRUCT<f1:STRING,f2:INT>'.\n" +
"Only scalar types are allowed in the select list.");
AnalysisError("create table t primary key (id) partition by hash partitions 3" +
" stored as kudu as select id, m from functional.complextypes_fileformat",
"Expr 'm' in select list returns a complex type 'MAP<STRING,BIGINT>'.\n" +
"Only scalar types are allowed in the select list.");
"Expr 'm' in select list returns a collection type 'MAP<STRING,BIGINT>'.\n" +
"Collection types are not allowed in the select list.");
AnalysisError("create table t primary key (id) partition by hash partitions 3" +
" stored as kudu as select id, a from functional.complextypes_fileformat",
"Expr 'a' in select list returns a complex type 'ARRAY<INT>'.\n" +
"Only scalar types are allowed in the select list.");
"Expr 'a' in select list returns a collection type 'ARRAY<INT>'.\n" +
"Collection types are not allowed in the select list.");
// IMPALA-6454: CTAS into Kudu tables with primary key specified in upper case.
AnalyzesOk("create table part_kudu_tbl primary key(INT_COL, SMALLINT_COL, ID)" +
@@ -3083,13 +3079,20 @@ public class AnalyzeDDLTest extends FrontendTestBase {
"Incompatible return types 'INT' and 'STRING' of exprs " +
"'int_col' and 'string_col'.");
// View cannot have complex-typed columns because complex-typed exprs are
// View cannot have collection-typed columns because collection-typed exprs are
// not supported in the select list.
AnalysisError("create view functional.foo (a, b, c) as " +
"select int_array_col, int_map_col, int_struct_col " +
AnalysisError("create view functional.foo (a, b) as " +
"select int_array_col, int_map_col " +
"from functional.allcomplextypes",
"Expr 'int_array_col' in select list returns a complex type 'ARRAY<INT>'.\n" +
"Only scalar types are allowed in the select list.");
"Expr 'int_array_col' in select list returns a collection type 'ARRAY<INT>'.\n" +
"Collection types are not allowed in the select list.");
// It's allowed to do the same with struct as it is supported in the select list.
AnalysisContext ctx = createAnalysisCtx();
// TODO: Turning Codegen OFF could be removed once the Codegen support is implemented
// for structs given in the select list.
ctx.getQueryOptions().setDisable_codegen(true);
AnalyzesOk("create view functional.foo (a) as " +
"select tiny_struct from functional_orc_def.complextypes_structs", ctx);
// IMPALA-7679: Inserting a null column type without an explicit type should
// throw an error.

View File

@@ -335,9 +335,10 @@ public class AnalyzeExprsTest extends AnalyzerTest {
AnalysisError("select 1 from functional.allcomplextypes where int_map_col = 1",
"operands of type MAP<STRING,INT> and TINYINT are not comparable: " +
"int_map_col = 1");
AnalysisError("select 1 from functional.allcomplextypes where int_struct_col = 1",
"operands of type STRUCT<f1:INT,f2:INT> and TINYINT are not comparable: " +
"int_struct_col = 1");
AnalysisError("select 1 from functional_orc_def.complextypes_structs where " +
"tiny_struct = true",
"operands of type STRUCT<b:BOOLEAN> and BOOLEAN are not comparable: " +
"tiny_struct = TRUE");
// Complex types are not comparable even if identical.
// TODO: Reconsider this behavior. Such a comparison should ideally work,
// but may require complex-typed SlotRefs and BE functions able to accept
@@ -645,9 +646,10 @@ public class AnalyzeExprsTest extends AnalyzerTest {
AnalysisError("select int_array_col or true from functional.allcomplextypes",
"Operand 'int_array_col' part of predicate 'int_array_col OR TRUE' should " +
"return type 'BOOLEAN' but returns type 'ARRAY<INT>'");
AnalysisError("select false and int_struct_col from functional.allcomplextypes",
"Operand 'int_struct_col' part of predicate 'FALSE AND int_struct_col' should " +
"return type 'BOOLEAN' but returns type 'STRUCT<f1:INT,f2:INT>'.");
AnalysisError("select false and tiny_struct from " +
"functional_orc_def.complextypes_structs",
"Operand 'tiny_struct' part of predicate 'FALSE AND tiny_struct' should " +
"return type 'BOOLEAN' but returns type 'STRUCT<b:BOOLEAN>'.");
AnalysisError("select not int_map_col from functional.allcomplextypes",
"Operand 'int_map_col' part of predicate 'NOT int_map_col' should return " +
"type 'BOOLEAN' but returns type 'MAP<STRING,INT>'.");
@@ -661,12 +663,13 @@ public class AnalyzeExprsTest extends AnalyzerTest {
AnalysisError("select 1 from functional.allcomplextypes where int_map_col is null",
"IS NULL predicate does not support complex types: int_map_col IS NULL");
AnalysisError("select * from functional.allcomplextypes where complex_struct_col " +
"is null", "IS NULL predicate does not support complex types: " +
"complex_struct_col IS NULL");
AnalysisError("select * from functional.allcomplextypes where nested_struct_col " +
"is not null", "IS NOT NULL predicate does not support complex types: " +
"nested_struct_col IS NOT NULL");
AnalysisError("select * from functional_orc_def.complextypes_structs where " +
"tiny_struct is null",
"IS NULL predicate does not support complex types: tiny_struct IS NULL");
AnalysisError("select * from functional_orc_def.complextypes_structs where " +
"tiny_struct is not null",
"IS NOT NULL predicate does not support complex types: tiny_struct " +
"IS NOT NULL");
}
@Test
@@ -767,10 +770,10 @@ public class AnalyzeExprsTest extends AnalyzerTest {
"where date_col between int_col and double_col",
"Incompatible return types 'DATE' and 'INT' " +
"of exprs 'date_col' and 'int_col'.");
AnalysisError("select 1 from functional.allcomplextypes " +
"where int_struct_col between 10 and 20",
"Incompatible return types 'STRUCT<f1:INT,f2:INT>' and 'TINYINT' " +
"of exprs 'int_struct_col' and '10'.");
AnalysisError("select 1 from functional_orc_def.complextypes_structs " +
"where tiny_struct between 10 and 20",
"Incompatible return types 'STRUCT<b:BOOLEAN>' and 'TINYINT' " +
"of exprs 'tiny_struct' and '10'.");
// IMPALA-7211: Do not cast decimal types to other decimal types
AnalyzesOk("select cast(1 as decimal(38,2)) between " +
"0.9 * cast(1 as decimal(38,3)) and 3");
@@ -1274,9 +1277,10 @@ public class AnalyzeExprsTest extends AnalyzerTest {
AnalysisError("select id, row_number() over (order by int_array_col) " +
"from functional_parquet.allcomplextypes", "ORDER BY expression " +
"'int_array_col' with complex type 'ARRAY<INT>' is not supported.");
AnalysisError("select id, count() over (partition by int_struct_col) " +
"from functional_parquet.allcomplextypes", "PARTITION BY expression " +
"'int_struct_col' with complex type 'STRUCT<f1:INT,f2:INT>' is not supported.");
AnalysisError("select id, count() over (partition by tiny_struct) from " +
"functional_orc_def.complextypes_structs",
"PARTITION BY expression 'tiny_struct' with complex type " +
"'STRUCT<b:BOOLEAN>' is not supported.");
}
/**
@@ -1731,9 +1735,10 @@ public class AnalyzeExprsTest extends AnalyzerTest {
"'string_col + INTERVAL 10 years' returns type 'STRING'. " +
"Expected type 'TIMESTAMP' or 'DATE'.");
AnalysisError(
"select int_struct_col + interval 10 years from functional.allcomplextypes",
"Operand 'int_struct_col' of timestamp/date arithmetic expression " +
"'int_struct_col + INTERVAL 10 years' returns type 'STRUCT<f1:INT,f2:INT>'. " +
"select tiny_struct + interval 10 years from " +
"functional_orc_def.complextypes_structs",
"Operand 'tiny_struct' of timestamp/date arithmetic expression " +
"'tiny_struct + INTERVAL 10 years' returns type 'STRUCT<b:BOOLEAN>'. " +
"Expected type 'TIMESTAMP' or 'DATE'.");
// Reversed interval and timestamp using addition.
AnalysisError("select interval 10 years + float_col from functional.alltypes",
@@ -1863,8 +1868,9 @@ public class AnalyzeExprsTest extends AnalyzerTest {
AnalyzesOk("select round(cast('1.1' as decimal), 1)");
// No matching signature for complex type.
AnalysisError("select lower(int_struct_col) from functional.allcomplextypes",
"No matching function with signature: lower(STRUCT<f1:INT,f2:INT>).");
AnalysisError("select lower(tiny_struct) from " +
"functional_orc_def.complextypes_structs",
"No matching function with signature: lower(STRUCT<b:BOOLEAN>).");
// Special cases for FROM in function call
AnalyzesOk("select extract(year from now())");
@@ -2170,10 +2176,10 @@ public class AnalyzeExprsTest extends AnalyzerTest {
AnalyzesOk("select if(bool_col, false, NULL) from functional.alltypes");
AnalyzesOk("select if(NULL, NULL, NULL) from functional.alltypes");
// No matching signature.
AnalysisError("select if(true, int_struct_col, int_struct_col) " +
"from functional.allcomplextypes",
AnalysisError("select if(true, tiny_struct, tiny_struct) " +
"from functional_orc_def.complextypes_structs",
"No matching function with signature: " +
"if(BOOLEAN, STRUCT<f1:INT,f2:INT>, STRUCT<f1:INT,f2:INT>).");
"if(BOOLEAN, STRUCT<b:BOOLEAN>, STRUCT<b:BOOLEAN>).");
// if() only accepts three arguments
AnalysisError("select if(true, false, true, true)",

View File

@@ -421,16 +421,21 @@ public class AnalyzeStmtsTest extends AnalyzerTest {
/**
* Checks that the given SQL analyzes ok, and asserts that the last result expr in the
* parsed SelectStmt is a scalar SlotRef whose absolute path is identical to the given
* expected one. Also asserts that the slot's absolute path is equal to its
* parsed SelectStmt is a non-collection SlotRef whose absolute path is identical to
* the given expected one. Also asserts that the slot's absolute path is equal to its
* materialized path. Intentionally allows multiple result exprs to be analyzed to test
* absolute path caching, though only the last path is validated.
*/
private void testSlotRefPath(String sql, List<Integer> expectedAbsPath) {
SelectStmt stmt = (SelectStmt) AnalyzesOk(sql);
AnalysisContext ctx = createAnalysisCtx();
// TODO: Turning Codegen OFF could be removed once the Codegen support is implemented
// for structs given in the select list.
ctx.getQueryOptions().setDisable_codegen(true);
SelectStmt stmt = (SelectStmt) AnalyzesOk(sql, ctx);
Expr e = stmt.getResultExprs().get(stmt.getResultExprs().size() - 1);
Preconditions.checkState(e instanceof SlotRef);
Preconditions.checkState(e.getType().isScalarType());
Preconditions.checkState(!e.getType().isCollectionType());
SlotRef slotRef = (SlotRef) e;
List<Integer> actualAbsPath = slotRef.getDesc().getPath().getAbsolutePath();
Assert.assertEquals("Mismatched absolute paths.", expectedAbsPath, actualAbsPath);
@@ -520,13 +525,11 @@ public class AnalyzeStmtsTest extends AnalyzerTest {
// Array of structs. No name conflicts with implicit fields. Both implicit and
// explicit paths are allowed.
addTestTable("create table d.t2 (c array<struct<f:int>>)");
addTestTable("create table d.t2 (c array<struct<f:int>>) stored as orc");
testSlotRefPath("select f from d.t2.c", path(0, 0, 0));
testSlotRefPath("select item.f from d.t2.c", path(0, 0, 0));
testSlotRefPath("select pos from d.t2.c", path(0, 1));
AnalysisError("select item from d.t2.c",
"Expr 'item' in select list returns a complex type 'STRUCT<f:INT>'.\n" +
"Only scalar types are allowed in the select list.");
testSlotRefPath("select item from d.t2.c", path(0, 0));
AnalysisError("select item.pos from d.t2.c",
"Could not resolve column/field reference: 'item.pos'");
// Test star expansion.
@@ -535,16 +538,14 @@ public class AnalyzeStmtsTest extends AnalyzerTest {
// Array of structs with name conflicts. Both implicit and explicit
// paths are allowed.
addTestTable("create table d.t3 (c array<struct<f:int,item:int,pos:int>>)");
addTestTable("create table d.t3 (c array<struct<f:int,item:int,pos:int>>) " +
"stored as orc");
testSlotRefPath("select f from d.t3.c", path(0, 0, 0));
testSlotRefPath("select item.f from d.t3.c", path(0, 0, 0));
testSlotRefPath("select item.item from d.t3.c", path(0, 0, 1));
testSlotRefPath("select item.pos from d.t3.c", path(0, 0, 2));
testSlotRefPath("select pos from d.t3.c", path(0, 1));
AnalysisError("select item from d.t3.c",
"Expr 'item' in select list returns a complex type " +
"'STRUCT<f:INT,item:INT,pos:INT>'.\n" +
"Only scalar types are allowed in the select list.");
testSlotRefPath("select item from d.t3.c", path(0, 0));
// Test star expansion.
testStarPath("select * from d.t3.c", path(0, 0, 0), path(0, 0, 1), path(0, 0, 2));
testStarPath("select c.* from d.t3.c", path(0, 0, 0), path(0, 0, 1), path(0, 0, 2));
@@ -561,38 +562,49 @@ public class AnalyzeStmtsTest extends AnalyzerTest {
// Map with a scalar key and struct value. No name conflicts. Both implicit and
// explicit paths are allowed.
addTestTable("create table d.t5 (c map<int,struct<f:int>>)");
addTestTable("create table d.t5 (c map<int,struct<f:int>>) stored as orc");
testSlotRefPath("select key from d.t5.c", path(0, 0));
testSlotRefPath("select f from d.t5.c", path(0, 1, 0));
testSlotRefPath("select value.f from d.t5.c", path(0, 1, 0));
AnalysisError("select value.value from d.t5.c",
"Could not resolve column/field reference: 'value.value'");
AnalysisError("select value from d.t5.c",
"Expr 'value' in select list returns a complex type " +
"'STRUCT<f:INT>'.\n" +
"Only scalar types are allowed in the select list.");
testSlotRefPath("select value from d.t5.c", path(0, 1));
// Test star expansion.
testStarPath("select * from d.t5.c", path(0, 0), path(0, 1, 0));
testStarPath("select c.* from d.t5.c", path(0, 0), path(0, 1, 0));
// Map with a scalar key and struct value with name conflicts. Both implicit and
// explicit paths are allowed.
addTestTable("create table d.t6 (c map<int,struct<f:int,key:int,value:int>>)");
addTestTable("create table d.t6 (c map<int,struct<f:int,key:int,value:int>>) " +
"stored as orc");
testSlotRefPath("select key from d.t6.c", path(0, 0));
testSlotRefPath("select f from d.t6.c", path(0, 1, 0));
testSlotRefPath("select value.f from d.t6.c", path(0, 1, 0));
testSlotRefPath("select value.key from d.t6.c", path(0, 1, 1));
testSlotRefPath("select value.value from d.t6.c", path(0, 1, 2));
AnalysisError("select value from d.t6.c",
"Expr 'value' in select list returns a complex type " +
"'STRUCT<f:INT,key:INT,value:INT>'.\n" +
"Only scalar types are allowed in the select list.");
testSlotRefPath("select value from d.t6.c", path(0, 1));
// Test star expansion.
testStarPath("select * from d.t6.c",
path(0, 0), path(0, 1, 0), path(0, 1, 1), path(0, 1, 2));
testStarPath("select c.* from d.t6.c",
path(0, 0), path(0, 1, 0), path(0, 1, 1), path(0, 1, 2));
// Map with nested struct value with name conflict. Both implicit and explicit paths
// are allowed.
addTestTable("create table d.t6_nested (c map<int," +
"struct<f:int,key:int,value:int,s:struct<f:int,key:int,value:int>>>)" +
" stored as orc");
testSlotRefPath("select key from d.t6_nested.c", path(0,0));
testSlotRefPath("select value from d.t6_nested.c", path(0,1));
testSlotRefPath("select f from d.t6_nested.c", path(0, 1, 0));
testSlotRefPath("select value.key from d.t6_nested.c", path(0, 1, 1));
testSlotRefPath("select value.value from d.t6_nested.c", path(0, 1, 2));
testSlotRefPath("select value.s from d.t6_nested.c", path(0, 1, 3));
testSlotRefPath("select value.s.f from d.t6_nested.c", path(0, 1, 3, 0));
testSlotRefPath("select value.s.key from d.t6_nested.c", path(0, 1, 3, 1));
testSlotRefPath("select value.s.value from d.t6_nested.c", path(0, 1, 3, 2));
// Test implicit/explicit paths on a complicated schema.
addTestTable("create table d.t7 (" +
"c1 int, " +
@@ -600,7 +612,8 @@ public class AnalyzeStmtsTest extends AnalyzerTest {
"c3 array<struct<a1:array<int>,a2:array<struct<x:int,y:int,a3:array<int>>>>>, " +
"c4 bigint, " +
"c5 map<int,struct<m1:map<int,string>," +
" m2:map<int,struct<x:int,y:int,m3:map<int,int>>>>>)");
" m2:map<int,struct<x:int,y:int,m3:map<int,int>>>>>) " +
"stored as orc");
// Test paths with c3.
testTableRefPath("select 1 from d.t7.c3.a1", path(2, 0, 0), null);
@@ -615,6 +628,10 @@ public class AnalyzeStmtsTest extends AnalyzerTest {
testTableRefPath("select 1 from d.t7.c3.item.a2.item.a3", path(2, 0, 1, 0, 2), null);
testSlotRefPath("select item from d.t7.c3.a2.a3", path(2, 0, 1, 0, 2, 0));
testSlotRefPath("select item from d.t7.c3.item.a2.item.a3", path(2, 0, 1, 0, 2, 0));
AnalysisContext ctx = createAnalysisCtx();
ctx.getQueryOptions().setDisable_codegen(true);
AnalysisError("select item from d.t7.c3", ctx,
"Struct containing a collection type is not allowed in the select list.");
// Test path assembly with multiple tuple descriptors.
testTableRefPath("select 1 from d.t7, t7.c3, c3.a2, a2.a3",
path(2, 0, 1, 0, 2), path(2, 0, 1, 0, 2));
@@ -755,27 +772,42 @@ public class AnalyzeStmtsTest extends AnalyzerTest {
"Illegal column/field reference 'complex_nested_struct_col.f2.f11' with " +
"intermediate collection 'f2' of type " +
"'ARRAY<STRUCT<f11:BIGINT,f12:MAP<STRING,STRUCT<f21:BIGINT>>>>'");
// Check the support of struct in the select list for different file formats.
AnalysisContext ctx = createAnalysisCtx();
ctx.getQueryOptions().setDisable_codegen(true);
AnalysisError("select alltypes from functional_parquet.complextypes_structs", ctx,
"Querying STRUCT is only supported for ORC file format.");
AnalyzesOk("select alltypes from functional_orc_def.complextypes_structs", ctx);
// Check if a struct in the select list raises an error if it contains collections.
addTestTable(
"create table nested_structs (s1 struct<s2:struct<i:int>>) stored as orc");
addTestTable("create table nested_structs_with_list " +
"(s1 struct<s2:struct<a:array<int>>>) stored as orc");
AnalyzesOk("select s1 from nested_structs", ctx);
AnalyzesOk("select s1.s2 from nested_structs", ctx);
AnalysisError("select s1 from nested_structs_with_list", ctx, "Struct containing " +
"a collection type is not allowed in the select list.");
AnalysisError("select s1.s2 from nested_structs_with_list", ctx, "Struct " +
"containing a collection type is not allowed in the select list.");
}
@Test
public void TestSlotRefPathAmbiguity() {
addTestDb("a", null);
addTestTable("create table a.a (a struct<a:struct<a:int>>)");
addTestTable("create table a.a (a struct<a:struct<a:int>>) stored as orc");
// Slot path is not ambiguous.
AnalyzesOk("select a.a.a.a.a from a.a");
AnalyzesOk("select t.a.a.a from a.a t");
// Slot path is not ambiguous but resolves to a struct.
AnalysisError("select a from a.a",
"Expr 'a' in select list returns a complex type 'STRUCT<a:STRUCT<a:INT>>'.\n" +
"Only scalar types are allowed in the select list.");
AnalysisError("select t.a from a.a t",
"Expr 't.a' in select list returns a complex type 'STRUCT<a:STRUCT<a:INT>>'.\n" +
"Only scalar types are allowed in the select list.");
AnalysisError("select t.a.a from a.a t",
"Expr 't.a.a' in select list returns a complex type 'STRUCT<a:INT>'.\n" +
"Only scalar types are allowed in the select list.");
// Slot path is not ambiguous and resolves to a struct.
AnalysisContext ctx = createAnalysisCtx();
ctx.getQueryOptions().setDisable_codegen(true);
AnalyzesOk("select a from a.a", ctx);
AnalyzesOk("select t.a from a.a t", ctx);
AnalyzesOk("select t.a.a from a.a t", ctx);
// Slot paths are ambiguous. A slot path can legally resolve to a non-scalar type,
// even though we currently do not support non-scalar SlotRefs in the select list
@@ -970,7 +1002,7 @@ public class AnalyzeStmtsTest extends AnalyzerTest {
}
/**
* Test that complex types are not allowed in the select list.
* Test that complex types are supported in the select list.
*/
@Test
public void TestComplexTypesInSelectList() {
@@ -990,35 +1022,50 @@ public class AnalyzeStmtsTest extends AnalyzerTest {
"tables only have complex-typed columns.");
// Empty star expansion, but non empty result exprs.
AnalyzesOk("select 1, * from only_complex_types");
// Illegal complex-typed expr in select list.
AnalysisError("select int_struct_col from functional.allcomplextypes",
"Expr 'int_struct_col' in select list returns a " +
"complex type 'STRUCT<f1:INT,f2:INT>'.\n" +
"Only scalar types are allowed in the select list.");
// Struct in select list works only if codegen is OFF.
AnalysisContext ctx = createAnalysisCtx();
ctx.getQueryOptions().setDisable_codegen(false);
AnalysisError("select alltypes from functional_orc_def.complextypes_structs", ctx,
"Struct type in select list is not allowed when Codegen is ON. You might want " +
"to set DISABLE_CODEGEN=true");
ctx.getQueryOptions().setDisable_codegen(true);
AnalyzesOk("select alltypes from functional_orc_def.complextypes_structs", ctx);
// Illegal complex-typed expr in a union.
AnalysisError("select int_struct_col from functional.allcomplextypes " +
"union all select int_struct_col from functional.allcomplextypes",
"Expr 'int_struct_col' in select list returns a " +
"complex type 'STRUCT<f1:INT,f2:INT>'.\n" +
"Only scalar types are allowed in the select list.");
AnalysisError("select int_array_col from functional.allcomplextypes ",
"Expr 'int_array_col' in select list returns a collection type 'ARRAY<INT>'.\n" +
"Collection types are not allowed in the select list.");
// Illegal complex-typed expr in a union.
AnalysisError("select int_array_col from functional.allcomplextypes " +
"union all select int_array_col from functional.allcomplextypes",
"Expr 'int_array_col' in select list returns a collection type 'ARRAY<INT>'.\n" +
"Collection types are not allowed in the select list.");
AnalysisError("select tiny_struct from functional_orc_def.complextypes_structs " +
"union all select tiny_struct from functional_orc_def.complextypes_structs", ctx,
"Set operations don't support STRUCT type. STRUCT<b:BOOLEAN> in tiny_struct");
// Illegal complex-typed expr inside inline view.
AnalysisError("select 1 from " +
"(select int_struct_col from functional.allcomplextypes) v",
"Expr 'int_struct_col' in select list returns a " +
"complex type 'STRUCT<f1:INT,f2:INT>'.\n" +
"Only scalar types are allowed in the select list.");
"(select int_array_col from functional.allcomplextypes) v",
"Expr 'int_array_col' in select list returns a collection type 'ARRAY<INT>'.\n" +
"Collection types are not allowed in the select list.");
// Structs are allowed in an inline view.
AnalyzesOk("select v.ts from (select tiny_struct as ts from " +
"functional_orc_def.complextypes_structs) v;", ctx);
// Illegal complex-typed expr in an insert.
AnalysisError("insert into functional.allcomplextypes " +
"select int_struct_col from functional.allcomplextypes",
"Expr 'int_struct_col' in select list returns a " +
"complex type 'STRUCT<f1:INT,f2:INT>'.\n" +
"Only scalar types are allowed in the select list.");
"select int_array_col from functional.allcomplextypes",
"Expr 'int_array_col' in select list returns a collection type 'ARRAY<INT>'.\n" +
"Collection types are not allowed in the select list.");
// Illegal complex-typed expr in a CTAS.
AnalysisError("create table new_tbl as " +
"select int_struct_col from functional.allcomplextypes",
"Expr 'int_struct_col' in select list returns a " +
"complex type 'STRUCT<f1:INT,f2:INT>'.\n" +
"Only scalar types are allowed in the select list.");
"select int_array_col from functional.allcomplextypes",
"Expr 'int_array_col' in select list returns a collection type 'ARRAY<INT>'.\n" +
"Collection types are not allowed in the select list.");
AnalysisError("create table new_tbl as " +
"select tiny_struct from functional_orc_def.complextypes_structs", ctx,
"Unable to INSERT into target table (default.new_tbl) because the column " +
"'tiny_struct' has a complex type 'STRUCT<b:BOOLEAN>' and Impala doesn't " +
"support inserting into tables containing complex type columns");
}
@Test
@@ -3019,9 +3066,9 @@ public class AnalyzeStmtsTest extends AnalyzerTest {
"ORDER BY timestamp_col");
// Ordering by complex-typed expressions is not allowed.
AnalysisError("select * from functional_parquet.allcomplextypes " +
"order by int_struct_col", "ORDER BY expression 'int_struct_col' with " +
"complex type 'STRUCT<f1:INT,f2:INT>' is not supported.");
AnalysisError("select * from functional_orc_def.complextypes_structs " +
"order by tiny_struct", "ORDER BY expression 'tiny_struct' with " +
"complex type 'STRUCT<b:BOOLEAN>' is not supported.");
AnalysisError("select * from functional_parquet.allcomplextypes " +
"order by int_array_col", "ORDER BY expression 'int_array_col' with " +
"complex type 'ARRAY<INT>' is not supported.");

View File

@@ -122,9 +122,8 @@ public class AnalyzeUpsertStmtTest extends AnalyzerTest {
"SELECT item FROM b.int_array_col, functional.alltypestiny");
// Illegal complex-typed expr
AnalysisError("upsert into functional_kudu.testtbl " +
"select int_struct_col from functional.allcomplextypes",
"Expr 'int_struct_col' in select list returns a " +
"complex type 'STRUCT<f1:INT,f2:INT>'.\n" +
"Only scalar types are allowed in the select list.");
"select int_array_col from functional.allcomplextypes",
"Expr 'int_array_col' in select list returns a collection type 'ARRAY<INT>'.\n" +
"Collection types are not allowed in the select list.");
}
}

BIN
testdata/ComplexTypesTbl/structs.orc vendored Normal file

Binary file not shown.

BIN
testdata/ComplexTypesTbl/structs.parq vendored Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -761,6 +761,70 @@ INSERT OVERWRITE TABLE {db_name}{db_suffix}.{table_name} SELECT * FROM functiona
---- DATASET
functional
---- BASE_TABLE_NAME
alltypes_structs
---- PARTITION_COLUMNS
year int
month int
---- COLUMNS
id int
struct_val struct<bool_col:boolean, tinyint_col:tinyint, smallint_col:smallint, int_col:int, bigint_col:bigint, float_col:float, double_col:double, date_string_col:string, string_col:string, timestamp_col:timestamp>
---- DEPENDENT_LOAD_HIVE
INSERT INTO {db_name}{db_suffix}.{table_name}
PARTITION (year, month)
SELECT
id,
named_struct(
'bool_col', bool_col,
'tinyint_col', tinyint_col,
'smallint_col', smallint_col,
'int_col', int_col,
'bigint_col', bigint_col,
'float_col', float_col,
'double_col', double_col,
'date_string_col', date_string_col,
'string_col', string_col,
'timestamp_col', timestamp_col),
year,
month
FROM {db_name}{db_suffix}.alltypes;
---- LOAD
====
---- DATASET
functional
---- BASE_TABLE_NAME
complextypes_structs
---- COLUMNS
id int
str string
alltypes struct<ti:tinyint, si:smallint, i:int, bi:bigint, b:boolean, f:float, do:double, da:date, ts:timestamp, s1:string, s2:string, c1:char(1), c2:char(3), vc:varchar(10), de1:decimal(5, 0), de2:decimal(10, 3)>
tiny_struct struct<b:boolean>
small_struct struct<i:int, s:string>
---- DEPENDENT_LOAD
`hadoop fs -mkdir -p /test-warehouse/complextypes_structs_parquet && \
hadoop fs -put -f ${IMPALA_HOME}/testdata/ComplexTypesTbl/structs.parq \
/test-warehouse/complextypes_structs_parquet/
---- DEPENDENT_LOAD_ACID
LOAD DATA LOCAL INPATH '{impala_home}/testdata/ComplexTypesTbl/structs.orc' OVERWRITE INTO TABLE {db_name}{db_suffix}.{table_name};
---- LOAD
====
---- DATASET
functional
---- BASE_TABLE_NAME
complextypes_nested_structs
---- COLUMNS
id int
outer_struct struct<str:string,inner_struct1:struct<str:string,de:decimal(8,2)>,inner_struct2:struct<i:int,str:string>,inner_struct3:struct<s:struct<i:int,s:string>>>
---- DEPENDENT_LOAD
`hadoop fs -mkdir -p /test-warehouse/complextypes_nested_structs_parquet && \
hadoop fs -put -f ${IMPALA_HOME}/testdata/ComplexTypesTbl/structs_nested.parq \
/test-warehouse/complextypes_nested_structs_parquet/
---- DEPENDENT_LOAD_ACID
LOAD DATA LOCAL INPATH '{impala_home}/testdata/ComplexTypesTbl/structs_nested.orc' OVERWRITE INTO TABLE {db_name}{db_suffix}.{table_name};
---- LOAD
====
---- DATASET
functional
---- BASE_TABLE_NAME
complextypestbl_minor_compacted
---- COLUMNS
id bigint

View File

@@ -97,6 +97,12 @@ table_name:complextypestbl_medium, constraint:restrict_to, table_format:orc/def/
table_name:complextypestbl_non_transactional, constraint:restrict_to, table_format:orc/def/block
table_name:pos_item_key_value_complextypestbl, constraint:restrict_to, table_format:orc/def/block
table_name:pos_item_key_value_complextypestbl, constraint:restrict_to, table_format:parquet/none/none
table_name:alltypes_structs, constraint:restrict_to, table_format:parquet/none/none
table_name:alltypes_structs, constraint:restrict_to, table_format:orc/def/block
table_name:complextypes_structs, constraint:restrict_to, table_format:parquet/none/none
table_name:complextypes_structs, constraint:restrict_to, table_format:orc/def/block
table_name:complextypes_nested_structs, constraint:restrict_to, table_format:parquet/none/none
table_name:complextypes_nested_structs, constraint:restrict_to, table_format:orc/def/block
table_name:alltypeserror, constraint:exclude, table_format:parquet/none/none
table_name:alltypeserrornonulls, constraint:exclude, table_format:parquet/none/none
1 # Table level constraints:
97 # the text_ tables are for testing test delimiters and escape chars in text files table_name:complextypes_nested_structs, constraint:restrict_to, table_format:orc/def/block
98 table_name:text_comma_backslash_newline, constraint:restrict_to, table_format:text/none/none table_name:alltypeserror, constraint:exclude, table_format:parquet/none/none
99 table_name:text_dollar_hash_pipe, constraint:restrict_to, table_format:text/none/none table_name:alltypeserrornonulls, constraint:exclude, table_format:parquet/none/none
100 table_name:unsupported_types, constraint:exclude, table_format:parquet/none/none
101 table_name:escapechartesttable, constraint:exclude, table_format:parquet/none/none
102 table_name:TblWithRaggedColumns, constraint:exclude, table_format:parquet/none/none
103 # the text_ tables are for testing test delimiters and escape chars in text files
104 table_name:text_comma_backslash_newline, constraint:restrict_to, table_format:text/none/none
105 table_name:text_dollar_hash_pipe, constraint:restrict_to, table_format:text/none/none
106 table_name:text_thorn_ecirc_newline, constraint:restrict_to, table_format:text/none/none
107 table_name:bad_serde, constraint:restrict_to, table_format:text/none/none
108 table_name:rcfile_lazy_binary_serde, constraint:restrict_to, table_format:rc/none/none

View File

@@ -0,0 +1,35 @@
====
---- QUERY
COMPUTE STATS complextypes_structs
---- RESULTS
'Updated 1 partition(s) and 2 column(s).'
---- TYPES
STRING
====
---- QUERY
# Checks that that there are no stats generated for struct columns.
SHOW COLUMN STATS complextypes_structs
---- RESULTS
'id','INT',6,0,4,4.0,-1,-1
'str','STRING',6,0,11,10.3333330154,-1,-1
'alltypes','STRUCT<ti:TINYINT,si:SMALLINT,i:INT,bi:BIGINT,b:BOOLEAN,f:FLOAT,do:DOUBLE,da:DATE,ts:TIMESTAMP,s1:STRING,s2:STRING,c1:CHAR(1),c2:CHAR(3),vc:VARCHAR(10),de1:DECIMAL(5,0),de2:DECIMAL(10,3)>',-1,-1,-1,-1.0,-1,-1
'tiny_struct','STRUCT<b:BOOLEAN>',-1,-1,-1,-1.0,-1,-1
'small_struct','STRUCT<i:INT,s:STRING>',-1,-1,-1,-1.0,-1,-1
---- TYPES
STRING,STRING,BIGINT,BIGINT,BIGINT,DOUBLE,BIGINT,BIGINT
====
---- QUERY
COMPUTE STATS complextypes_nested_structs
---- RESULTS
'Updated 1 partition(s) and 1 column(s).'
---- TYPES
STRING
====
---- QUERY
# Checks that that there are no stats generated for struct columns.
SHOW COLUMN STATS complextypes_nested_structs
---- RESULTS
'id','INT',5,0,4,4.0,-1,-1
'outer_struct','STRUCT<str:STRING,inner_struct1:STRUCT<str:STRING,de:DECIMAL(8,2)>,inner_struct2:STRUCT<i:INT,str:STRING>,inner_struct3:STRUCT<s:STRUCT<i:INT,s:STRING>>>',-1,-1,-1,-1.0,-1,-1
---- TYPES
STRING,STRING,BIGINT,BIGINT,BIGINT,DOUBLE,BIGINT,BIGINT

View File

@@ -0,0 +1,155 @@
====
---- QUERY
# Select a struct that contains multiple structs.
select id, outer_struct
from functional_orc_def.complextypes_nested_structs;
---- RESULTS
1,'{"str":"somestr1","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":333222111,"str":"somestr3"},"inner_struct3":{"s":{"i":112288,"s":null}}}'
2,'{"str":"str","inner_struct1":null,"inner_struct2":{"i":100,"str":"str3"},"inner_struct3":{"s":{"i":321,"s":"dfgs"}}}'
3,'NULL'
4,'{"str":"","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":1,"str":"string"},"inner_struct3":{"s":null}}'
5,'{"str":null,"inner_struct1":null,"inner_struct2":null,"inner_struct3":null}'
---- TYPES
INT,STRING
====
---- QUERY
# Select a struct that contains multiple structs using a filter on a non-struct field.
select id, outer_struct
from functional_orc_def.complextypes_nested_structs
where id > 2;
---- RESULTS
3,'NULL'
4,'{"str":"","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":1,"str":"string"},"inner_struct3":{"s":null}}'
5,'{"str":null,"inner_struct1":null,"inner_struct2":null,"inner_struct3":null}'
---- TYPES
INT,STRING
====
---- QUERY
# Select a struct that contains multiple structs using a filter on a struct field.
select id, outer_struct
from functional_orc_def.complextypes_nested_structs
where length(outer_struct.str) > 3;
---- RESULTS
1,'{"str":"somestr1","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":333222111,"str":"somestr3"},"inner_struct3":{"s":{"i":112288,"s":null}}}'
---- TYPES
INT,STRING
====
---- QUERY
# Select a nested struct with an order by.
select id, outer_struct
from functional_orc_def.complextypes_nested_structs
order by id;
---- RESULTS: VERIFY_IS_EQUAL_SORTED
1,'{"str":"somestr1","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":333222111,"str":"somestr3"},"inner_struct3":{"s":{"i":112288,"s":null}}}'
2,'{"str":"str","inner_struct1":null,"inner_struct2":{"i":100,"str":"str3"},"inner_struct3":{"s":{"i":321,"s":"dfgs"}}}'
3,'NULL'
4,'{"str":"","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":1,"str":"string"},"inner_struct3":{"s":null}}'
5,'{"str":null,"inner_struct1":null,"inner_struct2":null,"inner_struct3":null}'
---- TYPES
INT,STRING
====
---- QUERY
# Select a nested struct with an order by.
select id, outer_struct
from functional_orc_def.complextypes_nested_structs
order by id desc;
---- RESULTS: VERIFY_IS_EQUAL_SORTED
5,'{"str":null,"inner_struct1":null,"inner_struct2":null,"inner_struct3":null}'
4,'{"str":"","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":1,"str":"string"},"inner_struct3":{"s":null}}'
3,'NULL'
2,'{"str":"str","inner_struct1":null,"inner_struct2":{"i":100,"str":"str3"},"inner_struct3":{"s":{"i":321,"s":"dfgs"}}}'
1,'{"str":"somestr1","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":333222111,"str":"somestr3"},"inner_struct3":{"s":{"i":112288,"s":null}}}'
---- TYPES
INT,STRING
====
---- QUERY
# Select the same nested struct multiple times in one query.
select id, outer_struct, outer_struct
from functional_orc_def.complextypes_nested_structs;
---- RESULTS
1,'{"str":"somestr1","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":333222111,"str":"somestr3"},"inner_struct3":{"s":{"i":112288,"s":null}}}','{"str":"somestr1","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":333222111,"str":"somestr3"},"inner_struct3":{"s":{"i":112288,"s":null}}}'
2,'{"str":"str","inner_struct1":null,"inner_struct2":{"i":100,"str":"str3"},"inner_struct3":{"s":{"i":321,"s":"dfgs"}}}','{"str":"str","inner_struct1":null,"inner_struct2":{"i":100,"str":"str3"},"inner_struct3":{"s":{"i":321,"s":"dfgs"}}}'
3,'NULL','NULL'
4,'{"str":"","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":1,"str":"string"},"inner_struct3":{"s":null}}','{"str":"","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":1,"str":"string"},"inner_struct3":{"s":null}}'
5,'{"str":null,"inner_struct1":null,"inner_struct2":null,"inner_struct3":null}','{"str":null,"inner_struct1":null,"inner_struct2":null,"inner_struct3":null}'
---- TYPES
INT,STRING,STRING
====
---- QUERY
# Select the same nested struct multiple times in one query and order the results.
select id, outer_struct, outer_struct
from functional_orc_def.complextypes_nested_structs
order by id desc;
---- RESULTS: VERIFY_IS_EQUAL_SORTED
5,'{"str":null,"inner_struct1":null,"inner_struct2":null,"inner_struct3":null}','{"str":null,"inner_struct1":null,"inner_struct2":null,"inner_struct3":null}'
4,'{"str":"","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":1,"str":"string"},"inner_struct3":{"s":null}}','{"str":"","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":1,"str":"string"},"inner_struct3":{"s":null}}'
3,'NULL','NULL'
2,'{"str":"str","inner_struct1":null,"inner_struct2":{"i":100,"str":"str3"},"inner_struct3":{"s":{"i":321,"s":"dfgs"}}}','{"str":"str","inner_struct1":null,"inner_struct2":{"i":100,"str":"str3"},"inner_struct3":{"s":{"i":321,"s":"dfgs"}}}'
1,'{"str":"somestr1","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":333222111,"str":"somestr3"},"inner_struct3":{"s":{"i":112288,"s":null}}}','{"str":"somestr1","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":333222111,"str":"somestr3"},"inner_struct3":{"s":{"i":112288,"s":null}}}'
---- TYPES
INT,STRING,STRING
====
---- QUERY
# Similar to the above query but here the 'id' field is not in the select list but still
# used in the order by.
select outer_struct, outer_struct
from functional_orc_def.complextypes_nested_structs
order by id desc;
---- RESULTS: VERIFY_IS_EQUAL_SORTED
'{"str":null,"inner_struct1":null,"inner_struct2":null,"inner_struct3":null}','{"str":null,"inner_struct1":null,"inner_struct2":null,"inner_struct3":null}'
'{"str":"","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":1,"str":"string"},"inner_struct3":{"s":null}}','{"str":"","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":1,"str":"string"},"inner_struct3":{"s":null}}'
'NULL','NULL'
'{"str":"str","inner_struct1":null,"inner_struct2":{"i":100,"str":"str3"},"inner_struct3":{"s":{"i":321,"s":"dfgs"}}}','{"str":"str","inner_struct1":null,"inner_struct2":{"i":100,"str":"str3"},"inner_struct3":{"s":{"i":321,"s":"dfgs"}}}'
'{"str":"somestr1","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":333222111,"str":"somestr3"},"inner_struct3":{"s":{"i":112288,"s":null}}}','{"str":"somestr1","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":333222111,"str":"somestr3"},"inner_struct3":{"s":{"i":112288,"s":null}}}'
---- TYPES
STRING,STRING
====
---- QUERY
# WITH clause creates an inline view containing a nested struct.
with sub as (
select id, outer_struct from functional_orc_def.complextypes_nested_structs)
select sub.id, sub.outer_struct from sub;
---- RESULTS
1,'{"str":"somestr1","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":333222111,"str":"somestr3"},"inner_struct3":{"s":{"i":112288,"s":null}}}'
2,'{"str":"str","inner_struct1":null,"inner_struct2":{"i":100,"str":"str3"},"inner_struct3":{"s":{"i":321,"s":"dfgs"}}}'
3,'NULL'
4,'{"str":"","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":1,"str":"string"},"inner_struct3":{"s":null}}'
5,'{"str":null,"inner_struct1":null,"inner_struct2":null,"inner_struct3":null}'
---- TYPES
INT,STRING
====
---- QUERY
# WITH clause creates an inline view containing a nested struct. Also has a filter on
# the inline view and ordering by a non-complex item from the view.
with sub as (
select id, outer_struct
from functional_orc_def.complextypes_nested_structs
where length(outer_struct.str) > 3)
select sub.id, sub.outer_struct from sub order by sub.id desc;
---- RESULTS
1,'{"str":"somestr1","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":333222111,"str":"somestr3"},"inner_struct3":{"s":{"i":112288,"s":null}}}'
---- TYPES
INT,STRING
---- QUERY
# Checks that "SELECT nested_struct.* ..." omits the nested structs from the output.
select id, outer_struct.* from functional_orc_def.complextypes_nested_structs;
---- RESULTS
1,'somestr1'
2,'str'
3,'NULL'
4,''
5,'NULL'
---- TYPES
INT,STRING
====
---- QUERY
# Subquery that returns a complex type is not supported.
# IMPALA-9500
select outer_struct
from functional_orc_def.complextypes_nested_structs
where outer_struct in
(select outer_struct from functional_orc_def.complextypes_nested_structs);
---- CATCH
AnalysisException: A subquery can't return complex types. (SELECT outer_struct FROM functional_orc_def.complex
types_nested_structs)
====

View File

@@ -0,0 +1,19 @@
====
---- QUERY
# Checking that a primitive column is masked to NULL while there is a struct in the
# select list.
# Note1, functional_orc_def is hard-coded here. Once we implement struct direct read
# support for Parquet as well then we can remove the DB name here. IMPALA-9496
# Note2, turning off codegen support could be removed once we implement the support for
# that. IMPALA-10851
select id, str, alltypes from functional_orc_def.complextypes_structs
---- RESULTS
1,'NULL','{"ti":100,"si":12348,"i":156789012,"bi":163234345342,"b":true,"f":1234.56005859375,"do":65323423.33,"da":"2021-05-30","ts":"2021-06-01 10:19:04","s1":"some string","s2":"another str","c1":"x","c2":"xyz","vc":"somevarcha","de1":12345,"de2":null}'
2,'NULL','{"ti":123,"si":4567,"i":1562322212,"bi":334333345342,"b":false,"f":NaN,"do":23233423.099,"da":null,"ts":"2020-06-11 12:10:04","s1":null,"s2":"NULL","c1":"a","c2":"ab ","vc":"varchar","de1":11223,"de2":null}'
3,'NULL','{"ti":null,"si":null,"i":null,"bi":null,"b":null,"f":null,"do":null,"da":null,"ts":null,"s1":null,"s2":null,"c1":null,"c2":null,"vc":null,"de1":null,"de2":null}'
4,'NULL','{"ti":90,"si":30482,"i":1664336,"bi":23567459873,"b":true,"f":0.5600000023841858,"do":NaN,"da":"2000-12-31","ts":"2024-01-01 00:00:00.123400000","s1":"random string","s2":"","c1":"c","c2":"d ","vc":"addsdrr","de1":33357,"de2":null}'
5,'NULL','NULL'
6,'NULL','{"ti":127,"si":100,"i":234732212,"bi":664233223342,"b":true,"f":34.56000137329102,"do":99523423.33,"da":"1985-11-19","ts":"2020-09-15 03:11:22","s1":"string1","s2":"string2","c1":"z","c2":" ","vc":"cv","de1":346,"de2":6235.600}'
---- TYPES
INT,STRING,STRING
====

View File

@@ -0,0 +1,602 @@
====
---- QUERY
# Select a simple struct with one bool member.
select id, tiny_struct from functional_orc_def.complextypes_structs;
---- RESULTS
1,'{"b":true}'
2,'{"b":false}'
3,'{"b":true}'
4,'{"b":null}'
5,'{"b":false}'
6,'NULL'
---- TYPES
INT,STRING
====
---- QUERY
# Similar query as above but with an order by.
select id, tiny_struct from functional_orc_def.complextypes_structs order by id;
---- RESULTS: VERIFY_IS_EQUAL_SORTED
1,'{"b":true}'
2,'{"b":false}'
3,'{"b":true}'
4,'{"b":null}'
5,'{"b":false}'
6,'NULL'
---- TYPES
INT,STRING
====
---- QUERY
# Ordering by a member of the struct.
# Forced to use a SORT node instead of a TOPN.
set disable_outermost_topn = 1;
select id, alltypes from functional_orc_def.complextypes_structs
order by alltypes.ti;
---- RESULTS: VERIFY_IS_EQUAL_SORTED
4,'{"ti":90,"si":30482,"i":1664336,"bi":23567459873,"b":true,"f":0.5600000023841858,"do":NaN,"da":"2000-12-31","ts":"2024-01-01 00:00:00.123400000","s1":"random string","s2":"","c1":"c","c2":"d ","vc":"addsdrr","de1":33357,"de2":null}'
1,'{"ti":100,"si":12348,"i":156789012,"bi":163234345342,"b":true,"f":1234.56005859375,"do":65323423.33,"da":"2021-05-30","ts":"2021-06-01 10:19:04","s1":"some string","s2":"another str","c1":"x","c2":"xyz","vc":"somevarcha","de1":12345,"de2":null}'
2,'{"ti":123,"si":4567,"i":1562322212,"bi":334333345342,"b":false,"f":NaN,"do":23233423.099,"da":null,"ts":"2020-06-11 12:10:04","s1":null,"s2":"NULL","c1":"a","c2":"ab ","vc":"varchar","de1":11223,"de2":null}'
6,'{"ti":127,"si":100,"i":234732212,"bi":664233223342,"b":true,"f":34.56000137329102,"do":99523423.33,"da":"1985-11-19","ts":"2020-09-15 03:11:22","s1":"string1","s2":"string2","c1":"z","c2":" ","vc":"cv","de1":346,"de2":6235.600}'
3,'{"ti":null,"si":null,"i":null,"bi":null,"b":null,"f":null,"do":null,"da":null,"ts":null,"s1":null,"s2":null,"c1":null,"c2":null,"vc":null,"de1":null,"de2":null}'
5,'NULL'
---- TYPES
INT,STRING
====
---- QUERY
# Querying two simple structs. There is a string in one of them and also a non-struct
# string in the select list.
select id, str, tiny_struct, small_struct from functional_orc_def.complextypes_structs;
---- RESULTS
1,'first item','{"b":true}','NULL'
2,'second item','{"b":false}','{"i":19191,"s":"small_struct_str"}'
3,'third item','{"b":true}','{"i":98765,"s":null}'
4,'fourth item','{"b":null}','{"i":null,"s":"str"}'
5,'fifth item','{"b":false}','{"i":98765,"s":"abcde f"}'
6,'sixth item','NULL','{"i":null,"s":null}'
---- TYPES
INT,STRING,STRING,STRING
====
---- QUERY
# Similar query as above but with an order by.
select id, str, tiny_struct, small_struct
from functional_orc_def.complextypes_structs
order by id;
---- RESULTS: VERIFY_IS_EQUAL_SORTED
1,'first item','{"b":true}','NULL'
2,'second item','{"b":false}','{"i":19191,"s":"small_struct_str"}'
3,'third item','{"b":true}','{"i":98765,"s":null}'
4,'fourth item','{"b":null}','{"i":null,"s":"str"}'
5,'fifth item','{"b":false}','{"i":98765,"s":"abcde f"}'
6,'sixth item','NULL','{"i":null,"s":null}'
---- TYPES
INT,STRING,STRING,STRING
====
---- QUERY
# Querying the same struct multiple times in one query.
select id, small_struct, small_struct from functional_orc_def.complextypes_structs;
---- RESULTS
1,'NULL','NULL'
2,'{"i":19191,"s":"small_struct_str"}','{"i":19191,"s":"small_struct_str"}'
3,'{"i":98765,"s":null}','{"i":98765,"s":null}'
4,'{"i":null,"s":"str"}','{"i":null,"s":"str"}'
5,'{"i":98765,"s":"abcde f"}','{"i":98765,"s":"abcde f"}'
6,'{"i":null,"s":null}','{"i":null,"s":null}'
---- TYPES
INT,STRING,STRING
====
---- QUERY
# The same struct multiple times in the select list where there is an ordering in the
# results.
select id, tiny_struct, tiny_struct
from functional_orc_def.complextypes_structs
order by id desc;
---- RESULTS: VERIFY_IS_EQUAL_SORTED
6,'NULL','NULL'
5,'{"b":false}','{"b":false}'
4,'{"b":null}','{"b":null}'
3,'{"b":true}','{"b":true}'
2,'{"b":false}','{"b":false}'
1,'{"b":true}','{"b":true}'
---- TYPES
INT,STRING,STRING
====
---- QUERY
# Similar to the above query but here the 'id' field is not in the select list but still
# used in the order by.
select tiny_struct, tiny_struct
from functional_orc_def.complextypes_structs
order by id desc;
---- RESULTS: VERIFY_IS_EQUAL_SORTED
'NULL','NULL'
'{"b":false}','{"b":false}'
'{"b":null}','{"b":null}'
'{"b":true}','{"b":true}'
'{"b":false}','{"b":false}'
'{"b":true}','{"b":true}'
---- TYPES
STRING,STRING
====
---- QUERY
# Querying a struct that has all the primitive types as children.
# There are multiple string columns to check if none of the overwrites the other.
# There is a row where all the children of the struct are null but the struct is non
# null. Another row hold a struct that is itself null.
select id, str, alltypes from functional_orc_def.complextypes_structs;
---- RESULTS
1,'first item','{"ti":100,"si":12348,"i":156789012,"bi":163234345342,"b":true,"f":1234.56005859375,"do":65323423.33,"da":"2021-05-30","ts":"2021-06-01 10:19:04","s1":"some string","s2":"another str","c1":"x","c2":"xyz","vc":"somevarcha","de1":12345,"de2":null}'
2,'second item','{"ti":123,"si":4567,"i":1562322212,"bi":334333345342,"b":false,"f":NaN,"do":23233423.099,"da":null,"ts":"2020-06-11 12:10:04","s1":null,"s2":"NULL","c1":"a","c2":"ab ","vc":"varchar","de1":11223,"de2":null}'
3,'third item','{"ti":null,"si":null,"i":null,"bi":null,"b":null,"f":null,"do":null,"da":null,"ts":null,"s1":null,"s2":null,"c1":null,"c2":null,"vc":null,"de1":null,"de2":null}'
4,'fourth item','{"ti":90,"si":30482,"i":1664336,"bi":23567459873,"b":true,"f":0.5600000023841858,"do":NaN,"da":"2000-12-31","ts":"2024-01-01 00:00:00.123400000","s1":"random string","s2":"","c1":"c","c2":"d ","vc":"addsdrr","de1":33357,"de2":null}'
5,'fifth item','NULL'
6,'sixth item','{"ti":127,"si":100,"i":234732212,"bi":664233223342,"b":true,"f":34.56000137329102,"do":99523423.33,"da":"1985-11-19","ts":"2020-09-15 03:11:22","s1":"string1","s2":"string2","c1":"z","c2":" ","vc":"cv","de1":346,"de2":6235.600}'
---- TYPES
INT,STRING,STRING
====
---- QUERY
# Similar query as above but with an order by.
select id, str, alltypes from functional_orc_def.complextypes_structs order by id;
---- RESULTS: VERIFY_IS_EQUAL_SORTED
1,'first item','{"ti":100,"si":12348,"i":156789012,"bi":163234345342,"b":true,"f":1234.56005859375,"do":65323423.33,"da":"2021-05-30","ts":"2021-06-01 10:19:04","s1":"some string","s2":"another str","c1":"x","c2":"xyz","vc":"somevarcha","de1":12345,"de2":null}'
2,'second item','{"ti":123,"si":4567,"i":1562322212,"bi":334333345342,"b":false,"f":NaN,"do":23233423.099,"da":null,"ts":"2020-06-11 12:10:04","s1":null,"s2":"NULL","c1":"a","c2":"ab ","vc":"varchar","de1":11223,"de2":null}'
3,'third item','{"ti":null,"si":null,"i":null,"bi":null,"b":null,"f":null,"do":null,"da":null,"ts":null,"s1":null,"s2":null,"c1":null,"c2":null,"vc":null,"de1":null,"de2":null}'
4,'fourth item','{"ti":90,"si":30482,"i":1664336,"bi":23567459873,"b":true,"f":0.5600000023841858,"do":NaN,"da":"2000-12-31","ts":"2024-01-01 00:00:00.123400000","s1":"random string","s2":"","c1":"c","c2":"d ","vc":"addsdrr","de1":33357,"de2":null}'
5,'fifth item','NULL'
6,'sixth item','{"ti":127,"si":100,"i":234732212,"bi":664233223342,"b":true,"f":34.56000137329102,"do":99523423.33,"da":"1985-11-19","ts":"2020-09-15 03:11:22","s1":"string1","s2":"string2","c1":"z","c2":" ","vc":"cv","de1":346,"de2":6235.600}'
---- TYPES
INT,STRING,STRING
====
---- QUERY
# Similar query as above but with an order by desc.
select id, str, alltypes from functional_orc_def.complextypes_structs order by id desc;
---- RESULTS: VERIFY_IS_EQUAL_SORTED
6,'sixth item','{"ti":127,"si":100,"i":234732212,"bi":664233223342,"b":true,"f":34.56000137329102,"do":99523423.33,"da":"1985-11-19","ts":"2020-09-15 03:11:22","s1":"string1","s2":"string2","c1":"z","c2":" ","vc":"cv","de1":346,"de2":6235.600}'
5,'fifth item','NULL'
4,'fourth item','{"ti":90,"si":30482,"i":1664336,"bi":23567459873,"b":true,"f":0.5600000023841858,"do":NaN,"da":"2000-12-31","ts":"2024-01-01 00:00:00.123400000","s1":"random string","s2":"","c1":"c","c2":"d ","vc":"addsdrr","de1":33357,"de2":null}'
3,'third item','{"ti":null,"si":null,"i":null,"bi":null,"b":null,"f":null,"do":null,"da":null,"ts":null,"s1":null,"s2":null,"c1":null,"c2":null,"vc":null,"de1":null,"de2":null}'
2,'second item','{"ti":123,"si":4567,"i":1562322212,"bi":334333345342,"b":false,"f":NaN,"do":23233423.099,"da":null,"ts":"2020-06-11 12:10:04","s1":null,"s2":"NULL","c1":"a","c2":"ab ","vc":"varchar","de1":11223,"de2":null}'
1,'first item','{"ti":100,"si":12348,"i":156789012,"bi":163234345342,"b":true,"f":1234.56005859375,"do":65323423.33,"da":"2021-05-30","ts":"2021-06-01 10:19:04","s1":"some string","s2":"another str","c1":"x","c2":"xyz","vc":"somevarcha","de1":12345,"de2":null}'
---- TYPES
INT,STRING,STRING
====
---- QUERY
# Setting BATCH_SIZE to force the results to fit in multiple row batches.
set BATCH_SIZE=2;
select id, str, alltypes from functional_orc_def.complextypes_structs;
---- RESULTS
1,'first item','{"ti":100,"si":12348,"i":156789012,"bi":163234345342,"b":true,"f":1234.56005859375,"do":65323423.33,"da":"2021-05-30","ts":"2021-06-01 10:19:04","s1":"some string","s2":"another str","c1":"x","c2":"xyz","vc":"somevarcha","de1":12345,"de2":null}'
2,'second item','{"ti":123,"si":4567,"i":1562322212,"bi":334333345342,"b":false,"f":NaN,"do":23233423.099,"da":null,"ts":"2020-06-11 12:10:04","s1":null,"s2":"NULL","c1":"a","c2":"ab ","vc":"varchar","de1":11223,"de2":null}'
3,'third item','{"ti":null,"si":null,"i":null,"bi":null,"b":null,"f":null,"do":null,"da":null,"ts":null,"s1":null,"s2":null,"c1":null,"c2":null,"vc":null,"de1":null,"de2":null}'
4,'fourth item','{"ti":90,"si":30482,"i":1664336,"bi":23567459873,"b":true,"f":0.5600000023841858,"do":NaN,"da":"2000-12-31","ts":"2024-01-01 00:00:00.123400000","s1":"random string","s2":"","c1":"c","c2":"d ","vc":"addsdrr","de1":33357,"de2":null}'
5,'fifth item','NULL'
6,'sixth item','{"ti":127,"si":100,"i":234732212,"bi":664233223342,"b":true,"f":34.56000137329102,"do":99523423.33,"da":"1985-11-19","ts":"2020-09-15 03:11:22","s1":"string1","s2":"string2","c1":"z","c2":" ","vc":"cv","de1":346,"de2":6235.600}'
---- TYPES
INT,STRING,STRING
====
---- QUERY
# Querying struct in the select list and filter on one member of the struct.
set BATCH_SIZE=0;
select id, str, alltypes
from functional_orc_def.complextypes_structs
where alltypes.b = true;
---- RESULTS
1,'first item','{"ti":100,"si":12348,"i":156789012,"bi":163234345342,"b":true,"f":1234.56005859375,"do":65323423.33,"da":"2021-05-30","ts":"2021-06-01 10:19:04","s1":"some string","s2":"another str","c1":"x","c2":"xyz","vc":"somevarcha","de1":12345,"de2":null}'
4,'fourth item','{"ti":90,"si":30482,"i":1664336,"bi":23567459873,"b":true,"f":0.5600000023841858,"do":NaN,"da":"2000-12-31","ts":"2024-01-01 00:00:00.123400000","s1":"random string","s2":"","c1":"c","c2":"d ","vc":"addsdrr","de1":33357,"de2":null}'
6,'sixth item','{"ti":127,"si":100,"i":234732212,"bi":664233223342,"b":true,"f":34.56000137329102,"do":99523423.33,"da":"1985-11-19","ts":"2020-09-15 03:11:22","s1":"string1","s2":"string2","c1":"z","c2":" ","vc":"cv","de1":346,"de2":6235.600}'
---- TYPES
INT,STRING,STRING
====
---- QUERY
# Query a single struct slot.
select alltypes from functional_orc_def.complextypes_structs;
---- RESULTS
'{"ti":100,"si":12348,"i":156789012,"bi":163234345342,"b":true,"f":1234.56005859375,"do":65323423.33,"da":"2021-05-30","ts":"2021-06-01 10:19:04","s1":"some string","s2":"another str","c1":"x","c2":"xyz","vc":"somevarcha","de1":12345,"de2":null}'
'{"ti":123,"si":4567,"i":1562322212,"bi":334333345342,"b":false,"f":NaN,"do":23233423.099,"da":null,"ts":"2020-06-11 12:10:04","s1":null,"s2":"NULL","c1":"a","c2":"ab ","vc":"varchar","de1":11223,"de2":null}'
'{"ti":null,"si":null,"i":null,"bi":null,"b":null,"f":null,"do":null,"da":null,"ts":null,"s1":null,"s2":null,"c1":null,"c2":null,"vc":null,"de1":null,"de2":null}'
'{"ti":90,"si":30482,"i":1664336,"bi":23567459873,"b":true,"f":0.5600000023841858,"do":NaN,"da":"2000-12-31","ts":"2024-01-01 00:00:00.123400000","s1":"random string","s2":"","c1":"c","c2":"d ","vc":"addsdrr","de1":33357,"de2":null}'
'NULL'
'{"ti":127,"si":100,"i":234732212,"bi":664233223342,"b":true,"f":34.56000137329102,"do":99523423.33,"da":"1985-11-19","ts":"2020-09-15 03:11:22","s1":"string1","s2":"string2","c1":"z","c2":" ","vc":"cv","de1":346,"de2":6235.600}'
---- TYPES
STRING
====
---- QUERY
# Query a single struct slot and order by a member of the struct.
select alltypes from functional_orc_def.complextypes_structs order by alltypes.si;
---- RESULTS: VERIFY_IS_EQUAL_SORTED
'{"ti":127,"si":100,"i":234732212,"bi":664233223342,"b":true,"f":34.56000137329102,"do":99523423.33,"da":"1985-11-19","ts":"2020-09-15 03:11:22","s1":"string1","s2":"string2","c1":"z","c2":" ","vc":"cv","de1":346,"de2":6235.600}'
'{"ti":123,"si":4567,"i":1562322212,"bi":334333345342,"b":false,"f":NaN,"do":23233423.099,"da":null,"ts":"2020-06-11 12:10:04","s1":null,"s2":"NULL","c1":"a","c2":"ab ","vc":"varchar","de1":11223,"de2":null}'
'{"ti":100,"si":12348,"i":156789012,"bi":163234345342,"b":true,"f":1234.56005859375,"do":65323423.33,"da":"2021-05-30","ts":"2021-06-01 10:19:04","s1":"some string","s2":"another str","c1":"x","c2":"xyz","vc":"somevarcha","de1":12345,"de2":null}'
'{"ti":90,"si":30482,"i":1664336,"bi":23567459873,"b":true,"f":0.5600000023841858,"do":NaN,"da":"2000-12-31","ts":"2024-01-01 00:00:00.123400000","s1":"random string","s2":"","c1":"c","c2":"d ","vc":"addsdrr","de1":33357,"de2":null}'
'{"ti":null,"si":null,"i":null,"bi":null,"b":null,"f":null,"do":null,"da":null,"ts":null,"s1":null,"s2":null,"c1":null,"c2":null,"vc":null,"de1":null,"de2":null}'
'NULL'
---- TYPES
STRING
====
---- QUERY
# Query struct slots only.
select small_struct, alltypes from functional_orc_def.complextypes_structs;
---- RESULTS
'NULL','{"ti":100,"si":12348,"i":156789012,"bi":163234345342,"b":true,"f":1234.56005859375,"do":65323423.33,"da":"2021-05-30","ts":"2021-06-01 10:19:04","s1":"some string","s2":"another str","c1":"x","c2":"xyz","vc":"somevarcha","de1":12345,"de2":null}'
'{"i":19191,"s":"small_struct_str"}','{"ti":123,"si":4567,"i":1562322212,"bi":334333345342,"b":false,"f":NaN,"do":23233423.099,"da":null,"ts":"2020-06-11 12:10:04","s1":null,"s2":"NULL","c1":"a","c2":"ab ","vc":"varchar","de1":11223,"de2":null}'
'{"i":98765,"s":null}','{"ti":null,"si":null,"i":null,"bi":null,"b":null,"f":null,"do":null,"da":null,"ts":null,"s1":null,"s2":null,"c1":null,"c2":null,"vc":null,"de1":null,"de2":null}'
'{"i":null,"s":"str"}','{"ti":90,"si":30482,"i":1664336,"bi":23567459873,"b":true,"f":0.5600000023841858,"do":NaN,"da":"2000-12-31","ts":"2024-01-01 00:00:00.123400000","s1":"random string","s2":"","c1":"c","c2":"d ","vc":"addsdrr","de1":33357,"de2":null}'
'{"i":98765,"s":"abcde f"}','NULL'
'{"i":null,"s":null}','{"ti":127,"si":100,"i":234732212,"bi":664233223342,"b":true,"f":34.56000137329102,"do":99523423.33,"da":"1985-11-19","ts":"2020-09-15 03:11:22","s1":"string1","s2":"string2","c1":"z","c2":" ","vc":"cv","de1":346,"de2":6235.600}'
---- TYPES
STRING,STRING
====
---- QUERY
# Query struct slot in a join query.
select allt.id, comt.alltypes
from functional_orc_def.alltypes allt
join functional_orc_def.complextypes_structs comt on allt.id = comt.id;
---- RESULTS
1,'{"ti":100,"si":12348,"i":156789012,"bi":163234345342,"b":true,"f":1234.56005859375,"do":65323423.33,"da":"2021-05-30","ts":"2021-06-01 10:19:04","s1":"some string","s2":"another str","c1":"x","c2":"xyz","vc":"somevarcha","de1":12345,"de2":null}'
2,'{"ti":123,"si":4567,"i":1562322212,"bi":334333345342,"b":false,"f":NaN,"do":23233423.099,"da":null,"ts":"2020-06-11 12:10:04","s1":null,"s2":"NULL","c1":"a","c2":"ab ","vc":"varchar","de1":11223,"de2":null}'
3,'{"ti":null,"si":null,"i":null,"bi":null,"b":null,"f":null,"do":null,"da":null,"ts":null,"s1":null,"s2":null,"c1":null,"c2":null,"vc":null,"de1":null,"de2":null}'
4,'{"ti":90,"si":30482,"i":1664336,"bi":23567459873,"b":true,"f":0.5600000023841858,"do":NaN,"da":"2000-12-31","ts":"2024-01-01 00:00:00.123400000","s1":"random string","s2":"","c1":"c","c2":"d ","vc":"addsdrr","de1":33357,"de2":null}'
5,'NULL'
6,'{"ti":127,"si":100,"i":234732212,"bi":664233223342,"b":true,"f":34.56000137329102,"do":99523423.33,"da":"1985-11-19","ts":"2020-09-15 03:11:22","s1":"string1","s2":"string2","c1":"z","c2":" ","vc":"cv","de1":346,"de2":6235.600}'
---- TYPES
INT,STRING
====
---- QUERY
# Similar join query as above but with different join order.
select allt.id, comt.alltypes
from functional_orc_def.complextypes_structs comt
join functional_orc_def.alltypes allt on comt.id = allt.id;
---- RESULTS
1,'{"ti":100,"si":12348,"i":156789012,"bi":163234345342,"b":true,"f":1234.56005859375,"do":65323423.33,"da":"2021-05-30","ts":"2021-06-01 10:19:04","s1":"some string","s2":"another str","c1":"x","c2":"xyz","vc":"somevarcha","de1":12345,"de2":null}'
2,'{"ti":123,"si":4567,"i":1562322212,"bi":334333345342,"b":false,"f":NaN,"do":23233423.099,"da":null,"ts":"2020-06-11 12:10:04","s1":null,"s2":"NULL","c1":"a","c2":"ab ","vc":"varchar","de1":11223,"de2":null}'
3,'{"ti":null,"si":null,"i":null,"bi":null,"b":null,"f":null,"do":null,"da":null,"ts":null,"s1":null,"s2":null,"c1":null,"c2":null,"vc":null,"de1":null,"de2":null}'
4,'{"ti":90,"si":30482,"i":1664336,"bi":23567459873,"b":true,"f":0.5600000023841858,"do":NaN,"da":"2000-12-31","ts":"2024-01-01 00:00:00.123400000","s1":"random string","s2":"","c1":"c","c2":"d ","vc":"addsdrr","de1":33357,"de2":null}'
5,'NULL'
6,'{"ti":127,"si":100,"i":234732212,"bi":664233223342,"b":true,"f":34.56000137329102,"do":99523423.33,"da":"1985-11-19","ts":"2020-09-15 03:11:22","s1":"string1","s2":"string2","c1":"z","c2":" ","vc":"cv","de1":346,"de2":6235.600}'
---- TYPES
INT,STRING
====
---- QUERY
# Querying IS NULL on a member of a struct.
select id, str, alltypes
from functional_orc_def.complextypes_structs
where alltypes.da is null;
---- RESULTS
2,'second item','{"ti":123,"si":4567,"i":1562322212,"bi":334333345342,"b":false,"f":NaN,"do":23233423.099,"da":null,"ts":"2020-06-11 12:10:04","s1":null,"s2":"NULL","c1":"a","c2":"ab ","vc":"varchar","de1":11223,"de2":null}'
3,'third item','{"ti":null,"si":null,"i":null,"bi":null,"b":null,"f":null,"do":null,"da":null,"ts":null,"s1":null,"s2":null,"c1":null,"c2":null,"vc":null,"de1":null,"de2":null}'
5,'fifth item','NULL'
---- TYPES
INT,STRING,STRING
====
---- QUERY
# Querying a struct that is inside a nested array. Directly referencing the inner array
# in the FROM clause. This also triggers a re-analysis of the statement as the table is
# full ACID.
select inner_arr.ITEM, inner_arr.ITEM.e, inner_arr.ITEM.f
from functional_orc_def.complextypestbl.nested_struct.c.d.ITEM as inner_arr;
---- RESULTS
'{"e":-1,"f":"nonnullable"}',-1,'nonnullable'
'{"e":10,"f":"aaa"}',10,'aaa'
'{"e":-10,"f":"bbb"}',-10,'bbb'
'{"e":11,"f":"c"}',11,'c'
'{"e":null,"f":null}',NULL,'NULL'
'{"e":10,"f":"aaa"}',10,'aaa'
'{"e":null,"f":null}',NULL,'NULL'
'{"e":-10,"f":"bbb"}',-10,'bbb'
'{"e":null,"f":null}',NULL,'NULL'
'{"e":11,"f":"c"}',11,'c'
'NULL',NULL,'NULL'
'NULL',NULL,'NULL'
---- TYPES
STRING,INT,STRING
====
---- QUERY
# Querying a struct that is inside a nested array. Referencing the inner array through a
# join with the base table.
select tbl.id, inner_arr.ITEM
from functional_orc_def.complextypestbl tbl, tbl.nested_struct.c.d.ITEM as inner_arr;
---- RESULTS
8,'{"e":-1,"f":"nonnullable"}'
1,'{"e":10,"f":"aaa"}'
1,'{"e":-10,"f":"bbb"}'
1,'{"e":11,"f":"c"}'
2,'{"e":null,"f":null}'
2,'{"e":10,"f":"aaa"}'
2,'{"e":null,"f":null}'
2,'{"e":-10,"f":"bbb"}'
2,'{"e":null,"f":null}'
2,'{"e":11,"f":"c"}'
2,'NULL'
7,'NULL'
---- TYPES
BIGINT,STRING
====
---- QUERY
# Querying a struct that is inside a nested array. Used 2 joins to reference the inner
# array from the FROM clause.
select tbl.id, inner_arr.ITEM
from functional_orc_def.complextypestbl tbl,
tbl.nested_struct.c.d as outer_arr, outer_arr.ITEM as inner_arr;
---- RESULTS
8,'{"e":-1,"f":"nonnullable"}'
1,'{"e":10,"f":"aaa"}'
1,'{"e":-10,"f":"bbb"}'
1,'{"e":11,"f":"c"}'
2,'{"e":null,"f":null}'
2,'{"e":10,"f":"aaa"}'
2,'{"e":null,"f":null}'
2,'{"e":-10,"f":"bbb"}'
2,'{"e":null,"f":null}'
2,'{"e":11,"f":"c"}'
2,'NULL'
7,'NULL'
---- TYPES
BIGINT,STRING
====
---- QUERY
# Querying a struct that is inside a nested array. Used different kind of joins to
# reference the inner array from the FROM clause.
select tbl.id, inner_arr.ITEM
from functional_orc_def.complextypestbl tbl left join
tbl.nested_struct.c.d as outer_arr inner join outer_arr.ITEM as inner_arr;
---- RESULTS
8,'{"e":-1,"f":"nonnullable"}'
1,'{"e":10,"f":"aaa"}'
1,'{"e":-10,"f":"bbb"}'
1,'{"e":11,"f":"c"}'
2,'{"e":null,"f":null}'
2,'{"e":10,"f":"aaa"}'
2,'{"e":null,"f":null}'
2,'{"e":-10,"f":"bbb"}'
2,'{"e":null,"f":null}'
2,'{"e":11,"f":"c"}'
2,'NULL'
7,'NULL'
---- TYPES
BIGINT,STRING
====
---- QUERY
# Similar query as above but with an order by.
select tbl.id, inner_arr.ITEM
from functional_orc_def.complextypestbl tbl,
tbl.nested_struct.c.d as outer_arr, outer_arr.ITEM as inner_arr
order by tbl.id;
---- RESULTS: VERIFY_IS_EQUAL_SORTED
1,'{"e":10,"f":"aaa"}'
1,'{"e":-10,"f":"bbb"}'
1,'{"e":11,"f":"c"}'
2,'{"e":null,"f":null}'
2,'{"e":10,"f":"aaa"}'
2,'{"e":null,"f":null}'
2,'{"e":-10,"f":"bbb"}'
2,'{"e":null,"f":null}'
2,'{"e":11,"f":"c"}'
2,'NULL'
7,'NULL'
8,'{"e":-1,"f":"nonnullable"}'
---- TYPES
BIGINT,STRING
====
---- QUERY
# Structs are allowed in an inline view.
select v.ts from
(select tiny_struct as ts from functional_orc_def.complextypes_structs) v
---- RESULTS
'{"b":true}'
'{"b":false}'
'{"b":true}'
'{"b":null}'
'{"b":false}'
'NULL'
---- TYPES
STRING
====
---- QUERY
# Structs in an inline view where the underying file format is not supported for structs.
select v.ts from
(select int_struct_col as ts from functional.allcomplextypes) v
---- CATCH
AnalysisException: Querying STRUCT is only supported for ORC file format.
====
---- QUERY
# Structs in an inline view with order by.
select v.id, v.ts from
(select id, tiny_struct as ts
from functional_orc_def.complextypes_structs
order by id
limit 3) v
---- RESULTS
1,'{"b":true}'
2,'{"b":false}'
3,'{"b":true}'
---- TYPES
INT,STRING
====
---- QUERY
select v.id, v.ts from
(select id, tiny_struct as ts
from functional_orc_def.complextypes_structs
order by id
limit 3) v
order by id desc
---- RESULTS: VERIFY_IS_EQUAL_SORTED
3,'{"b":true}'
2,'{"b":false}'
1,'{"b":true}'
---- TYPES
INT,STRING
====
---- QUERY
select v.id, v.ts from
(select id, tiny_struct as ts
from functional_orc_def.complextypes_structs) v
order by id desc
---- RESULTS: VERIFY_IS_EQUAL_SORTED
6,'NULL'
5,'{"b":false}'
4,'{"b":null}'
3,'{"b":true}'
2,'{"b":false}'
1,'{"b":true}'
---- TYPES
INT,STRING
====
---- QUERY
# CREATE VIEW AS SELECT where the select returns struct.
create view $DATABASE.struct_view as select id, small_struct
from functional_orc_def.complextypes_structs;
select id, small_struct from $DATABASE.struct_view;
---- RESULTS
1,'NULL'
2,'{"i":19191,"s":"small_struct_str"}'
3,'{"i":98765,"s":null}'
4,'{"i":null,"s":"str"}'
5,'{"i":98765,"s":"abcde f"}'
6,'{"i":null,"s":null}'
---- TYPES
INT,STRING
====
---- QUERY
# WITH clause creates an inline view containing a struct.
with sub as (select id, small_struct from functional_orc_def.complextypes_structs)
select sub.id, sub.small_struct from sub;
---- RESULTS
1,'NULL'
2,'{"i":19191,"s":"small_struct_str"}'
3,'{"i":98765,"s":null}'
4,'{"i":null,"s":"str"}'
5,'{"i":98765,"s":"abcde f"}'
6,'{"i":null,"s":null}'
---- TYPES
INT,STRING
====
---- QUERY
# WITH clause creates an inline view containing a struct. Also has a filter on the inline
# view and ordering by a non-complex item from the view.
with sub as (
select id, small_struct
from functional_orc_def.complextypes_structs
where small_struct.i > 19200)
select sub.id, sub.small_struct from sub order by sub.id desc;
---- RESULTS: VERIFY_IS_EQUAL_SORTED
5,'{"i":98765,"s":"abcde f"}'
3,'{"i":98765,"s":null}'
---- TYPES
INT,STRING
====
---- QUERY
# Create a view containing structs and query the view.
create view tmp_view as
select id, str, tiny_struct, alltypes from functional_orc_def.complextypes_structs;
select id, alltypes, tiny_struct from tmp_view;
---- RESULTS
1,'{"ti":100,"si":12348,"i":156789012,"bi":163234345342,"b":true,"f":1234.56005859375,"do":65323423.33,"da":"2021-05-30","ts":"2021-06-01 10:19:04","s1":"some string","s2":"another str","c1":"x","c2":"xyz","vc":"somevarcha","de1":12345,"de2":null}','{"b":true}'
2,'{"ti":123,"si":4567,"i":1562322212,"bi":334333345342,"b":false,"f":NaN,"do":23233423.099,"da":null,"ts":"2020-06-11 12:10:04","s1":null,"s2":"NULL","c1":"a","c2":"ab ","vc":"varchar","de1":11223,"de2":null}','{"b":false}'
3,'{"ti":null,"si":null,"i":null,"bi":null,"b":null,"f":null,"do":null,"da":null,"ts":null,"s1":null,"s2":null,"c1":null,"c2":null,"vc":null,"de1":null,"de2":null}','{"b":true}'
4,'{"ti":90,"si":30482,"i":1664336,"bi":23567459873,"b":true,"f":0.5600000023841858,"do":NaN,"da":"2000-12-31","ts":"2024-01-01 00:00:00.123400000","s1":"random string","s2":"","c1":"c","c2":"d ","vc":"addsdrr","de1":33357,"de2":null}','{"b":null}'
5,'NULL','{"b":false}'
6,'{"ti":127,"si":100,"i":234732212,"bi":664233223342,"b":true,"f":34.56000137329102,"do":99523423.33,"da":"1985-11-19","ts":"2020-09-15 03:11:22","s1":"string1","s2":"string2","c1":"z","c2":" ","vc":"cv","de1":346,"de2":6235.600}','NULL'
---- TYPES
INT,STRING,STRING
====
---- QUERY
# Query a struct from a partitioned table to check multi-fragment execution.
set disable_outermost_topn = 1;
select id, struct_val from functional_orc_def.alltypes_structs order by id desc limit 5;
---- RESULTS: VERIFY_IS_EQUAL_SORTED
7299,'{"bool_col":false,"tinyint_col":9,"smallint_col":9,"int_col":9,"bigint_col":90,"float_col":9.899999618530273,"double_col":90.89999999999999,"date_string_col":"12/31/10","string_col":"9","timestamp_col":"2010-12-31 05:09:13.860000000"}'
7298,'{"bool_col":true,"tinyint_col":8,"smallint_col":8,"int_col":8,"bigint_col":80,"float_col":8.800000190734863,"double_col":80.8,"date_string_col":"12/31/10","string_col":"8","timestamp_col":"2010-12-31 05:08:13.780000000"}'
7297,'{"bool_col":false,"tinyint_col":7,"smallint_col":7,"int_col":7,"bigint_col":70,"float_col":7.699999809265137,"double_col":70.7,"date_string_col":"12/31/10","string_col":"7","timestamp_col":"2010-12-31 05:07:13.710000000"}'
7296,'{"bool_col":true,"tinyint_col":6,"smallint_col":6,"int_col":6,"bigint_col":60,"float_col":6.599999904632568,"double_col":60.59999999999999,"date_string_col":"12/31/10","string_col":"6","timestamp_col":"2010-12-31 05:06:13.650000000"}'
7295,'{"bool_col":false,"tinyint_col":5,"smallint_col":5,"int_col":5,"bigint_col":50,"float_col":5.5,"double_col":50.5,"date_string_col":"12/31/10","string_col":"5","timestamp_col":"2010-12-31 05:05:13.600000000"}'
---- TYPES
INT,STRING
====
---- QUERY
# Query the same struct multiple times from a partitioned table.
select id, struct_val, struct_val from functional_orc_def.alltypes_structs order by id limit 2;
---- RESULTS: VERIFY_IS_EQUAL_SORTED
0,'{"bool_col":true,"tinyint_col":0,"smallint_col":0,"int_col":0,"bigint_col":0,"float_col":0,"double_col":0,"date_string_col":"01/01/09","string_col":"0","timestamp_col":"2009-01-01 00:00:00"}','{"bool_col":true,"tinyint_col":0,"smallint_col":0,"int_col":0,"bigint_col":0,"float_col":0,"double_col":0,"date_string_col":"01/01/09","string_col":"0","timestamp_col":"2009-01-01 00:00:00"}'
1,'{"bool_col":false,"tinyint_col":1,"smallint_col":1,"int_col":1,"bigint_col":10,"float_col":1.100000023841858,"double_col":10.1,"date_string_col":"01/01/09","string_col":"1","timestamp_col":"2009-01-01 00:01:00"}','{"bool_col":false,"tinyint_col":1,"smallint_col":1,"int_col":1,"bigint_col":10,"float_col":1.100000023841858,"double_col":10.1,"date_string_col":"01/01/09","string_col":"1","timestamp_col":"2009-01-01 00:01:00"}'
---- TYPES
INT,STRING,STRING
====
---- QUERY
# Query struct from a partitioned table with where clause on the struct's members.
select id, struct_val
from functional_orc_def.alltypes_structs
where struct_val.tinyint_col=8 and struct_val.timestamp_col > "2010-12-30";
---- RESULTS
7288,'{"bool_col":true,"tinyint_col":8,"smallint_col":8,"int_col":8,"bigint_col":80,"float_col":8.800000190734863,"double_col":80.8,"date_string_col":"12/30/10","string_col":"8","timestamp_col":"2010-12-30 04:58:13.330000000"}'
7298,'{"bool_col":true,"tinyint_col":8,"smallint_col":8,"int_col":8,"bigint_col":80,"float_col":8.800000190734863,"double_col":80.8,"date_string_col":"12/31/10","string_col":"8","timestamp_col":"2010-12-31 05:08:13.780000000"}'
---- TYPES
INT,STRING
====
---- QUERY
# It's not supported to create a view with structs from a table type that doesn't
# support selecting structs.
create view tmp_view as select id, int_struct_col from functional_hbase.allcomplextypes;
---- CATCH
is not supported when querying STRUCT type STRUCT<f1:INT,f2:INT>
====
---- QUERY
# It's not supported to create a view with structs from a file format that doesn't
# support selecting structs.
create view tmp_view as select id, int_struct_col from functional.allcomplextypes;
---- CATCH
AnalysisException: Querying STRUCT is only supported for ORC file format.
====
---- QUERY
# Querying IS NULL on a struct is not supported.
# IMPALA-3060
select id, str, alltypes
from functional_orc_def.complextypes_structs
where alltypes is null;
---- CATCH
AnalysisException: IS NULL predicate does not support complex types: alltypes IS NULL
====
---- QUERY
# Subquery that returns a complex type is not supported.
# IMPALA-9500
select alltypes
from functional_orc_def.complextypes_structs
where alltypes in (select alltypes from functional_orc_def.complextypes_structs);
---- CATCH
AnalysisException: A subquery can't return complex types. (SELECT alltypes FROM functional_orc_def.complextypes_structs)
====
---- QUERY
select tbl.nested_struct from functional_orc_def.complextypestbl tbl;
---- CATCH
AnalysisException: Struct containing a collection type is not allowed in the select list.
====
---- QUERY
select tbl.nested_struct.c from functional_orc_def.complextypestbl tbl;
---- CATCH
AnalysisException: Struct containing a collection type is not allowed in the select list.
====
---- QUERY
# Unioning structs is not supported.
# IMPALA-10752
select id, tiny_struct from functional_orc_def.complextypes_structs
union all
select id, tiny_struct from functional_orc_def.complextypes_structs;
---- CATCH
AnalysisException: Set operations don't support STRUCT type. STRUCT<b:BOOLEAN> in tiny_struct
====
---- QUERY
# Ordering by struct column is not supported.
select id, tiny_struct from functional_orc_def.complextypes_structs
order by tiny_struct
---- CATCH
AnalysisException: ORDER BY expression 'tiny_struct' with complex type 'STRUCT<b:BOOLEAN>' is not supported.
====
---- QUERY
# Ordering by struct column (using the index of the column) is not supported.
select id, tiny_struct from functional_orc_def.complextypes_structs
order by 2
---- CATCH
AnalysisException: ORDER BY expression 'tiny_struct' with complex type 'STRUCT<b:BOOLEAN>' is not supported.
====
---- QUERY
# Check that the order by don't confuse the 3rd column with the member of the struct.
select id, tiny_struct from functional_orc_def.complextypes_structs
order by 3
---- CATCH
AnalysisException: ORDER BY: ordinal exceeds the number of items in the SELECT list: 3

View File

@@ -29,6 +29,8 @@ from getpass import getuser
from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
from tests.common.skip import (SkipIfS3, SkipIfABFS, SkipIfADLS, SkipIfIsilon,
SkipIfLocal, SkipIfHive2, SkipIfGCS)
from tests.common.test_dimensions import (create_client_protocol_dimension,
create_exec_option_dimension, create_orc_dimension)
from tests.util.hdfs_util import NAMENODE
from tests.util.calculation_util import get_random_id
@@ -1573,3 +1575,48 @@ class TestRangerColumnMaskingTpchNested(CustomClusterTestSuite):
for col in tbl_cols[tbl]:
policy_name = "%s_%s_mask" % (tbl, col)
TestRanger._remove_policy(policy_name)
class TestRangerColumnMaskingComplexTypesInSelectList(CustomClusterTestSuite):
"""
Tests Ranger policies when complex types are given in the select list. The reason
this is a separate class is that directly querying complex types works only on HS2
while some tests in TestRanger needs Beeswax interface otherwise some of them fails.
"""
@classmethod
def get_workload(cls):
return 'functional-query'
@classmethod
def add_test_dimensions(cls):
cls.ImpalaTestMatrix.add_dimension(create_client_protocol_dimension())
cls.ImpalaTestMatrix.add_dimension(create_orc_dimension(cls.get_workload()))
cls.ImpalaTestMatrix.add_constraint(lambda v:
v.get_value('protocol') == 'hs2')
cls.ImpalaTestMatrix.add_dimension(create_exec_option_dimension(
disable_codegen_options=[True]))
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(
impalad_args=IMPALAD_ARGS, catalogd_args=CATALOGD_ARGS)
def test_column_masking_with_structs_in_select_list(self, vector, unique_name):
user = getuser()
db = "functional_orc_def"
# Create another client for admin user since current user doesn't have privileges to
# create/drop databases or refresh authorization.
admin_client = self.create_impala_client()
policy_cnt = 0
try:
# Add a policy on a primitive column of a table which contains nested columns.
TestRanger._add_column_masking_policy(
unique_name + str(policy_cnt), user, "functional_orc_def",
"complextypes_structs", "str", "MASK_NULL")
policy_cnt += 1
self.execute_query_expect_success(admin_client, "refresh authorization",
user=ADMIN)
self.run_test_case("QueryTest/ranger_column_masking_struct_in_select_list", vector,
use_db=db)
finally:
for i in range(policy_cnt):
TestRanger._remove_policy(unique_name + str(i))

View File

@@ -109,6 +109,11 @@ def create_parquet_dimension(workload):
TableFormatInfo.create_from_string(dataset, 'parquet/none'))
def create_orc_dimension(workload):
dataset = get_dataset_from_workload(workload)
return ImpalaTestDimension('table_format',
TableFormatInfo.create_from_string(dataset, 'orc/def'))
def create_avro_snappy_dimension(workload):
dataset = get_dataset_from_workload(workload)
return ImpalaTestDimension('table_format',

View File

@@ -34,6 +34,8 @@ from tests.common.skip import (
SkipIfLocal,
SkipIfNotHdfsMinicluster
)
from tests.common.test_dimensions import (create_exec_option_dimension,
create_exec_option_dimension_from_dict, create_client_protocol_dimension)
from tests.common.test_vector import ImpalaTestDimension
from tests.util.filesystem_utils import WAREHOUSE, get_fs_path, IS_HDFS
@@ -112,6 +114,85 @@ class TestNestedTypes(ImpalaTestSuite):
use_db='tpch_nested' + db_suffix)
class TestNestedTypesInSelectList(ImpalaTestSuite):
"""Functional tests for nested types provided in the select list."""
@classmethod
def get_workload(self):
return 'functional-query'
@classmethod
def add_test_dimensions(cls):
super(TestNestedTypesInSelectList, cls).add_test_dimensions()
cls.ImpalaTestMatrix.add_constraint(lambda v:
v.get_value('table_format').file_format in ['parquet', 'orc'])
cls.ImpalaTestMatrix.add_dimension(
ImpalaTestDimension('mt_dop', 0, 2))
cls.ImpalaTestMatrix.add_dimension(
create_exec_option_dimension_from_dict({
'disable_codegen': ['False', 'True']}))
cls.ImpalaTestMatrix.add_dimension(create_client_protocol_dimension())
cls.ImpalaTestMatrix.add_constraint(lambda v:
v.get_value('protocol') == 'hs2')
def test_struct_in_select_list(self, vector, unique_database):
"""Queries where a struct column is in the select list"""
if vector.get_value('table_format').file_format == 'parquet':
pytest.skip()
if vector.get_value('exec_option')['disable_codegen'] == 'False':
pytest.skip()
self.run_test_case('QueryTest/struct-in-select-list', vector, unique_database)
def test_nested_struct_in_select_list(self, vector, unique_database):
"""Queries where a nested struct column is in the select list"""
if vector.get_value('table_format').file_format == 'parquet':
pytest.skip()
if vector.get_value('exec_option')['disable_codegen'] == 'False':
pytest.skip()
self.run_test_case('QueryTest/nested-struct-in-select-list', vector, unique_database)
# Moved this to a separate test class from TestNestedTypesInSelectList because this needs
# a narrower test vector.
class TestNestedTypesInSelectListWithBeeswax(ImpalaTestSuite):
"""Functional tests for nested types provided in the select list."""
@classmethod
def get_workload(self):
return 'functional-query'
@classmethod
def add_test_dimensions(cls):
cls.ImpalaTestMatrix.add_dimension(create_client_protocol_dimension())
cls.ImpalaTestMatrix.add_constraint(lambda v:
v.get_value('protocol') == 'beeswax')
cls.ImpalaTestMatrix.add_dimension(create_exec_option_dimension(
disable_codegen_options=[True]))
def test_struct_with_beeswax(self, vector):
expected_err = "Returning complex types is not supported through the beeswax " + \
"interface"
err = self.execute_query_expect_failure(self.client,
"select tiny_struct from functional_orc_def.complextypes_structs",
vector.get_value('exec_option'))
assert expected_err in str(err)
class TestComputeStatsWithNestedTypes(ImpalaTestSuite):
"""Functional tests for running compute stats on tables that have nested types in the
columns."""
@classmethod
def get_workload(self):
return 'functional-query'
@classmethod
def add_test_dimensions(cls):
super(TestComputeStatsWithNestedTypes, cls).add_test_dimensions()
cls.ImpalaTestMatrix.add_constraint(lambda v:
v.get_value('table_format').file_format in ['parquet', 'orc'])
def test_compute_stats_with_structs(self, vector):
"""COMPUTE STATS and SHOW COLUMN STATS for tables with structs"""
self.run_test_case('QueryTest/compute-stats-with-structs', vector)
class TestNestedTypesNoMtDop(ImpalaTestSuite):
"""Functional tests for nested types that do not need to be run with mt_dop > 0."""
@classmethod