Files
impala/testdata/workloads/functional-query/queries/QueryTest/insert_null.test
Lenni Kuff a2cbd2820e Add Catalog Service and support for automatic metadata refresh
The Impala CatalogService manages the caching and dissemination of cluster-wide metadata.
The CatalogService combines the metadata from the Hive Metastore, the NameNode,
and potentially additional sources in the future. The CatalogService uses the
StateStore to broadcast metadata updates across the cluster.
The CatalogService also directly handles executing metadata updates request from
impalad servers (DDL requests). It exposes a Thrift interface to allow impalads to
directly connect execute their DDL operations.
The CatalogService has two main components - a C++ server that implements StateStore
integration, Thrift service implementiation, and exporting of the debug webpage/metrics.
The other main component is the Java Catalog that manages caching and updating of of all
the metadata. For each StateStore heartbeat, a delta of all metadata updates is broadcast
to the rest of the cluster.

Some Notes On the Changes
---
* The metadata is all sent as thrift structs. To do this all catalog objects (Tables/Views,
Databases, UDFs) have thrift struct to represent them. These are sent with each statestore
delta update.
* The existing Catalog class has been seperated into two seperate sub-classes. An
ImpladCatalog and a CatalogServiceCatalog. See the comments on those classes for more
details.

What is working:
* New CatalogService created
* Working with statestore delta updates and latest UDF changes
* DDL performed on Node 1 is now visible on all other nodes without a "refresh".
* Each DDL operation against the Catalog Service will return the catalog version that
  contains the change. An impalad will wait for the statestore heartbeat that contains this
  version before returning from the DDL comment.
* All table types (Hbase, Hdfs, Views) getting their metadata propagated properly
* Block location information included in CS updates and used by Impalads
* Column and table stats included in CS updates and used by Impalads
* Query tests are all passing

Still TODO:
* Directly return catalog object metadata from DDL requests
* Poll the Hive Metastore to detect new/dropped/modified tables
* Reorganize the FE code for the Catalog Service. I don't think we want everything in the
  same JAR.

Change-Id: I8c61296dac28fb98bcfdc17361f4f141d3977eda
Reviewed-on: http://gerrit.ent.cloudera.com:8080/601
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:11 -08:00

171 lines
4.6 KiB
Plaintext

====
---- QUERY
# Test that we properly write null values to text tables.
insert overwrite table nullinsert
select NULL, "", "NULL", "\\N", NULL from alltypes limit 1
---- SETUP
RESET nullinsert
---- RESULTS
: 1
====
---- QUERY
select * from nullinsert
---- TYPES
string, string, string, string, int
---- RESULTS
'NULL','','NULL','\N',NULL
====
---- QUERY
select * from nullinsert_alt
---- TYPES
string
---- SETUP
RESET nullinsert_alt
---- RESULTS
'\N,,NULL,\\N,\N'
====
---- QUERY
# Test NULL partition keys using static partition insert. Both partitions keys are NULL.
insert overwrite table alltypesinsert
partition(year=NULL, month=NULL)
select id, bool_col, tinyint_col, smallint_col, int_col, bigint_col,
float_col, double_col, date_string_col, string_col, timestamp_col
from alltypessmall
where year=2009 and month=4
---- SETUP
DROP PARTITIONS alltypesinsert
---- RESULTS
year=__HIVE_DEFAULT_PARTITION__/month=__HIVE_DEFAULT_PARTITION__/: 25
====
---- QUERY
# Verify contents of alltypesinsert.
select count(*) from alltypesinsert where year is null and month is null
---- TYPES
bigint
---- RESULTS
25
====
---- QUERY
# Verify that dropping NULL partitions works in the SETUP section.
select * from alltypesinsert
---- SETUP
DROP PARTITIONS alltypesinsert
---- TYPES
int, int, int, boolean, tinyint, smallint, int, bigint, float, double, string, string, timestamp
---- RESULTS
====
---- QUERY
# Test NULL partition keys using static partition insert. Year partition key is NULL.
insert overwrite table alltypesinsert
partition(year=NULL, month=10)
select id, bool_col, tinyint_col, smallint_col, int_col, bigint_col,
float_col, double_col, date_string_col, string_col, timestamp_col
from alltypessmall
where year=2009 and month=4
---- SETUP
DROP PARTITIONS alltypesinsert
---- RESULTS
year=__HIVE_DEFAULT_PARTITION__/month=10/: 25
====
---- QUERY
# Verify contents of alltypesinsert.
select count(*) from alltypesinsert where year is null and month=10
---- TYPES
bigint
---- RESULTS
25
====
---- QUERY
# Test NULL partition keys using dynamic partition insert. Month partition key is NULL.
insert overwrite table alltypesinsert
partition(year=2008, month=NULL)
select id, bool_col, tinyint_col, smallint_col, int_col, bigint_col,
float_col, double_col, date_string_col, string_col, timestamp_col
from alltypessmall
where year=2009 and month=4
---- SETUP
DROP PARTITIONS alltypesinsert
---- RESULTS
year=2008/month=__HIVE_DEFAULT_PARTITION__/: 25
====
---- QUERY
# Verify contents of alltypesinsert.
select count(*) from alltypesinsert where year=2008 and month is null
---- TYPES
bigint
---- RESULTS
25
====
---- QUERY
# Test NULL partition keys using dynamic partition insert.
insert overwrite table alltypesinsert
partition(year, month)
select id, bool_col, tinyint_col, smallint_col, int_col, bigint_col,
float_col, double_col, date_string_col, string_col, timestamp_col,
cast(if(bool_col, NULL, 2007) as int) as year, cast(if(tinyint_col % 3 = 0, NULL, 6) as int) as month
from alltypessmall
where year=2009 and month=4
---- RESULTS: VERIFY_IS_EQUAL_SORTED
year=2007/month=6/: 8
year=2007/month=__HIVE_DEFAULT_PARTITION__/: 5
year=__HIVE_DEFAULT_PARTITION__/month=6/: 7
year=__HIVE_DEFAULT_PARTITION__/month=__HIVE_DEFAULT_PARTITION__/: 5
====
---- QUERY
# Verify contents of each new partition in alltypesinsert.
select count(*) from alltypesinsert where year=2007 and month=6
---- TYPES
bigint
---- RESULTS
8
====
---- QUERY
# Verify contents of each new partition in alltypesinsert.
select count(*) from alltypesinsert where year=2007 and month is null
---- TYPES
bigint
---- RESULTS
5
====
---- QUERY
# Verify contents of each new partition in alltypesinsert.
select count(*) from alltypesinsert where year is null and month=6
---- TYPES
bigint
---- RESULTS
7
====
---- QUERY
# Verify contents of each new partition in alltypesinsert.
select count(*) from alltypesinsert where year is null and month is null
---- TYPES
bigint
---- RESULTS
5
====
---- QUERY
# Insert nulls and non-null values into table with
# custom table property serialization.null.format='xyz'
insert overwrite nullformat_custom
select 1, NULL, NULL, NULL, NULL union all
select 2, true, "", 1, 1 union all
select 3, false, "NULL", 2, 2 union all
select 4, false, "xyz", 3, 3 union all
select 5, false, "xyzbar", 4, 4
---- RESULTS
: 5
====
---- QUERY
# Test correct interpretation of NULLs with custom
# table property serialization.null.format='xyz'
select id, a, b, b is null, c, d from nullformat_custom order by id limit 10
---- TYPES
int, boolean, string, boolean, int, double
---- RESULTS
1,NULL,'NULL',true,NULL,NULL
2,true,'',false,1,1
3,false,'NULL',false,2,2
4,false,'NULL',true,3,3
5,false,'xyzbar',false,4,4
====