Impala incorrectly returned NULLs in the "Max Size" column of the SHOW
COLUMN STATS result when executed through the HS2 interface. The issue
was that the column was specified to be type INT in the result schema,
but the actual type of the contents that we inserted into it was
"long". The reason why this is not an issue in Impala shell is because
we stringify the contents without inspecting the metadata for beeswax
results.
The issue was fixed by changing the type from INT to BIGINT.
Change-Id: I419657744635dfdc2e1562fe60a597617fff446e
Reviewed-on: http://gerrit.cloudera.org:8080/6109
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
HIVE-15653 is a Hive Metastore bug that results in ALTER TABLE
commands wiping the table stats of unpartitioned tables.
Until the Hive bug is fixed, this patch adds a workaround
to Impala that forces the Metastore to preserve the table stats.
Testing: Private core/hdfs run passed.
Change-Id: Ic191c765f73624bc716badadd7215c8dca9d6b1f
Reviewed-on: http://gerrit.cloudera.org:8080/5731
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
A handful of fixes to codegen memory usage:
* Delete the IR module when we're done with it (it can be fairly large)
* Track the compiled code size (typically not that large, but it can add
up if there are many fragments).
* Estimate optimisation memory requirements and track it in the memory
tracker. This is very crude but much better than not tracking it.
A handful of fixes to improve codegen time/cost, particularly targeted
at compute stats workloads:
* Avoid over-inlining when there are many aggregate functions,
conjuncts, etc by adding "NoInline" attributes.
* Don't codegen non-grouping merge aggregations. They will only process
one row per Impala daemon, so codegen is not worth it.
* Make the Hll algorithm more efficient by specialising the hash function
based on decimal width.
Limitations:
* This doesn't tackle over-inlining of large expr trees, but a similar
approach will be used there in a follow-on patch.
Perf:
Compute stats on functional_parquet.widetable_1000_cols goes from 1min+
of codegen to ~ 5s codegen on my machine. Local perf runs of tpc-h
and targeted perf showed no regressions and some moderate improvements
(1-2%).
Also did an experiment to understand the perf consequences of disabling
inlining. I manually set CODEGEN_INLINE_EXPRS_THRESHOLD to 0, and ran:
drop stats tpch_20_parquet.lineitem
compute stats tpch_20_parquet.lineitem;
There was no difference in time spent in the agg node: 30.7s with
inlining, 30.5s without.
Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Reviewed-on: http://gerrit.cloudera.org:8080/4956
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
This commit handles partition related DDL in a more general way. We can
now use compound predicates to specify a list of partitions in
statements like ALTER TABLE DROP PARTITION and COMPUTE INCREMENTAL
STATS, etc. It will also make sure some statements only accept one
partition at a time, such as PARTITION SET LOCATION and LOAD DATA. ALTER
TABLE ADD PARTITION remains using the old PartitionKeyValue's logic.
The changed partition related DDLs are as follows,
Table: p (i int) partitioned by (j int, k string)
Partitions:
+-------+---+-------+--------+------+--------------+-------------------+
| j | k | #Rows | #Files | Size | Bytes Cached | Cache Replication |
+-------+---+-------+--------+------+--------------+-------------------+
| 1 | a | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| 1 | b | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| 1 | c | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| 2 | d | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| 2 | e | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| 2 | f | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| Total | | -1 | 0 | 0B | 0B | |
+-------+---+-------+--------+------+--------------+-------------------+
1. show files in p partition (j<2, k='a');
2. alter table p partition (j<2, k in ("b","c") set cached in 'testPool';
// j can appear more than once,
3.1. alter table p partition (j<2, j>0, k<>"d") set uncached;
// it is the same as
3.2. alter table p partition (j<2 and j>0, not k="e") set uncached;
// we can also do 'or'
3.3. alter table p partition (j<2 or j>0, k like "%") set uncached;
// missing 'k' matches all values of k
4. alter table p partition (j<2) set fileformat textfile;
5. alter table p partition (k rlike ".*") set serdeproperties ("k"="v");
6. alter table p partition (j is not null) set tblproperties ("k"="v");
7. alter table p drop partition (j<2);
8. compute incremental stats p partition(j<2);
The remaining old partition related DDLs are as follows,
1. load data inpath '/path/from' into table p partition (j=2, k="d");
2. alter table p add partition (j=2, k="g");
3. alter table p partition (j=2, k="g") set location '/path/to';
4. insert into p partition (j=2, k="g") values (1), (2), (3);
General partition expressions or partially specified partition specs
allows partition predicates to return empty partition set no matter
'IF EXISTS' is specified.
Examples:
[localhost.localdomain:21000] >
alter table p drop partition (j=2, k="f");
Query: alter table p drop partition (j=2, k="f")
+-------------------------+
| summary |
+-------------------------+
| Dropped 1 partition(s). |
+-------------------------+
Fetched 1 row(s) in 0.78s
[localhost.localdomain:21000] >
alter table p drop partition (j=2, k<"f");
Query: alter table p drop partition (j=2, k<"f")
+-------------------------+
| summary |
+-------------------------+
| Dropped 2 partition(s). |
+-------------------------+
Fetched 1 row(s) in 0.41s
[localhost.localdomain:21000] >
alter table p drop partition (k="a");
Query: alter table p drop partition (k="a")
+-------------------------+
| summary |
+-------------------------+
| Dropped 1 partition(s). |
+-------------------------+
Fetched 1 row(s) in 0.25s
[localhost.localdomain:21000] > show partitions p;
Query: show partitions p
+-------+---+-------+--------+------+--------------+-------------------+
| j | k | #Rows | #Files | Size | Bytes Cached | Cache Replication |
+-------+---+-------+--------+------+--------------+-------------------+
| 1 | b | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| 1 | c | -1 | 0 | 0B | NOT CACHED | NOT CACHED |
| Total | | -1 | 0 | 0B | 0B | |
+-------+---+-------+--------+------+--------------+-------------------+
Fetched 3 row(s) in 0.01s
Change-Id: I2c9162fcf9d227b8daf4c2e761d57bab4e26408f
Reviewed-on: http://gerrit.cloudera.org:8080/3942
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Hive expects types for column stats to be specified as all lower
case. For some reason, it doesn't check this when the stats are
first written, but it does check when performing an 'alter table'.
This causes it to drop stats that Impala wrote because we specify
type names in upper case.
This patch converts the types that Impala sends to Hive for the
column stats to all lower case and adds a regression test.
I also filed HIVE-15061 to track the issue from the Hive end.
Change-Id: Ia373ec917efa7ab9f2a59b8a870b7ebc30175dda
Reviewed-on: http://gerrit.cloudera.org:8080/4845
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
This patch makes it a little easier to use the unique_database fixture
with .test files. The RESULTS section can now contain $DATABASE which
is replaced with the current database by the test framework.
Testing:
- ran the test locally on exhaustive
- ran the test on hdfs and the local filesystem on Jenkins
Change-Id: I8655eb769003f88c0e1ec1b254118e4ec3353b48
Reviewed-on: http://gerrit.cloudera.org:8080/2947
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Hive allows creating Avro tables without an explicit Avro schema since 0.14.0.
For such tables, the Avro schema is inferred from the column definitions,
and not stored in the metadata at all (no Avro schema literal or Avro schema file).
This patch adds support for loading the metadata of such tables, although Impala
currently cannot create such tables (expect a follow-on patch).
Change-Id: I9e66921ffbeff7ce6db9619bcfb30278b571cd95
Reviewed-on: http://gerrit.cloudera.org:8080/538
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Based on Google's HyperLogLog++ paper. Uses a bias correcting
interpolation as a sub algorithm for Hll estimates within a specific
range.
Change-Id: If4fe692b4308f6a57aea6167e9bc00db11eaaab9
Reviewed-on: http://gerrit.cloudera.org:8080/415
Tested-by: Internal Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
This patch adds a 'location' column to the output of SHOW TABLE STATS /
SHOW PARTITIONS. This helps users understand the effects of ALTER TABLE
SET LOCATION commands, particularly for partitions, and is easier to
identify than the output of DESCRIBE FORMATTED.
Some existing tests in alter-table.test have been updated to include
checking the location output before and after a SET LOCATION
command. The tests in show.test have also been updated to check for the
location; all other tests that use SHOW [TABLE STATS|PARTITIONS] use a
generic regex to avoid overly verbose tests.
Change-Id: I9d276f7b133c38c9319e0906397ca1c31cec95bb
Reviewed-on: http://gerrit.cloudera.org:8080/316
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
The following commits disabled tests to unblock the full data load:
a00a9a5e53f7a8e7a1e3c931ea0e4b7db21c6f00
bf29d06f2e53bb924d250275d51f5ccd1213531d
This patch re-enables those tests and adds new tests to guard against
regressions to HIVE-6308.
Unfortunately, we cannot completely remove the analysis check for HIVE-6308
in our code, because there is still one case where COMPUTE STATS will fail on
a Hive-created Avro table: If there is a mismatch in column names between
the Avro schema and the column defs given to a CREATE TABLE in Hive.
Change-Id: I81ae6b526db02fdfc634e09eeb9d12036e2adfdd
Reviewed-on: http://gerrit.cloudera.org:8080/180
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Add skip markers for S3 that can be used to categorize the tests that
are skipped against S3 to help see what coverage is missing. Soon
we'll be reworking some tests and/or adding new tests to get back the
important gaps.
Also, add a mechanism to parameterize paths in the .test files, and
start using these new variables. This is a step toward enabling some
more tests against S3.
Finally, a fix for buildall.sh to stop the minicluster before applying
the metastore snapshot. Otherwise, this fails since the ms db is in
use.
Change-Id: I142434ed67bed407e61d7b2c90f825734fc0dce0
Reviewed-on: http://gerrit.cloudera.org:8080/127
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Additionally, this patch also disabled the hbase/none test dimension if the
TARGET_FILESYSTEM environment variable is set to either s3 of isilon.
Change-Id: I63aecaa478d2ba9eb68de729e9640071359a2eeb
Reviewed-on: http://gerrit.cloudera.org:8080/74
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
The problem was that VARCHAR was not present in a few switch statements
for updating/populating column stats in various places.
Change-Id: I0b2b316b734d27a7ff08701b0986014be2473443
Reviewed-on: http://gerrit.cloudera.org:8080/65
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This patch adds the possibility to specify the number of replicas that
should be cached in main memory. This can be useful in high QPS
scenarios as the majority of the load is no longer the single cached
replica, but a set of cached replicas. While the cache replication
factor can be larger than the block replication factor on disk, the
difference will be ignored by HDFS until more replicas become
available.
This extends the current syntax for specifying the cache pool in the
following way:
cached in 'poolName'
is extended with the optional replication factor
cached in 'poolName' with replication = XX
By default, the cache replication factor is set to 1. As this value is
not yet configurable in HDFS it's defined as a constant in the JniCatalog
thrift specification. If a partitioned table is cached, all its child
partitions inherit this cache replication factor. If child partitions
have a custom cache replication factor, changing the cache replication
factor on the partitioned table afterwards will overwrite this custom
value. If a new partition is added to the table, it will again inherit
the cache replication factor of the parent independent of the cache pool
that is used to cache the partition.
To review changes and status of the replication factor for tables and
partitions the replication factor is part of output of the "show
partitions" command.
Change-Id: I2aee63258d6da14fb5ce68574c6b070cf948fb4d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5533
Tested-by: jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
This patch adds the ability to compute and drop column and table
statistics at partition granularity.
The following commands are added. Detail about the implementation
follows.
COMPUTE INCREMENTAL STATS <tbl_name> [PARTITION <partition_spec>]
This variant of COMPUTE STATS will, ultimately, do the same thing as the
traditional COMPUTE STATS statement, but does so by caching the
intermediate state of the computation for each partition in the Hive
MetaStore. If the PARTITION clause is added, the computation is
performed for only that partition. If the PARTITION clause is omitted,
incremental stats are updated only for those partitions with missing
incremental stats (e.g. one column does not have stats, or incremental
stats was never computed for this partition). In this patch, incremental
stats are only invalidated when a DROP STATS variant is executed. Future
patches can automatically invalidate the statistics after REFRESH or
INSERT queries, etc.
DROP INCREMENTAL STATS <tbl_name> PARTITION <part_spec>
This variant of DROP stats removes the incremental statistics for the
given table. It does *not* recalculate the statistics for the whole
table, so this should be used only to invalidate the intermediate state
for a partition which will shortly be subject to COMPUTE INCREMENTAL
STATS. The point of this variant is to allow users to notify Impala when
they believe a partition has changed significantly enough to warrant
recomputation of its statistics. It is not necessary for new partitions;
Impala will detect that they do not have any valid statistics.
--------
This is achieved by adapting the existing HLL UDA via swapping its
finalize method for a new one which returns the intermediate HLL
buckets, rather than aggregating and then disposing of them. This
intermediate state is then returned to Impala's catalog-op-executor.cc,
which then passes the intermediate state back to the frontend to be
ultimately stored in the HMS.
This intermediate state is computed on a per-partition basis by grouping
the input to the UDA by partition. Thus, the incremental computation
produces one row for each partition selected (the set of which might be
quite small, if there are few partitions without valid incremental
stats: this is the point of the new commands).
At the same time, the query coordinator aggregates the output of the UDA
to produce table-level statistics. This computation incorporates any
existing (and not re-computed) intermediate partition state which is
passed to the coordinator by the frontend. The resulting statistics are
saved to the table as normal.
Intermediate statistics are serialised to the HMS by writing a Thrift
structure's serialised form to the partition's 'parameters' map. There
is a schema-imposed limit of 4000 characters to the serialised string,
which is exacerbated by the fact that the Thrift representation must
first be base-64 encoded to avoid type errors in the HMS. The current
patch breaks the encoded structure into 4k chunks, and then recombines
them on read. The alltypes table (11 columns) takes about three of these
chunks. This may mean that incremental stats are not suitable for
particularly wide tables: these structures could be zipped before
encoding for some space savings. In the meantime, the NDV estimates are
run-length encoded (since they are generally sparse); this can result in
substantial space savings.
Change-Id: If82cf4753d19eb532265acb556f798b95fbb0f34
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4475
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5408
Because we add 'total' to the last row in SHOW PARTITIONS, we set the
partition key columns to be string. At least, that's what the comment
said, but we didn't do that in fact.
This patch also corrects the column type for max width, which should be INT.
Change-Id: I787ab17be27f45107340119017e528c58a3daad3
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4678
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
This patch addresses the following issues:
1. Allow creating Avro tables without col defs in Impala. Compute stats works on them.
2. Handle table creation with inconsistent col defs and Avro schema as follows:
The table creation will succeed and ignore the col defs in favor of the Avro schema.
A warning is issued that the col defs and the Avro schema are inconsistent.
Compute stats works on such tables.
This patch does not address the issue of compute stats after Avro schema evolution.
Change-Id: Iea6b737d238d81491dc2097012ebc149a89d03ba
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4182
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4250
Tested-by: jenkins
The query used to generate the stats does a GROUP BY on the partition keys,
and so empty partitions will not get any results. Detect the empty partition
case and set the number of rows to 0.
Change-Id: I1ccb7d2016f35026aa1b418155c4534024f3cee5
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4029
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 128a02f508cdb280b53b8a8429e6b90491e43956)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4042
Semi or anti-joined table references are now only visible inside the
On-clause of the corresponding join.
Change-Id: Id93e53ecdf2a74baf9736aa427fa7af15358ca27
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3789
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
This patch improves the performance of the DDL queries
"Alter table add partition" and "Alter table drop partition"
as the number of partitions is scaled up.
The issue was that every time a partition was added or dropped,
the entire block metadata for that table was reloaded. This
operation was highly expensive especially as the number
of partitions became larger.
This patch handles this by adding/dropping only the added/dropped
partition's metadata to the hdfsTable (adding/dropping it to/from
the internal partition list), and incrementally updating the
corresponding data structures instead of refreshing them from scratch.
The following are the time improvements observed.
Number of partitions Time taken to add/drop Time to add/drop
(existing) a new partition (before) a new partition (now)
1 1.02s 1.02s
10 0.27s 0.27s
100 0.14s 0.14s
500 0.35s 0.35s
1000 0.91s 0.51s
10000 11.72s 0.85s
20000 21.92s 0.87s
Out of this total time (for the worst case), around 0.50s is spent in
adding and dropping the partition to the hive meta store and rest of the
time is spent in updating the catalog.
Change-Id: I359ab0af921543c0fdcb975c14b05f80f93fe803
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3291
Reviewed-by: Anusha Dasarakothapalli <anusha.dasarakothapalli@cloudera.com>
Tested-by: jenkins
Adds support for dropping all table and column stats from a table. Once incremental
stats are supported, this will provide the user a way to force a recompute of all
stats.
Change-Id: I27e03d5986b64eb91852bfc3417ffa971d432d6b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3533
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit f1f074f24bfdc77c4cef147fe9d26f27df80ab81)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3551
With IMPALA-1033 we disabled the counting of the number of NULLs in each column,
and that gave a 2x speed-up in the computation. But erroneously the value 0 was
being placed in the number of NULLs, instead of the correct -1 that indicates
'unknown'.
Change-Id: Ib882eb2a87e7e2469f606081cb2881461b441a45
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3377
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3378
The compute stats statement was not quoting the DB and table names. If those names
were aliasing with keywords, then the compute stats would not execute due to a syntax
error.
Change-Id: Ie08421246bb54a63a44eaf19d0d835da780b7033
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3170
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3198
This change adds DDL support for HDFS caching. The DDL allows the user to indicate a
table or partition should be cached and which pool to cache the data into:
* Create a cached table: CREATE TABLE ... CACHED IN 'poolName'
* Cache a table/partition: ALTER TABLE ... [partitionSpec] SET CACHED IN 'poolName'
* Uncache a table/partition: ALTER TABLE ... [partitionSpec] SET UNCACHED
When a table/partition is marked as cached, a new HDFS caching request is submitted
to cache the location (HDFS path) of the table/partition and the ID of that request
is stored with in the table metadata (in the table properties). This is stored as:
'cache_directive_id'='<requestId>'. The cache requests and IDs are managed by HDFS
and persisted across HDFS restarts.
When a cached table or partition is dropped it is important to uncache the cached data
(drop the associated cache request). For partitioned tables, this means dropping all
cache requests from all cached partitions in the table.
Likewise, if a partitioned table is created as cached, new partitions should be marked
as cached by default.
It is desirable to know which cache pools exists early on (in analysis) so the query
will fail without hitting HDFS/CatalogServer if a non-existent pool is specified. To
support this, a new cache pool catalog object type was introduced. The catalog server
caches the known pools (periodically refreshing the cache) and sends the known pools out
in catalog updates. This allows impalads to perform analysis checks on cache pool
existence going to HDFS. It would be easy to use this to add basic cache pool management
in the future (ADD/DROP/SHOW CACHE POOL).
Waiting for the table/partition to become cached may take a long time. Instead of
blocking the user from access the time during this period we will wait for the cache
requests to complete in the background and once they have finished the table metadata
will be automatically refreshed.
Change-Id: I1de9c6e25b2a3bdc09edebda5510206eda3dd89b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2310
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Avro tables that were not created with a column-definition list do not have
their columns properly populated in the Metastore backend DB (HIVE-6308).
For such tables COMPUTE STATS and Hive's ANALYZE TABLE cannot succeed.
This patch fails COMPUTE STATS in analysis for such broken Avro tables
and adds tests for Avro tables with mismatched a column-definition list
and Avro schema.
Change-Id: I561ecea944ae2f83d69950b7a1ab9edaa89bdcea
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1892
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1920
The parquet file stores the application version that wrote it so
is different between our c4 and c5 branches.
HBase storage is also not guaranteed to be identical across versions.
Change-Id: I02984a55e0678756e50c1fff6db22c43788d3916
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1028
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
A compute stats command computes the table and column stats for a given
table and persists them in the metastore.
The table stats consist of the per-partition and per-table row count.
The column stats are computed on a per-table basis and consist of the
number of distinct values and the number of NULLs per column.
This patch introduces a new 'child query' concept that
compute stats utilizes. Child queries are cancelled
if the parent query is cancelled. A compute stats stmt is
executed by the following query hirarchy:
parent: compute stats query (DDL)
- child: compute table stats query (QUERY)
- child: compute column stats query (QUERY)
The new child query concept is necessary to decouple child query fetches
from parent query fetches, i.e., we could not execute a child query as
part of the original compute stats query, because then a client could
fetch the results we need for updating the Metastore statistics. The
reason why our existing CTAS works without this decoupling
is that its insert 'child query' is not fetchable.
Change-Id: I560533e3cb09bcbbdb3eea7fcf0b460bc6b36dcd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/873
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins