Commit Graph

318 Commits

Author SHA1 Message Date
Alex Behm
3f54240fed PlannerTest uses explain level 'normal'. Only add stats and costs to explain output in 'verbose' mode.
Change-Id: I827b4c7085b5aa2dc5521f8748d8973178f43f4c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/678
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:23 -08:00
Alex Behm
c5c2ccb56c Fix build break due to machine-dependent explain output.
Change-Id: I6b72e4e6cf2a7b38d4687c6f0f860e9744c9cedb
Reviewed-on: http://gerrit.ent.cloudera.com:8080/675
Tested-by: jenkins
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
2014-01-08 10:53:22 -08:00
Alex Behm
4bb8b38cde Added stats and cost estimates to explain output.
Change-Id: I1273745a439fd25cefa4e08ecc075c98cc8bfc45
Reviewed-on: http://gerrit.ent.cloudera.com:8080/602
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:53:22 -08:00
Skye Wanderman-Milne
8692e7df8d Add timestamp support to CodegenAnyVal
Change-Id: I2bbeae16660709c2c15d545e6d1c791912e880db
Reviewed-on: http://gerrit.ent.cloudera.com:8080/655
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:21 -08:00
Nong Li
6b9a7de02e Add symbol resolution during analysis for create function stmts.
Before this, we had to specify the entire mangled symbol. This can be quite
long and quite tedious (take a look at some of the create UDA test cases that
specify all the symbols).

This patch adds some code to convert from the user function signature to the
mangled name. This means the user can specify the unmangled name and we can
do the symbol lookup. The mangling rules are pretty convoluted but if it is
messed up, the user can always specify the full symbol.

Some other minor cleanup in:
  - JNI from FE to BE
  - UDFs/UDAs that are loaded as test data

Change-Id: I733dbf3a72cb7b06221c27e622d161bcca0d74a8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/624
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:20 -08:00
Nong Li
c031cd4e96 Update RLE encoding to pad literal groups to 8.
Change-Id: I77cb2b80b888b569ff715c583f16aea4e39fe680
Reviewed-on: http://gerrit.ent.cloudera.com:8080/644
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:17 -08:00
Nong Li
15db34e356 AggregationNode refactoring
This patch redoes how the aggregation node is implemented. The functionality is
now split between aggregation-node, agg-expr and aggregate-functions. This is a working
progress (there's still a lot of debug stuff I added that needs to be cleaned up) but
it does pass the tests.

Aggregation-node is now very simple and now only deals with the grouping part.
Aggregate-expr serves as the glue between the agg node and the aggregate functions.
The aggregation functions are implemented with the UDA interface. I've reimplemented
our existing aggregate functions with this setup. For true UDAs, the binaries would be
loaded in aggregate-expr.

This also includes some preliminary changes in the FE. We now need to annotate each
AggNode as executing the update vs. merge phase (root aggs execute update, others
execute merge) and if it needs a finalize step (only the root does). This is more
general than our builtins which are too simple to need this structure.

There is a big TODO here to allow the intermediate types between agg nodes to change.
For example, in distinct estimate, the input type is the column type and the output type
is a bigint. We'd like the intermediate type to be CHAR(256). This is different since
currently, the intermediate type and output type have always been the same. We've hacked
around this by having both the intermediate and output type be TYPE_STRING. I've left
this for another patch (changing the BE to support this is trivial).
For aggregates that result in strings, we used to store some additional stuff past the
end of the tuple. The layout was:
<tuple> <length of 1st string buffer>,<length of 2nd string buffer>, etc

The rationale for this is that we want to reuse the buffer for min/max and grow the buffer
more quickly for group_concat. This breaks down the abstraction between agg-expr and
agg-node and is not something UDAs can use in general. Rather than try to hack around
this, I think the proper solution is to the intermediate type not be StringValue and
to contain the buffer length itself.

This patch also resurrects the distinct estimate code. The distinct estimate functions
exercise all of the code paths.

Change-Id: Ic152a2cd03bc1713967673681e1e6204dcd80346
Reviewed-on: http://gerrit.ent.cloudera.com:8080/564
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:13 -08:00
Lenni Kuff
a2cbd2820e Add Catalog Service and support for automatic metadata refresh
The Impala CatalogService manages the caching and dissemination of cluster-wide metadata.
The CatalogService combines the metadata from the Hive Metastore, the NameNode,
and potentially additional sources in the future. The CatalogService uses the
StateStore to broadcast metadata updates across the cluster.
The CatalogService also directly handles executing metadata updates request from
impalad servers (DDL requests). It exposes a Thrift interface to allow impalads to
directly connect execute their DDL operations.
The CatalogService has two main components - a C++ server that implements StateStore
integration, Thrift service implementiation, and exporting of the debug webpage/metrics.
The other main component is the Java Catalog that manages caching and updating of of all
the metadata. For each StateStore heartbeat, a delta of all metadata updates is broadcast
to the rest of the cluster.

Some Notes On the Changes
---
* The metadata is all sent as thrift structs. To do this all catalog objects (Tables/Views,
Databases, UDFs) have thrift struct to represent them. These are sent with each statestore
delta update.
* The existing Catalog class has been seperated into two seperate sub-classes. An
ImpladCatalog and a CatalogServiceCatalog. See the comments on those classes for more
details.

What is working:
* New CatalogService created
* Working with statestore delta updates and latest UDF changes
* DDL performed on Node 1 is now visible on all other nodes without a "refresh".
* Each DDL operation against the Catalog Service will return the catalog version that
  contains the change. An impalad will wait for the statestore heartbeat that contains this
  version before returning from the DDL comment.
* All table types (Hbase, Hdfs, Views) getting their metadata propagated properly
* Block location information included in CS updates and used by Impalads
* Column and table stats included in CS updates and used by Impalads
* Query tests are all passing

Still TODO:
* Directly return catalog object metadata from DDL requests
* Poll the Hive Metastore to detect new/dropped/modified tables
* Reorganize the FE code for the Catalog Service. I don't think we want everything in the
  same JAR.

Change-Id: I8c61296dac28fb98bcfdc17361f4f141d3977eda
Reviewed-on: http://gerrit.ent.cloudera.com:8080/601
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:11 -08:00
Nong Li
1eb2b7a964 Add execution for vararg UDFs.
Change-Id: I46e5670c09ac0b8e62f39dfc832fe880dd1dc995
Reviewed-on: http://gerrit.ent.cloudera.com:8080/572
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:09 -08:00
Nong Li
4bb1e8c854 Add varargs to UDF/UDA parser/analyzer.
Change-Id: I4c3f2e74f6c29cee4b0b787c058b0455b16a11fd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/548
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:05 -08:00
Skye Wanderman-Milne
b7f83bcd73 Add support for LLVM IR UDFs.
This patch also adds a number of improvements to NativeUdfExpr. Highlights include:

* Correctly handling the lowering of AnyVal struct types (required for ABI compatibility)
* A rudimentary library cache for reusing handles produced by dlopen
* More complicated test cases

Change-Id: Iab9acdd7d7c4308e5d7ee3210f21b033fda5a195
Reviewed-on: http://gerrit.ent.cloudera.com:8080/540
Tested-by: jenkins
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:53:03 -08:00
Nong Li
8963d79f51 Fix build break from UdfContext rename.
Change-Id: Ia3df23fcba7d3812ae90565daab89916cbb50861
Reviewed-on: http://gerrit.ent.cloudera.com:8080/549
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:01 -08:00
Nong Li
e39de94316 Add parser/analysis to support UDAs.
I looked around some and I think having create/drop/show [aggregate] function
seems reasonable and extends nicely for UDTs.

The create aggregate function can accept a lot of arguments. The non-essential one, I
went with resolving them by name rather than position (i.e. argName="value"). I think
this is better for the user than specifying it by position.

The grammar is:
CREATE AGGREGATE <name>(<arg_types>) RETURNS <type> [INTERMEDIATE <type>]
LOCATION '/path' UpdateFn='Fn' [comment='comment']
[SerializeFn='symbol'] [MergeFn='symbol'] [InitFn='symbol'] [FinalizeFn='symbol']

The optional args at the end can be in any order. If the other symbols are not
specified, we derive them from the UpdateFn symbol that's required. The analyzer
would try to figure it out and fail if we can't find the derived symbol in the binary.

The simplest example would be:
CREATE AGGREGATE FUNCTION count(float) RETURNS BIGINT LOCATION '/path'
UpdateFn='CountUpdateFn';

In which case we assume the intermediate type is the return type and the other functions
are called 'CountInitFn', 'CountSerializeFn', 'CountMergeFn' 'CountFinalizeFn'.

Change-Id: Iefc5741293050f5b295df28e9d1a7d039ead8675
Reviewed-on: http://gerrit.ent.cloudera.com:8080/513
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:52:59 -08:00
Alex Behm
39f9a067fa IMPALA-444: Fixed accuracy of string to double conversion. Falling back to strod for scientific notation.
Change-Id: I9a5d948620907d34601ef041e58b1c9bb2172f71
Reviewed-on: http://gerrit.ent.cloudera.com:8080/507
Tested-by: jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:52:56 -08:00
Alex Behm
6253b21834 IMPALA-505: Fixed conjunct evaluation against partition columns in hdfs scan node when there are no matarialized slots.
Change-Id: Ia003347bd7ee4986f5411c7175057192635a4c6c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/509
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:52:54 -08:00
Skye Wanderman-Milne
fd99db0300 First pass at UdfExpr.
Change-Id: I517bf56541749b5c2459554821c7bf838239fdf0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/439
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:52:50 -08:00
Nong Li
a0bf45a0b4 Add udf type.
Change-Id: Ic5f52c127750cc9c847a3e34d3fdcfc78bee5a8a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/454
Tested-by: jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:52:48 -08:00
Alex Behm
33000b8c15 Fixed codegen of floating-point modulo.
Change-Id: Idd28c6a71a659471aa632a6e26d970557daeb3bf
Reviewed-on: http://gerrit.ent.cloudera.com:8080/385
Tested-by: jenkins
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
2014-01-08 10:52:46 -08:00
Nong Li
308650f208 Fix create function ddl test setup issue.
Change-Id: I30c9a4342efbdb17bd53fb14bdcee172506cdadb
Reviewed-on: http://gerrit.ent.cloudera.com:8080/447
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:52:44 -08:00
Nong Li
8eb727b585 UDF ddl cleanup
Change-Id: I381fed277b5809727d2d8bf430258c01d2d0ae1f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/436
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-01-08 10:52:43 -08:00
Nong Li
b22d1f41a7 Change all "Status Close()" to "void Close()"
Doing it this way makes sure we don't bail early on the Close path
which is rarely the right thing to do. This found a few places where
we were not doing proper cleanup because of this.

Change-Id: Ie663c68398c14589b5cbc1bd980644b0b10fd865
Reviewed-on: http://gerrit.ent.cloudera.com:8080/373
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:52:38 -08:00
ishaan
53cd9eadab Treat HBase as a file format for functional tests
Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922
Reviewed-on: http://gerrit.ent.cloudera.com:8080/102
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:36 -08:00
Nong Li
af90c8a133 Fix memory usage tracking.
Changes MemLimit to MemTracker:
- the limit is optional
- it also records a label and an optional parent
- Consume() and Release() also update the ancestors and there's also a new
  AnyLimitExceeded(), which also checks the ancestors
- the consumption counter is a HighwaterMarkCounter and can optionally be created
  as part of a profile

Each fragment instance now has a MemTracker that is part of a 3-level
hierarchy: process, query, fragment instance.

Change-Id: I5f580f4956fdf07d70bd9a6531032439aaf0fd07
Reviewed-on: http://gerrit.ent.cloudera.com:8080/339
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:52:36 -08:00
Nong Li
2394ae2e66 UDF parsing and analysis.
Change-Id: If8058c1cb66bf5e9c7049d4b78f5882b46c03fc1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/318
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:52:32 -08:00
Aaron Davidson
cafb7b72f8 External sorting
This is an experimental implementation of external sorting. This patch includes the following additions:
(1) creation and implementation of the Sorter interface, which can sort Impala Tuples.
(2) normalization of Tuples to allow memcmp-able sorting.
(3) a testing framework for the Sorter,
(4) a benchmark to compare the current state of the Sorter with other sorts,
(5) an implementation of a Vector which can store data whose size is only known at runtime,
(6) a sorting algorithm (basically a dumbed down STL sort) which can operate over such a vector,
(7) implementation of a simple in-memory Merger, and
(8) logic to stream blocks of memory in and out of memory for the actual external merging.

I have a local branch for experimental optimizations and benchmarking -- this should be considered
a "basic", working sort.

The following optimizations have been implemented:
(i)   Optionally extracting keys instead of writing them in place.
(ii)  Optionally opportunistically parallelize run building (sorting & prepare for output).
(iii) Maximize disk IO and minimize buffer recycling by writing buffers out, but also keeping
      them in memory until right when they're needed.
(iv)  Prepare auxililary data backwards so the buffers can be released as we go, and still
      go out in an order which preserves the first buffers of the run.
(v)   Always merge maximum number of runs at a time, taking from the next merge level if
      available.

Change-Id: I1d7304d54d73152da929b1efffc1e851e5fb8fd4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/126
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Aaron Davidson <aaron.davidson@cloudera.com>
2014-01-08 10:52:27 -08:00
Aaron Davidson
00275ce3a9 (IMPALA-422) Add string concatenation function
Implements a group_concat() function which concatenates all the values in a group together.

The format is group_concat(str_col, [separator]). The default separator is ', '. NULLs
are ignored.

Change-Id: If152df6f528401117dba81d66ef691bfb548cc7d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/117
Reviewed-by: Aaron Davidson <aaron.davidson@cloudera.com>
Tested-by: Aaron Davidson <aaron.davidson@cloudera.com>
2014-01-08 10:52:21 -08:00
Skye Wanderman-Milne
efac6f82fd Print errors to shell in BaseSequenceScanner.
Change-Id: I0d1b041695c0d61b8c4994833f0a703e3bfa9c6a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/278
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:52:20 -08:00
Lenni Kuff
d66d3bfce3 IMPALA-161: Add Impala support for CREATE TABLE AS SELECT
This adds support for CREATE TABLE AS SELECT to Impala. It supports all functionality a
regular CREATE TABLE statement includes, except it does not allow for for specifying
partition columns. Hive also has this limitation and it wouldn't be too hard to support
in the future.

Change-Id: I4ca3c3b8f1576441b8bb5ed9dc521d7dfa96ab74
Reviewed-on: http://gerrit.ent.cloudera.com:8080/157
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:17 -08:00
ishaan
e9e23bff5d Fix build because of a change in parquetfile.
This changes QueryTest/create.test to unblock the builds.

Change-Id: If91ac43e349c2f81034ba7504c27890781f33260
Reviewed-on: http://gerrit.ent.cloudera.com:8080/255
Tested-by: jenkins <kitchen-build@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:16 -08:00
Nong Li
a3bc1ce133 Some parquet encoder/decoder refactoring. Added dictionary to other types.
Split out the encoder/type for parquet reader/writer. I think this puts us
in a better place to support future encodings.

On the tpch lineitem table, the results are:
Before:
  BytesWritten: 236.45 MB
  Per Column Sizes:
    l_comment: 75.71 MB
    l_commitdate: 8.64 MB
    l_discount: 11.19 MB
    l_extendedprice: 33.02 MB
    l_linenumber: 4.56 MB
    l_linestatus: 869.98 KB
    l_orderkey: 8.99 MB
    l_partkey: 27.02 MB
    l_quantity: 11.58 MB
    l_receiptdate: 8.65 MB
    l_returnflag: 1.40 MB
    l_shipdate: 8.65 MB
    l_shipinstruct: 1.45 MB
    l_shipmode: 2.17 MB
    l_suppkey: 21.91 MB
    l_tax: 10.68 MB
After:
 BytesWritten: 198.63 MB            (84%)
  Per Column Sizes:
    l_comment: 75.71 MB             (100%)
    l_commitdate: 8.64 MB           (100%)
    l_discount: 2.89 MB             (25.8%)
    l_extendedprice: 33.13 MB       (100.33%)
    l_linenumber: 1.50 MB           (32.89%)
    l_linestatus: 870.26 KB         (100.032%)
    l_orderkey: 9.18 MB             (102.11%)
    l_partkey: 27.10 MB             (100.29%)
    l_quantity: 4.32 MB             (37.31%)
    l_receiptdate: 8.65 MB          (100%)
    l_returnflag: 1.40 MB           (100%)
    l_shipdate: 8.65 MB             (100%)
    l_shipinstruct: 1.45 MB         (100%)
    l_shipmode: 2.17 MB             (100%)
    l_suppkey: 10.11 MB             (46.14%)
    l_tax: 2.89 MB                  (27.06%)

The table is overall 84% as big (i.e. 16% smaller). A few columns got marginally
bigger. If the file filled  the 1 GB, I'd expect the overhead to decrease even
more.

The restructuring to use a virtual call doesn't seem to change things much and
will go away when we codegen the scanner.

Here's what they look like with this patch (note this is on the before data files,
so only string cols are dictionary encoded).

Before query times:
  Insert Time: 8.5 sec
  select *: 2.3 sec
  select avg(l_orderkey): .33 sec

After query times:
  Insert Time: 9.5 sec                  <-- Longer due to doing dictionary encoding
  select *: 2.4 sec                     <-- kind of noisy, possibly a slight slow down
  select avg(l_orderkey): .33 sec

Change-Id: I213fdca1bb972cc200dc0cd9fb14b77a8d36d9e6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/238
Tested-by: jenkins <kitchen-build@cloudera.com>
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:52:16 -08:00
Skye Wanderman-Milne
b9ea32e9b7 Fix IMPALA-129, IMPALA-534, and other scanner bugs.
Change-Id: Idbd29af3fcc35b9e1173d08ac55b5780751c5938
Reviewed-on: http://gerrit.ent.cloudera.com:8080/196
Tested-by: jenkins <kitchen-build@cloudera.com>
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:52:14 -08:00
Alex Behm
9a201645cd IMPALA-496: Fix escaping of field delimiter and escape character in inserts
Change-Id: I49c36ae9823b35dcb9e92d1a13bef270657e36f2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/163
Tested-by: jenkins <kitchen-build@cloudera.com>
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:09 -08:00
Alex Behm
f0e2d539fc IMPALA-495: Views Sometimes Not Utilizing Partition Pruning.
Change-Id: I65daebbe8c4b72b956a409fe28edd3773fda7cb7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/128
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:52:04 -08:00
Alex Behm
c9965e5a5c Fix build break due to views defined by a constant select.
Change-Id: I5deeeb03469494f5ba6ed7a911354bbdd6c98195
Reviewed-on: http://gerrit.ent.cloudera.com:8080/149
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
2014-01-08 10:52:04 -08:00
Alex Behm
2b427208e5 IMPALA-507: Creating a VIEW that does not reference a table fails with IllegalStateException.
Change-Id: I11470ba919bbfced76730adae2a46647c4ef110b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/146
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:52:04 -08:00
Alex Behm
52c9d26d16 IMPALA-475: Impala should avoid the use of c_# style autogenerated column aliases unless necessary.
Change-Id: I959e35bcee1698ebc35534dc4f390c5c2c7dc919
Reviewed-on: http://gerrit.ent.cloudera.com:8080/141
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:52:03 -08:00
Alex Behm
9754f5bf52 IMPALA-504: Right and full outer joins do not return row with NULL value for rhs table.
Change-Id: Ia3f8d474fb30189b36fb587b2920d7b9b224ea71
Reviewed-on: http://gerrit.ent.cloudera.com:8080/129
Tested-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:52:03 -08:00
Skye Wanderman-Milne
6e7406df8b IMPALA-502: Impala does not return NULL for case where table has extra string column and data does not (it returns an empty string)
Change-Id: I0cfe5ce5fc279d46610a3cc191a501ccbc335296
Reviewed-on: http://gerrit.ent.cloudera.com:8080/127
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:52:02 -08:00
Nong Li
fd53edbbe4 Fix parquet writer bug with not setting dictionary metadata.
Change-Id: Ia5c0886497678d31b82cb5052e06df437bb201be
Reviewed-on: http://gerrit.ent.cloudera.com:8080/114
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Marcel Kornacker <marcel@cloudera.com>
2014-01-08 10:52:02 -08:00
Lenni Kuff
faeb7f5fa3 Add scanner test case for scenario where data and table schema do not match
Change-Id: I16f007ad1cb2caac47506914512c5665fc3d5f56
Reviewed-on: http://gerrit.ent.cloudera.com:8080/98
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:01 -08:00
Skye Wanderman-Milne
3fecdeb793 IMPALA-441: support default values for Avro tables 2014-01-08 10:51:39 -08:00
Alex Behm
8ad15fabcf IMPALA-372: Added CREATE/DROP/ALTER VIEW. 2014-01-08 10:51:35 -08:00
Alex Behm
3bba336bbf IMPALA-359: Return proper tuple id of inline view with distinct aggregation. 2014-01-08 10:51:26 -08:00
Alan Choi
254ee6ef89 IMPALA-434 Support binary hbase encoding 2014-01-08 10:51:18 -08:00
Skye Wanderman-Milne
e8344bb0d0 Dictionary encoding/decoding 2014-01-08 10:51:15 -08:00
Lenni Kuff
c2cfc7e2a3 IMPALA-373: Add support for 'LOAD DATA' statements
This change adds Impala support for LOAD DATA statements. This allows the user
to load one or more files into a table or partition from a given HDFS location. The
load operation only moves files, it does not convert data to match the target
table/partition's file format.
2014-01-08 10:51:02 -08:00
Alex Behm
045038e479 IMPALA-374: Added WITH clause without recursion. 2014-01-08 10:51:00 -08:00
Henry Robinson
79b36a5eb3 IMPALA-375: Add column permutation clause to INSERT statement 2014-01-08 10:50:59 -08:00
Alan Choi
15a3d92492 Qualify table with database 2014-01-08 10:50:57 -08:00
Alan Choi
58687d16b8 IMPALA-406 Raise an error when inserting into HBase table using a null row key. 2014-01-08 10:50:56 -08:00