Commit Graph

222 Commits

Author SHA1 Message Date
Alex Behm
7fcd7cd64e Add list of tables missing stats to explain header and mem-limit exceeded error.
Change-Id: Ibe8f329d5513ae84a8134b9ddb3645fa174d8a66
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1501
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1880
2014-03-12 21:15:22 -07:00
Alex Behm
58950a52a3 IMPALA-798: Distributed execution of CTAS and explain CTAS.
Change-Id: I32004a4b31c54cf5c185169fece143a61213d12d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1850
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1867
2014-03-12 16:51:50 -07:00
Matthew Jacobs
8fa8a0f828 IMPALA-843: Do not close reader contexts until plan fragment close
Fixes a crash that occurs in some cases when io buffers are still used and
child nodes are closed early. We close child nodes early when all rows have
been consumed and resources are transfered, but in some cases io buffers are
still in use when a scan node is closed. We avoid this problem by only
closing reader contexts when the entire fragment is closed.

Change-Id: Ie62cdecdcd530bdc61dd4e83cd9ecfc7d2c93ef6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1806
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 66f14a47b953b7b7153c73f4e018d03461dcd5ef)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1859
2014-03-12 14:44:18 -07:00
Alex Behm
748ea3f38b Fix test_partitioning.py and expected results.
Change-Id: I21148f3a10abbda4f9e587f83cbabdd2a79c6147
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1861
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1866
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-03-12 11:25:17 -07:00
Lenni Kuff
08417c875f IMPALA-849: Impala does not work with boolean partition key columns
This is because in HdfsTable we call call "expr.castTo(colType)", but BooleanLiteral
(incorrectly) didn't implement "uncheckedCastTo()". This meant that instead of a
BooleanLiteral being returned we got back a CastExpr, which cannot be cast to LiteralExpr.

As part of this change it turns out Boolean partition columns are also broken in Hive. I
filed HIVE-6590 for these issues and we decided to disable INSERT into a boolean partition
column for Impala due to this bug.

Change-Id: I3e295bb96aadc08d64faf551f6393a7128a7ef27
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1755
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-03-11 18:42:08 -07:00
Alex Behm
47c52ade84 IMPALA-866: Make HdfsScanNode.computeStats() idempotent with respect to totalBytes_.
Change-Id: I1c243b089db82c0544586a2a1428081aa2dbcd52
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1844
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1852
2014-03-11 18:20:15 -07:00
Nong Li
5022aa08fb IMPALA-869: Fix result initialization for MIN().
Change-Id: I50eceb04c0eb1c9eedb9c963cb75d2fc5aeb4825
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1847
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-03-11 17:26:31 -07:00
Nong Li
f6de8d9e30 IMPALA-765: Fix subexpr elimination codegen optimization.
The previous implementation did not properly handle replacing the is_null
return argument from expr calls.

Change-Id: I96cd0dfca8876b4f914b0cbc4eb459ea3dcdf230
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1795
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-03-10 15:20:53 -07:00
Alex Behm
a615ebc549 IMPALA-822,IMP-1271: Binding predicates on an aggregation now properly trigger slot materialization.
The bug was that the number of materialized agg-tuple slots did not correspond to the number
of materialized agg functions, due to binding predicates against an AggNode causing slot
materialization after SelectStmt.materializeRequiredSlots().
This patch fixes the issue by taking binding predicates (bound to a slot in an agg tuple)
into consideration in SelectStmt.materializeRequiredSlots().
I added a new sanity check in AggregationNode.toThrift() surfaced another issue with slot
materialization that is also fixed in this patch. The ordering exprs must be marked before
the agg exprs in SelectStmt.materializeRequiredSlots() because the odering exprs may contain
agg exprs that are only referenced inside the ORDER BY clause.

Change-Id: I1bdc0466f583907bed625ce6608938e59faee83f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1639
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1818
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-03-08 00:25:26 -08:00
Alex Behm
f7c2781afe IMPALA-845: Transfer predicates to 2nd phase merge agg in some cases.
Having predicates need to be transferred to the 2nd phase merge agg
for distinct + non-distinct aggregates without group by.
For distinct + non-distinct aggregates with group by, it is correct
to evaluate the predicates at the 2nd phase (non-merge) agg.

Change-Id: I71d73c4ef92becbb81e142bc0cb5f54e790b1fb5
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1743
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1817
2014-03-07 21:45:16 -08:00
Alex Behm
66a6c1f312 Fix UDF query test files.
Change-Id: Idea277ea2d20c47b2a81b0f2f06c48455de2ea45
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1780
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-03-06 07:37:14 -08:00
Skye Wanderman-Milne
6ceed1e632 UDF API additions
This patch introduces the ability to specify a prepare and close
function for a UDF, as well as FunctionContext methods for maintaining
state across UDF invocations within a query. Many of the changes are
related to adding an Expr::Open() function which calls the UDF's
prepare function, if specified (it has to be called in Open() since
the LLVM module must be compiled first).

Change-Id: I581d90d03dff71f7ff5d4a6bef839ba6bc46b443
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1693
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 8e2ed7fb9051d98f89327715fdebd6f5ed22d6ee)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1757
2014-03-05 07:32:34 -08:00
Alex Behm
69a840d965 Consistent memory estimates for explain tests.
Our new build machines (e.g., beefy) have more cores than our other machines,
so scan nodes may have a different memory estimate causing the explain tests
to fail. This patch fixes the num_scanner_threads to 1 for explain tests
to ensure consisteny estimates.

Change-Id: Ie6194f3c3b17d04aa141d04fcddb7ac948e92fcf
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1735
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1753
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-03-05 05:38:30 -08:00
Skye Wanderman-Milne
203fc66456 Add GetTypeDesc() method to FunctionContext.
This is currently only implemented for NativeUdfExpr.

Change-Id: I81b442c5668dff43d0486d1cfc445bca2af66606
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1664
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit e1087c3a78e6e12938b583c302907bd32c59f524)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1720
2014-03-01 20:24:30 -08:00
Nong Li
80658d9eab IMPALA-828: Fix avro codegen if conjuncts cannot be codegend.
Change-Id: I9ff0214e541eb958132fbe5b7798883db91ef025
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1695
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-02-27 19:58:13 -08:00
Alex Behm
cb8150e8ee IMPALA-817: Check equality of function name in Function.equals().
Change-Id: Ib9b4ee3a21f90fdb0d7ebccd89462dc67040bd1e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1594
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1611
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
2014-02-19 17:13:51 -08:00
Nong Li
0d2919fe7f Refactor scalar and aggregate function analysis and execution.
This patch cleans up analysis and execution of scalar and aggregate functions
so that there is no difference between how builtins and user functions are
handled. The only difference is that the catalog is populated with the builtins
all the time.

The BE always gets a TFunction object and just executes it (builtins will have
an empty hdfs file location).

This removes the opcode registry and all of the functionality is subsumed by
the catalog, most of which was already duplicated there anyway.

This also introduces the concept of a system database; databases that the
user cannot modify and is populated automatically on startup.

Change-Id: Iaa3f84dad0a1a57691f5c7d8df7305faf01d70ed
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1386
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1577
2014-02-18 18:40:08 -08:00
Lenni Kuff
95404d4888 Support prioritized background table loading
The overall goal of this change allow for table metadata to be loaded in the background
but also to allow prioritization of loading on an as-needed basis. As part of analysis,
any tables that are not loaded are tracked and if analysis fails the Impalad will make
an RPC to the CatalogServer to requiest the metadata loading of these tables be
prioritized and analysis will be restarted.

To support this, the CatalogServer now has a deque of the tables to load. For
background loading, tables to load are added to the tail of the deque. However, a new
CatalogServer RPC was added that can prioritize the loading of one or more tables in
which case they will get added to the head of the deque. The next table to load is
always taken from the head. This helps prioritize loading but is admittedly not the most
fair approach.

The support the prioritized loading, some changes had to made on the Impalad side during
analysis:
- During analysis, any tables that are missing metadata are tracked.
- Analysis now runs in a loop. If it fails due to an AnalysisException AND at least 1
  table/view was missing metadata, these tables missing metadata are requested to be
  loaded by calling the CatalogServer.
- The impalad will wait until the required tables are received (by getting notified each
  time there is a call to updateCatalog()), and waiting to run analysis until all tables
  are available. Once the tables are available, analysis will restart.

This change also introduces two new flags:

--load_catalog_in_background (bool). When this is true (the default) the catalog server
will run a period background thread to queue all unloaded tables for loading. This is
generally the desired behavior, but there may be some cases (very large metastores) where
this may need to be disabled.

--num_metadata_loading_threads (int32). The number of threads to use when loading catalog
metadata (degree of parallelism). The default is 16, but it can be increased to improve
performance at the cost of stressing the Hive metastore/HDFS.

Change-Id: Ib94dbbf66ffcffea8c490f50f5c04d19fb2078ad
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1476
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1538
2014-02-13 23:43:06 -08:00
Nong Li
d5d4b4785b Fix broken udf test case. Should not specify DB.
Change-Id: I5f6343cbef9f52d349130360e029b38b23d0187a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1505
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-02-10 11:34:56 -08:00
Nong Li
1a55133f0a IMPALA-735. Fix codegen bug affecting outer joins.
Change-Id: I99ca45b558fb2ed694f261a22e7e91e59f1ad675
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1496
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-02-10 05:00:21 -08:00
Nong Li
7d578a9e54 Cleanup for IMPALA-774 fix.
Change-Id: I47bce71c482b3576957e88980f764c30f45229a9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1454
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1470
2014-02-05 22:58:51 -08:00
Henry Robinson
16af29ea5f IMPALA-770: Fix crash in aggregation node with zero-width tuple
The select exprs of an inline view may not always be materialised, yet
the output tuple itself may be. This patch fixes a crash in this
situation in the backend aggregation node which assumed its output tuple
would always have at least one materialised slot.

The cause was a couple of too-conservative DCHECKs that failed if the
tuple was NULL. In fact, the code was robust to this possibility without
the checks, so this bug didn't affect release builds of Impala.

Change-Id: If0b90809d30fcd196f55197953392452d1ac9c4f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1431
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 8c1c21b66c43e900760ace54d090305f32a85a1f)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1471
Tested-by: Henry Robinson <henry@cloudera.com>
2014-02-05 22:01:35 -08:00
Nong Li
ccd8c0338f IMPALA-774: Fix runtimestate setup when evaluating expr from FE.
We weren't initializing the udf mem pool causing UDFs to return strings to crash if used as part
of a constant expression.

Change-Id: Ic3a0e556aec8ce03a9e59f3ccf6980c682046b50
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1447
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
2014-02-05 11:02:27 -08:00
Alex Behm
3f68be2caa IMP-1227: Ignore columns of unsupported types in compute stats. Enclose identifiers that are Impala keywords in quotes.
Change-Id: Ie7fa6da2869090428c9229c44b973ecccbb49e8e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1357
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1368
2014-01-28 17:18:17 -08:00
Lenni Kuff
e97a1b52e0 Remove flaky verification in ALTER/CREATE table tests
This fixes the flaky ALTER/CREATE tests by removing a verification step that
didn't add value and was non-deterministic. The verficiation step that was
removed verified that CREATE/ALTER set the appropriate file format by
changing the format to something that didn't match the underlying data files,
then attempting to read the data. This is already covered by the positive
test case where the file format is changed to match the underlying data.

Change-Id: I66f485405234f472f3b83f3e776bf7f2c10de874
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1379
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1382
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-28 16:03:02 -08:00
Alex Behm
22ab2595d6 [CDH5] Fixes to expected explain plans due to new HBase version.
Change-Id: I33f09283dcea278ca07f9d2d44e542e644def8ca
2014-01-15 15:12:24 -08:00
Nong Li
53d7bbb97a [CDH5] Impala changes for updated thirdparty components.
Changes include:
  - version changes in impala-config
  - version changes in various loading scripts
  - hbase jars are no longer in hive/lib
  - mini-llama script changes
  - updates due to sentry api changes
  - JDBC tests disabled
  - unsupported types tests disabled.

Change-Id: If8cf1b7ad8e22aa4d23094b9a4b1047f7e9d93ee
2014-01-15 15:12:13 -08:00
Alex Behm
6799c93922 Simplified/enhanced explain plans with a total of four explain levels.
There are now 4 explain levels summarized as follows:
- Level 0: MINIMAL
  Non-fragmented parallel plan only showing plan nodes with minimal attributes
- Level 1: STANDARD
  Non-fragmented parallel plan with some details in plan nodes
- Level 2: EXTENDED
  Non-fragmented parallel plan with full details in plan nodes including
  the table/column stats, row size, #hosts, cardinality,
  and estimated per-host memory requirement
- Level 3: VERBOSE
  Fragmented parallel plan with full details (like level 2)

This patch also includes several bugfixes related to plan costing and/or
testing of explain plans.

Change-Id: I622310f01d1b3d53ea1031adaf3b3ffdd94eba30
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1211
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-10 19:17:59 -08:00
Skye Wanderman-Milne
561da008c7 IMPALA-729: fix resource management in Parquet scanner for multiple row groups
We weren't attaching resources to the row batch when starting a new
row group, so it was possible for string data to be overwritten. This
patch removes CloseStreams() and merges its functionality with
AttachCompletedResources() so it's not possible to destroy streams
without transferring the resources first. It also merges and removes
ScannerContext::Close().

Also adds test cases for IMPALA-720.

Change-Id: Ia8f40c7d39d8702716f1d337fe797e2696bd0fcb
2014-01-08 10:56:26 -08:00
Alan Choi
57b961168d IMP-1188 Fix HBase row key predicates issues
This patch fixes a few row key issues:

1. We used to assert that the row key filter must be a string literal.
However, it can also be a constant function. We need to eval the expr
and then use the result as the start/stop key.

2. Cast(row_key as int) simply failed.
This should not be transformed into start/stop key.

3. We used to assert that lower bound < upper bound.
This query:
  select * from tbl where row_key > 'b' and row_key < 'a'
would simply ASSERT. We should simply not return any rows.

4. Handle NULL predicate
HBase row key can't be null. If either upper/lower bound is null, we simply
don't need to return any rows.

Change-Id: Ia03590a862888b377bf1f48bcb838b99193fa241
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1180
Reviewed-by: Alan Choi <alan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:40 -08:00
Alan Choi
468ca0aa5d IMPALA-723 Fix union with aggregate
The problem is that with Union, AggregateInfo.materializeRequiredSlots() is being called more than once.
Other "materializeSlots" related calls are idempotent, but this one is not.
That's because materializedAggregateSlots_ is an array list and we keep adding the same duplicate value
to the array list. We can fix it by making materializeRequiredSlots() idempotent.

Change-Id: Ic18f89010c088fe9018b15f0281bc9340b8a2d14
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1195
Tested-by: jenkins
Reviewed-by: Alan Choi <alan@cloudera.com>
Tested-by: Alan Choi <alan@cloudera.com>
2014-01-08 10:54:40 -08:00
Lenni Kuff
6afea60704 Update test logging to print executable SQL statements and log all actions executed
This is the first step in cleaning up the test logging. It provides a common connection
interface that provides tracing around all operations. When a test fails the output will
be executable SQL. It also logs actions such as when a connection is opened, close, or
when an operation is cancelled. Currently only beeswax connections are supported, but
I have a seperate patch that adds support for executing using HS2 as well as Beeswax.

Example of new logging:
-- connecting to: localhost:21000
-- executing against localhost:21000
use functional;

SET disable_codegen=False;
SET abort_on_error=1;
SET batch_size=0;
SET num_nodes=0;

-- executing against localhost:21000
select a.timestamp_col from alltypessmall a inner join alltypessmall b on
(a.timestamp_col = b.timestamp_col)
where a.year=2009 and a.month=1 and b.year=2009 and b.month=1;
-- closing connection to: localhost:21000

Change-Id: Iedc7d4d3a84bfeff6cc1daae6ed1ca97613d7700
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1133
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:54:40 -08:00
Alan Choi
00e912b372 IMPALA-715 HBase scanner should use CallIntMethod for int return type function call
The hbase-table-scanner used CallShortMethod to retrieve the size of the array.
    Short will overflow. Because the size of the array is an int, we should use
    CallIntMethod instead.

Change-Id: I941981f7504ee04adf998398f8baf6beae76d000
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1171
Reviewed-by: Alan Choi <alan@cloudera.com>
Tested-by: Alan Choi <alan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:39 -08:00
Matthew Jacobs
967346b0c4 IMPALA-630: Add fn to get the PID of the impalad to which the user is connected
Change-Id: I2d8b304bfb22883489bbbbe33e07478d164583b9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1127
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
2014-01-08 10:54:37 -08:00
Chris Channing
7e98708d7d IMPALA-114: Add support for custom date/time formats
This change set adds support for dealing with custom date/time formats in Impala.  The following date/time tokens are supported:

y – Year
M – Month
d – Day
H – Hour
m – Minute
s – second
S – Fractional second

The token names and usage have been modeled on the SimpleDateFormat class used in Java. This allows the use of repeating tokens to indicate zero padding for an output scenario (TS -> String) and a guide for reading data to a given length in a parsing scenario. Representing literals months is achieved by specifying three repeating tokens e.g. yyyy-MMM-dd -> 2013-Nov-21.

Formatting character groups can appear in any order along with any separators e.g.

yyyy/MM/dd
dd-MMM-yy
(dd)(MM)(yyyy)  HH:mm:sss
..etc..

The following features are not supported with this patch:

    - Long literal months e.g. MMMM
    - Nested strings e.g. “Year: “ yyyy “Month: “ mm “Day: “ dd
    - Lazy formatting

Change-Id: Ibba2eaed366fd736b921b31b8d0d517ac1248bca
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1001
Reviewed-by: Christopher Channing <cchanning@cloudera.com>
Tested-by: Christopher Channing <cchanning@cloudera.com>
2014-01-08 10:54:34 -08:00
Alex Behm
74164e8f99 IMPALA-688: Fix column stats computation for HBase row key. Use regex to fix flaky tests.
Change-Id: I1d3fb915921bbc5366da0ee51608fd54aa237777
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1135
Tested-by: jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:54:33 -08:00
Alex Behm
e4ad086dee Added max/avg length for string columns in COMPUTE STATS.
Change-Id: I6f61de2323ee12681642684ec633ed4bb7506de2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1079
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:30 -08:00
Alex Behm
dd0409e9d6 IMPALA-509: Minimal type promotion for arithmetic exprs.
Change-Id: I576fe9baf3bae7d46ee08e29ececc4adda97e9df
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1078
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:54:30 -08:00
Matthew Jacobs
93368e20b1 Fix CROSS JOIN handling in join order optimization and add tests
Cross joins should be handled like outer joins in the join order
optimization in that the right table referenced by a cross join may not
be reordered anywhere before tables referenced to the left of the cross
join. If there are inner joins to the right of the cross join, those
tables may be reordered before the cross join.

E.g., if we have A JOIN B CROSS JOIN C JOIN D, then C must come after A
and B, but D may be reordered to come before C.

Also adds test cases for join order optimization and predicate propagation.

Change-Id: I6b1022dd3e862efbff81e283b43284d846c8eca4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1096
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:29 -08:00
Skye Wanderman-Milne
9e17042185 Allow zero bit width dict/RLE decoders.
This allows us to read single-value dictionary-encoded columns
generated by parquet-mr.

Change-Id: I80903d910d0cc3a3e4ebf02e34212d868e94feb4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1098
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:27 -08:00
Skye Wanderman-Milne
de531e15bd IMPALA-694: Allow Impala to read files produced by parquet-mr version <= 1.2.8
parquet-mr had a bug where it didn't include the dictionary page's
header in the total column size. We now compensate for this by
detecting these files and padding the scan range length. This required
changing how the scanner detects when it's finished: it now counts the
number of rows rather than checking eosr (since the scan range may be
longer than the column).

Change-Id: Id9933808b965003c0c3b3aa78c32fe29a0c4bcbe
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1097
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:27 -08:00
Matthew Jacobs
f327431a8e IMPALA-171: Add CROSS JOIN
Adds a CROSS JOIN (cartesian product). Common join code is moved from to
a new abstract base class BlockingJoinNode. We must keep all build RowBatches in
memory in order to iterate over them for every row from the left child. The
TupleRowList provides a convenient way to iterate over all of the rows.
A future change will address codegen for the CrossJoinNode.

Change-Id: I5e0caa6fb4ec802a9c87e700f9dd6238cea8cdf2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/970
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:25 -08:00
Skye Wanderman-Milne
acdc792355 IMPALA-695: Use the local path of Hive UDF jars in the FE.
The FE was creating class loaders with the HDFS locations of Hive UDF
libs, rather than the local locations created by the BE. Our tests
still passed since we only used UDFs already on the classpath
(e.g. Hive builtins).

Change-Id: Idbe9c98ad6adb84b70cb44efbf9ad0afc53366ca
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1081
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:25 -08:00
Skye Wanderman-Milne
b54d16dabd IMPALA-679: Append hash of HDFS path to filename in CopyHdfsFile() to avoid collisions.
Change-Id: Ia84fa81fe043a9604248d66ed963ef3f91b0601e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1018
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:22 -08:00
Alex Behm
24662b1941 Allow ALTER TABLE to set per-partition serde and table properties.
The main motivation is to allow users to set the per-partition number
of rows for manual incremental stats maintenance, as well as a means
to 'drop' stats that may have caused undesirable plan changes.

Change-Id: Iff38317a993e5d7952ea4df839947f5ec341e930
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1010
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:22 -08:00
Lenni Kuff
bfb16ff552 Disable SHOW STATS tests because results are unstable (IMPALA-688)
Change-Id: Ib4b4fe3a29d3bd0e3c7ece8b5b21c4ec4b5eb289
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1060
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:54:22 -08:00
Nong Li
e3fdef7839 Fix subexpr elimination IR rewriting.
Change-Id: Iabdcc1686951e71136a603ed30f9d16fb1c1ec46
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1056
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:22 -08:00
Lenni Kuff
0bae3978c9 Update compute-stats.py to execute using Impala
Updates our compute stats script to execute using Impala. This allows us
to easily compute stats on all tables in a database or all tables in the
metastore.
The updated stats caused one of the TPCH plans to change so this also
updates the TPCH planner test results.

Change-Id: I17e5dcd1036a35e40eb4eb2c8e4a20702db9049c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1024
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:18 -08:00
Lenni Kuff
76fa3b2ded Update DDL to support 'STORED AS PARQUET' and 'STORED AS AVRO' syntax
This change updates our DDL syntax support to allow for using 'STORED AS PARQUET'
as well as 'STORED AS PARQUETFILE'. Moving forward we should prefer the new syntax,
but continue to support the old.  I made the same change for 'AVROFILE', but since
we have not yet documented the 'AVROFILE' syntax I left out support for the old syntax.

Change-Id: I10c73a71a94ee488c9ae205485777b58ab8957c9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1053
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:18 -08:00
Nong Li
ab21dde002 Update compute stats test to use regex for parquet/hbase file size.
The parquet file stores the application version that wrote it so
is different between our c4 and c5 branches.

HBase storage is also not guaranteed to be identical across versions.

Change-Id: I02984a55e0678756e50c1fff6db22c43788d3916
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1028
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:17 -08:00