Commit Graph

225 Commits

Author SHA1 Message Date
Dan Hecht
1fee56cb26 IMPALA-1080: Implement "SET <query_option>" as SQL statement.
Also add support for "SET", which returns a table of query options and
their respective values.

The front-end parses the option into a (key, value) pair and then the
existing backend logic is used to set the option, or return the result
sets.

Change-Id: I40dbd98537e2a73bdd5b27d8b2575a2fe6f8295b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3582
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
Tested-by: jenkins
(cherry picked from commit aa0f6a2fc1d3fe21f22cc7bc56887e1fdb02250b)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3614
2014-07-25 10:25:09 -07:00
Nong Li
cfa58a4567 Run test_rows_availability serially.
Change-Id: Id87a209a614f889209456f8c0d9aedd8ad0e513f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3565
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3584
2014-07-22 14:35:46 -07:00
Nong Li
7dc57aaa9e Change buffered block mgr to support multiple clients.
This patch does a few things:
1. Moves the buffer block mgr from the sorter to the runtime state. This is now
   one that is shared across the query fragment. The partitioned hash join and agg
   will use this as well.
2. Adds a Client interface to the block mgr. Each exec node is a different client
   and can reserve a minimum number of buffers. This avoid starvation.
3. Updated the BufferedBlockMgr interface's for getting pinned blocks to collapse
   two existing APIs.

Change-Id: Ibb31fbe480f3726048457f26e24a9e33f7201d86
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3504
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3574
2014-07-22 12:45:37 -07:00
Nong Li
a25400c94e Increase timeout in test_rows_availability to make sure query state is what we expect.
Change-Id: Id4feebcc7b7cecb07555009219e6420e48a0c82b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3534
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3579
2014-07-22 12:12:13 -07:00
Nong Li
202d656ddc Stop setting query state to EXCEPTION for non-exception cases.
We were setting the state to exception on Cancel() all the time.
We use the cancellation path as the normal cleanup path so this
gets called even when the query went fine (e.g. UnregisterQuery
calls Cancel()). We had already plumbed through a 'cause' argument
to differentiate.

Change-Id: Icf1091c165dec36d3dad7ce308367bbbc9edee4f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3524
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3575
2014-07-22 04:08:28 -07:00
Nong Li
188a0ea833 Rework structure of hash table.
This patch does two things in preparation for external joins. The
hash table used to contain a directory structure (buckets and nodes)
both of which were contiguous. The nodes contained the tuple ptrs
within it.

This patch changes it so the nodes are not stored contiguously but
allocated in pages. (this structure is dense and does not require
random lookups by index). The bucket structure is still contiguous
since we rely on the doubling property and random lookup by index.

The second change is that the node's no longer store the tuple ptrs
within them. This makes it easier to build the hash table ontop of
existing data.

Here's a quick benchmark doing a self join on tpch lineitem. Both
build and probe times decreased a bit.

Before:
 HASH_JOIN_NODE (id=2):(Total: 1s139ms, non-child: 985.939ms, % non-child: 86.50%)
         - BuildBuckets: 2.10M (2097152)
         - BuildRows: 6.00M (6001215)
         - BuildTime: 527.991ms
         - LeftChildRows: 6.00M (6001215)
         - LeftChildTime: 451.964ms
         - LoadFactor: 0.50
         - RowsReturned: 30.01M (30012985)
         - RowsReturnedRate: 26.33 M/sec
After:
HASH_JOIN_NODE (id=2):(Total: 1s019ms, non-child: 835.350ms, % non-child: 81.97%)
         - BuildBuckets: 2.10M (2097152)
         - BuildRows: 6.00M (6001215)
         - BuildTime: 423.175ms
         - LeftChildRows: 6.00M (6001215)
         - LeftChildTime: 406.67ms
         - LoadFactor: 0.50
         - RowsReturned: 30.01M (30012985)
         - RowsReturnedRate: 29.45 M/sec

Change-Id: I79e209a24c24fb4f2f99574bcf187746fddadc06
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3245
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-07-15 16:57:09 -07:00
Henry Robinson
9d0173c647 [CDH5] Disable ACL tests
The tests pass every time locally (in a 60 minute run), but fail
intermittently on our build machines.

Change-Id: I62d5ea0df8c42728a538b29bd16006be3179bfd3
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3489
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-07-14 15:38:11 -07:00
Henry Robinson
ff32821c6b [CDH5] Test to confirm that ACLs are inherited correctly on INSERT
Change-Id: I781a6b7203c2e12b484162954abae51a6443bead
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3076
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-07-09 19:04:55 -07:00
Matthew Jacobs
65c1a6f21e Remove SOURCE keyword by parsing as an identifier and checking the value
Reverts "IMPALA-1033: Remove SOURCE keyword; very common identifier"

Change-Id: I3fcf6d02786e00287b564cff0a823d0c19504e7a
2014-06-30 16:47:47 -07:00
Alex Behm
7777fbff53 Clean up expr substitution and cloning.
Before: The pre- and postconditions of expr substitution and cloning,
in particular, their effect on the isAnalyzed_ flag were unclear and
sometimes inconsistent e.g., some literal exprs set isAnalyzed_ to
true in their c'tor. As a result, several places required ad-hoc
solutions like Expr.unsetIsAnalyzed() and Expr.reanalyze().

This patch cleans up expr substitution and cloning, summarized as follows:

Expr analysis:
All exprs start our with isAnalyzed_ = false. The flag it set to true
iff analyze() has been called on the expr.

Expr.clone():
Creates a deep copy of an expr including all its analysis state.

Expr.equals():
Comparison of expr trees ignores implicit casts. This simplifies expr
substitution because un/analyzed exprs can be easily compared/substituted.

ExprSubstitutionMap:
When adding a mapping, the rhs expr must be analyzed to allow
substitution across query blocks. There is no requirement on the lhs expr.

Expr substitution:
Substitution returns an analyzed clone of the original expr with exprs
substituted. While performing the substitution, implicit casts and analysis
state are removed such that the returned result has minimal implicit casts
and types.
There are two versions of substitute functions: One that throws exceptions
one that does not, because the caller may have different expectations on
whether a substitution must succeed or not.

Numeric literals:
This patch combines IntLiteral and DecimalLiteral into a NumericLiteral.
Its main benefit is that analyze() always produces the same type, even if
the literal was implicitly cast and/or isAnalyzed was unset because of
expr substitution. This was not the case before because an implicit cast
could permanently turn an IntLiteral into a DecimalLiteral.

There is no more need for unsetIsAnalyzed() or reanalyze().

Change-Id: I646110e3714cff8ae8d5a378c25a107dd43334b6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3228
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3318
2014-06-30 10:18:26 -07:00
Dimitris Tsirogiannis
5a6f53db16 Add partition pruning tests
The following changes are included in this commit:
1. Modified the alltypesagg table to include an additional partition key
that has nulls.
2. Added a number of tests in hdfs.test that exercise the partition
pruning logic (see IMPALA-887).
3. Modified all the tests that are affected by the change in alltypesagg.

Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236
2014-06-24 02:14:27 -07:00
Alex Behm
881f3a8c33 Re-order union operands descending by their estimated per-host memory.
Re-order union operands descending by their estimated per-host memory,
s.t. parent nodes can gauge the peak memory consumption of a MergeNode after
opening it during execution (a MergeNode opens its first operand in Open()).
Scan nodes are always ordered last because they can dynamically scale down their
memory usage, whereas many other nodes cannot (e.g., joins, aggregations).
One goal is to decrease the likelihood of a SortNode parent claiming too much
memory in its Open(), possibly causing the mem limit to be hit when subsequent
union operands are executed.

Change-Id: Ia51caaffd55305ea3dbd2146cd55acc7da67f382
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3146
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3213
Tested-by: jenkins
2014-06-20 18:46:10 -07:00
Taras Bobrovytsky
7faaa65996 Added order by query tests
- Added static order by tests to test_queries.py and QueryTest/sort.test
- test_order_by.py also contains tests with static queries that are run with
  multiple memory limits.
- Added stress, scratch disk and failpoints tests
- Incorporated Srinath's change that copied all order by with limit tests into
  the top-n.test file

Extra time required:

Serial:
scratch disk: 42 seconds
test queries sort : 77 seconds
test sort: 56 seconds
sort stress: 142 seconds
TOTAL: 5 min 17 seconds

Parallel(8 threads):
scratch disk: 40 seconds
test queries sort: 42 seconds
test sort: 49 seconds
sort stress: 93 seconds
TOTAL: 3 min 44 sec

Change-Id: Ic5716bcfabb5bb3053c6b9cebc9bfbbb9dc64a7c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2820
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3205
2014-06-20 13:35:10 -07:00
Dimitris Tsirogiannis
7dbd3a5860 IMPALA-1040: Reading a decimal partitioned column with invalid values
This commit fixes IMPALA-1040 in which when an invalid value is inserted
to a decimal partitioned column through hive it results in a non
informative error message and in some cases in the associated table to
disappear from Impala's catalog. The fix results in a more informative
error message to always be thrown by Impala to indicate the insertion of
an invalid partition key value.

Change-Id: I2855ea69944e269fb7e02b3825f44e64352151e7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3062
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3200
2014-06-20 12:46:52 -07:00
Ippokratis Pandis
6026f1ebe1 IMPALA-1055: Compute stats query statements don't quote DB and table names
The compute stats statement was not quoting the DB and table names. If those names
were aliasing with keywords, then the compute stats would not execute due to a syntax
error.

Change-Id: Ie08421246bb54a63a44eaf19d0d835da780b7033
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3170
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3198
2014-06-20 09:32:52 -07:00
Alex Behm
aacd8bcf72 Change UnionNode to open its first child in UnionNode::Open().
This patch ensures that rows are available for clients to fetch
after we advance the query to FINISHED if the coordinator
fragment is rooted at a UnionNode.

Change-Id: I9b4ad3f70b46c7e7720bdd5ca9ad85479c2cb7fd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3139
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3168
2014-06-19 16:44:43 -07:00
ishaan
dc3dc3dc1e Enable tpch queries to run on text to unblock the full data load build.
Some planner tests depend on data being populated in the tpch tmp tables (in text format)
. This change re-enables the tpch query tests to run on text so that they pass.

Change-Id: I4ed09f55e05cb01978cb6f0808c6395552c0f129
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3176
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-06-19 16:19:13 -07:00
Lenni Kuff
0ac0527643 Reduce test execution time by limiting long running tests to exhaustive exec strategy
I looked at the latest run from master and took the tests suites that had long
execution times. This cleans those test suites up to either completely disable them
on 'core' or add constraints to limit the number of test vectors. It shouldn't impact
nightly coverage since we still run the same tests exhaustively.

Change-Id: I10c78c35155b00de0c36d9fc0923b2b1fc6b44de
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3119
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3125
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-06-18 16:18:17 -07:00
anusha
6b3689e8c7 IMPALA-973: Fix for invalidate metadata behaviour
Change-Id: Ie0c4c458b0919978b03ebaba28bf37950dd34643
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3009
Tested-by: jenkins
Reviewed-by: Anusha Dasarakothapalli <anusha.dasarakothapalli@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3091
2014-06-17 12:18:50 -07:00
Dimitris Tsirogiannis
67eb5eb3a8 IMPALA-1028: Cardinality estimate is wrong for partitioned tables if we
filter out all partitions

This commit fixes IMPALA-1028 in which the cardinality estimate is not
correct when all the partitions of a partitioned table are filtered out.
To fix this issue we make sure that the estimated result cardinality of
the scan node is zero when all the partitions are filtered out.

Change-Id: I225949eb2e8f905a5d0f678d7f199fb95ba4aab0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3063
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3083
2014-06-16 20:36:13 -07:00
Srinath Shankar
0df773eed6 Check RuntimeState for cancellation in sorter.
Currently, cancellation checking when a SortNode is executing only
happens when a batch is being added to the sorter (SortNode::SortInput()) or
when a batch is being retrieved from the sorter (SortNode::GetNext())

This fix passes in a RuntimeState into the Sorter instance itself, which
checks for cancellation at the following points:
i) During an in-memory sort (In Partition() and SortHelper()). In Partition(),
 the cancellation check may be delayed if the input is completely sorted.
ii) During an intermediate merge before each batch of rows from a merge is
 copied into a run.

Change-Id: I5c28c7244ee2e40627cf14542b99f872e3a8c343
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3007
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3059
2014-06-14 17:48:40 -07:00
anusha
ffc334a735 IMPALA-834: Fix for Create Table like Views
Change-Id: Ied1f706c48a1106e1d6fc2aa73e57746f52ea333
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2939
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3014
Reviewed-by: Anusha Dasarakothapalli <anusha.dasarakothapalli@cloudera.com>
2014-06-12 22:13:30 -07:00
Skye Wanderman-Milne
1cc628d32d IMPALA-950: Skip computing stats for decimal columns.
This patch also adds a mechanism to return analysis warnings to
client, which is used to log skipped decimal columns.

Change-Id: I30c246044a68ec8861cd5bed072bd54e65a079e6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2822
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit fc77422acef7e6f93fdeb5448309414b905f0725)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2984
2014-06-11 19:16:34 -07:00
Skye Wanderman-Milne
6ac9a8104b IMPALA-1009: UDF/UDA leaks should not fail queries
With this change, leaky UDFs built with the SDK will still fail when
using the test harness, but leaky UDFs running in Impala will only
trigger a warning. This change also updates the test infrastructure to
always check for non-fatal errors/warnings.

Change-Id: I5615349b9d691e4eddea3e03e152ef12e73835e7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2844
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 60ce5190d96add6104aba642d2354d87a26000fa)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2938
2014-06-10 21:46:47 -07:00
Nong Li
5e49150a22 Speed up views compat test.
- Use a smaller table so hive runs faster
- Don't invalidate the catalog, just the view created in hive
- This lets us run it in parallel

Change-Id: I8085d8967dc96cbbb20e2d719072b29fe591cd98
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2958
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-10 20:53:23 -07:00
Nong Li
ad534429df [CDH5] Disable flaky hdfs caching test.
Change-Id: I19900ae029876d8f74169eda0f08f5be3509fbaf
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2946
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-10 18:24:42 -07:00
Lenni Kuff
b3ebfddadd Allow tests to access query result column values by col alias or col position
For example, you can now do something like:
result_set = execute("select * from tbl")
result_row = result_set[0]
result_row['col_alias'] or result_row[4]

to access column values. If the column alias/position does not exist an exception is
thrown.

Change-Id: Ie4b65619ed17fd90bf39e0966a7fc7e1180dbc5c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2719
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2922
2014-06-09 23:24:26 -07:00
Victor Bittorf
09aff77a6c IMPALA-943: removed database udf_test from front-end tests
Added CATCH section to test files.

Change-Id: I28ba3a6e5ae4c53df5b86505573793d7b150863b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2782
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 5b616715958f3ebfdc45b8dc0e4baa82bd55f1d2)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2912
2014-06-09 19:06:15 -07:00
Matthew Jacobs
89ec6b3d7a IMPALA-1033: Remove SOURCE keyword; very common identifier
The SOURCE keyword was introduced for DATA SOURCE ddl commands, but
it is also a very common identifier. This removes the SOURCE and
SOURCES keywords and instead uses DATASOURCE and DATASOURCES.

Change-Id: Ic6c2897d1e23efa169aa8787752fe4aa2bb125d5
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2895
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 267c13f9b46d249bfd1b8711fd3fadf6853dc1ef)
2014-06-09 17:17:14 -07:00
Srinath Shankar
5755b0bdee Order by without limit for Impala
Enable order-by without limit
Added BufferedBlockMgr to allocate buffers and spill to disk.
Added Sorter for the external sort impelementation
Added new SortNode execution node that completely sorts its input
Changes to enable writing in IoMgr went in a separate patch.

Reviewed-on: http://gerrit.ent.cloudera.com:8080/1539
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins

Conflicts:

	testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test

Change-Id: I3ece32affe5b006f53bbdfcc03ded01471e818ac
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2900
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
2014-06-09 16:58:08 -07:00
Matthew Jacobs
4f804f9a34 IMPALA-1029: DROP DATA SOURCE should remove jar from lib caches
Change-Id: I2a3f0f4e54474aa4fd4e0515bfcdc272d1535544
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2846
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit f2ae745a09b5025c53919a4ce71e3943b034f87c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2899
2014-06-09 11:37:16 -07:00
Henry Robinson
60cbe1b0e1 IMPALA-741: Support partitions with non-existant HDFS locations
If a partition had a location that did not exist in HDFS, Impala would
refuse to load its metadata. This meant a typo could render a table
unloadable. We fix this problem by removing the existence check from the
frontend, and by inheriting access from the first extant parent of the
partition directory.

Fixing this exposed a second issue, where Impala wouldn't create
directories for partitions in the right place after an INSERT if the
partition location had been changed. To get this right we have to plumb
the partition ID through to Coordinator::FinalizeSuccessfulInsert(), so
that the coordinator can look up the partition's location from the
query-wide descriptor table. As a by-product, this patch rationalises
the per-partition, per-fragment statistics gathering a little bit by
putting almost all the per-partition stats into TInsertPartitionStatus.

Change-Id: I9ee0a1a1ef62cf28f55be3249e8142c362083163
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2851
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
2014-06-08 18:44:45 -07:00
ishaan
db97981ab9 [CDH5] Switch the tpcds schemas to use decimal instead of float/double.
This patch converts the tpcds schemas to use decimal instead of float/double. Currently,
Impala can only r/w decimal in text, therefore, the tables are constrained to text. The
schemas were obtained from the official tpc spec:
http://www.tpc.org/tpcds/spec/tpcds_1.1.0.pdf

Change-Id: I1ef0113dcb48bad52af75ee93b47b08adf9e1a69
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2403
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-06-08 11:47:23 -07:00
Matthew Jacobs
2f9b2ae785 Fix SHOW DATA SOURCE test; must execute setup/cleanup serially
The SHOW DATA SOURCE tests were run as part of the other SHOW * tests
in test_show(), but the setup/cleanup for data sources can't be run
in parallel. This change moves the SHOW DATA SOURCE tests into a separate
test method and the setup/cleanup code is only run for this test (i.e.
not using setup_method() and teardown_method()). The test is then
only executed serially.

Change-Id: I221145f49cfe7290e132c6a87a5295b747c1fcc7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2864
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 5bcd769eae3a694d7f6f42d093f9197e8a4e8b77)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2870
2014-06-05 20:07:57 -07:00
Nong Li
5d80942d42 [CDH5] IMPALA-1019: Fix cancellation path in io mgr for cached reads.
Change-Id: I11efd65d1efa900f79afe88b781262a44ac5006a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2703
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-30 19:14:39 -07:00
Nong Li
84f851b5a5 IMPALA-959: Fix ASAN decimal crashes.
Not quite sure what the underlying issue is but these fixes seem to work.

Change-Id: I759804eb8338ba86969c0214a1e6e35588c94297
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2726
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-05-30 16:47:07 -07:00
Skye Wanderman-Milne
c8b2017093 Add decimal UDF/UDA support.
Change-Id: Ie48c1cb8e978c7282593b7f602dd68added6d3fd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2625
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 5048f04b332c13b1bff32fb257272b0fea4b8584)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2739
2014-05-29 20:49:53 -07:00
Lenni Kuff
c45e9a70d9 [CDH5] Add DDL support for HDFS caching
This change adds DDL support for HDFS caching. The DDL allows the user to indicate a
table or partition should be cached and which pool to cache the data into:
* Create a cached table: CREATE TABLE ... CACHED IN 'poolName'
* Cache a table/partition: ALTER TABLE ... [partitionSpec] SET CACHED IN 'poolName'
* Uncache a table/partition: ALTER TABLE ... [partitionSpec] SET UNCACHED

When a table/partition is marked as cached, a new HDFS caching request is submitted
to cache the location (HDFS path) of the table/partition and the ID of that request
is stored with in the table metadata (in the table properties). This is stored as:
'cache_directive_id'='<requestId>'. The cache requests and IDs are managed by HDFS
and persisted across HDFS restarts.

When a cached table or partition is dropped it is important to uncache the cached data
(drop the associated cache request). For partitioned tables, this means dropping all
cache requests from all cached partitions in the table.
Likewise, if a partitioned table is created as cached, new partitions should be marked
as cached by default.

It is desirable to know which cache pools exists early on (in analysis) so the query
will fail without hitting HDFS/CatalogServer if a non-existent pool is specified. To
support this, a new cache pool catalog object type was introduced. The catalog server
caches the known pools (periodically refreshing the cache) and sends the known pools out
in catalog updates. This allows impalads to perform analysis checks on cache pool
existence going to HDFS. It would be easy to use this to add basic cache pool management
in the future (ADD/DROP/SHOW CACHE POOL).

Waiting for the table/partition to become cached may take a long time. Instead of
blocking the user from access the time during this period we will wait for the cache
requests to complete in the background and once they have finished the table metadata
will be automatically refreshed.

Change-Id: I1de9c6e25b2a3bdc09edebda5510206eda3dd89b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2310
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-05-27 16:47:15 -07:00
Matthew Jacobs
f9c9a7ca13 Add SHOW DATA SOURCES
Change-Id: Ieeb0df107f45a58b8a99f717e96453da93ee7270
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2529
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit b2392c5bfe9fc928ad19af6ff6737e6dc6324e63)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2614
2014-05-19 17:52:27 -07:00
Skye Wanderman-Milne
edbbe6035e Decimal: read from Avro
Allows reading decimal columns with or without codegen. Includes tests
based on a data file posted on HIVE-5823.

Change-Id: Ie541c6b98bd24543691850cb45a434af60b5a5a6
(cherry picked from commit 6983dcefdf70cce14724e17d03bc061ffb8f671c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2596
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-05-16 22:26:11 -07:00
ishaan
0298e8b6ab Fix the ASAN build by xfailing test_decimal when ASAN_OPTIONS is set.
Adding decimal columns crashes an ASAN built impalad. This change skips the test.

Change-Id: Ic94055a3f0d00f89354177de18bc27d2f4cecec2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2532
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2594
2014-05-16 18:14:30 -07:00
Matthew Jacobs
19f34d9187 test_data_source_tables should only run for 'text/none'
Change-Id: I784fb4305f8cff92c2582b0a7008f836c7aa9fa4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2504
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 9f3621ec5d270c60258e93e8a2a596329c31f4e6)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2508
2014-05-09 19:32:18 -07:00
Matthew Jacobs
0c533bb152 External Data Source: Backend changes
Change-Id: Ifa62b4ea231da47facb31c3f8d43e5e3ac73591f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2284
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
(cherry picked from commit f1e5db2853135c4346788192e2dbc632d4fe1dfb)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2497
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
2014-05-09 02:24:41 -07:00
Henry Robinson
38befd2126 IMPALA-724: Support infinite / nan values in text files
This patch allows the text scanner to read 'inf' or 'Infinity' from a
row and correctly translate it into floating-point infinity. It also
adds is_inf() and is_nan() builtins.

Finally, we change the text table writer to write Infinity and NaN for
compatibility with Hive.

In the future, we might consider adding nan / inf literals to our
grammar (postgres has this, see:
http://www.postgresql.org/docs/9.3/static/datatype-numeric.html).

Change-Id: I796f2852b3c6c3b72e9aae9dd5ad228d188a6ea3
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2393
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 58091355142cadd2b74874d9aa7c8ab6bf3efe2f)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2483
2014-05-08 12:28:53 -07:00
Nong Li
03e5665e56 Decimal: Read/Write to parquet.
This adds support for the FIXED_LENGTH_BYTE_ARRAY parquet type and
encoding for decimals.

Change-Id: I9d5780feb4530989b568ec8d168cbdc32b7039bd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1727
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2432
2014-05-02 16:38:35 -07:00
Nong Li
5adbcbbce5 Update decimal tests to only run on text/none.
Change-Id: I9a35f9e1687171fc3f06c17516bca2ea4b9af9e1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2217
Tested-by: jenkins
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2431
Reviewed-by: Nong Li <nong@cloudera.com>
2014-05-02 12:18:37 -07:00
Nong Li
bb3feb675e Dynamically scale down mem usage in scanners and io mgr.
This patch scales down the amount of buffering in the io mgr and the number
of scanner threads if the query is close to mem limits.

Change-Id: I68ef247a68642939b98ec7c429dfd393b23a20d2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1906
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2417
2014-05-01 15:04:07 -07:00
Skye Wanderman-Milne
60db4d4d82 CDH-18416: Don't inline ReadWriteUtil::ReadZLong()
For wide Avro tables, ReadZLong() would get inlined many times into a
single function body, causing LLVM to crash. Not inlining doesn't seem
to have a performance impact on narrow tables, and helps with wide
tables.

This change also adds tests over wide (i.e. many-column) tables. The
test tables are produced by specifying shell commands to generate test
tables in functional_schema_template.sql, which are executed in
generate-schema-statements.py. In the SQL templates, sections starting
with a ` are treated as shell commands. The output of the shell
command is then used as the section text. This is only a starting
point; it isn't currently implemented for all sections, and may have
to be tweaked if we use this mechanism for all tables.

Change-Id: Ife0d857d19b21534167a34c8bc06bc70bef34910
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2206
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
(cherry picked from commit 1c5951e3cce25a048208ab9bb3a3aed95e41cf67)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2353
Tested-by: jenkins
2014-04-28 15:58:15 -07:00
Skye Wanderman-Milne
bd2fc2d1d4 IMPALA-934: Refresh cached UDF library when creating a new function
This change adds the ability to refresh a local cache entry, causing
the old cache entry to be dropped and the library to be reloaded from
HDFS. This is used in ResolveSymbolLookup(), which is called by the
frontend when creating a new a function, and in ImpalaServer when
receiving a "create function" heartbeat. This change also makes sure
the FE calls into the backend for jars, so jars get refreshed as well.

Change-Id: I5fd61c1bc2e04838449335d5a68b61af8b101b01
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2286
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit e8587794b3b82438190c91b2ebe9d1e12db73981)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2348
2014-04-24 19:39:16 -07:00
Lenni Kuff
bb09b5270f IMPALA-839: Update tests to be more thorough when run exhaustively
Some tests have constraints that were there only to help reduce runtime which
reduces coverage when running in exhaustive mode. The majority of the constraints
are because it adds no value to run the test across additional dimensions (or
it is invalid to run with those dimensions). Updates the tests that have
legitimate constraints to use two new helper methods for constraining the table format
dimension:
create_uncompressed_text_dimension()
create_parquet_dimension()

These will create a dimension that will produce a single test vector, either
uncompressed text or parquet respectively.

Change-Id: Id85387c1efd5d192f8059ef89934933389bfe247
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2149
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit e02acbd469bc48c684b2089405b4a20552802481)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2290
2014-04-18 20:11:31 -07:00