Commit Graph

25 Commits

Author SHA1 Message Date
Dimitris Tsirogiannis
5a6f53db16 Add partition pruning tests
The following changes are included in this commit:
1. Modified the alltypesagg table to include an additional partition key
that has nulls.
2. Added a number of tests in hdfs.test that exercise the partition
pruning logic (see IMPALA-887).
3. Modified all the tests that are affected by the change in alltypesagg.

Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236
2014-06-24 02:14:27 -07:00
Alex Behm
881f3a8c33 Re-order union operands descending by their estimated per-host memory.
Re-order union operands descending by their estimated per-host memory,
s.t. parent nodes can gauge the peak memory consumption of a MergeNode after
opening it during execution (a MergeNode opens its first operand in Open()).
Scan nodes are always ordered last because they can dynamically scale down their
memory usage, whereas many other nodes cannot (e.g., joins, aggregations).
One goal is to decrease the likelihood of a SortNode parent claiming too much
memory in its Open(), possibly causing the mem limit to be hit when subsequent
union operands are executed.

Change-Id: Ia51caaffd55305ea3dbd2146cd55acc7da67f382
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3146
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3213
Tested-by: jenkins
2014-06-20 18:46:10 -07:00
Alex Behm
ef6705d7e0 Rename MergeNode to UnionNode.
Change-Id: I9e3675a103757db1345b04bd1d102d2719efddd0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3128
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3154
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-06-19 12:44:21 -07:00
Alex Behm
677062be3d Rework planning of unions s.t. a UnionStmt produces a single MergeNode.
This patch changes the planning of a UnionStmt s.t. it always produces a single fragment
with a MergeNode connecting all child fragments as its root.
The data partition of the returned fragment and how the child fragments are merged
depends on the data partitions of the child fragments:
- All child fragments are unpartitioned or partitioned: The returned fragment is
  has a UNPARTITIONED or RANDOM data partition, respectively. The MergeNode absorbs
  the plan trees of all child fragments.
- Mixed partitioned/unpartitioned child fragments: The returned fragment is
  RANDOM partitioned. The plan trees of all partitioned child fragments are absorbed
  into the MergeNode. All unpartitioned child fragments are connected to the
  MergeNode via a RANDOM exchange, and remain unchanged otherwise.

Also adds support for random partitioned data exchanges.

Change-Id: I82b2d12c104d98c4e7133234653ee1b67658ef7a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2876
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3143
2014-06-19 00:56:58 -07:00
Srinath Shankar
895bdeddd8 Ignore order-by without limit in INSERT and CTAS
Order-by without limit in the query statement corresponding an INSERT
or CTAS must be ignored because
i) There is no guarantee on row ordering when the target table is scanned again
   i.e. 'select * from table' may return rows in any order, regardless of how the
   rows were inserted, and
ii) Ignoring (and not flagging an error) is consistent with the treatment of
   order-by w/o limit in nested queries, union operands etc.
Currently, an order-by w/o limit in a QueryStmt is only evaluated if the analyzer is
the root analyzer (has no ancestors).
However, a new child analyzer is not created for the QueryStmt in an InsertStmt, so this
technique fails for inserts. The correct thing to do is to use a child analyzer for that
QueryStmt, but this has spill-over scoping effects for analysis of with clauses.

This patch adds a flag, similar to the isExplain flag to the analyzer to identify
insert statements.

Change-Id: I9ded587cfea75eca0b7a43ee9b0df0a6c8ecb602
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3044
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3060
2014-06-14 18:36:43 -07:00
Nong Li
5d903efca3 ExecSummary
The runtime profile as we present it is not very useful and I think the structure of
it makes it hard to consume. This patch adds a new client facing schemed set of
counters that are collected from the runtime profiles. For example, with this structure
it would be easy to have the shell get the stats of a running query and print a useful
progress report or to check the most relevant metrics for diagnosing issues.

Here's an example of the output for one of the tpch queries:
Operator              #Hosts   Avg Time   Max Time    #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail
------------------------------------------------------------------------------------------------------------------------
09:MERGING-EXCHANGE        1   79.738us   79.738us        5           5         0        -1.00 B  UNPARTITIONED
05:TOP-N                   3   84.693us   88.810us        5           5  12.00 KB       120.00 B
04:AGGREGATE               3    5.263ms    6.432ms        5           5  44.00 KB       10.00 MB  MERGE FINALIZE
08:AGGREGATE               3   16.659ms   27.444ms   52.52K     600.12K   3.20 MB       15.11 MB  MERGE
07:EXCHANGE                3    2.644ms      5.1ms   52.52K     600.12K         0              0  HASH(o_orderpriority)
03:AGGREGATE               3  342.913ms  966.291ms   52.52K     600.12K  10.80 MB       15.11 MB
02:HASH JOIN               3    2s165ms    2s171ms  144.87K     600.12K  13.63 MB      941.01 KB  INNER JOIN, BROADCAST
|--06:EXCHANGE             3    8.296ms    8.692ms   57.22K      15.00K         0              0  BROADCAST
|  01:SCAN HDFS            2    1s412ms    1s978ms   57.22K      15.00K  24.21 MB      176.00 MB  tpch.orders o
00:SCAN HDFS               3    8s032ms    8s558ms    3.79M     600.12K  32.29 MB      264.00 MB  tpch.lineitem l

Change-Id: Iaad4b9dd577c375006313f19442bee6d3e27246a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2964
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-11 03:10:11 -07:00
Srinath Shankar
5755b0bdee Order by without limit for Impala
Enable order-by without limit
Added BufferedBlockMgr to allocate buffers and spill to disk.
Added Sorter for the external sort impelementation
Added new SortNode execution node that completely sorts its input
Changes to enable writing in IoMgr went in a separate patch.

Reviewed-on: http://gerrit.ent.cloudera.com:8080/1539
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins

Conflicts:

	testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test

Change-Id: I3ece32affe5b006f53bbdfcc03ded01471e818ac
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2900
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
2014-06-09 16:58:08 -07:00
Alex Behm
15e05082c0 IMPALA-831: Distributed aggregation and top-n over unions.
Change-Id: I056e8271421008378db93e8b2393861cc9dd4b90
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1840
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1886
2014-03-13 15:42:31 -07:00
Alex Behm
6799c93922 Simplified/enhanced explain plans with a total of four explain levels.
There are now 4 explain levels summarized as follows:
- Level 0: MINIMAL
  Non-fragmented parallel plan only showing plan nodes with minimal attributes
- Level 1: STANDARD
  Non-fragmented parallel plan with some details in plan nodes
- Level 2: EXTENDED
  Non-fragmented parallel plan with full details in plan nodes including
  the table/column stats, row size, #hosts, cardinality,
  and estimated per-host memory requirement
- Level 3: VERBOSE
  Fragmented parallel plan with full details (like level 2)

This patch also includes several bugfixes related to plan costing and/or
testing of explain plans.

Change-Id: I622310f01d1b3d53ea1031adaf3b3ffdd94eba30
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1211
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-10 19:17:59 -08:00
Marcel Kornacker
fd182201bc Predicate propagation plus join order optimization.
Introduces STRAIGHT_JOIN keyword to prevent join order optimization.

Structural changes to the planning framework:
- slot materialization: the decision whether to materialize a slot now happens *prior* to
  plan generation. This is needed in order to be able to generate accurate cost estimates
  at plan generation time. see QueryStmt.materializeRequiredSlots()
- added PlanNode.init(), which initializes the entire state of a PlanNode; this subsumes
  finalize()
  * computeMemLayout() now happens per-tuple in the corresponding ScanNode's init()
  * init() calls computeStats() by default; also marks slots as materialized and calls
    TupleDescriptor.computeMemLayout()
- added PlanNode.tblRefIds_
- restructured UnionStmt and union plan generation to fit pred propagation model:
  all tuples are created (and equiv predicates registered) prior to plan generation
- added Expr.isAuxExpr

Change-Id: I475c1645bfca9e84ae6e5f529e7781d9532e5c9a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/955
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:24 -08:00
Matthew Jacobs
8a55982105 Add OFFSET to skip rows returned with a LIMIT
Adds support for skipping a number of rows with an ORDER BY clause and a LIMIT. Hive
does not support OFFSET so creating a view with an OFFSET will not work in Hive.

For example, "SELECT * FROM T1 ORDER BY ID LIMIT 20 OFFSET 5" will do the sorting, skip
5 rows, then return the next 20. OFFSET requires an ORDER BY clause.

Note this is not very efficient as we must actually keep (limit+offset) rows in memory
in the topn-node, and all child sort nodes must as well. Users should be careful when
using this feature.

Change-Id: I4d7021c278296e7bdbfa0e6f2699cd6f23eef59d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/900
Tested-by: jenkins
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
2014-01-08 10:54:02 -08:00
Alex Behm
3f54240fed PlannerTest uses explain level 'normal'. Only add stats and costs to explain output in 'verbose' mode.
Change-Id: I827b4c7085b5aa2dc5521f8748d8973178f43f4c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/678
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:23 -08:00
Alex Behm
4bb8b38cde Added stats and cost estimates to explain output.
Change-Id: I1273745a439fd25cefa4e08ecc075c98cc8bfc45
Reviewed-on: http://gerrit.ent.cloudera.com:8080/602
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:53:22 -08:00
Nong Li
2b9105cd11 IMPALA-487: don't compact data from rhs of join if it is going through an exchange node.
Change-Id: I442445e7370218352cd6d3137f2a454c9afb73ba
Reviewed-on: http://gerrit.ent.cloudera.com:8080/476
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-01-08 10:52:50 -08:00
ishaan
53cd9eadab Treat HBase as a file format for functional tests
Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922
Reviewed-on: http://gerrit.ent.cloudera.com:8080/102
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:36 -08:00
Alan Choi
2d25f11ec3 IMPALA-91 new explain plan output 2014-01-08 10:50:10 -08:00
Marcel Kornacker
5bfc477ccc IMPALA-291: Plans should explicitly mention the join strategy 2014-01-08 10:49:59 -08:00
Marcel Kornacker
398e725a23 make broadcast joins the default join strategy 2014-01-08 10:49:34 -08:00
Marcel Kornacker
d7e22f44bb Partitioned hash joins
- added PlanNode.numNodes, PlanNode.avgRowSize and PlanNode.computeStats()
- fixing up some cardinality estimates
- Planner now tries to do a cost-based decision between broadcast join and join with full repartitioning (both inputs)
- ExchangeNode now distinguishes between its input and output row descriptor: the output potentially contains more tuples
- fixed problem related to cancellation and concurrent hash table builds.

Not included:
- partitioned joins that take advantage of existing partitions of the inputs; those will have to wait for a follow-on change
2014-01-08 10:49:29 -08:00
Marcel Kornacker
c02d25baa8 IMPALA-20: Limit clause in inline view not handled correctly by planner
- this adds a SelectNode that evaluates conjuncts and enforces the limit
- all limits are now distributed: enforced both by the child plan fragment and
  by the merging ExchangeNode
- all limits w/ Order By are now distributed: enforced both by the child plan fragment and
  by the merging TopN node
2014-01-08 10:48:29 -08:00
ishaan
09d6d931f4 Change the way data is loaded 2014-01-08 10:48:09 -08:00
Alan Choi
476a665763 IMP-620: print number of scanned partition and total scaned bytes 2014-01-08 10:46:57 -08:00
Marcel Kornacker
fd77f06f15 Moving functional-newplanner back to functional-planner (and renaming NewPlanner to Planner) 2014-01-08 10:46:20 -08:00
Marcel Kornacker
04d12f03fc cleaning up logging output 2014-01-08 10:44:28 -08:00
Lenni Kuff
04edc8f534 Update benchmark tests to run against generic workload, data loading with scale factor, +more
This change updates the run-benchmark script to enable it to target one or more
workloads. Now benchmarks can be run like:

./run-benchmark --workloads=hive-benchmark,tpch

We lookup the workload in the workloads directory, then read the associated
query .test files and start executing them.

To ensure the queries are not duplicated between benchmark and query tests, I
moved all existing queries (under fe/src/test/resources/* to the workloads
directory. You do NOT need to look through all the .test files, I've just moved
them. The one new file is the 'hive-benchmark.test' which contains the hive
benchmark queries.

Also added support for generating schema for different scale factors as well as
executing against these scale factors. For example, let's say we have a dataset
with a scale factor called "SF1". We would first generate the schema using:

./generate_schema_statements --workload=<workload> --scale_factor="SF3"
This will create tables with a unique names from the other scale factors.

Run the generated .sql file to load the data. Alternatively, the data can loaded
by running a new python script:
./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor]
For example: load-data.sh -w tpch -e core -s SF3

Then run against this:
./run-benchmark --workloads=<workload> --scale_factor=SF3

This changeset also includes a few other minor tweaks to some of the test
scripts.

Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6
2014-01-08 10:44:22 -08:00