This patch adds test coverage for partitioned inserts where the memory
limit will be exceeded by the table writer.
Testing:
Ran the test with exploration_strategy=exhaustive locally then ran an exhaustive
private build. Manually inspected the memory limit report to make sure
that it was behaving as expected (writer memory was being correctly
tracked, etc).
Change-Id: I8583c60d648af9eedc956315df5ac3c3d6608704
Reviewed-on: http://gerrit.cloudera.org:8080/3245
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
INSERTs on S3 are slower because of double buffering where we buffer
once locally and once in a staging directoy in S3 before moving the
file(s) to the final location. Also, moving the file from the staging
directory to the final location in HDFS is a quick rename which is
only a metadata operation. However, on S3, renames are not supported,
thus becoming a full file copy instead of just a metadata rename
operation.
This patch instroduces a boolean query option "s3_skip_insert_staging"
which avoids the staging step on S3 and allows the sinks to write to
the final location directly.
This trades in consistency for the sake of performance. If a node(s)
fails during the query, then we will end up with inconsistent results
in the final location.
P.S: This option is disabled for INSERT OVERWRITE queries as that
would require cleaning the destination directory before moving the
final files there. However, the coordinator is responsible for the
cleaning which takes place only after the table sinks have moved
the files to the final location. Thus, INSERT OVERWRITE queries must
still have their files moved to a staging location by the table sinks.
Performance gains:
- For non-partitioned tables, the INSERT queries run 4-4.5x faster on
S3. (Tested on a 63GB INSERT to a table)
- For heavily partitioned tables, there is considerable improvement
in the order of 4-5 minutes on queries that take ~27 minutes but
queries are still slow because of IMPALA-3482 where the catalog
takes too long to update all the metadata. (Tested with a query
that creates 2.4K partitions in a table totalling ~19GB).
Change-Id: Iff9620d41ba0d5fb1aa0c9f4abb48866fc2b0698
Reviewed-on: http://gerrit.cloudera.org:8080/2905
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
For IMPALA-1740 we added a test to insert.test, which creates a table and
inserts data. The table was created on HDFS by default and thus inserts with
compression enabled did not work. This change adds the required table to the
functional schema in the same way we do it for the other insert tests.
Change-Id: Ie68e7067b7a16218d27935820d5d1ce7035d2e6c
Reviewed-on: http://gerrit.cloudera.org:8080/2919
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
HIVE-5795 introduced a parameter skip.header.line.count to skip header
lines from input files. This change introduces the capability to skip
an arbitrary number of header lines from csv input files on hdfs. The
size of the total file header must be smaller than
max_scan_range_length, otherwise an error will be reported. This is
necessary because scan ranges are not read in disk order, so there is
no way of identifying header lines except by counting from the start
of the first scan range.
[localhost:21000] > alter table t1 set
tblproperties('skip.header.line.count'='1');
Query: alter table t1 set tblproperties('skip.header.line.count'='1')
[localhost:21000] > select * from t1;
Query: select * from t1
+----+----+
| c1 | c2 |
+----+----+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
+----+----+
Fetched 3 row(s) in 0.32s
[localhost:21000] > alter table t1 set
tblproperties('skip.header.line.count'='0');
Query: alter table t1 set tblproperties('skip.header.line.count'='0')
[localhost:21000] > select * from t1;
Query: select * from t1
+------+------+
| c1 | c2 |
+------+------+
| NULL | NULL |
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
+------+------+
WARNINGS: Error converting column: 0 TO INT (Data is: num1)
Error converting column: 1 TO DOUBLE (Data is: num2)
file: hdfs://localhost:20500/test-warehouse/t1/test.txt
record: num1,num2
Fetched 4 row(s) in 0.41s
Change-Id: I595f01a165d41499ca1956fe748ba3840a6eb543
Reviewed-on: http://gerrit.cloudera.org:8080/2110
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
Add support for creating a table based on a parquet file which contains arrays,
structs and/or maps.
Change-Id: I56259d53a3d9b82f318228e864c783b48a03f9ae
Reviewed-on: http://gerrit.cloudera.org:8080/582
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This patch fixes an issue where incorrect results are produced by a CTAS or IAS
that is fed from a QueryStmt that has outer-joined inline views with constants or
conditionals in the select list. The regression was introduced in this commit:
b8f642710ea9d311a7aca32611eaa7cac6cd86df
Now that the final expression substitution with TupleIsNullPredicate() wrapping
is performed in planning, the InsertStmt's result expressions should be taken
from the feeding QueryStmt's result expressions, and not the QueryStmt's
(already substituted) base table result expressions.
Change-Id: Iae29683638df01f140d0f74976cca8ca9ba0852d
Reviewed-on: http://gerrit.cloudera.org:8080/637
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The bug was that we were not substituting the partition key exprs of an InsertStmt
with the root plan node's output smap during single-node planning.
Change-Id: I16eff4bab0b1d95c7f30fd89b14af2628d6f865f
Reviewed-on: http://gerrit.cloudera.org:8080/580
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Before: Constant conjuncts used to be registered in the analyzer together with
non-constant conjuncts. Since constant conjuncts are not bound by any slot or
tuple they were incorrectly placed into whatever plan node called init() first
and then were incorrectly marked as assigned. For handling queries with a
limit 0 we had special code in the BE.
After: Since constant conjuncts do not fit well into the existing slot/tuple
based assignment logic this patch treats them specially as follows. Constant
that do not originate from the ON clause of an outer join are evaluated
directly. Depending on which clause the conjunct came from either the entire
query block is marked as returning an empty set (HAVING clause) or the block
is marked as having an empty select-project-join portion (ON and WHERE clause).
In the latter case, aggregations (if any) must still be performed.
The plan sub-trees that are guaranteed to return an empty result set are
implemented by an EmptySetNode. Constant conjuncts from the ON clause of an
outer are assigned to the node implementing the join.
Similarly, query blocks with a limit 0 are marked as returning an empty result,
and planned as an EmptySetNode.
As a side effect, this patch also fixes:
IMPALA-89: Make our behavior of INSERT OVERWRITE ... LIMIT 0
consistent with Hive's. The target table is left empty after
such an operation.
Change-Id: Ia35679ac0b3a9d94edae7f310efc4d934c1bfb0d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3653
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3800
This patch allows the text scanner to read 'inf' or 'Infinity' from a
row and correctly translate it into floating-point infinity. It also
adds is_inf() and is_nan() builtins.
Finally, we change the text table writer to write Infinity and NaN for
compatibility with Hive.
In the future, we might consider adding nan / inf literals to our
grammar (postgres has this, see:
http://www.postgresql.org/docs/9.3/static/datatype-numeric.html).
Change-Id: I796f2852b3c6c3b72e9aae9dd5ad228d188a6ea3
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2393
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 58091355142cadd2b74874d9aa7c8ab6bf3efe2f)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2483
The Impala CatalogService manages the caching and dissemination of cluster-wide metadata.
The CatalogService combines the metadata from the Hive Metastore, the NameNode,
and potentially additional sources in the future. The CatalogService uses the
StateStore to broadcast metadata updates across the cluster.
The CatalogService also directly handles executing metadata updates request from
impalad servers (DDL requests). It exposes a Thrift interface to allow impalads to
directly connect execute their DDL operations.
The CatalogService has two main components - a C++ server that implements StateStore
integration, Thrift service implementiation, and exporting of the debug webpage/metrics.
The other main component is the Java Catalog that manages caching and updating of of all
the metadata. For each StateStore heartbeat, a delta of all metadata updates is broadcast
to the rest of the cluster.
Some Notes On the Changes
---
* The metadata is all sent as thrift structs. To do this all catalog objects (Tables/Views,
Databases, UDFs) have thrift struct to represent them. These are sent with each statestore
delta update.
* The existing Catalog class has been seperated into two seperate sub-classes. An
ImpladCatalog and a CatalogServiceCatalog. See the comments on those classes for more
details.
What is working:
* New CatalogService created
* Working with statestore delta updates and latest UDF changes
* DDL performed on Node 1 is now visible on all other nodes without a "refresh".
* Each DDL operation against the Catalog Service will return the catalog version that
contains the change. An impalad will wait for the statestore heartbeat that contains this
version before returning from the DDL comment.
* All table types (Hbase, Hdfs, Views) getting their metadata propagated properly
* Block location information included in CS updates and used by Impalads
* Column and table stats included in CS updates and used by Impalads
* Query tests are all passing
Still TODO:
* Directly return catalog object metadata from DDL requests
* Poll the Hive Metastore to detect new/dropped/modified tables
* Reorganize the FE code for the Catalog Service. I don't think we want everything in the
same JAR.
Change-Id: I8c61296dac28fb98bcfdc17361f4f141d3977eda
Reviewed-on: http://gerrit.ent.cloudera.com:8080/601
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
Split out the encoder/type for parquet reader/writer. I think this puts us
in a better place to support future encodings.
On the tpch lineitem table, the results are:
Before:
BytesWritten: 236.45 MB
Per Column Sizes:
l_comment: 75.71 MB
l_commitdate: 8.64 MB
l_discount: 11.19 MB
l_extendedprice: 33.02 MB
l_linenumber: 4.56 MB
l_linestatus: 869.98 KB
l_orderkey: 8.99 MB
l_partkey: 27.02 MB
l_quantity: 11.58 MB
l_receiptdate: 8.65 MB
l_returnflag: 1.40 MB
l_shipdate: 8.65 MB
l_shipinstruct: 1.45 MB
l_shipmode: 2.17 MB
l_suppkey: 21.91 MB
l_tax: 10.68 MB
After:
BytesWritten: 198.63 MB (84%)
Per Column Sizes:
l_comment: 75.71 MB (100%)
l_commitdate: 8.64 MB (100%)
l_discount: 2.89 MB (25.8%)
l_extendedprice: 33.13 MB (100.33%)
l_linenumber: 1.50 MB (32.89%)
l_linestatus: 870.26 KB (100.032%)
l_orderkey: 9.18 MB (102.11%)
l_partkey: 27.10 MB (100.29%)
l_quantity: 4.32 MB (37.31%)
l_receiptdate: 8.65 MB (100%)
l_returnflag: 1.40 MB (100%)
l_shipdate: 8.65 MB (100%)
l_shipinstruct: 1.45 MB (100%)
l_shipmode: 2.17 MB (100%)
l_suppkey: 10.11 MB (46.14%)
l_tax: 2.89 MB (27.06%)
The table is overall 84% as big (i.e. 16% smaller). A few columns got marginally
bigger. If the file filled the 1 GB, I'd expect the overhead to decrease even
more.
The restructuring to use a virtual call doesn't seem to change things much and
will go away when we codegen the scanner.
Here's what they look like with this patch (note this is on the before data files,
so only string cols are dictionary encoded).
Before query times:
Insert Time: 8.5 sec
select *: 2.3 sec
select avg(l_orderkey): .33 sec
After query times:
Insert Time: 9.5 sec <-- Longer due to doing dictionary encoding
select *: 2.4 sec <-- kind of noisy, possibly a slight slow down
select avg(l_orderkey): .33 sec
Change-Id: I213fdca1bb972cc200dc0cd9fb14b77a8d36d9e6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/238
Tested-by: jenkins <kitchen-build@cloudera.com>
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
This is the first set of changes required to start getting our functional test
infrastructure moved from JUnit to Python. After investigating a number of
option, I decided to go with a python test executor named py.test
(http://pytest.org/). It is very flexible, open source (MIT licensed), and will
enable us to do some cool things like parallel test execution.
As part of this change, we now use our "test vectors" for query test execution.
This will be very nice because it means if load the "core" dataset you know you
will be able to run the "core" query tests (specified by --exploration_strategy
when running the tests).
You will see that now each combination of table format + query exec options is
treated like an individual test case. this will make it much easier to debug
exactly where something failed.
These new tests can be run using the script at tests/run-tests.sh
This change updates the run-benchmark script to enable it to target one or more
workloads. Now benchmarks can be run like:
./run-benchmark --workloads=hive-benchmark,tpch
We lookup the workload in the workloads directory, then read the associated
query .test files and start executing them.
To ensure the queries are not duplicated between benchmark and query tests, I
moved all existing queries (under fe/src/test/resources/* to the workloads
directory. You do NOT need to look through all the .test files, I've just moved
them. The one new file is the 'hive-benchmark.test' which contains the hive
benchmark queries.
Also added support for generating schema for different scale factors as well as
executing against these scale factors. For example, let's say we have a dataset
with a scale factor called "SF1". We would first generate the schema using:
./generate_schema_statements --workload=<workload> --scale_factor="SF3"
This will create tables with a unique names from the other scale factors.
Run the generated .sql file to load the data. Alternatively, the data can loaded
by running a new python script:
./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor]
For example: load-data.sh -w tpch -e core -s SF3
Then run against this:
./run-benchmark --workloads=<workload> --scale_factor=SF3
This changeset also includes a few other minor tweaks to some of the test
scripts.
Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6