After IMPALA-4735 has been fixed, this suffix is no longer needed to
make test names prefix-free.
Change-Id: Ie63d145c94c02ec67e81d0c51a33d20685fba73e
Reviewed-on: http://gerrit.cloudera.org:8080/5886
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
This patch addresses warning messages from pytest re: the imported
TestMatrix, TestVector, and TestDimension classes, which were being
collected as potential test classes. The fix was to simply prepend
the class names with Impala-
git grep -l 'TestDimension' | xargs \
sed -i 's/TestDimension/ImpalaTestDimension/g'
git grep -l 'TestMatrix' | xargs \
sed -i 's/TestMatrix/ImpalaTestMatrix/g'
git grep -l 'TestVector' | xargs \
sed -i 's/TestVector/ImpalaTestVector/g'
The tests all passed in an exhaustive run on the upstream jenkins
server:
http://jenkins.impala.io:8080/view/Utility/job/pre-review-test/8/
Change-Id: I06b7bc6fd99fbb637a47ba376bf9830705c1fce1
Reviewed-on: http://gerrit.cloudera.org:8080/5794
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
IMPALA-2521 introduced clustering for insert statements. This change
makes the HdfsTableSink aware of clustered inputs, so that partitions
are opened, written, and closed one by one.
This change also adds/modifies tests in several ways:
- clustered insert tests switch from selecting all rows from
alltypessmall to alltypes. Together with varying settings for
batch_size, this results in a larger number of row batches being
written.
- clustered insert tests select from alltypes instead of
functional.alltypes to make sure we also select from various input
formats.
- clustered insert tests have been added to select from alltypestiny to
create inserts with 1 and 2 rows per partition respectively.
- exhaustive insert tests now use different values for batch_size: 1,
16, 0 (meaning default, 1024). This is limited to uncompressed parquet
files, to maintain a reasonable runtime. On my machine execution of
test.insert took 1778 seconds, compared to 1002 seconds with the just
default row batch size.
- There is additional testing in test_insert_behaviour.py to make sure
that insertion over several row batches only creates one file per
partition.
- It renames the test_insert method to make it unique in the file and
allow for effective filtering with -k.
- It adds tests to the Analyzer test suite.
Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Reviewed-on: http://gerrit.cloudera.org:8080/4863
Reviewed-by: Lars Volker <lv@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Many of our test scripts have import statements that look like
"from xxx import *". It is a good practice to explicitly name what
needs to be imported. This commit implements this practice. Also,
unused import statements are removed.
Change-Id: I6a33bb66552ae657d1725f765842f648faeb26a8
Reviewed-on: http://gerrit.cloudera.org:8080/3444
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
The text scanner had some pathological behaviour when reading
significantly past the end of it scan range. E.g. reading a 256mb string
that's split across blocks. ScannerContext defaulted to issuing 1kb
reads, even if the scan node requested significantly more data. E.g. if
the Parquet scanner called ReadBytes(16mb), this was chopped up into
1kb reads, which were reassembled in boundary_buffer_.
Increase the minimum read size in this case to 64kb. Reading that amount
of data should not have any significant overhead even if we only read
a few bytes past the end of the scan range.
ScannerContext implements a saner default algorithm that will work better
if scanners make many small reads: it starts with 64kb reads and doubles
the size of each successive read past the end of the scan range. We
also correct pass the 'read_past_size' into GetNextBuffer(), so that
we always read the right amount of data.
Also save some time by pre-sizing the boundary buffer to the correct
size instead of reallocating it multiple times.
Testing:
Add test case that exercises the code paths for very large strings.
Performance:
The queries in the test case are vastly faster than before. E.g. 0.6s
vs ~60s for the count(*) query.
Change-Id: Id90c5dea44f07dba5dd465cf325fbff28be34137
Reviewed-on: http://gerrit.cloudera.org:8080/3518
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Previously Impala disallowed LOAD DATA and INSERT on S3. This patch
functionally enables LOAD DATA and INSERT on S3 without making major
changes for the sake of improving performance over S3. This patch also
enables both INSERT and LOAD DATA between file systems.
S3 does not support the rename operation, so the staged files in S3
are copied instead of renamed, which contributes to the slow
performance on S3.
The FinalizeSuccessfulInsert() function now does not make any
underlying assumptions of the filesystem it is on and works across
all supported filesystems. This is done by adding a full URI field to
the base directory for a partition in the TInsertPartitionStatus.
Also, the HdfsOp class now does not assume a single filesystem and
gets connections to the filesystems based on the URI of the file it
is operating on.
Added a python S3 client called 'boto3' to access S3 from the python
tests. A new class called S3Client is introduced which creates
wrappers around the boto3 functions and have the same function
signatures as PyWebHdfsClient by deriving from a base abstract class
BaseFileSystem so that they can be interchangeably through a
'generic_client'. test_load.py is refactored to use this generic
client. The ImpalaTestSuite setup creates a client according to the
TARGET_FILESYSTEM environment variable and assigns it to the
'generic_client'.
P.S: Currently, the test_load.py runs 4x slower on S3 than on
HDFS. Performance needs to be improved in future patches. INSERT
performance is slower than on HDFS too. This is mainly because of an
extra copy that happens between staging and the final location of a
file. However, larger INSERTs come closer to HDFS permformance than
smaller inserts.
ACLs are not taken care of for S3 in this patch. It is something
that still needs to be discussed before implementing.
Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d
Reviewed-on: http://gerrit.cloudera.org:8080/2574
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
test_insert_wide_table runs in parallel attempts to DROP/CREATE tables
with the same name into the same database. A race thus exists in which
one tests can stomp on each others' tables.
The fix is to use the unique_database fixture, in which each test can
run in parallel within its own database, and thus its own table.
Change-Id: If3b22f1b089d20c6024d6d680af82f102342394e
Reviewed-on: http://gerrit.cloudera.org:8080/2871
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Michael Brown <mikeb@cloudera.com>
Allow Impala to start only with a running HMS (and no additional services like HDFS,
HBase, Hive, YARN) and use the local file system.
Skip all tests that need these services, use HDFS caching or assume that multiple impalads
are running.
To run Impala with the local filesystem, set TARGET_FILESYSTEM to 'local' and
WAREHOUSE_LOCATION_PREFIX to a location on the local filesystem where the current user has
permissions since this is the location where the test data will be extracted.
Test coverage (with core strategy) in comparison with HDFS and S3:
HDFS 1348 tests passed
S3 1157 tests passed
Local Filesystem 1161 tests passed
Change-Id: Ic9718c7e0307273382b1cc6baf203ff2fb2acd03
Reviewed-on: http://gerrit.cloudera.org:8080/1352
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Readability: Alex Behm <alex.behm@cloudera.com>
Many python files had a hashbang and the executable bit set though
they were not intended to be run a standalone script. That makes
determining which python files are actually scripts very difficult.
A future patch will update the hashbang in real python scripts so they
use $IMPALA_HOME/bin/impala-python.
Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba
Reviewed-on: http://gerrit.cloudera.org:8080/599
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
This patch encapsulates pytests's skipif markers in classes. It leads to the following
benefits:
- Provide context and grouping for tests being skipped.
- As we improve test reporting, annotations will give us a better idea of coverage.
Change-Id: Ib0557fb78c873047c214bb62bb6b045ceabaf0c9
Reviewed-on: http://gerrit.cloudera.org:8080/297
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
Reviewed-on: http://gerrit.cloudera.org:8080/343
Add skip markers for S3 that can be used to categorize the tests that
are skipped against S3 to help see what coverage is missing. Soon
we'll be reworking some tests and/or adding new tests to get back the
important gaps.
Also, add a mechanism to parameterize paths in the .test files, and
start using these new variables. This is a step toward enabling some
more tests against S3.
Finally, a fix for buildall.sh to stop the minicluster before applying
the metastore snapshot. Otherwise, this fails since the ms db is in
use.
Change-Id: I142434ed67bed407e61d7b2c90f825734fc0dce0
Reviewed-on: http://gerrit.cloudera.org:8080/127
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
This patch introduces a new pytest marker that skip tests that currently don't work when
s3 is used as the underlying file system. The set of blacklisted tests is a superset of
tests that cannot be run with s3. Follow up patches will remove some of the test files
from the blacklist.
Change-Id: I39a58223d3435f0bd6496ffd00a2d483b751693d
Reviewed-on: http://gerrit.cloudera.org:8080/82
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
Introduces support for writing tables stored as Avro files. This supports writing all
data types except TIMESTAMP. Supports the following COMPRESSION_CODECs: NONE, DEFLATE,
SNAPPY.
Change-Id: Ica62063a4f172533c30dd1e8b0a11856da452467
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3863
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 15c6066d05d5077bee0d5123d26777b0715eb9c6)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4056
This supports both uncompressed and block compressed formats. Row compressed formats are
not supported. The type of compression is specified using a query parameter
COMPRESSION_CODEC with values NONE, GZIP, BZIP2, and SNAPPY.
Note: this patch only has basic testing. More extensive testing will be done when this
avro writer is used in data loading.
Change-Id: Id284bd4f3a28e27e49d56b1127cdc83c736feb61
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3541
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
This patch does a few things:
1) Move the metadata tests into their own folder under tests/. I think it's useful to
loosely categorize them so it's easier to run a subset of the tests that are most
useful for the changes you are making.
2) Reduce the test vectors for query_tests. We should have identical coverage in
the daily exhaustive runs but the normal runs should be much better. In particular,
deemphasizing scanner tests since that code is more stable now.
3) Misc test cleanup/consolidate python test files/etc.
Change-Id: I03c2f34877aed192c2a50665bd5e15fa85e12f1e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3831
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
Realized that if we run it with text/codec it will successfully insert a record
("hi", "there") in tinytable, which is undesirable. Skipping this test for
compressed text.
Change-Id: I8076f4fcac17d7f7530ac4993b0d4b3bd89d5fa0
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3668
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Previously it was expecting that inserting in any combination of text/codec where
codec!=none would fail. With the read compressed text patch, such an insert will
succeed by inserting uncompressed text, ignoring the compression type. This is a
temporary fix for this (code coverage) test to succeed, until the text writers are
in.
Change-Id: Ib34bf661ee90e3bcbb76a225df8809ae13e835fa
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3659
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
For example, you can now do something like:
result_set = execute("select * from tbl")
result_row = result_set[0]
result_row['col_alias'] or result_row[4]
to access column values. If the column alias/position does not exist an exception is
thrown.
Change-Id: Ie4b65619ed17fd90bf39e0966a7fc7e1180dbc5c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2719
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2922
For wide Avro tables, ReadZLong() would get inlined many times into a
single function body, causing LLVM to crash. Not inlining doesn't seem
to have a performance impact on narrow tables, and helps with wide
tables.
This change also adds tests over wide (i.e. many-column) tables. The
test tables are produced by specifying shell commands to generate test
tables in functional_schema_template.sql, which are executed in
generate-schema-statements.py. In the SQL templates, sections starting
with a ` are treated as shell commands. The output of the shell
command is then used as the section text. This is only a starting
point; it isn't currently implemented for all sections, and may have
to be tweaked if we use this mechanism for all tables.
Change-Id: Ife0d857d19b21534167a34c8bc06bc70bef34910
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2206
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
(cherry picked from commit 1c5951e3cce25a048208ab9bb3a3aed95e41cf67)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2353
Tested-by: jenkins
Partition column expressions are analysed twice for INSERT statements -
once to infer the type and so to add a possible cast, and once to
compute stats on the resulting expr. However, this process resulted in
an partition column expr that was a IntLiteral getting the smallest type
that would contains its value, rather than retaining the
column-compatible type that had been assigned to it.
This patch does the minimum thing, which is make IntLiteral.analyze()
idempotent. Doing the same thing to Expr and LiteralExpr unearths some
other bugs, which we will have to fix in a follow-on patch (see
IMPALA-884).
Change-Id: Ie22fc5d3f4832c735a1ebc0ef78f50d736f597fd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1931
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 1912d65ea21a5025d385948642f0d4aadad91abf)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1947
This is because in HdfsTable we call call "expr.castTo(colType)", but BooleanLiteral
(incorrectly) didn't implement "uncheckedCastTo()". This meant that instead of a
BooleanLiteral being returned we got back a CastExpr, which cannot be cast to LiteralExpr.
As part of this change it turns out Boolean partition columns are also broken in Hive. I
filed HIVE-6590 for these issues and we decided to disable INSERT into a boolean partition
column for Impala due to this bug.
Change-Id: I3e295bb96aadc08d64faf551f6393a7128a7ef27
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1755
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
This change adds support for cluster-synchronized catalog operations. This provides the
guaranteethat after a catalog op completes, all other subscribers to the catalog topic have
also processed that update. This is useful when load balancing, because a common workflow
is to target a different impalad for each statement executed.
For example if each of the following were executed sequentially, but targeting
a different node:
1) CREATE TABLE Foo
2) INSERT INTO Foo
3) SELECT * FROM Foo
4) INSERT INTO Foo ....
Since both the INSERT and the CREATE update the catalog, it would not work as expected
without this patch. The user might either get a "table not found" error or would be
missing partition information from the INSERT.
The downside is that this approach to DDL takes a bit longer because we need to wait
until all subscribers have processed an update. If all nodes are healthy, this overhead
should not be significantly longer than the current DDL time. However, a single bad node
might slow down or completely block the completion of all DDL operations. By default
this feature is disabled, but it can be enabled using a new query option: SYNCED_DDL=1
To test this, the base test suite was updated to support selecting a random impalad
to execute each query section in a query test file. This is currently only enabled
for the insert and DDL tests, but could be leveraged by more tests in the future.
TODO: Add additional failure tests around this functionality.
TODO: Add an explicit "sync" statement so users do not need to run all their DDL
in this mode (since it is slower).
Change-Id: I45e757a931bf2a4740cc0cdd1e76ce49a1e22b83
Reviewed-on: http://gerrit.ent.cloudera.com:8080/899
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
This change adds support for faster DDL via the CatalogServer by directly
returning the TCatalogObject from each catalog operation and using this result
to update the local impalad's catalog cache directly, rather than waiting
for a state store heartbeat that contains the change.
Because the Impalad's catalog can now be updated in two ways, it means that
we need to be careful when applying updates to ensure no work gets "undone".
For example, consider the following sequence of events:
t1: [Direct Update] - Add item A - (Catalog Version 9)
t2: [Direct Update] - Drop item A - (Catalog Version 10)
t3: [StateStore Update] - (From Catalog Version 9)
In this case, we need to ensure that the state store update in t3 does not undo the
drop in t2, even though that update will contain the change to "add item A".
To support this, we now check the catalog versions before adding any item to ensure
that an existing item does not overwrite an item with a newer catalog version.
To handle the case of removals, a new CatalogUpdateLog is introduced. This log tracks
the catalog version each item was removed from the catalog. When adding a new
catalog object, it is checked to see if this object was removed in a catalog version >
than the version of the current object. If so, the update is ignored.
This covers most updates, but there is still one concurrency issue that is not covered
with this change. If someone issues an "invalidate metadata" concurrently with a
direct catalog operation, it may briefly set the catalog back in time. This seems like
okay behavior to me (the command is invalidating the catalog metadata). If we want
to address this the CatalogUpdateLog could be extended to track additions to the catalog
and we could replay the log after invalidating the metadata (as one possible solution).
Change-Id: Icc9bdecc3c32436708bf9e9e7974f91d40e514f2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/864
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
The Impala CatalogService manages the caching and dissemination of cluster-wide metadata.
The CatalogService combines the metadata from the Hive Metastore, the NameNode,
and potentially additional sources in the future. The CatalogService uses the
StateStore to broadcast metadata updates across the cluster.
The CatalogService also directly handles executing metadata updates request from
impalad servers (DDL requests). It exposes a Thrift interface to allow impalads to
directly connect execute their DDL operations.
The CatalogService has two main components - a C++ server that implements StateStore
integration, Thrift service implementiation, and exporting of the debug webpage/metrics.
The other main component is the Java Catalog that manages caching and updating of of all
the metadata. For each StateStore heartbeat, a delta of all metadata updates is broadcast
to the rest of the cluster.
Some Notes On the Changes
---
* The metadata is all sent as thrift structs. To do this all catalog objects (Tables/Views,
Databases, UDFs) have thrift struct to represent them. These are sent with each statestore
delta update.
* The existing Catalog class has been seperated into two seperate sub-classes. An
ImpladCatalog and a CatalogServiceCatalog. See the comments on those classes for more
details.
What is working:
* New CatalogService created
* Working with statestore delta updates and latest UDF changes
* DDL performed on Node 1 is now visible on all other nodes without a "refresh".
* Each DDL operation against the Catalog Service will return the catalog version that
contains the change. An impalad will wait for the statestore heartbeat that contains this
version before returning from the DDL comment.
* All table types (Hbase, Hdfs, Views) getting their metadata propagated properly
* Block location information included in CS updates and used by Impalads
* Column and table stats included in CS updates and used by Impalads
* Query tests are all passing
Still TODO:
* Directly return catalog object metadata from DDL requests
* Poll the Hive Metastore to detect new/dropped/modified tables
* Reorganize the FE code for the Catalog Service. I don't think we want everything in the
same JAR.
Change-Id: I8c61296dac28fb98bcfdc17361f4f141d3977eda
Reviewed-on: http://gerrit.ent.cloudera.com:8080/601
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
Split out the encoder/type for parquet reader/writer. I think this puts us
in a better place to support future encodings.
On the tpch lineitem table, the results are:
Before:
BytesWritten: 236.45 MB
Per Column Sizes:
l_comment: 75.71 MB
l_commitdate: 8.64 MB
l_discount: 11.19 MB
l_extendedprice: 33.02 MB
l_linenumber: 4.56 MB
l_linestatus: 869.98 KB
l_orderkey: 8.99 MB
l_partkey: 27.02 MB
l_quantity: 11.58 MB
l_receiptdate: 8.65 MB
l_returnflag: 1.40 MB
l_shipdate: 8.65 MB
l_shipinstruct: 1.45 MB
l_shipmode: 2.17 MB
l_suppkey: 21.91 MB
l_tax: 10.68 MB
After:
BytesWritten: 198.63 MB (84%)
Per Column Sizes:
l_comment: 75.71 MB (100%)
l_commitdate: 8.64 MB (100%)
l_discount: 2.89 MB (25.8%)
l_extendedprice: 33.13 MB (100.33%)
l_linenumber: 1.50 MB (32.89%)
l_linestatus: 870.26 KB (100.032%)
l_orderkey: 9.18 MB (102.11%)
l_partkey: 27.10 MB (100.29%)
l_quantity: 4.32 MB (37.31%)
l_receiptdate: 8.65 MB (100%)
l_returnflag: 1.40 MB (100%)
l_shipdate: 8.65 MB (100%)
l_shipinstruct: 1.45 MB (100%)
l_shipmode: 2.17 MB (100%)
l_suppkey: 10.11 MB (46.14%)
l_tax: 2.89 MB (27.06%)
The table is overall 84% as big (i.e. 16% smaller). A few columns got marginally
bigger. If the file filled the 1 GB, I'd expect the overhead to decrease even
more.
The restructuring to use a virtual call doesn't seem to change things much and
will go away when we codegen the scanner.
Here's what they look like with this patch (note this is on the before data files,
so only string cols are dictionary encoded).
Before query times:
Insert Time: 8.5 sec
select *: 2.3 sec
select avg(l_orderkey): .33 sec
After query times:
Insert Time: 9.5 sec <-- Longer due to doing dictionary encoding
select *: 2.4 sec <-- kind of noisy, possibly a slight slow down
select avg(l_orderkey): .33 sec
Change-Id: I213fdca1bb972cc200dc0cd9fb14b77a8d36d9e6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/238
Tested-by: jenkins <kitchen-build@cloudera.com>
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
With this change the Python tests will now be called as part of buildall and
the corresponding Java tests have been disabled. The new tests can also be
invoked calling ./tests/run-tests.sh directly.
This includes a fix from Nong that caused wrong results for limit on non-io
manager formats.
This is the first set of changes required to start getting our functional test
infrastructure moved from JUnit to Python. After investigating a number of
option, I decided to go with a python test executor named py.test
(http://pytest.org/). It is very flexible, open source (MIT licensed), and will
enable us to do some cool things like parallel test execution.
As part of this change, we now use our "test vectors" for query test execution.
This will be very nice because it means if load the "core" dataset you know you
will be able to run the "core" query tests (specified by --exploration_strategy
when running the tests).
You will see that now each combination of table format + query exec options is
treated like an individual test case. this will make it much easier to debug
exactly where something failed.
These new tests can be run using the script at tests/run-tests.sh