Commit Graph

26 Commits

Author SHA1 Message Date
Matthew Jacobs
f413e236a8 IMPALA-3579: Strict handling of numeric overflow in text parsing
Adds a query option 'strict_mode' which treats integer and
floating pt overflows as parse errors. In the past,
overflows were ignored and the max value was returned. When
this query option is set, overflowing values are treated as if
they were completely invalid data, i.e. NULL is returned.
When abort_on_error is enabled, this means the query is
aborted.

Notes:
* DECIMAL overflow/underflow is already treated as an error.
* The handling in text-converter treats underflows the same
  as overflows, so they would result in the same behavior.
  However, floating point parsing never returns an underflow
  today.
* We may also want to handle numeric values that are truncated
  when parsing to integer types, e.g. 10.5 -> 10.

Change-Id: I7409c31ec0cb6fe0b2d9842b9f58fe1670914836
Reviewed-on: http://gerrit.cloudera.org:8080/3150
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2016-05-23 08:40:20 -07:00
Alex Behm
51d59354a8 Fix expected output of show.test to work on S3.
Change-Id: I7e24c1a875126320717f636b6e4c91bf5d2aa979
Reviewed-on: http://gerrit.cloudera.org:8080/1758
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2016-01-10 23:22:44 +00:00
Alex Behm
85de03f294 Fix bad 'show files' test that relies on previous tests.
Fixes tests in show.test that executes 'show files' on the
'insert_string_partitioned' table in our functional db. The expected
output relied on modifications to 'insert_string_partitioned' made
in test_insert.py. Tests should not rely on the overall ordering
of test execution.

This patch also fixes 'show files' to produce a consistently ordered
output.

Change-Id: Ic736b94b70677b0e3f4f8a9838ffdfdde2ba17ab
Reviewed-on: http://gerrit.cloudera.org:8080/1748
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-01-08 10:53:45 +00:00
Alex Behm
1a4a830a6d IMPALA-2776: Remove escapechartesttable and associated tests.
The original purpose of the escapechartesttable was to test
Impala's behavior on text tables that have the same character
as line terminator and escape character. Recent changes in
Hive have made creating such a table impossible because
1) Only newline is allowed as the line terminator
2) Newline is forbidden as the escape character

See HIVE-11785 for details on the Hive changes.

This commit removes escapechartesttable and all associated
tests, but does not add the same enforcement rules as Hive.
These enforcement rules should be added in a follow-on change.

Change-Id: I2bd9755f4c2cc3d7dfd8d67c3759885951550f08
Reviewed-on: http://gerrit.cloudera.org:8080/1690
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2016-01-05 06:04:41 +00:00
Tim Armstrong
a280b93a37 IMPALA-2070: include the database comment when showing databases
As part of change, refactor catalog and frontend functions to return
TDatabase/Db objects instead of just the string names of databases -
this required a lot of method/variable renamings.

Add test for creating database with comment. Modify existing tests
that assumed only a single column in SHOW DATABASES results.

Change-Id: I400e99b0aa60df24e7f051040074e2ab184163bf
Reviewed-on: http://gerrit.cloudera.org:8080/620
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-12-04 01:40:41 +00:00
zuowang
16792b28be IMPALA-1437: Implement SHOW FILES IN <table>
Query:SHOW FILES IN db.table

Result:
+---------------------------------------------+------+---------------------+
| path                                        | size | partition           |
+---------------------------------------------+------+---------------------+
| hdfs://namenode/path/to/partition/file1.dat | 128B | year=2010, month=11 |
| hdfs://namenode/path/to/partition/file2.dat | 256B | year=2010, month=12 |
| hdfs://namenode/path/to/partition/file3.dat | 1.3G | year=2011, month=1  |
+---------------------------------------------+------+---------------------+

Query:SHOW FILES IN db.table PARTITION(year=2010, month=12)

Result:
+---------------------------------------------+------+---------------------+
| path                                        | size | partition           |
+---------------------------------------------+------+---------------------+
| hdfs://namenode/path/to/partition/file2.dat | 256B | year=2010, month=12 |
+---------------------------------------------+------+---------------------+

Only support Hdfs tables. Will throw exceptions for other kinds of table.

Partition is optional. Will throw exception if specified partition cannot be found.

Change-Id: I6480ed87ab6cdfb02a60bffa72a8047a161f92ab
Reviewed-on: http://gerrit.cloudera.org:8080/19
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
2015-03-05 05:13:50 +00:00
Alex Behm
19bab59854 Create/alter/describe tables with complex types.
This patch adds parsing of complex types and tests for using complex
types in various exprs and create/alter/describe stmts.

Change-Id: Ibc211a560c889f5ccfb616813700b923c89d8245
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3577
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3594
2014-07-23 17:26:14 -07:00
Matthew Jacobs
2f9b2ae785 Fix SHOW DATA SOURCE test; must execute setup/cleanup serially
The SHOW DATA SOURCE tests were run as part of the other SHOW * tests
in test_show(), but the setup/cleanup for data sources can't be run
in parallel. This change moves the SHOW DATA SOURCE tests into a separate
test method and the setup/cleanup code is only run for this test (i.e.
not using setup_method() and teardown_method()). The test is then
only executed serially.

Change-Id: I221145f49cfe7290e132c6a87a5295b747c1fcc7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2864
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 5bcd769eae3a694d7f6f42d093f9197e8a4e8b77)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2870
2014-06-05 20:07:57 -07:00
Matthew Jacobs
12b72c4330 IMPALA-1011: Handle SHOW DATA SOURCES when no sources configured
Change-Id: I367b90c7603aea973d442f9186a6b32598a66a28
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2716
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 4df5c6d741237e9c91e84e39fd6ea760ccb40cf5)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2723
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
2014-05-28 20:38:41 -07:00
Matthew Jacobs
f9c9a7ca13 Add SHOW DATA SOURCES
Change-Id: Ieeb0df107f45a58b8a99f717e96453da93ee7270
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2529
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit b2392c5bfe9fc928ad19af6ff6737e6dc6324e63)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2614
2014-05-19 17:52:27 -07:00
Lenni Kuff
bfb16ff552 Disable SHOW STATS tests because results are unstable (IMPALA-688)
Change-Id: Ib4b4fe3a29d3bd0e3c7ece8b5b21c4ec4b5eb289
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1060
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:54:22 -08:00
Lenni Kuff
0bae3978c9 Update compute-stats.py to execute using Impala
Updates our compute stats script to execute using Impala. This allows us
to easily compute stats on all tables in a database or all tables in the
metastore.
The updated stats caused one of the TPCH plans to change so this also
updates the TPCH planner test results.

Change-Id: I17e5dcd1036a35e40eb4eb2c8e4a20702db9049c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1024
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:18 -08:00
Alex Behm
93e5b262c2 Added COMPUTE STATS command for gathering table and column stats.
A compute stats command computes the table and column stats for a given
table and persists them in the metastore.
The table stats consist of the per-partition and per-table row count.
The column stats are computed on a per-table basis and consist of the
number of distinct values and the number of NULLs per column.

This patch introduces a new 'child query' concept that
compute stats utilizes. Child queries are cancelled
if the parent query is cancelled. A compute stats stmt is
executed by the following query hirarchy:
parent: compute stats query (DDL)
- child: compute table stats query (QUERY)
- child: compute column stats query (QUERY)

The new child query concept is necessary to decouple child query fetches
from parent query fetches, i.e., we could not execute a child query as
part of the original compute stats query, because then a client could
fetch the results we need for updating the Metastore statistics. The
reason why our existing CTAS works without this decoupling
is that its insert 'child query' is not fetchable.

Change-Id: I560533e3cb09bcbbdb3eea7fcf0b460bc6b36dcd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/873
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:14 -08:00
Skye Wanderman-Milne
49f4bd285a Change test_metadata_query_statements.py::test_show to ignore Parquet
file sizes since they may fluctuate slightly.

Change-Id: I3ddb6ceebe6dcc86cc1c58b35b0cd96986ec43e1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/988
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:14 -08:00
Alex Behm
1497002013 Added SHOW TABLE/COLUMN STATS command.
Fixed the following stats-related bugs:
- Per-partition row count was not distributed properly via CatalogService
- HBase column stats were not loaded and distributed properly

Enhancements to test framework:
- Allow regex specification of expected row or column values
- Fixed expected results of some tests because the test framework
  did not catch that they were incorrect

Change-Id: I1fa8e710bbcf0ddb62b961fdd26ecd9ce7b75d51
Reviewed-on: http://gerrit.ent.cloudera.com:8080/813
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:51 -08:00
ishaan
53cd9eadab Treat HBase as a file format for functional tests
Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922
Reviewed-on: http://gerrit.ent.cloudera.com:8080/102
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:36 -08:00
Alex Behm
9a201645cd IMPALA-496: Fix escaping of field delimiter and escape character in inserts
Change-Id: I49c36ae9823b35dcb9e92d1a13bef270657e36f2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/163
Tested-by: jenkins <kitchen-build@cloudera.com>
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:09 -08:00
ishaan
5ed84d7f65 IMP-739 Results for show queries should check for subset, not equality. 2014-01-08 10:48:46 -08:00
ishaan
09d6d931f4 Change the way data is loaded 2014-01-08 10:48:09 -08:00
Lenni Kuff
1896701399 IMPALA-44: Database names are case sensitive 2014-01-08 10:47:34 -08:00
Lenni Kuff
9d981984e7 Update expected results of the 'show table/database' test to remove trevni tables 2014-01-08 10:47:10 -08:00
Lenni Kuff
30dbf59ef2 Final changes to enable Python test infrastructure and tests
With this change the Python tests will now be called as part of buildall and
the corresponding Java tests have been disabled. The new tests can also be
invoked calling ./tests/run-tests.sh directly.

This includes a fix from Nong that caused wrong results for limit on non-io
manager formats.
2014-01-08 10:46:57 -08:00
Lenni Kuff
ef48f65e76 Add test framework for running Impala query tests via Python
This is the first set of changes required to start getting our functional test
infrastructure moved from JUnit to Python. After investigating a number of
option, I decided to go with a python test executor named py.test
(http://pytest.org/). It is very flexible, open source (MIT licensed), and will
enable us to do some cool things like parallel test execution.

As part of this change, we now use our "test vectors" for query test execution.
This will be very nice because it means if load the "core" dataset you know you
will be able to run the "core" query tests (specified by --exploration_strategy
when running the tests).

You will see that now each combination of table format + query exec options is
treated like an individual test case. this will make it much easier to debug
exactly where something failed.

These new tests can be run using the script at tests/run-tests.sh
2014-01-08 10:46:50 -08:00
Henry Robinson
91c3b979ca IMP-370: SHOW TABLES IN support and IMP-363: SHOW DATABASES
Change-Id: Ic41c4b0767a0480f0a18e1e985f25de3bc2ca947
2014-01-08 10:45:59 -08:00
Henry Robinson
540673763f Add session key handling to ThriftServer, and session support to the frontend 2014-01-08 10:45:59 -08:00
Henry Robinson
e3e6ba984b Show / describe 2014-01-08 10:44:49 -08:00