This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:
CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);
ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();
Sort columns can only be specified for Hdfs tables and effectiveness may
vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.columns' table property and will result in an additional SORT node
being added to the plan before the final table sink. Specifying sort
columns also enables clustering during inserts, so the SORT node will
contain all partitioning columns first, followed by the sort columns. We
do this because sort columns add a SORT node to the plan and adding the
clustering columns to the SORT node is cheap.
Sort columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144). Until then, it is possible to specify
sort columns using both ways at the same time and the column lists
will be concatenated.
Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Reviewed-on: http://gerrit.cloudera.org:8080/6495
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
Adds new parametrization to the unique database fixture:
- num_dbs: allows creating multiple unique databases at once;
the 2nd, 3rd, etc. datbase name is generated by appending
"2", "3", etc., to the first database name
- sync_ddl: allows creating the dabatases(s) with sync_ddl
which is needed by most tests in test_ddl.py
Testing: I ran debug/core and debug/exhaustive on HDFS and
core/debug on S3. Also ran the test locally in a loop on
exhaustive.
Change-Id: Idf667dd5e960768879c019e2037cf48ad4e4241b
Reviewed-on: http://gerrit.cloudera.org:8080/4155
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This is the first in a series of patches to clean up test_ddl.py
Summary of changes:
- Break up test_create() and corresponding .test files into:
* test_create_database()
* test_create_table()
* test_create_table_like_table()
* test_create_table_like_file()
* test_create_table_as_select()
- Merge test_nested() into the tests above
- Move a test into test_hms_integration.py
- Add a new test_ddl_base.py as base class for DDL tests.
The plan is to split up test_ddl.py into several smaller
.py files in subsequent patches.
Testing: I tested test_ddl.py and test_hms_integration.py on
exhaustive locally as well as in private builds on all filesystems.
Change-Id: I5f4c044d39e165c2535961b8d0a765c8dbbd051c
Reviewed-on: http://gerrit.cloudera.org:8080/3044
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>